`R/utils.R`

`gen_structured_model.Rd`

Function that generates data of the different simulation studies
presented in the accompanying paper. This function requires the
`popkin`

and `bnpsd`

package to be installed.

gen_structured_model( n, p_design, p_kinship, k, s, Fst, b0, nPC = 10, eta, sigma2, geography = c("ind", "1d", "circ"), percent_causal, percent_overlap, train_tune_test = c(0.6, 0.2, 0.2) )

n | number of observations to simulate |
---|---|

p_design | number of variables in X_test, i.e., the design matrix |

p_kinship | number of variable in X_kinship, i.e., matrix used to calculate kinship |

k | number of intermediate subpopulations. |

s | the desired bias coefficient, which specifies sigma indirectly. Required if sigma is missing |

Fst | The desired final FST of the admixed individuals. Required if sigma is missing |

b0 | the true intercept parameter |

nPC | number of principal components to include in the design matrix used for regression adjustment for population structure via principal components. This matrix is used as the input in a standard lasso regression routine, where there are no random effects. |

eta | the true eta parameter, which has to be |

sigma2 | the true sigma2 parameter |

geography | the type of geography for simulation the kinship matrix.
"ind" is independent populations where every individuals is actually
unadmixed, "1d" is a 1D geography and "circ" is circular geography.
Default: "ind". See the functions in the |

percent_causal | percentage of |

percent_overlap | this represents the percentage of causal SNPs that will also be included in the calculation of the kinship matrix |

train_tune_test | the proportion of sample size used for training tuning parameter selection and testing. default is 60/20/20 split |

A list with the following elements

- ytune
simulated response vector for tuning parameter selection set

- ytest
simulated response vector for test set

- xtrain
simulated design matrix for training set

- xtune
simulated design matrix for tuning parameter selection set

- xtest
simulated design matrix for testing set

- xtrain_lasso
simulated design matrix for training set for lasso model. This is the same as xtrain, but also includes the nPC principal components

- xtune_lasso
simulated design matrix for tuning parameter selection set for lasso model. This is the same as xtune, but also includes the nPC principal components

- xtest
simulated design matrix for testing set for lasso model. This is the same as xtest, but also includes the nPC principal components

- causal
character vector of the names of the causal SNPs

- beta
the vector of true regression coefficients

- kin_train
2 times the estimated kinship for the training set individuals

- kin_tune_train
The covariance matrix between the tuning set and the training set individuals

- kin_test_train
The covariance matrix between the test set and training set individuals

- Xkinship
the matrix of SNPs used to estimate the kinship matrix

- not_causal
character vector of the non-causal SNPs

- PC
the principal components for population structure adjustment

The kinship is estimated using the `popkin`

function from the
`popkin`

package. This function will multiple that kinship matrix by 2
to give the expected covariance matrix which is subsequently used in the
linear mixed models

admixed <- gen_structured_model(n = 100, p_design = 50, p_kinship = 5e2, geography = "1d", percent_causal = 0.10, percent_overlap = "100", k = 5, s = 0.5, Fst = 0.1, b0 = 0, nPC = 10, eta = 0.1, sigma2 = 1, train_tune_test = c(0.8, 0.1, 0.1))#>names(admixed)#> [1] "ytrain" "ytune" "ytest" "xtrain" #> [5] "xtune" "xtest" "xtrain_lasso" "xtune_lasso" #> [9] "xtest_lasso" "Xkinship" "kin_train" "kin_tune_train" #> [13] "kin_test_train" "mu_train" "causal" "beta" #> [17] "not_causal" "kinship" "coancestry" "PC" #> [21] "subpops"