Simulation Scenario from Bhatnagar et al. (2018+) ggmix paper

Function that generates data of the different simulation studies presented in the accompanying paper. This function requires the popkin and bnpsd package to be installed.

gen_structured_model(
  n,
  p_design,
  p_kinship,
  k,
  s,
  Fst,
  b0,
  nPC = 10,
  eta,
  sigma2,
  geography = c("ind", "1d", "circ"),
  percent_causal,
  percent_overlap,
  train_tune_test = c(0.6, 0.2, 0.2)
)

Arguments

n	number of observations to simulate
p_design	number of variables in X_test, i.e., the design matrix
p_kinship	number of variable in X_kinship, i.e., matrix used to calculate kinship
k	number of intermediate subpopulations.
s	the desired bias coefficient, which specifies sigma indirectly. Required if sigma is missing
Fst	The desired final FST of the admixed individuals. Required if sigma is missing
b0	the true intercept parameter
nPC	number of principal components to include in the design matrix used for regression adjustment for population structure via principal components. This matrix is used as the input in a standard lasso regression routine, where there are no random effects.
eta	the true eta parameter, which has to be `0 < eta < 1`
sigma2	the true sigma2 parameter
geography	the type of geography for simulation the kinship matrix. "ind" is independent populations where every individuals is actually unadmixed, "1d" is a 1D geography and "circ" is circular geography. Default: "ind". See the functions in the `bnpsd` for details on how this data is actually generated.
percent_causal	percentage of `p_design` that is causal. must be \(0 \leq percent_causal \leq 1\). The true regression coefficients are generated from a standard normal distribution.
percent_overlap	this represents the percentage of causal SNPs that will also be included in the calculation of the kinship matrix
train_tune_test	the proportion of sample size used for training tuning parameter selection and testing. default is 60/20/20 split

Value

A list with the following elements

ytune

simulated response vector for tuning parameter selection set

ytest

simulated response vector for test set

xtrain

simulated design matrix for training set

xtune

simulated design matrix for tuning parameter selection set

xtest

simulated design matrix for testing set

xtrain_lasso

simulated design matrix for training set for lasso model. This is the same as xtrain, but also includes the nPC principal components

xtune_lasso

simulated design matrix for tuning parameter selection set for lasso model. This is the same as xtune, but also includes the nPC principal components

xtest

simulated design matrix for testing set for lasso model. This is the same as xtest, but also includes the nPC principal components

causal

character vector of the names of the causal SNPs

beta

the vector of true regression coefficients

kin_train

2 times the estimated kinship for the training set individuals

kin_tune_train

The covariance matrix between the tuning set and the training set individuals

kin_test_train

The covariance matrix between the test set and training set individuals

Xkinship

the matrix of SNPs used to estimate the kinship matrix

not_causal

character vector of the non-causal SNPs

the principal components for population structure adjustment

Details

The kinship is estimated using the popkin function from the popkin package. This function will multiple that kinship matrix by 2 to give the expected covariance matrix which is subsequently used in the linear mixed models

Examples

admixed <- gen_structured_model(n = 100,
                                p_design = 50,
                                p_kinship = 5e2,
                                geography = "1d",
                                percent_causal = 0.10,
                                percent_overlap = "100",
                                k = 5, s = 0.5, Fst = 0.1,
                                b0 = 0, nPC = 10,
                                eta = 0.1, sigma2 = 1,
                                train_tune_test = c(0.8, 0.1, 0.1))
#> eta * sigma2 * kin not PD, using Matrix::nearPD
names(admixed)
#>  [1] "ytrain"         "ytune"          "ytest"          "xtrain"        
#>  [5] "xtune"          "xtest"          "xtrain_lasso"   "xtune_lasso"   
#>  [9] "xtest_lasso"    "Xkinship"       "kin_train"      "kin_tune_train"
#> [13] "kin_test_train" "mu_train"       "causal"         "beta"          
#> [17] "not_causal"     "kinship"        "coancestry"     "PC"            
#> [21] "subpops"

Simulation Scenario from Bhatnagar et al. (2018+) ggmix paper

Arguments

Value

Details

See also

Examples