Function that generates data of the different simulation studies presented in the accompanying paper. This function requires the popkin and bnpsd package to be installed.

gen_structured_model(
  n,
  p_design,
  p_kinship,
  k,
  s,
  Fst,
  b0,
  nPC = 10,
  eta,
  sigma2,
  geography = c("ind", "1d", "circ"),
  percent_causal,
  percent_overlap,
  train_tune_test = c(0.6, 0.2, 0.2)
)

Arguments

n

number of observations to simulate

p_design

number of variables in X_test, i.e., the design matrix

p_kinship

number of variable in X_kinship, i.e., matrix used to calculate kinship

k

number of intermediate subpopulations.

s

the desired bias coefficient, which specifies sigma indirectly. Required if sigma is missing

Fst

The desired final FST of the admixed individuals. Required if sigma is missing

b0

the true intercept parameter

nPC

number of principal components to include in the design matrix used for regression adjustment for population structure via principal components. This matrix is used as the input in a standard lasso regression routine, where there are no random effects.

eta

the true eta parameter, which has to be 0 < eta < 1

sigma2

the true sigma2 parameter

geography

the type of geography for simulation the kinship matrix. "ind" is independent populations where every individuals is actually unadmixed, "1d" is a 1D geography and "circ" is circular geography. Default: "ind". See the functions in the bnpsd for details on how this data is actually generated.

percent_causal

percentage of p_design that is causal. must be \(0 \leq percent_causal \leq 1\). The true regression coefficients are generated from a standard normal distribution.

percent_overlap

this represents the percentage of causal SNPs that will also be included in the calculation of the kinship matrix

train_tune_test

the proportion of sample size used for training tuning parameter selection and testing. default is 60/20/20 split

Value

A list with the following elements

ytune

simulated response vector for tuning parameter selection set

ytest

simulated response vector for test set

xtrain

simulated design matrix for training set

xtune

simulated design matrix for tuning parameter selection set

xtest

simulated design matrix for testing set

xtrain_lasso

simulated design matrix for training set for lasso model. This is the same as xtrain, but also includes the nPC principal components

xtune_lasso

simulated design matrix for tuning parameter selection set for lasso model. This is the same as xtune, but also includes the nPC principal components

xtest

simulated design matrix for testing set for lasso model. This is the same as xtest, but also includes the nPC principal components

causal

character vector of the names of the causal SNPs

beta

the vector of true regression coefficients

kin_train

2 times the estimated kinship for the training set individuals

kin_tune_train

The covariance matrix between the tuning set and the training set individuals

kin_test_train

The covariance matrix between the test set and training set individuals

Xkinship

the matrix of SNPs used to estimate the kinship matrix

not_causal

character vector of the non-causal SNPs

PC

the principal components for population structure adjustment

Details

The kinship is estimated using the popkin function from the popkin package. This function will multiple that kinship matrix by 2 to give the expected covariance matrix which is subsequently used in the linear mixed models

See also

Examples

admixed <- gen_structured_model(n = 100, p_design = 50, p_kinship = 5e2, geography = "1d", percent_causal = 0.10, percent_overlap = "100", k = 5, s = 0.5, Fst = 0.1, b0 = 0, nPC = 10, eta = 0.1, sigma2 = 1, train_tune_test = c(0.8, 0.1, 0.1))
#> eta * sigma2 * kin not PD, using Matrix::nearPD
names(admixed)
#> [1] "ytrain" "ytune" "ytest" "xtrain" #> [5] "xtune" "xtest" "xtrain_lasso" "xtune_lasso" #> [9] "xtest_lasso" "Xkinship" "kin_train" "kin_tune_train" #> [13] "kin_test_train" "mu_train" "causal" "beta" #> [17] "not_causal" "kinship" "coancestry" "PC" #> [21] "subpops"