Simulation Scenario from Bhatnagar et al. (2018+) sail paper

Function that generates data of the different simulation studies presented in the accompanying paper. This function requires the truncnorm package to be installed.

gendata(n, p, corr, E = truncnorm::rtruncnorm(n, a = -1, b = 1), betaE,
  SNR, parameterIndex)

Arguments

n	number of observations
p	number of main effect variables (X)
corr	correlation between predictors
E	simulated environment vector of length `n`. Can be continuous or integer valued. Factors must be converted to numeric. Default: `truncnorm::rtruncnorm(n, a = -1, b = 1)`
betaE	exposure effect size
SNR	signal to noise ratio
parameterIndex	simulation scenario index. See details for more information.

Value

A list with the following elements:

x: matrix of dimension nxp of simulated main effects
y: simulated response vector of length n
e: simulated exposure vector of length n
Y.star: linear predictor vector of length n
f1: the function f1 evaluated at x_1 (f1(X1))
f2: the function f1 evaluated at x_1 (f1(X1))
f3: the function f1 evaluated at x_1 (f1(X1))
f4: the function f1 evaluated at x_1 (f1(X1))
betaE: the value for $\beta_E$
f1.f: the function f1
f2.f: the function f2
f3.f: the function f3
f4.f: the function f4
X1: an n length vector of the first predictor
X2: an n length vector of the second predictor
X3: an n length vector of the third predictor
X4: an n length vector of the fourth predictor
scenario: a character representing the simulation scenario identifier as described in Bhatnagar et al. (2018+)
causal: character vector of causal variable names
not_causal: character vector of noise variables

Details

We evaluate the performance of our method on three of its defining characteristics: 1) the strong heredity property, 2) non-linearity of predictor effects and 3) interactions.

Heredity Property

: Truth obeys weak hierarchy (parameterIndex = 2) $$Y* = f_1(X_{1}) + f_2(X_{2}) + \beta_E * X_{E} + X_{E} * f_3(X_{3}) + X_{E} * f_4(X_{4}) $$
: Truth only has interactions (parameterIndex = 3)$$Y* = X_{E} * f_3(X_{3}) + X_{E} * f_4(X_{4}) $$

Non-linearity

Truth is linear (parameterIndex = 4) $$Y* = \sum_{j=1}^{4}\beta_j X_{j} + \beta_E * X_{E} + X_{E} * X_{3} + X_{E} * X_{4} $$

Interactions

Truth only has main effects (parameterIndex = 5) $$Y* = \sum_{j=1}^{4} f_j(X_{j}) + \beta_E * X_{E} $$

The functions are from the paper by Lin and Zhang (2006):

f2: f2 <- function(t) 3 * (2 * t - 1)^2
f3: f3 <- function(t) 4 * sin(2 * pi * t) / (2 - sin(2 * pi * t))
f4: f4 <- function(t) 6 * (0.1 * sin(2 * pi * t) + 0.2 * cos(2 * pi * t) + 0.3 * sin(2 * pi * t)^2 + 0.4 * cos(2 * pi * t)^3 + 0.5 * sin(2 * pi * t)^3)

The response is generated as $$Y = Y* + k*error$$ where Y* is the linear predictor, the error term is generated from a standard normal distribution, and k is chosen such that the signal-to-noise ratio is SNR = Var(Y*)/Var(error), i.e., the variance of the response variable Y due to error is 1/SNR of the variance of Y due to Y*

The covariates are simulated as follows as described in Huang et al. (2010). First, we generate $w1,\ldots, wp, u,v$ independently from $Normal(0,1)$ truncated to the interval [0,1] for $i=1,\ldots,n$. Then we set $x_j = (w_j + t*u)/(1 + t)$ for $j = 1,\ldots, 4$ and $x_j = (w_j + t*v)/(1 + t)$ for $j = 5,\ldots, p$, where the parameter $t$ controls the amount of correlation among predictors. This leads to a compound symmetry correlation structure where $Corr(x_j,x_k) = t^2/(1+t^2)$, for $1 \le j \le 4, 1 \le k \le 4$, and $Corr(x_j,x_k) = t^2/(1+t^2)$, for $5 \le j \le p, 5 \le k \le p$, but the covariates of the nonzero and zero components are independent.

References

Lin, Y., & Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. The Annals of Statistics, 34(5), 2272-2297.

Huang J, Horowitz JL, Wei F. Variable selection in nonparametric additive models (2010). Annals of statistics. Aug 1;38(4):2282.

Bhatnagar SR, Yang Y, Greenwood CMT. Sparse additive interaction models with the strong heredity property (2018+). Preprint.

Examples