Processing math: 100%

Function that generates data of the different simulation studies presented in the accompanying paper. This function requires the truncnorm package to be installed.

gendata(n, p, corr, E = truncnorm::rtruncnorm(n, a = -1, b = 1), betaE,
  SNR, parameterIndex)

Arguments

n

number of observations

p

number of main effect variables (X)

corr

correlation between predictors

E

simulated environment vector of length n. Can be continuous or integer valued. Factors must be converted to numeric. Default: truncnorm::rtruncnorm(n, a = -1, b = 1)

betaE

exposure effect size

SNR

signal to noise ratio

parameterIndex

simulation scenario index. See details for more information.

Value

A list with the following elements:

x

matrix of dimension nxp of simulated main effects

y

simulated response vector of length n

e

simulated exposure vector of length n

Y.star

linear predictor vector of length n

f1

the function f1 evaluated at x_1 (f1(X1))

f2

the function f1 evaluated at x_1 (f1(X1))

f3

the function f1 evaluated at x_1 (f1(X1))

f4

the function f1 evaluated at x_1 (f1(X1))

betaE

the value for βE

f1.f

the function f1

f2.f

the function f2

f3.f

the function f3

f4.f

the function f4

X1

an n length vector of the first predictor

X2

an n length vector of the second predictor

X3

an n length vector of the third predictor

X4

an n length vector of the fourth predictor

scenario

a character representing the simulation scenario identifier as described in Bhatnagar et al. (2018+)

causal

character vector of causal variable names

not_causal

character vector of noise variables

Details

We evaluate the performance of our method on three of its defining characteristics: 1) the strong heredity property, 2) non-linearity of predictor effects and 3) interactions.

Heredity Property

Truth obeys weak hierarchy (parameterIndex = 2) Y=f1(X1)+f2(X2)+βEXE+XEf3(X3)+XEf4(X4)

Truth only has interactions (parameterIndex = 3)Y=XEf3(X3)+XEf4(X4)

Non-linearity

Truth is linear (parameterIndex = 4) Y=4j=1βjXj+βEXE+XEX3+XEX4

Interactions

Truth only has main effects (parameterIndex = 5) Y=4j=1fj(Xj)+βEXE

.

The functions are from the paper by Lin and Zhang (2006):

f2

f2 <- function(t) 3 * (2 * t - 1)^2

f3

f3 <- function(t) 4 * sin(2 * pi * t) / (2 - sin(2 * pi * t))

f4

f4 <- function(t) 6 * (0.1 * sin(2 * pi * t) + 0.2 * cos(2 * pi * t) + 0.3 * sin(2 * pi * t)^2 + 0.4 * cos(2 * pi * t)^3 + 0.5 * sin(2 * pi * t)^3)

The response is generated as Y=Y+kerror where Y* is the linear predictor, the error term is generated from a standard normal distribution, and k is chosen such that the signal-to-noise ratio is SNR = Var(Y*)/Var(error), i.e., the variance of the response variable Y due to error is 1/SNR of the variance of Y due to Y*

The covariates are simulated as follows as described in Huang et al. (2010). First, we generate w1,,wp,u,v independently from Normal(0,1) truncated to the interval [0,1] for i=1,,n. Then we set xj=(wj+tu)/(1+t) for j=1,,4 and xj=(wj+tv)/(1+t) for j=5,,p, where the parameter t controls the amount of correlation among predictors. This leads to a compound symmetry correlation structure where Corr(xj,xk)=t2/(1+t2), for 1j4,1k4, and Corr(xj,xk)=t2/(1+t2), for 5jp,5kp, but the covariates of the nonzero and zero components are independent.

References

Lin, Y., & Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. The Annals of Statistics, 34(5), 2272-2297.

Huang J, Horowitz JL, Wei F. Variable selection in nonparametric additive models (2010). Annals of statistics. Aug 1;38(4):2282.

Bhatnagar SR, Yang Y, Greenwood CMT. Sparse additive interaction models with the strong heredity property (2018+). Preprint.

Examples

DT <- gendata(n = 75, p = 100, corr = 0, betaE = 2, SNR = 1, parameterIndex = 1)