`R/sampling.R`

`sampleCaseBase.Rd`

This function implements the case-base sampling approach described in Hanley and Miettinen (2009). It can be used to fit smooth-in-time parametric functions easily via logistic regression.

```
sampleCaseBase(
data,
time,
event,
ratio = 10,
comprisk = FALSE,
censored.indicator
)
```

- data
a data.frame or data.table containing the source dataset.

- time
a character string giving the name of the time variable. See Details.

- event
a character string giving the name of the event variable. See Details.

- ratio
Integer, giving the ratio of the size of the base series to that of the case series. Defaults to 10.

- comprisk
Logical. Indicates whether we have multiple event types and that we want to consider some of them as competing risks.

- censored.indicator
a character string of length 1 indicating which value in

`event`

is the censored. This function will use`relevel`

to set`censored.indicator`

as the reference level. This argument is ignored if the`event`

variable is a numeric

The function returns a dataset, with the same format as the source dataset, and where each row corresponds to a person-moment sampled from the case or the base series.

The base series is sampled using a multinomial scheme: individuals are sampled proportionally to their follow-up time.

It is assumed that `data`

contains the two columns corresponding to the
supplied time and event variables. If either the `time`

or `event`

argument is missing, the function looks for columns with appropriate-looking
names (see `checkArgsTimeEvent`

).

The offset is calculated using the total follow-up time for
all individuals in the study. Therefore, we need `time`

to be on the
original scale, not a transformed scale (e.g. logarithmic). Otherwise, the
offset and the estimation will be wrong.

```
# Simulate censored survival data for two outcome types from exponential
library(data.table)
set.seed(12345)
nobs <- 500
tlim <- 10
# simulation parameters
b1 <- 200
b2 <- 50
# event type 0-censored, 1-event of interest, 2-competing event
# t observed time/endpoint
# z is a binary covariate
DT <- data.table(z = rbinom(nobs, 1, 0.5))
DT[, `:=`(
"t_event" = rweibull(nobs, 1, b1),
"t_comp" = rweibull(nobs, 1, b2)
)]
#> z t_event t_comp
#> <int> <num> <num>
#> 1: 1 312.74831 127.708526
#> 2: 1 53.25243 8.497106
#> 3: 1 34.48639 249.441113
#> 4: 1 13.62873 52.322220
#> 5: 0 78.36455 18.839434
#> ---
#> 496: 0 29.81916 142.094179
#> 497: 1 157.21649 53.021951
#> 498: 1 299.22847 36.967088
#> 499: 1 194.74603 63.880643
#> 500: 1 402.21055 55.350048
DT[, `:=`(
"event" = 1 * (t_event < t_comp) + 2 * (t_event >= t_comp),
"time" = pmin(t_event, t_comp)
)]
#> z t_event t_comp event time
#> <int> <num> <num> <num> <num>
#> 1: 1 312.74831 127.708526 2 127.708526
#> 2: 1 53.25243 8.497106 2 8.497106
#> 3: 1 34.48639 249.441113 1 34.486389
#> 4: 1 13.62873 52.322220 1 13.628727
#> 5: 0 78.36455 18.839434 2 18.839434
#> ---
#> 496: 0 29.81916 142.094179 1 29.819162
#> 497: 1 157.21649 53.021951 2 53.021951
#> 498: 1 299.22847 36.967088 2 36.967088
#> 499: 1 194.74603 63.880643 2 63.880643
#> 500: 1 402.21055 55.350048 2 55.350048
DT[time >= tlim, `:=`("event" = 0, "time" = tlim)]
#> z t_event t_comp event time
#> <int> <num> <num> <num> <num>
#> 1: 1 312.74831 127.708526 0 10.000000
#> 2: 1 53.25243 8.497106 2 8.497106
#> 3: 1 34.48639 249.441113 0 10.000000
#> 4: 1 13.62873 52.322220 0 10.000000
#> 5: 0 78.36455 18.839434 0 10.000000
#> ---
#> 496: 0 29.81916 142.094179 0 10.000000
#> 497: 1 157.21649 53.021951 0 10.000000
#> 498: 1 299.22847 36.967088 0 10.000000
#> 499: 1 194.74603 63.880643 0 10.000000
#> 500: 1 402.21055 55.350048 0 10.000000
out <- sampleCaseBase(DT, time = "time", event = "event", comprisk = TRUE)
```