Create case-base dataset for use in fitting parametric hazard functions

This function implements the case-base sampling approach described in Hanley and Miettinen (2009). It can be used to fit smooth-in-time parametric functions easily via logistic regression.

sampleCaseBase(
  data,
  time,
  event,
  ratio = 10,
  comprisk = FALSE,
  censored.indicator
)

Arguments

data: a data.frame or data.table containing the source dataset.
time: a character string giving the name of the time variable. See Details.
event: a character string giving the name of the event variable. See Details.
ratio: Integer, giving the ratio of the size of the base series to that of the case series. Defaults to 10.
comprisk: Logical. Indicates whether we have multiple event types and that we want to consider some of them as competing risks.
censored.indicator: a character string of length 1 indicating which value in event is the censored. This function will use relevel to set censored.indicator as the reference level. This argument is ignored if the event variable is a numeric

Value

The function returns a dataset, with the same format as the source dataset, and where each row corresponds to a person-moment sampled from the case or the base series.

Details

The base series is sampled using a multinomial scheme: individuals are sampled proportionally to their follow-up time.

It is assumed that data contains the two columns corresponding to the supplied time and event variables. If either the time or event argument is missing, the function looks for columns with appropriate-looking names (see checkArgsTimeEvent).

Warning

The offset is calculated using the total follow-up time for all individuals in the study. Therefore, we need time to be on the original scale, not a transformed scale (e.g. logarithmic). Otherwise, the offset and the estimation will be wrong.

Examples

# Simulate censored survival data for two outcome types from exponential
library(data.table)
set.seed(12345)
nobs <- 500
tlim <- 10

# simulation parameters
b1 <- 200
b2 <- 50

# event type 0-censored, 1-event of interest, 2-competing event
# t observed time/endpoint
# z is a binary covariate
DT <- data.table(z = rbinom(nobs, 1, 0.5))
DT[, `:=`(
  "t_event" = rweibull(nobs, 1, b1),
  "t_comp" = rweibull(nobs, 1, b2)
)]
#>          z   t_event     t_comp
#>      <int>     <num>      <num>
#>   1:     1 312.74831 127.708526
#>   2:     1  53.25243   8.497106
#>   3:     1  34.48639 249.441113
#>   4:     1  13.62873  52.322220
#>   5:     0  78.36455  18.839434
#>  ---                           
#> 496:     0  29.81916 142.094179
#> 497:     1 157.21649  53.021951
#> 498:     1 299.22847  36.967088
#> 499:     1 194.74603  63.880643
#> 500:     1 402.21055  55.350048
DT[, `:=`(
  "event" = 1 * (t_event < t_comp) + 2 * (t_event >= t_comp),
  "time" = pmin(t_event, t_comp)
)]
#>          z   t_event     t_comp event       time
#>      <int>     <num>      <num> <num>      <num>
#>   1:     1 312.74831 127.708526     2 127.708526
#>   2:     1  53.25243   8.497106     2   8.497106
#>   3:     1  34.48639 249.441113     1  34.486389
#>   4:     1  13.62873  52.322220     1  13.628727
#>   5:     0  78.36455  18.839434     2  18.839434
#>  ---                                            
#> 496:     0  29.81916 142.094179     1  29.819162
#> 497:     1 157.21649  53.021951     2  53.021951
#> 498:     1 299.22847  36.967088     2  36.967088
#> 499:     1 194.74603  63.880643     2  63.880643
#> 500:     1 402.21055  55.350048     2  55.350048
DT[time >= tlim, `:=`("event" = 0, "time" = tlim)]
#>          z   t_event     t_comp event      time
#>      <int>     <num>      <num> <num>     <num>
#>   1:     1 312.74831 127.708526     0 10.000000
#>   2:     1  53.25243   8.497106     2  8.497106
#>   3:     1  34.48639 249.441113     0 10.000000
#>   4:     1  13.62873  52.322220     0 10.000000
#>   5:     0  78.36455  18.839434     0 10.000000
#>  ---                                           
#> 496:     0  29.81916 142.094179     0 10.000000
#> 497:     1 157.21649  53.021951     0 10.000000
#> 498:     1 299.22847  36.967088     0 10.000000
#> 499:     1 194.74603  63.880643     0 10.000000
#> 500:     1 402.21055  55.350048     0 10.000000

out <- sampleCaseBase(DT, time = "time", event = "event", comprisk = TRUE)