Plot Hazards and Hazard Ratios

Introduction

In this vignette, we describe the plot method for objects of class singleEventCB which is obtained from running the fitSmoothHazard function. There are currently two types of plots: hazard functions and hazard ratios. We describe each one in detail below. Note that the plot method has only been properly tested for family="glm".

Hazard Function

The hazard function plots require the visreg package.

To illustrate hazard function plots, we will use the breast cancer dataset which contains the observations of 686 women taken from the TH.data package. This dataset is also available from the casebase package. In the following, we will show different hazard functions for different combinations of continuous, binary variables as well as their interactions.

library(casebase)
#> See example usage at http://sahirbhatnagar.com/casebase/
library(visreg)
library(splines)
library(ggplot2)
data("brcancer")
str(brcancer)

One binary predictor, no interactions

We first fit a main effects only model with a spline on log(time) and hormonal therapy as main effects.

mod_cb <- fitSmoothHazard(cens ~ ns(log(time), df = 3) + hormon,
                          data = brcancer,
                          time = "time")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(mod_cb)
#> Fitting smooth hazards with case-base sampling
#> 
#> Sample size: 686 
#> Number of events: 299 
#> Number of base moments: 29900 
#> ----
#> 
#> Call:
#> fitSmoothHazard(formula = cens ~ ns(log(time), df = 3) + hormon, 
#>     data = brcancer, time = "time")
#> 
#> Coefficients:
#>                        Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)            -87.7446    19.7822  -4.436 9.18e-06 ***
#> ns(log(time), df = 3)1  53.6185    13.0349   4.113 3.90e-05 ***
#> ns(log(time), df = 3)2 153.9915    38.4109   4.009 6.10e-05 ***
#> ns(log(time), df = 3)3  32.2959     7.9401   4.067 4.75e-05 ***
#> hormon                  -0.3490     0.1255  -2.780  0.00543 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 3354.9  on 30198  degrees of freedom
#> Residual deviance: 3277.2  on 30194  degrees of freedom
#> AIC: 3287.2
#> 
#> Number of Fisher Scoring iterations: 10

Hazard functions on separate plots

All arguments needed for the hazard function plots are supplied through the hazard.params argument. This is a named list of arguments which will override the defaults passed to visreg::visreg(). The default arguments are list(fit = x, trans = exp, plot = TRUE, rug = FALSE, alpha = 1, partial = FALSE, overlay = TRUE). For example, if you want a 95% confidence band, specify hazard.params = list(alpha = 0.05). For a complete list of options, please see the visreg vignettes.

We first plot the hazard as a function of time, for hormon = 0 and hormon = 1. This is achieved by specifying the xvar argument, as well as the cond argument. The cond argument must be provided as a named list. Each element of that list specifies the value for one of the terms in the model; any elements left unspecified are filled in with the median/most common category. Note that even though we fit the log(time), we must specify time in the xvar argument.

par(mfrow = c(1, 2))
plot(mod_cb,
     hazard.params = list(xvar = "time",
                          cond = list(hormon = 0),
                          alpha = 0.05,
                          main = "No Hormonal Therapy Hazard Function"))

plot(mod_cb,
     hazard.params = list(xvar = "time",
                          cond = list(hormon = 1),
                          alpha = 0.05,
                          main = "Hormonal Therapy Hazard Function"))

Hazard functions on same plots

Alternatively, we can plot the hazard functions on the same plot. This is accomplished with the by argument:

plot(mod_cb,
     hazard.params = list(xvar = "time",
                          by = "hormon",
                          alpha = 0.05,
                          ylab = "Hazard"))

Note that if we want to extract the data used to construct the plot, e.g. to create our own, we simply assign the call to plot to an object (we may optionally set plot=FALSE in the hazard.params argument as to not print any plots):

plot_results <- plot(mod_cb,
     hazard.params = list(xvar = "time",
                          by = "hormon",
                          alpha = 0.10,
                          ylab = "Hazard",
                          plot = FALSE))

head(plot_results$fit)
#>           time hormon offset cens    visregFit    visregLwr    visregUpr
#> 1   0.01058959      0      0    0 7.816629e-39 5.775573e-53 1.057898e-24
#> 2  26.39601667      0      0    0 2.928569e-07 1.817244e-08 4.719521e-06
#> 3  52.78144375      0      0    0 7.300131e-06 1.572749e-06 3.388456e-05
#> 4  79.16687083      0      0    0 3.288475e-05 1.242184e-05 8.705691e-05
#> 5 105.55229791      0      0    0 8.009549e-05 4.151875e-05 1.545154e-04
#> 6 131.93772499      0      0    0 1.438305e-04 9.049528e-05 2.285999e-04

ggplot2 version

The function is flexible because you may leverage ggplot2 just by specifying gg = TRUE, the plot will return a ggplot object:

gg_object <- plot(mod_cb,
                  hazard.params = list(xvar = "time",
                                       by = "hormon",
                                       alpha = 0.20, # 80% CI
                                       ylab = "Hazard",
                                       gg = TRUE))

attr(gg_object,"class")
#> [1] "gg"     "ggplot"

Now we can use it downstream for any plot while leveraging the entire ggplot2 ecosystem of packages and functions:

gg_object + 
  theme_minimal()+
  theme(legend.position = "bottom") + 
  labs(title = "Casebase") +
  scale_x_continuous(n.breaks = 10)

One binary predictor with interaction

Next, we fit an interaction model with a time-varying covariate, i.e. to test the hypothesis that the effect of hormonal therapy on the hazard varies with time.

mod_cb_tvc <- fitSmoothHazard(cens ~ hormon * ns(log(time), df = 3),
                              data = brcancer,
                              time = "time")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(mod_cb_tvc)
#> Fitting smooth hazards with case-base sampling
#> 
#> Sample size: 686 
#> Number of events: 299 
#> Number of base moments: 29900 
#> ----
#> 
#> Call:
#> fitSmoothHazard(formula = cens ~ hormon * ns(log(time), df = 3), 
#>     data = brcancer, time = "time")
#> 
#> Coefficients:
#>                               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)                     -99.01      27.22  -3.638 0.000275 ***
#> hormon                          -46.66      61.72  -0.756 0.449620    
#> ns(log(time), df = 3)1           61.38      18.02   3.405 0.000660 ***
#> ns(log(time), df = 3)2          175.57      52.80   3.325 0.000884 ***
#> ns(log(time), df = 3)3           36.62      10.90   3.361 0.000777 ***
#> hormon:ns(log(time), df = 3)1    31.04      41.10   0.755 0.450174    
#> hormon:ns(log(time), df = 3)2    89.10     119.26   0.747 0.454988    
#> hormon:ns(log(time), df = 3)3    19.09      24.84   0.768 0.442301    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 3354.9  on 30198  degrees of freedom
#> Residual deviance: 3273.8  on 30191  degrees of freedom
#> AIC: 3289.8
#> 
#> Number of Fisher Scoring iterations: 11

Now we can easily plot the hazard function over time for each hormon group:

plot(mod_cb_tvc,
     hazard.params = list(xvar = "time",
                          by = "hormon",
                          alpha = 0.05,
                          ylab = "Hazard"))

One continuous predictor with interaction

Now we fit a model with an interaction between a continuous variable, estrogen receptor (in fmol), and time.

mod_cb_tvc <- fitSmoothHazard(cens ~ estrec * ns(log(time), df = 3),
                              data = brcancer,
                              time = "time")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(mod_cb_tvc)
#> Fitting smooth hazards with case-base sampling
#> 
#> Sample size: 686 
#> Number of events: 299 
#> Number of base moments: 29900 
#> ----
#> 
#> Call:
#> fitSmoothHazard(formula = cens ~ estrec * ns(log(time), df = 3), 
#>     data = brcancer, time = "time")
#> 
#> Coefficients:
#>                               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)                   -79.3451    21.4077  -3.706 0.000210 ***
#> estrec                         -0.7286     0.3863  -1.886 0.059326 .  
#> ns(log(time), df = 3)1         46.9547    14.0338   3.346 0.000820 ***
#> ns(log(time), df = 3)2        139.3924    41.6332   3.348 0.000814 ***
#> ns(log(time), df = 3)3         28.0963     8.6126   3.262 0.001105 ** 
#> estrec:ns(log(time), df = 3)1   0.4957     0.2584   1.918 0.055102 .  
#> estrec:ns(log(time), df = 3)2   1.3792     0.7410   1.861 0.062695 .  
#> estrec:ns(log(time), df = 3)3   0.3099     0.1598   1.939 0.052474 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 3354.9  on 30198  degrees of freedom
#> Residual deviance: 3262.7  on 30191  degrees of freedom
#> AIC: 3278.7
#> 
#> Number of Fisher Scoring iterations: 13

There are now many ways to plot the time-varying effect of estrogen receptor on the hazard function. The default is to plot the 10th, 50th and 90th quantiles of the by variable:

# computed at the 10th, 50th and 90th quantiles of estrec
plot(mod_cb_tvc,
     hazard.params = list(xvar = "time",
                          by = "estrec",
                          alpha = 1,
                          ylab = "Hazard"))

We can also show the quartiles of estrec by specifying the breaks argument. If breaks is a single number, that will be the used as the number of breaks:

# computed at quartiles of estrec
plot(mod_cb_tvc,
     hazard.params = list(xvar = c("time"),
                          by = "estrec",
                          alpha = 1,
                          breaks = 4,
                          ylab = "Hazard"))

Alternatively, if breaks is a vector, it will be used as the actual values to be used:

# computed where I want
plot(mod_cb_tvc,
     hazard.params = list(xvar = c("time"),
                          by = "estrec",
                          alpha = 1,
                          breaks = c(3,2200),
                          ylab = "Hazard"))

visreg2d(mod_cb_tvc, 
         xvar = "time",
         yvar = "estrec",
         trans = exp,
         print.cond = TRUE,
         zlab = "Hazard",
         plot.type = "image")

visreg2d(mod_cb_tvc, 
         xvar = "time",
         yvar = "estrec",
         trans = exp,
         print.cond = TRUE,
         zlab = "Hazard",
         plot.type = "persp")

# this can also work if 'rgl' is installed
# visreg2d(mod_cb_tvc, 
#          xvar = "time",
#          yvar = "estrec",
#          trans = exp,
#          print.cond = TRUE,
#          zlab = "Hazard",
#          plot.type = "rgl")

One continuous predictor with interaction and several other predictors

All the examples so far have only included two predictors in the regression equation. In this example, we fit a smooth hazard model with several predictors:

mod_cb_tvc <- fitSmoothHazard(cens ~ estrec * ns(log(time), df = 3) + 
                                horTh + 
                                age + 
                                menostat + 
                                tsize + 
                                tgrade + 
                                pnodes + 
                                progrec,
                              data = brcancer,
                              time = "time")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(mod_cb_tvc)
#> Fitting smooth hazards with case-base sampling
#> 
#> Sample size: 686 
#> Number of events: 299 
#> Number of base moments: 29900 
#> ----
#> 
#> Call:
#> fitSmoothHazard(formula = cens ~ estrec * ns(log(time), df = 3) + 
#>     horTh + age + menostat + tsize + tgrade + pnodes + progrec, 
#>     data = brcancer, time = "time")
#> 
#> Coefficients:
#>                                 Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)                   -6.735e+01  1.816e+01  -3.708 0.000209 ***
#> estrec                        -4.939e-01  3.146e-01  -1.570 0.116397    
#> ns(log(time), df = 3)1         3.925e+01  1.189e+01   3.301 0.000963 ***
#> ns(log(time), df = 3)2         1.155e+02  3.539e+01   3.265 0.001095 ** 
#> ns(log(time), df = 3)3         2.377e+01  7.280e+00   3.265 0.001093 ** 
#> horThyes                      -3.381e-01  1.301e-01  -2.598 0.009384 ** 
#> age                           -8.103e-03  9.268e-03  -0.874 0.381983    
#> menostatPost                   2.448e-01  1.832e-01   1.336 0.181454    
#> tsize                          7.875e-03  4.015e-03   1.962 0.049820 *  
#> tgrade.L                       5.660e-01  1.908e-01   2.967 0.003008 ** 
#> tgrade.Q                      -2.118e-01  1.226e-01  -1.728 0.083951 .  
#> pnodes                         5.250e-02  7.857e-03   6.682 2.36e-11 ***
#> progrec                       -2.263e-03  5.812e-04  -3.893 9.91e-05 ***
#> estrec:ns(log(time), df = 3)1  3.387e-01  2.105e-01   1.609 0.107595    
#> estrec:ns(log(time), df = 3)2  9.343e-01  6.038e-01   1.547 0.121748    
#> estrec:ns(log(time), df = 3)3  2.110e-01  1.297e-01   1.626 0.103860    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 3354.9  on 30198  degrees of freedom
#> Residual deviance: 3161.5  on 30183  degrees of freedom
#> AIC: 3193.5
#> 
#> Number of Fisher Scoring iterations: 13

In the following plot, we show the time-varying effect of estrec while controlling for all other variables. By default, the other terms in the model are set to their median if the term is numeric or the most common category if the term is a factor. The values of the other variables are shown in the output:

plot(mod_cb_tvc,
     hazard.params = list(xvar = "time",
                          by = "estrec",
                          alpha = 1,
                          breaks = 2,
                          ylab = "Hazard"))
#> Conditions used in construction of plot
#> estrec: 8 / 174
#> horTh: no
#> age: 53
#> menostat: Post
#> tsize: 25
#> tgrade: II
#> pnodes: 3
#> progrec: 53
#> offset: 0

You can of course set the values of the other covariates as before, i.e. by specifying the cond argument as a named list to the hazard.params argument:

plot(mod_cb_tvc,
     hazard.params = list(xvar = "time",
                          by = "estrec",
                          cond = list(tgrade = "III", age = 49),
                          alpha = 1,
                          breaks = 2,
                          ylab = "Hazard"))
#> Conditions used in construction of plot
#> estrec: 8 / 174
#> horTh: no
#> age: 49
#> menostat: Post
#> tsize: 25
#> tgrade: III
#> pnodes: 3
#> progrec: 53
#> offset: 0

Hazard Ratio

In this section we illustrate how to plot hazard ratios using the plot method for objects of class singleEventCB which is obtained from running the fitSmoothHazard function. Note that these function have only been thoroughly tested with family = "glm".

In what follows, the hazard ratio for a variable $X$ is defined as

$\frac{h\left(t | X=x_1, \mathbf{Z}=\mathbf{z_1} ; \hat{\beta}\right)}{h(t | X=x_0, \mathbf{Z}=\mathbf{z_0} ; \hat{\beta})}$ where $h(t|\cdot;\hat{\beta})$ is the hazard rate as a function of the variable $t$ (which is usually time, but can be any other continuous variable), $x_1$ is the value of $X$ for the exposed group, $x_0$ is the value of $X$ for the unexposed group, $\mathbf{Z}$ are other covariates in the model which are equal to $\mathbf{z_1}$ in the exposed and $\mathbf{z_0}$ in the unexposed group, and $\hat{\beta}$ are the estimated regression coefficients.

As indicated by the formula above, it is most instructive to plot the hazard ratio as a function of a variable $t$ only if there is an interaction between $t$ and $X$ . Otherwise, the resulting plot will simply be a horizontal line across time.

Manson Trial (eprchd)

We use data from the Manson trial (NEJM 2003) which is included in the casebase package. This randomized clinical trial investigated the effect of estrogen plus progestin (estPro) on coronary heart disease (CHD) risk in 16,608 postmenopausal women who were 50 to 79 years of age at base line. Participants were randomly assigned to receive estPro or placebo. The primary efficacy outcome of the trial was CHD (nonfatal myocardial infarction or death due to CHD).

We fit a model with the interaction between time and treatment arm. We are therefore interested in visualizing the hazard ratio of the treatment over time.

data("eprchd")
eprchd <- transform(eprchd, 
                    treatment = factor(treatment, levels = c("placebo","estPro")))
str(eprchd)
#> 'data.frame':    16608 obs. of  3 variables:
#>  $ time     : num  0.0833 0.0833 0.0833 0.0833 0.0833 ...
#>  $ status   : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ treatment: Factor w/ 2 levels "placebo","estPro": 1 1 1 1 1 1 1 1 1 1 ...

fit_mason <- fitSmoothHazard(status ~ treatment*time,
                             data = eprchd,
                             time = "time")
summary(fit_mason)
#> Fitting smooth hazards with case-base sampling
#> 
#> Sample size: 16608 
#> Number of events: 324 
#> Number of base moments: 32400 
#> ----
#> 
#> Call:
#> fitSmoothHazard(formula = status ~ treatment * time, data = eprchd, 
#>     time = "time")
#> 
#> Coefficients:
#>                      Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)          -6.09104    0.17501 -34.804  < 2e-16 ***
#> treatmentestPro       0.58230    0.22440   2.595  0.00946 ** 
#> time                  0.11404    0.04756   2.398  0.01650 *  
#> treatmentestPro:time -0.12315    0.06349  -1.940  0.05243 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 3635.4  on 32723  degrees of freedom
#> Residual deviance: 3626.2  on 32720  degrees of freedom
#> AIC: 3634.2
#> 
#> Number of Fisher Scoring iterations: 7

To plot the hazard ratio, we must specify the newdata argument with a covariate pattern for the reference group. In this example, we treat the placebo as the reference group. Because we have fit an interaction with time, we also provide a sequence of times at which we would like to calculate the hazard ratio.

newtime <- quantile(fit_mason[["originalData"]][[fit_mason[["timeVar"]]]], 
                    probs = seq(0.01, 0.99, 0.01))

# reference category
newdata <- data.frame(treatment = factor("placebo", 
                                         levels = c("placebo", "estPro")), 
                      time = newtime)
str(newdata)
#> 'data.frame':    99 obs. of  2 variables:
#>  $ treatment: Factor w/ 2 levels "placebo","estPro": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ time     : num  0.917 1.75 2.5 3.167 3.417 ...

plot(fit_mason, 
     type = "hr", 
     newdata = newdata,
     var = "treatment",
     increment = 1,
     xvar = "time",
     ci = T,
     rug = T)

In the call to plot we specify the xvar which is the variable plotted on the x-axis, the var argument which specified the variable for which we want the hazard ratio. The increment = 1 indicates that we want to increment var by 1 level, which in this case is estPro. Alternatively, we can specify the exposed argument which should be a function that takes newdata and returns the exposed dataset. The following call is equivalent to the one above:

plot(fit_mason, 
     type = "hr", 
     newdata = newdata,
     exposed = function(data) transform(data, treatment = "estPro"),
     xvar = "time",
     ci = T,
     rug = T)

Alternatively, if we want the placebo group to be the exposed group, we can change the newdata argument to the following:

newdata <- data.frame(treatment = factor("estPro", 
                                         levels = c("placebo", "estPro")), 
                      time = newtime)
str(newdata)
#> 'data.frame':    99 obs. of  2 variables:
#>  $ treatment: Factor w/ 2 levels "placebo","estPro": 2 2 2 2 2 2 2 2 2 2 ...
#>  $ time     : num  0.917 1.75 2.5 3.167 3.417 ...

levels(newdata$treatment)
#> [1] "placebo" "estPro"

Note that the reference category in newdata is still placebo. Therefore we must set increment = -1 in order to get the exposed dataset:

plot(fit_mason, 
     type = "hr", 
     newdata = newdata,
     var = "treatment",
     increment = -1,
     xvar = "time",
     ci = TRUE,
     rug = TRUE)

If the $X$ variable has more than two levels, than, increment works the same way, e.g. increment = 2 will provide an exposed group two levels above the value in newdata.

Save results

In order to save the data used to make the plot, you simply have to assign the call to plot to a variable. This is particularly useful if you want to really customize the plot aesthetics:

result <- plot(fit_mason, 
               type = "hr", 
               newdata = newdata,
               var = "treatment",
               increment = -1,
               xvar = "time",
               ci = TRUE,
               rug = TRUE)

head(result)
#>    treatment      time log_hazard_ratio standarderror hazard_ratio lowerbound
#> 1%    estPro 0.9166667      -0.46941047     0.1765961    0.6253708  0.4424037
#> 2%    estPro 1.7500000      -0.36678563     0.1401346    0.6929582  0.5265314
#> 3%    estPro 2.5000000      -0.27442328     0.1184545    0.7600103  0.6025468
#> 4%    estPro 3.1666667      -0.19232341     0.1133830    0.8250400  0.6606376
#> 5%    estPro 3.4166667      -0.16153596     0.1154931    0.8508359  0.6784813
#> 6%    estPro 3.9166667      -0.09996106     0.1257633    0.9048727  0.7071923
#>    upperbound
#> 1%  0.8840086
#> 2%  0.9119894
#> 3%  0.9586237
#> 4%  1.0303547
#> 5%  1.0669738
#> 6%  1.1578103

Sahir Rai Bhatnagar

2024-08-17