vignettes/plotsmoothHazard.Rmd
plotsmoothHazard.Rmd
In this vignette, we describe the plot
method for
objects of class singleEventCB
which is obtained from
running the fitSmoothHazard
function. There are currently
two types of plots: hazard functions and hazard ratios. We describe each
one in detail below. Note that the plot
method has only
been properly tested for family="glm"
.
The hazard function plots require the visreg
package.
To illustrate hazard function plots, we will use the breast cancer
dataset which contains the observations of 686 women taken from the TH.data
package. This dataset is also available from the casebase
package. In the following, we will show different hazard functions for
different combinations of continuous, binary variables as well as their
interactions.
library(casebase)
#> See example usage at http://sahirbhatnagar.com/casebase/
library(visreg)
library(splines)
library(ggplot2)
data("brcancer")
str(brcancer)
We first fit a main effects only model with a spline on
log(time)
and hormonal therapy as main effects.
mod_cb <- fitSmoothHazard(cens ~ ns(log(time), df = 3) + hormon,
data = brcancer,
time = "time")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(mod_cb)
#> Fitting smooth hazards with case-base sampling
#>
#> Sample size: 686
#> Number of events: 299
#> Number of base moments: 29900
#> ----
#>
#> Call:
#> fitSmoothHazard(formula = cens ~ ns(log(time), df = 3) + hormon,
#> data = brcancer, time = "time")
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -87.7446 19.7822 -4.436 9.18e-06 ***
#> ns(log(time), df = 3)1 53.6185 13.0349 4.113 3.90e-05 ***
#> ns(log(time), df = 3)2 153.9915 38.4109 4.009 6.10e-05 ***
#> ns(log(time), df = 3)3 32.2959 7.9401 4.067 4.75e-05 ***
#> hormon -0.3490 0.1255 -2.780 0.00543 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 3354.9 on 30198 degrees of freedom
#> Residual deviance: 3277.2 on 30194 degrees of freedom
#> AIC: 3287.2
#>
#> Number of Fisher Scoring iterations: 10
All arguments needed for the hazard function plots are supplied
through the hazard.params
argument. This is a named list of
arguments which will override the defaults passed to
visreg::visreg()
. The default arguments are
list(fit = x, trans = exp, plot = TRUE, rug = FALSE, alpha = 1, partial = FALSE, overlay = TRUE)
.
For example, if you want a 95% confidence band, specify
hazard.params = list(alpha = 0.05)
. For a complete list of
options, please see the visreg
vignettes.
We first plot the hazard as a function of time, for
hormon = 0
and hormon = 1
. This is achieved by
specifying the xvar
argument, as well as the
cond
argument. The cond
argument must be
provided as a named list. Each element of that list specifies the value
for one of the terms in the model; any elements left unspecified are
filled in with the median/most common category. Note that even though we
fit the log(time)
, we must specify time
in the
xvar
argument.
Alternatively, we can plot the hazard functions on the same plot.
This is accomplished with the by
argument:
Note that if we want to extract the data used to construct the plot,
e.g.Β to create our own, we simply assign the call to plot
to an object (we may optionally set plot=FALSE
in the
hazard.params
argument as to not print any plots):
plot_results <- plot(mod_cb,
hazard.params = list(xvar = "time",
by = "hormon",
alpha = 0.10,
ylab = "Hazard",
plot = FALSE))
head(plot_results$fit)
#> time hormon offset cens visregFit visregLwr visregUpr
#> 1 0.01058959 0 0 0 7.816629e-39 5.775573e-53 1.057898e-24
#> 2 26.39601667 0 0 0 2.928569e-07 1.817244e-08 4.719521e-06
#> 3 52.78144375 0 0 0 7.300131e-06 1.572749e-06 3.388456e-05
#> 4 79.16687083 0 0 0 3.288475e-05 1.242184e-05 8.705691e-05
#> 5 105.55229791 0 0 0 8.009549e-05 4.151875e-05 1.545154e-04
#> 6 131.93772499 0 0 0 1.438305e-04 9.049528e-05 2.285999e-04
The function is flexible because you may leverage
ggplot2
just by specifying gg = TRUE
, the plot
will return a ggplot
object:
gg_object <- plot(mod_cb,
hazard.params = list(xvar = "time",
by = "hormon",
alpha = 0.20, # 80% CI
ylab = "Hazard",
gg = TRUE))
attr(gg_object,"class")
#> [1] "gg" "ggplot"
Now we can use it downstream for any plot while leveraging the entire
ggplot2
ecosystem of packages and functions:
gg_object +
theme_minimal()+
theme(legend.position = "bottom") +
labs(title = "Casebase") +
scale_x_continuous(n.breaks = 10)
Next, we fit an interaction model with a time-varying covariate, i.e.Β to test the hypothesis that the effect of hormonal therapy on the hazard varies with time.
mod_cb_tvc <- fitSmoothHazard(cens ~ hormon * ns(log(time), df = 3),
data = brcancer,
time = "time")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(mod_cb_tvc)
#> Fitting smooth hazards with case-base sampling
#>
#> Sample size: 686
#> Number of events: 299
#> Number of base moments: 29900
#> ----
#>
#> Call:
#> fitSmoothHazard(formula = cens ~ hormon * ns(log(time), df = 3),
#> data = brcancer, time = "time")
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -99.01 27.22 -3.638 0.000275 ***
#> hormon -46.66 61.72 -0.756 0.449620
#> ns(log(time), df = 3)1 61.38 18.02 3.405 0.000660 ***
#> ns(log(time), df = 3)2 175.57 52.80 3.325 0.000884 ***
#> ns(log(time), df = 3)3 36.62 10.90 3.361 0.000777 ***
#> hormon:ns(log(time), df = 3)1 31.04 41.10 0.755 0.450174
#> hormon:ns(log(time), df = 3)2 89.10 119.26 0.747 0.454988
#> hormon:ns(log(time), df = 3)3 19.09 24.84 0.768 0.442301
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 3354.9 on 30198 degrees of freedom
#> Residual deviance: 3273.8 on 30191 degrees of freedom
#> AIC: 3289.8
#>
#> Number of Fisher Scoring iterations: 11
Now we can easily plot the hazard function over time for each
hormon
group:
Now we fit a model with an interaction between a continuous variable,
estrogen receptor (in fmol
), and time.
mod_cb_tvc <- fitSmoothHazard(cens ~ estrec * ns(log(time), df = 3),
data = brcancer,
time = "time")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(mod_cb_tvc)
#> Fitting smooth hazards with case-base sampling
#>
#> Sample size: 686
#> Number of events: 299
#> Number of base moments: 29900
#> ----
#>
#> Call:
#> fitSmoothHazard(formula = cens ~ estrec * ns(log(time), df = 3),
#> data = brcancer, time = "time")
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -79.3451 21.4077 -3.706 0.000210 ***
#> estrec -0.7286 0.3863 -1.886 0.059326 .
#> ns(log(time), df = 3)1 46.9547 14.0338 3.346 0.000820 ***
#> ns(log(time), df = 3)2 139.3924 41.6332 3.348 0.000814 ***
#> ns(log(time), df = 3)3 28.0963 8.6126 3.262 0.001105 **
#> estrec:ns(log(time), df = 3)1 0.4957 0.2584 1.918 0.055102 .
#> estrec:ns(log(time), df = 3)2 1.3792 0.7410 1.861 0.062695 .
#> estrec:ns(log(time), df = 3)3 0.3099 0.1598 1.939 0.052474 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 3354.9 on 30198 degrees of freedom
#> Residual deviance: 3262.7 on 30191 degrees of freedom
#> AIC: 3278.7
#>
#> Number of Fisher Scoring iterations: 13
There are now many ways to plot the time-varying effect of estrogen
receptor on the hazard function. The default is to plot the 10th, 50th
and 90th quantiles of the by
variable:
# computed at the 10th, 50th and 90th quantiles of estrec
plot(mod_cb_tvc,
hazard.params = list(xvar = "time",
by = "estrec",
alpha = 1,
ylab = "Hazard"))
We can also show the quartiles of estrec
by specifying
the breaks
argument. If breaks
is a single
number, that will be the used as the number of breaks:
# computed at quartiles of estrec
plot(mod_cb_tvc,
hazard.params = list(xvar = c("time"),
by = "estrec",
alpha = 1,
breaks = 4,
ylab = "Hazard"))
Alternatively, if breaks
is a vector, it will be used as
the actual values to be used:
# computed where I want
plot(mod_cb_tvc,
hazard.params = list(xvar = c("time"),
by = "estrec",
alpha = 1,
breaks = c(3,2200),
ylab = "Hazard"))
visreg2d(mod_cb_tvc,
xvar = "time",
yvar = "estrec",
trans = exp,
print.cond = TRUE,
zlab = "Hazard",
plot.type = "image")
visreg2d(mod_cb_tvc,
xvar = "time",
yvar = "estrec",
trans = exp,
print.cond = TRUE,
zlab = "Hazard",
plot.type = "persp")
# this can also work if 'rgl' is installed
# visreg2d(mod_cb_tvc,
# xvar = "time",
# yvar = "estrec",
# trans = exp,
# print.cond = TRUE,
# zlab = "Hazard",
# plot.type = "rgl")
All the examples so far have only included two predictors in the regression equation. In this example, we fit a smooth hazard model with several predictors:
mod_cb_tvc <- fitSmoothHazard(cens ~ estrec * ns(log(time), df = 3) +
horTh +
age +
menostat +
tsize +
tgrade +
pnodes +
progrec,
data = brcancer,
time = "time")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(mod_cb_tvc)
#> Fitting smooth hazards with case-base sampling
#>
#> Sample size: 686
#> Number of events: 299
#> Number of base moments: 29900
#> ----
#>
#> Call:
#> fitSmoothHazard(formula = cens ~ estrec * ns(log(time), df = 3) +
#> horTh + age + menostat + tsize + tgrade + pnodes + progrec,
#> data = brcancer, time = "time")
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -6.735e+01 1.816e+01 -3.708 0.000209 ***
#> estrec -4.939e-01 3.146e-01 -1.570 0.116397
#> ns(log(time), df = 3)1 3.925e+01 1.189e+01 3.301 0.000963 ***
#> ns(log(time), df = 3)2 1.155e+02 3.539e+01 3.265 0.001095 **
#> ns(log(time), df = 3)3 2.377e+01 7.280e+00 3.265 0.001093 **
#> horThyes -3.381e-01 1.301e-01 -2.598 0.009384 **
#> age -8.103e-03 9.268e-03 -0.874 0.381983
#> menostatPost 2.448e-01 1.832e-01 1.336 0.181454
#> tsize 7.875e-03 4.015e-03 1.962 0.049820 *
#> tgrade.L 5.660e-01 1.908e-01 2.967 0.003008 **
#> tgrade.Q -2.118e-01 1.226e-01 -1.728 0.083951 .
#> pnodes 5.250e-02 7.857e-03 6.682 2.36e-11 ***
#> progrec -2.263e-03 5.812e-04 -3.893 9.91e-05 ***
#> estrec:ns(log(time), df = 3)1 3.387e-01 2.105e-01 1.609 0.107595
#> estrec:ns(log(time), df = 3)2 9.343e-01 6.038e-01 1.547 0.121748
#> estrec:ns(log(time), df = 3)3 2.110e-01 1.297e-01 1.626 0.103860
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 3354.9 on 30198 degrees of freedom
#> Residual deviance: 3161.5 on 30183 degrees of freedom
#> AIC: 3193.5
#>
#> Number of Fisher Scoring iterations: 13
In the following plot, we show the time-varying effect of
estrec
while controlling for all other variables. By
default, the other terms in the model are set to their median if the
term is numeric or the most common category if the term is a factor. The
values of the other variables are shown in the output:
plot(mod_cb_tvc,
hazard.params = list(xvar = "time",
by = "estrec",
alpha = 1,
breaks = 2,
ylab = "Hazard"))
#> Conditions used in construction of plot
#> estrec: 8 / 174
#> horTh: no
#> age: 53
#> menostat: Post
#> tsize: 25
#> tgrade: II
#> pnodes: 3
#> progrec: 53
#> offset: 0
You can of course set the values of the other covariates as before,
i.e.Β by specifying the cond
argument as a named list to the
hazard.params
argument:
plot(mod_cb_tvc,
hazard.params = list(xvar = "time",
by = "estrec",
cond = list(tgrade = "III", age = 49),
alpha = 1,
breaks = 2,
ylab = "Hazard"))
#> Conditions used in construction of plot
#> estrec: 8 / 174
#> horTh: no
#> age: 49
#> menostat: Post
#> tsize: 25
#> tgrade: III
#> pnodes: 3
#> progrec: 53
#> offset: 0
In this section we illustrate how to plot hazard ratios using the
plot
method for objects of class singleEventCB
which is obtained from running the fitSmoothHazard
function. Note that these function have only been thoroughly tested with
family = "glm"
.
In what follows, the hazard ratio for a variable is defined as
where is the hazard rate as a function of the variable (which is usually time, but can be any other continuous variable), is the value of for the exposed group, is the value of for the unexposed group, are other covariates in the model which are equal to in the exposed and in the unexposed group, and are the estimated regression coefficients.
As indicated by the formula above, it is most instructive to plot the hazard ratio as a function of a variable only if there is an interaction between and . Otherwise, the resulting plot will simply be a horizontal line across time.
We use data from the Manson trial (NEJM 2003) which is included in
the casebase
package. This randomized clinical trial
investigated the effect of estrogen plus progestin (estPro
)
on coronary heart disease (CHD) risk in 16,608 postmenopausal women who
were 50 to 79 years of age at base line. Participants were randomly
assigned to receive estPro
or placebo
. The
primary efficacy outcome of the trial was CHD (nonfatal myocardial
infarction or death due to CHD).
We fit a model with the interaction between time and treatment arm. We are therefore interested in visualizing the hazard ratio of the treatment over time.
data("eprchd")
eprchd <- transform(eprchd,
treatment = factor(treatment, levels = c("placebo","estPro")))
str(eprchd)
#> 'data.frame': 16608 obs. of 3 variables:
#> $ time : num 0.0833 0.0833 0.0833 0.0833 0.0833 ...
#> $ status : num 0 0 0 0 0 0 0 0 0 0 ...
#> $ treatment: Factor w/ 2 levels "placebo","estPro": 1 1 1 1 1 1 1 1 1 1 ...
fit_mason <- fitSmoothHazard(status ~ treatment*time,
data = eprchd,
time = "time")
summary(fit_mason)
#> Fitting smooth hazards with case-base sampling
#>
#> Sample size: 16608
#> Number of events: 324
#> Number of base moments: 32400
#> ----
#>
#> Call:
#> fitSmoothHazard(formula = status ~ treatment * time, data = eprchd,
#> time = "time")
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -6.09104 0.17501 -34.804 < 2e-16 ***
#> treatmentestPro 0.58230 0.22440 2.595 0.00946 **
#> time 0.11404 0.04756 2.398 0.01650 *
#> treatmentestPro:time -0.12315 0.06349 -1.940 0.05243 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 3635.4 on 32723 degrees of freedom
#> Residual deviance: 3626.2 on 32720 degrees of freedom
#> AIC: 3634.2
#>
#> Number of Fisher Scoring iterations: 7
To plot the hazard ratio, we must specify the newdata
argument with a covariate pattern for the reference group. In this
example, we treat the placebo
as the reference group.
Because we have fit an interaction with time, we also provide a sequence
of times at which we would like to calculate the hazard ratio.
newtime <- quantile(fit_mason[["originalData"]][[fit_mason[["timeVar"]]]],
probs = seq(0.01, 0.99, 0.01))
# reference category
newdata <- data.frame(treatment = factor("placebo",
levels = c("placebo", "estPro")),
time = newtime)
str(newdata)
#> 'data.frame': 99 obs. of 2 variables:
#> $ treatment: Factor w/ 2 levels "placebo","estPro": 1 1 1 1 1 1 1 1 1 1 ...
#> $ time : num 0.917 1.75 2.5 3.167 3.417 ...
plot(fit_mason,
type = "hr",
newdata = newdata,
var = "treatment",
increment = 1,
xvar = "time",
ci = T,
rug = T)
In the call to plot
we specify the xvar
which is the variable plotted on the x-axis, the var
argument which specified the variable for which we want the hazard
ratio. The increment = 1
indicates that we want to
increment var
by 1 level, which in this case is
estPro
. Alternatively, we can specify the
exposed
argument which should be a function that takes
newdata
and returns the exposed dataset. The following call
is equivalent to the one above:
plot(fit_mason,
type = "hr",
newdata = newdata,
exposed = function(data) transform(data, treatment = "estPro"),
xvar = "time",
ci = T,
rug = T)
Alternatively, if we want the placebo
group to be the
exposed group, we can change the newdata
argument to the
following:
newdata <- data.frame(treatment = factor("estPro",
levels = c("placebo", "estPro")),
time = newtime)
str(newdata)
#> 'data.frame': 99 obs. of 2 variables:
#> $ treatment: Factor w/ 2 levels "placebo","estPro": 2 2 2 2 2 2 2 2 2 2 ...
#> $ time : num 0.917 1.75 2.5 3.167 3.417 ...
levels(newdata$treatment)
#> [1] "placebo" "estPro"
Note that the reference category in newdata
is still
placebo
. Therefore we must set increment = -1
in order to get the exposed
dataset:
plot(fit_mason,
type = "hr",
newdata = newdata,
var = "treatment",
increment = -1,
xvar = "time",
ci = TRUE,
rug = TRUE)
If the
variable has more than two levels, than, increment
works
the same way, e.g.Β increment = 2
will provide an
exposed
group two levels above the value in
newdata
.
In order to save the data used to make the plot, you simply have to
assign the call to plot
to a variable. This is particularly
useful if you want to really customize the plot aesthetics:
result <- plot(fit_mason,
type = "hr",
newdata = newdata,
var = "treatment",
increment = -1,
xvar = "time",
ci = TRUE,
rug = TRUE)
head(result)
#> treatment time log_hazard_ratio standarderror hazard_ratio lowerbound
#> 1% estPro 0.9166667 -0.46941047 0.1765961 0.6253708 0.4424037
#> 2% estPro 1.7500000 -0.36678563 0.1401346 0.6929582 0.5265314
#> 3% estPro 2.5000000 -0.27442328 0.1184545 0.7600103 0.6025468
#> 4% estPro 3.1666667 -0.19232341 0.1133830 0.8250400 0.6606376
#> 5% estPro 3.4166667 -0.16153596 0.1154931 0.8508359 0.6784813
#> 6% estPro 3.9166667 -0.09996106 0.1257633 0.9048727 0.7071923
#> upperbound
#> 1% 0.8840086
#> 2% 0.9119894
#> 3% 0.9586237
#> 4% 1.0303547
#> 5% 1.0669738
#> 6% 1.1578103