Overview

A basic component for our modelling of multivariate survival data is that many models are build around marginals that on Cox form. The marginal Cox model can be fitted efficiently in the mets package, in particular the handling of strata and robust standard errors is optimized.

The basic models assumes that each subject has a marginal on Cox-form λg(k,i)(t)exp(XkiTβ). \lambda_{g(k,i)}(t) \exp( X_{ki}^T \beta). where g(k,i)g(k,i) gives the strata for the subject.

We here discuss and show how to get robust standard errors of

  • the regression parameters

  • the baseline

and how to do goodness of fit test using

  • cumulative residuals score test

First we generate some data from the Clayton-Oakes model, with 55 members in each cluster and a variance parameter at 22

 library(mets)
 options(warn=-1)
 set.seed(1000) # to control output in simulatins for p-values below.
 n <- 1000
 k <- 5
 theta <- 2
 data <- simClaytonOakes(n,k,theta,0.3,3)

The data is on has one subject per row.

  • time : time of event
  • status : 1 for event and 0 for censoring
  • x : x is a binary covariate
  • cluster : cluster

Now we fit the model and produce robust standard errors for both regression parameters and baseline.

First, recall that the baseline for strata gg is asymptotically equivalent to Âg(t)Ag(t)=kg(i:kig0t1S0,gdMkigPg(t)βk)\begin{align} \hat A_g(t) - A_g(t) & = \sum_{k \in g} \left( \sum_{i: ki \in g} \int_0^t \frac{1}{S_{0,g}} dM_{ki}^g - P^g(t) \beta_k \right) \end{align} with Pg(t)=0tEg(s)dΛ̂g(s)P^g(t) = \int_0^t E_g(s) d \hat \Lambda_g(s) the derivative of 0t1/S0,g(s)dNg\int_0^t 1/S_{0,g}(s) dN_{\cdot g} wrt to β\beta, and β̂β=I(τ)1k(i0τ(ZkiEg)dMkig)=kβk\begin{align} \hat \beta - \beta & = I(\tau)^{-1} \sum_k ( \sum_i \int_0^\tau (Z_{ki} - E_{g}) dM_{ki}^g ) = \sum_k \beta_{k} \end{align} with Mkig(t)=Nki(t)0tYki(s)exp(Zkiβ)dΛg(k,i)(t),βk=I(τ)1i0τ(ZkiEg)dMkig\begin{align} M_{ki}^g(t) & = N_{ki}(t) - \int_0^t Y_{ki}(s) \exp( Z_{ki} \beta) d \Lambda_{g(k,i)}(t), \\ \beta_{k} & = I(\tau)^{-1} \sum_i \int_0^\tau (Z_{ki} - E_{g}) dM_{ki}^g \end{align} the basic 0-mean processes, that are martingales in the iid setting, and I(t)I(t) is the derivative of the total score, Û(t,β))\hat U(t,\beta)), with respect to β\beta evaluated at time tt.

The variance of the baseline of strata g is estimated by kg(i:kig0t1S0,g(k,i)dM̂kigPg(t)βk)2\begin{align} \sum_{k \in g} ( \sum_{i: ki \in g} \int_0^t \frac{1}{S_{0,g(k,i)}} d\hat M_{ki}^g - P^g(t) \beta_k )^2 \end{align} that can be computed using the particular structure of dM̂ikg(t)=dNik(t)1S0,g(i,k)exp(Zikβ)dNg.(t)\begin{align} d \hat M_{ik}^g(t) & = dN_{ik}(t) - \frac{1}{S_{0,g(i,k)}} \exp(Z_{ik} \beta) dN_{g.}(t) \end{align}

This robust variance of the baseline and the iid decomposition for β\beta is computed in mets as:

   out <- phreg(Surv(time,status)~x+cluster(cluster),data=data)
   summary(out)
#> 
#>     n events
#>  5000   4854
#> coeffients:
#>   Estimate     S.E.  dU^-1/2 P-value
#> x 0.287859 0.028177 0.028897       0
#> 
#> exp(coeffients):
#>   Estimate   2.5%  97.5%
#> x   1.3336 1.2619 1.4093
   # robust standard errors attached to output
   rob <- robust.phreg(out)

We can get the iid decomposition of the β̂β\hat \beta - \beta by

   # making iid decomposition of regression parameters
   betaiid <- IC(out)
   head(betaiid)
#>                x
#> [1,] -0.34616008
#> [2,] -1.44918926
#> [3,] -0.03898156
#> [4,]  0.42156050
#> [5,]  0.34253904
#> [6,] -0.07706668
   # robust standard errors
   crossprod(betaiid/NROW(betaiid))^.5
#>            x
#> x 0.02817714
   # same as 

We now look at the plot with robust standard errors

  bplot(rob,se=TRUE,robust=TRUE,col=3)

We can also make survival prediction with robust standard errors using the phreg.

  pp <-  predict(out,data[1:20,],se=TRUE,robust=TRUE)
  plot(pp,se=TRUE,whichx=1:10)

Finally, just to check that we can recover the model we also estimate the dependence parameter

tt <- twostageMLE(out,data=data)
summary(tt)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects
#> $estimates
#>                 Coef.         SE        z P-val Kendall tau        SE
#> dependence1 0.5316753 0.03497789 15.20032     0   0.2100093 0.0109146
#> 
#> $type
#> NULL
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"

Goodness of fit

The observed score process is given by U(t,β̂)=ki0t(ZkiÊg)dM̂kig\begin{align} U(t,\hat \beta) & = \sum_k \sum_i \int_0^t (Z_{ki} - \hat E_g ) d \hat M_{ki}^g \end{align} where gg is strata g(k,i)g(k,i). The observed score has the iid decomposition Û(t)=ki0t(ZkiEg)dMkigI(t)kβk\begin{align} \hat U(t) = \sum_k \sum_i \int_0^t (Z_{ki} - E_g) dM_{ki}^g - I(t) \sum_k \beta_k \end{align} where βk\beta_k is the iid decomposition of the score process for the true β\betaβk=I(τ)1i0τ(ZkiEg)dMkig\begin{align} \beta_k & = I(\tau)^{-1} \sum_i \int_0^\tau (Z_{ki} - E_g ) d M_{ki}^g \end{align} and I(t)I(t) is the derivative of the total score, Û(t,β))\hat U(t,\beta)), with respect to β\beta evaluated at time tt.

This observed score can be resampled given it is on iid form in terms of clusters.

Now using the cumulative score process for checking proportional hazards

gout <- gof(out)
gout
#> Cumulative score process test for Proportionality:
#>   Sup|U(t)|  pval
#> x  30.24353 0.401

The p-value reflects wheter the observed score process is consistent with the model.

  plot(gout)

Computational aspects

The score processes can be resampled as in Lin, Wei, Ying (1993) using the martingale structure, such that the observed score process is resampled by ki0tgki(ZkiEg)dNkiI(t)I1(τ)gki0τ(ZkiEg)dNki.\begin{align} \sum_k \sum_i \int_0^t g_{ki} (Z_{ki} - E_g) dN_{ki} - I(t) I^{-1}(\tau) g_{ki} \int_0^{\tau} (Z_{ki} - E_g) dN_{ki} . \end{align} where gkig_{ki} are i.i.d. standard normals.

Based on the zero mean processes we more generally with clusters can resample the score process. For resampling of score process we need U(t,β)=kigk0t(ZkiEg)dMkig\begin{align} U(t,\beta) & = \sum_k \sum_i g_k \int_0^t (Z_{ki} - E_g ) dM_{ki}^g \end{align} where gg is strata. We write gkg_k as gkig_{ki} and thus repeating gkg_k within each cluster.

Computations are done using that 0t(ZkiEg)dMkig=0t(ZkiEg)dNkig0t(ZkiEg)Yki(u)dΛg(u)\begin{align*} \int_0^t (Z_{ki} - E_{g}) dM_{ki}^g & = \int_0^t (Z_{ki} - E_{g}) dN_{ki}^g - \int_0^{t} (Z_{ki} - E_{g}) Y_{ki}(u) d\Lambda^g(u) \end{align*} therefore and summing the compensator part with the gkig_{ki} multipliers then gives for each strata gg0tS1gw(u)S0g(u)dNg.(v)0tEg(u)S0gw(u)S0g(u)dNg.(v)\begin{align*} & \int_0^t \frac{S_{1g}^w(u)}{S_{0g}(u)} dN_{g.}(v) - \int_0^t E_{g}(u) \frac{S_{0g}^w(u)}{S_{0g}(u)} dN_{g.}(v) \end{align*} with
Sjgw(t)=kigexp(Zkiβ)ZkijYki(t)gkiSjg(t)=kigexp(Zkiβ)ZkijYki(t).\begin{align*} S_{jg}^w(t) & = \sum_{ki \in g} \exp(Z_{ki} \beta) Z_{ki}^j Y_{ki}(t) g_{ki} \\ S_{jg}(t) & = \sum_{ki \in g} \exp(Z_{ki} \beta) Z_{ki}^j Y_{ki}(t). \end{align*}

Cluster stratified Cox models

For clustered data it is possible to estimate the regression coefficient within clusters by using Cox’s partial likelihood stratified on clusters.

Note, here that the data is generated with a different subject specific structure, so we will not recover the β\beta at 0.3 and the model will not be a proportional Cox model, we we would also expect to reject “proportionality” with the gof-test.

The model can be thought of as λg(k,i)(t)exp(XkiTβ) \lambda_{g(k,i)} (t) \exp( X_{ki}^T \beta) where λg(t)\lambda_g(t) is some cluster specific baseline.

The regression coefficient β\beta can be estimated by using the partial likelihood for clusters.

 out <- phreg(Surv(time,status)~x+strata(cluster),data=data)
 summary(out)
#> 
#>     n events
#>  5000   4854
#> coeffients:
#>   Estimate     S.E.  dU^-1/2 P-value
#> x 0.406307 0.032925 0.039226       0
#> 
#> exp(coeffients):
#>   Estimate   2.5%  97.5%
#> x   1.5013 1.4074 1.6013

SessionInfo

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] mets_1.3.5     timereg_2.0.6  survival_3.7-0
#> 
#> loaded via a namespace (and not attached):
#>  [1] cli_3.6.3           knitr_1.49          rlang_1.1.4        
#>  [4] xfun_0.50           textshaping_0.4.1   jsonlite_1.8.9     
#>  [7] listenv_0.9.1       future.apply_1.11.3 lava_1.8.0         
#> [10] htmltools_0.5.8.1   ragg_1.3.3          sass_0.4.9         
#> [13] rmarkdown_2.29      grid_4.4.2          evaluate_1.0.3     
#> [16] jquerylib_0.1.4     fastmap_1.2.0       mvtnorm_1.3-3      
#> [19] numDeriv_2016.8-1.1 yaml_2.3.10         lifecycle_1.0.4    
#> [22] compiler_4.4.2      codetools_0.2-20    fs_1.6.5           
#> [25] Rcpp_1.0.13-1       future_1.34.0       systemfonts_1.1.0  
#> [28] lattice_0.22-6      digest_0.6.37       R6_2.5.1           
#> [31] parallelly_1.41.0   parallel_4.4.2      splines_4.4.2      
#> [34] bslib_0.8.0         Matrix_1.7-1        tools_4.4.2        
#> [37] globals_0.16.3      pkgdown_2.1.1       cachem_1.1.0       
#> [40] desc_1.4.3