Binomial Regression for Survival and Competing Risks Data

Binomial Regression for censored data

Details

The binreg function does direct binomial regression for one time-point, $t$ , fitting the model $\begin{align*} P(T \leq t, \epsilon=1 | X ) & = \mbox{expit}( X^T \beta) = F_1(t,X,\beta) \end{align*}$ the estimation is based on IPCW weighted EE with response $Y(t)=I(T \leq t, \epsilon=1 )$ $\begin{align*} U(\beta) = & X ( Y(t) \frac{ \Delta(t) }{G_c(T_i \wedge t)} - \mbox{expit}( X^T \beta)) = 0 \end{align*}$ for IPCW adjusted responses and with $\Delta$ indicator of death and $G_c$ censoring survival distribution. With $\Delta(t) = I( C_i > T_i \wedge t)$ . The default type=“II” is to augment with the censoring term, that is solve $\begin{align*} & U(\beta) + \int_0^t X \frac{E(Y(t) | T>u)}{G_c(u)} d\hat M_c(u) =0 \end{align*}$ where $M_c(u)$ is the censoring martingale, this typically improves the performance. This is equivlent to the pseudo-value approach (see Overgaard (2024)).

The influence function of type=“II” is $\begin{align*} U(\beta) - \int_0^t X \frac{E(Y| T>u)}{G_c(u)} d M_c(u) + \int_0^t \frac{E(X| T>u) E(Y| T>u)}{G_c(u)} d M_c(u) + \int_0^t \frac{E( X Y| T>u)}{G_c(u)} d M_c(u) \end{align*}$ and for type=“I” $\begin{align*} & U(\beta) + \int_0^t \frac{E( X Y| T>u)}{G_c(u)} d M_c(u). \end{align*}$ The means such as $E(X Y| T>u)$ are estimated by IPCW estimators among survivors.

The function logitIPCW instead considers $\begin{align*} U^{glm}(\beta) = & \frac{ \Delta(t) }{G_c(T_i \wedge t)} X ( Y(t) - \mbox{expit}( X^T \beta)) = 0. \end{align*}$ The two score equations are quite similar, and exactly the same when the censoring model is fully-nonparametric.

The logitIPCW has influence function $\begin{align*} & U^{glm}(\beta) + \int_0^t \frac{E( X ( Y - F_1(t,\beta)) | T>u)}{G_c(u)} d M_c(u) \end{align*}$

Which estimator performs the best depends on the censoring distribution and it seems that the binreg with type=“II” performs overall quite nicely (see Blanche et al (2023) and Overgaard (2024)). For the full estimated censoring model all estimators have the same influence function (see Blanche et al (2023)).

Additional functions logitATE, and binregATE computes the average treatment effect. We demonstrate this in another vignette.

The function logitATE can be used there is no censoring and we thus have simple binary outcome.

Variance is based on sandwich formula with IPCW adjustment (using the influence functions), and naive.var is variance under known censoring model. The influence functions are stored in the output.

Examples

 library(mets)
 options(warn=-1)
 set.seed(1000) # to control output in simulatins for p-values below.

 data(bmt)
 bmt$time <- bmt$time+runif(nrow(bmt))*0.01
 # logistic regresion with IPCW binomial regression 
 out <- binreg(Event(time,cause)~tcell+platelet,bmt,time=50)
 summary(out)
#> 
#>    n events
#>  408    160
#> 
#>  408 clusters
#> coeffients:
#>              Estimate   Std.Err      2.5%     97.5% P-value
#> (Intercept) -0.180338  0.126748 -0.428760  0.068084  0.1548
#> tcell       -0.418545  0.345480 -1.095675  0.258584  0.2257
#> platelet    -0.437644  0.240978 -0.909952  0.034665  0.0694
#> 
#> exp(coeffients):
#>             Estimate    2.5%  97.5%
#> (Intercept)  0.83499 0.65132 1.0705
#> tcell        0.65800 0.33431 1.2951
#> platelet     0.64556 0.40254 1.0353

We can also compute predictions using the estimates

 predict(out,data.frame(tcell=c(0,1),platelet=c(1,1)),se=TRUE)
#>        pred         se     lower     upper
#> 1 0.3502406 0.04847582 0.2552280 0.4452533
#> 2 0.2618207 0.06969334 0.1252217 0.3984196

Further the censoring model can depend on strata

 outs <- binreg(Event(time,cause)~tcell+platelet,bmt,time=50,cens.model=~strata(tcell,platelet))
 summary(outs)
#> 
#>    n events
#>  408    160
#> 
#>  408 clusters
#> coeffients:
#>              Estimate   Std.Err      2.5%     97.5% P-value
#> (Intercept) -0.180697  0.127414 -0.430424  0.069030  0.1561
#> tcell       -0.365928  0.350632 -1.053154  0.321299  0.2967
#> platelet    -0.433494  0.240270 -0.904415  0.037428  0.0712
#> 
#> exp(coeffients):
#>             Estimate    2.5%  97.5%
#> (Intercept)  0.83469 0.65023 1.0715
#> tcell        0.69355 0.34884 1.3789
#> platelet     0.64824 0.40478 1.0381

Absolute risk differences and ratio

Now for illustrations I wish to consider the absolute risk difference depending on tcell

 outs <- binreg(Event(time,cause)~tcell,bmt,time=50,cens.model=~strata(tcell))
 summary(outs)
#> 
#>    n events
#>  408    160
#> 
#>  408 clusters
#> coeffients:
#>             Estimate  Std.Err     2.5%    97.5% P-value
#> (Intercept) -0.30054  0.11153 -0.51914 -0.08194  0.0070
#> tcell       -0.51741  0.33981 -1.18342  0.14860  0.1278
#> 
#> exp(coeffients):
#>             Estimate    2.5%  97.5%
#> (Intercept)  0.74042 0.59503 0.9213
#> tcell        0.59606 0.30623 1.1602

the risk difference is

ps <-  predict(outs,data.frame(tcell=c(0,1)),se=TRUE)
ps
#>        pred         se     lower     upper
#> 1 0.4254253 0.02726306 0.3719897 0.4788609
#> 2 0.3061988 0.06819019 0.1725461 0.4398516
sum( c(1,-1) * ps[,1])
#> [1] 0.1192264

Getting the standard errors are easy enough since the two-groups are independent. In the case where we in addition had adjusted for other covariates, however, we would need the to apply the delta-theorem thus using the relevant covariances along the lines of

dd <- data.frame(tcell=c(0,1))
p <- predict(outs,dd)

riskdifratio <- function(p,contrast=c(1,-1)) {
   outs$coef <- p
   p <- predict(outs,dd)[,1]
   pd <- sum(contrast*p)
   r1 <- p[1]/p[2]
   r2 <- p[2]/p[1]
   return(c(pd,r1,r2))
}
     
estimate(outs,f=riskdifratio,dd,null=c(0,1,1))
#>      Estimate Std.Err     2.5%  97.5% P-value
#> [p1]   0.1192 0.07344 -0.02471 0.2632 0.10448
#> [p2]   1.3894 0.32197  0.75833 2.0204 0.22652
#> [p3]   0.7197 0.16679  0.39284 1.0467 0.09291
#> 
#>  Null Hypothesis: 
#>   [p1] = 0
#>   [p2] = 1
#>   [p3] = 1 
#>  
#> chisq = 12.0249, df = 3, p-value = 0.007298

same as

run <- 0
if (run==1) {
library(prodlim)
pl <- prodlim(Hist(time,cause)~tcell,bmt)
spl <- summary(pl,times=50,asMatrix=TRUE)
spl
}

Augmenting the Binomial Regression

Rather than using a larger censoring model we can also compute an augmentation term and then fit the binomial regression model based on this augmentation term. Here we compute the augmentation based on stratified non-parametric estimates of $F_1(t,S(X))$ , where $S(X)$ gives strata based on $X$ as a working model.

Computes the augmentation term for each individual as well as the sum $\begin{align*} A & = \int_0^t H(u,X) \frac{1}{S^*(u,s)} \frac{1}{G_c(u)} dM_c(u) \end{align*}$ with $\begin{align*} H(u,X) & = F_1^*(t,S(X)) - F_1^*(u,S(X)) \end{align*}$ using a KM for $G_c(t)$ and a working model for cumulative baseline related to $F_1^*(t,s)$ and $s$ is strata, $S^*(t,s) = 1 - F_1^*(t,s) - F_2^*(t,s)$ .

Standard errors computed under assumption of correct but estimated $G_c(s)$ model.

 data(bmt)
 dcut(bmt,breaks=2) <- ~age 
 out1<-BinAugmentCifstrata(Event(time,cause)~platelet+agecat.2+
              strata(platelet,agecat.2),data=bmt,cause=1,time=40)
 summary(out1)
#> 
#>    n events
#>  408    157
#> 
#>  408 clusters
#> coeffients:
#>                      Estimate  Std.Err     2.5%    97.5% P-value
#> (Intercept)          -0.51295  0.17090 -0.84791 -0.17799  0.0027
#> platelet             -0.63011  0.23585 -1.09237 -0.16785  0.0075
#> agecat.2(0.203,1.94]  0.55926  0.21211  0.14353  0.97500  0.0084
#> 
#> exp(coeffients):
#>                      Estimate    2.5%  97.5%
#> (Intercept)           0.59873 0.42831 0.8370
#> platelet              0.53253 0.33542 0.8455
#> agecat.2(0.203,1.94]  1.74938 1.15434 2.6512

 out2<-BinAugmentCifstrata(Event(time,cause)~platelet+agecat.2+
     strata(platelet,agecat.2)+strataC(platelet),data=bmt,cause=1,time=40)
 summary(out2)
#> 
#>    n events
#>  408    157
#> 
#>  408 clusters
#> coeffients:
#>                      Estimate  Std.Err     2.5%    97.5% P-value
#> (Intercept)          -0.51346  0.17109 -0.84879 -0.17814  0.0027
#> platelet             -0.63636  0.23653 -1.09996 -0.17276  0.0071
#> agecat.2(0.203,1.94]  0.56280  0.21229  0.14672  0.97889  0.0080
#> 
#> exp(coeffients):
#>                      Estimate    2.5%  97.5%
#> (Intercept)           0.59842 0.42793 0.8368
#> platelet              0.52922 0.33288 0.8413
#> agecat.2(0.203,1.94]  1.75559 1.15803 2.6615

SessionInfo

sessionInfo()
#> R version 4.5.0 (2025-04-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] mets_1.3.6
#> 
#> loaded via a namespace (and not attached):
#>  [1] cli_3.6.5           knitr_1.50          rlang_1.1.6        
#>  [4] xfun_0.52           textshaping_1.0.0   jsonlite_2.0.0     
#>  [7] listenv_0.9.1       future.apply_1.11.3 lava_1.8.1         
#> [10] htmltools_0.5.8.1   ragg_1.4.0          sass_0.4.10        
#> [13] rmarkdown_2.29      grid_4.5.0          evaluate_1.0.3     
#> [16] jquerylib_0.1.4     fastmap_1.2.0       numDeriv_2016.8-1.1
#> [19] yaml_2.3.10         mvtnorm_1.3-3       lifecycle_1.0.4    
#> [22] timereg_2.0.6       compiler_4.5.0      codetools_0.2-20   
#> [25] fs_1.6.6            Rcpp_1.0.14         future_1.40.0      
#> [28] lattice_0.22-6      systemfonts_1.2.2   digest_0.6.37      
#> [31] R6_2.6.1            parallelly_1.43.0   parallel_4.5.0     
#> [34] splines_4.5.0       Matrix_1.7-3        bslib_0.9.0        
#> [37] tools_4.5.0         globals_0.17.0      survival_3.8-3     
#> [40] pkgdown_2.1.1       cachem_1.1.0        desc_1.4.3