Overview

Simulation of survival data is important for both theoretical and practical work. In a practical setting we might wish to validate that standard errors are valid even in a rather small sample, or validate that a more complicated procedure is doing as intended. Therefore it is useful to have simple tools for generating survival data that looks as much as possible like particular data. In a theoretical setting we often are interested in evaluating the finite sample properties of a new procedure in different settings that often are motivated by a specific practical problem. The aim is provide such tools.

Bender et al. in a nice recent paper also discussed how to generate survival data based on the Cox model, and restricted attention to some of the many useful parametric survival models (weibull, exponential).

Different survival models can be cooked, and we here give recipes for hazard and cumulative incidence based simulations. More recipes are given in vignette about recurrent events.

  • hazard based.
  • cumulative incidence.
  • recurrent events (see recurrent events vignette).
 library(mets)
#> Loading required package: timereg
#> Loading required package: survival
#> 
#> Attaching package: 'mets'
#> The following objects are masked from 'package:timereg':
#> 
#>     Event, event.split, kmplot, plotConfregion
 options(warn=-1)
 set.seed(10) # to control output in simulations

Hazard based, Cox models

Given a survival time TT with cumulative hazard Λ(t)=0tλ(s)ds\Lambda(t)=\int_0^t \lambda(s) ds, it follows that with EExp(1)E \sim Exp(1) (exponential with rate 1), that Λ1(E)\Lambda^{-1}(E) will have the same distribution as TT.

This provides the basis for simulations of survival times with a given hazard and is a consequence of this simple calculation P(Λ1(E)>t)=P(E>Λ(t))=exp(Λ(t))=P(T>t). P(\Lambda^{-1}(E) > t) = P(E > \Lambda(t)) = \exp( - \Lambda(t)) = P(T > t).

Similarly if TT given XX have hazard on Cox form λ0(t)exp(XTβ) \lambda_0(t) \exp( X^T \beta) where β\beta is a pp-dimensional regression coefficient and λ0(t)\lambda_0(t) a baseline hazard funcion, then it is useful to observe also that Λ1(E/HR)\Lambda^{-1}(E/HR) with HR=exp(XTβ)HR=\exp(X^T \beta) has the same distribution as TT given XX.

Therefore if the inverse of the cumulative hazard can be computed we can generate survival with a specified hazard function. One useful observation is note that for a piecewise linear continuous cumulative hazard on an interval [0,τ][0,\tau]Λl(t)\Lambda_l(t) it is easy to compute the inverse.

Further, we can approximate any cumulative hazard with a piecewise linear continous cumulative hazard and then simulate data according to this approximation. Recall that fitting the Cox model to data will give a piecewise constant cumulative hazard and the regression coefficients so with these at hand we can first approximate the piecewise constant “Breslow”-estimator with a linear upper (or lower bound) by simply connecting the values by straight lines.

Delayed entry

If TT given XX have hazard on Cox form λ0(t)exp(XTβ) \lambda_0(t) \exp( X^T \beta) and we wish to generate data according to this hazard for those that are alive at time ss, that is draw from the distribution of TT given T>sT>s (all given XX ), then we note that
Λ01(Λ0(s)+E/HR)) \Lambda_0^{-1}( \Lambda_0(s) + E/HR)) with HR=exp(XTβ))HR=\exp(X^T \beta)) and with EExp(1)E \sim Exp(1) has the distributiion we are after.

This is again a consequence of a simple calculation PX(Λ1(Λ(s)+E/HR)>t)=PX(E>HR(Λ(t)Λ(s)))=PX(T>t|T>s) P_X(\Lambda^{-1}(\Lambda(s)+ E/HR) > t) = P_X(E > HR( \Lambda(t) - \Lambda(s)) ) = P_X(T>t | T>s)

The engine is to simulate data with a given linear cumulative hazard.

 nsim <- 100
 chaz <-  c(0,1,1.5,2,2.1)
 breaks <- c(0,10,   20,  30,   40)
 cumhaz <- cbind(breaks,chaz)
 X <- rbinom(nsim,1,0.5)
 beta <- 0.2
 rrcox <- exp(X * beta)
 
 pctime <- rchaz(cumhaz,n=nsim)
 pctimecox <- rchaz(cumhaz,rrcox)

Now we generate data that resemble Cox models for the bmt data

 data(bmt); 
 cox1 <- phreg(Surv(time,cause==1)~tcell+platelet,data=bmt)
 cox2 <- phreg(Surv(time,cause==2)~tcell+platelet,data=bmt)

 X1 <- bmt[,c("tcell","platelet")]
 n <- nsim
 xid <- sample(1:nrow(X1),n,replace=TRUE)
 Z1 <- X1[xid,]
 Z2 <- X1[xid,]
 rr1 <- exp(as.matrix(Z1) %*% cox1$coef)
 rr2 <- exp(as.matrix(Z2) %*% cox2$coef)

 d <-  rcrisk(cox1$cum,cox2$cum,rr1,rr2)
 dd <- cbind(d,Z1)

 scox1 <- phreg(Surv(time,status==1)~tcell+platelet,data=dd)
 scox2 <- phreg(Surv(time,status==2)~tcell+platelet,data=dd)
 par(mfrow=c(1,2))
 plot(cox1); plot(scox1,add=TRUE,col=2)
 plot(cox2); plot(scox2,add=TRUE,col=2)

 cbind(cox1$coef,scox1$coef,cox2$coef,scox2$coef)
#>                [,1]       [,2]       [,3]       [,4]
#> tcell    -0.4232606 -0.1004993  0.3991068  0.7390719
#> platelet -0.5654438 -0.4042542 -0.2461474 -0.2459575

Now model with no covariates and specific call of sim.base function

 data(sTRACE)
 dtable(sTRACE,~chf+diabetes)
#> 
#>     diabetes   0   1
#> chf                 
#> 0            223  16
#> 1            230  31
 coxs <-   phreg(Surv(time,status==9)~strata(diabetes,chf),data=sTRACE)
 strata <- sample(0:3,nsim,replace=TRUE)
 simb <- sim.base(coxs$cumhaz,nsim,stratajump=coxs$strata.jumps,strata=strata)
 cc <-   phreg(Surv(time,status)~strata(strata),data=simb)
 plot(coxs,col=1); plot(cc,add=TRUE,col=2)

More Cox games

 cox <-  coxph(Surv(time,status==9)~vf+chf+wmi,data=sTRACE)
 sim1 <- sim.cox(cox,nsim,data=sTRACE)
 cc <- coxph(Surv(time,status)~vf+chf+wmi,data=sim1)
 cbind(cox$coef,cc$coef)
#>           [,1]       [,2]
#> vf   0.2970218 -0.2279416
#> chf  0.8018334  0.6783073
#> wmi -0.8920005 -1.4844731
 cor(sim1[,c("vf","chf","wmi")])
#>             vf        chf        wmi
#> vf   1.0000000  0.2067080 -0.0977774
#> chf  0.2067080  1.0000000 -0.5084292
#> wmi -0.0977774 -0.5084292  1.0000000
 cor(sTRACE[,c("vf","chf","wmi")])
#>              vf        chf         wmi
#> vf   1.00000000  0.1346711 -0.08966805
#> chf  0.13467109  1.0000000 -0.37464791
#> wmi -0.08966805 -0.3746479  1.00000000
 
 cox <-  phreg(Surv(time, status==9)~vf+chf+wmi,data=sTRACE)
 sim3 <- sim.cox(cox,nsim,data=sTRACE)
 cc <-  phreg(Surv(time, status)~vf+chf+wmi,data=sim3)
 cbind(cox$coef,cc$coef)
#>           [,1]        [,2]
#> vf   0.2970218 -10.5088431
#> chf  0.8018334   0.9662815
#> wmi -0.8920005  -0.8485974
 plot(cox,se=TRUE); plot(cc,add=TRUE,col=2)

 
 coxs <-  phreg(Surv(time,status==9)~strata(chf,vf)+wmi,data=sTRACE)
 sim3 <- sim.phreg(coxs,nsim,data=sTRACE)
 cc <-   phreg(Surv(time, status)~strata(chf,vf)+wmi,data=sim3)
 cbind(coxs$coef,cc$coef)
#>           [,1]       [,2]
#> wmi -0.8683355 -0.1482848
 plot(coxs,col=1); plot(cc,add=TRUE,col=2)

More Cox games with cause specific hazards

 data(bmt)
 # coxph          
 cox1 <- coxph(Surv(time,cause==1)~tcell+platelet,data=bmt)
 cox2 <- coxph(Surv(time,cause==2)~tcell+platelet,data=bmt)
 coxs <- list(cox1,cox2)
 dd <- sim.cause.cox(coxs,nsim,data=bmt)
 scox1 <- coxph(Surv(time,status==1)~tcell+platelet,data=dd)
 scox2 <- coxph(Surv(time,status==2)~tcell+platelet,data=dd)
 cbind(cox1$coef,scox1$coef)
#>                [,1]       [,2]
#> tcell    -0.4231551 -0.3833041
#> platelet -0.5646181 -0.5517000
 cbind(cox2$coef,scox2$coef)
#>                [,1]       [,2]
#> tcell     0.3991911 -0.3693576
#> platelet -0.2456203  0.4904355

Stratified Cox models using phreg

 ## stratified with phreg 
 cox0 <- phreg(Surv(time,cause==0)~tcell+platelet,data=bmt)
 cox1 <- phreg(Surv(time,cause==1)~tcell+platelet,data=bmt)
 cox2 <- phreg(Surv(time,cause==2)~strata(tcell)+platelet,data=bmt)
 coxs <- list(cox0,cox1,cox2)
 dd <- sim.cause.cox(coxs,nsim,data=bmt)
 scox0 <- phreg(Surv(time,status==1)~tcell+platelet,data=dd)
 scox1 <- phreg(Surv(time,status==2)~tcell+platelet,data=dd)
 scox2 <- phreg(Surv(time,status==3)~strata(tcell)+platelet,data=dd)
 cbind(cox0$coef,scox0$coef)
#>               [,1]       [,2]
#> tcell    0.1912407  0.1017686
#> platelet 0.1563789 -0.4795122
 cbind(cox1$coef,scox1$coef)
#>                [,1]       [,2]
#> tcell    -0.4232606 -0.8654778
#> platelet -0.5654438 -0.3978295
 cbind(cox2$coef,scox2$coef)
#>                [,1]         [,2]
#> platelet -0.2271912 -0.004601463
 par(mfrow=c(1,3))
 plot(cox0); plot(scox0,add=TRUE,col=2); 
 plot(cox1); plot(scox1,add=TRUE,col=2); 
 plot(cox2); plot(scox2,add=TRUE,col=2); 

 
 cox1 <- phreg(Surv(time,cause==1)~strata(tcell)+platelet,data=bmt)
 cox2 <- phreg(Surv(time,cause==2)~tcell+strata(platelet),data=bmt)
 coxs <- list(cox1,cox2)
 dd <- sim.cause.cox(coxs,nsim,data=bmt)
 scox1 <- phreg(Surv(time,status==1)~strata(tcell)+platelet,data=dd)
 scox2 <- phreg(Surv(time,status==2)~tcell+strata(platelet),data=dd)
 cbind(cox1$coef,scox1$coef)
#>                [,1]       [,2]
#> platelet -0.5658612 -0.5197637
 cbind(cox2$coef,scox2$coef)
#>            [,1]      [,2]
#> tcell 0.4153706 0.2740442
 par(mfrow=c(1,2))
 plot(cox1); plot(scox1,add=TRUE); 
 plot(cox2); plot(scox2,add=TRUE); 

  • sim.phreg only for phreg, but a bit more flexible, can deal with strata
  • sim.cox can only deal with covariates that can be identified from the names of its coefficients (so factors should be coded accordingly) or use sim.phreg
 library(mets)
 n <- 100
 data(bmt)
 bmt$bmi <- rnorm(408)
 dcut(bmt) <- gage~age
 data <- bmt
 cox1 <- phreg(Surv(time,cause==1)~strata(tcell)+platelet,data=bmt)
 cox2 <- phreg(Surv(time,cause==2)~strata(gage)+tcell+platelet,data=bmt)
 cox3 <- phreg(Surv(time,cause==0)~strata(platelet)+bmi,data=bmt)
 coxs <- list(cox1,cox2,cox3)

 dd <- sim.phregs(coxs,1000,data=bmt,extend=0.002)
 scox1 <- phreg(Surv(time,status==1)~strata(tcell)+platelet,data=dd)
 scox2 <- phreg(Surv(time,status==2)~strata(gage)+tcell+platelet,data=dd)
 scox3 <- phreg(Surv(time,status==3)~strata(platelet)+bmi,data=dd)
 cbind(coef(cox1),coef(scox1), coef(cox2),coef(scox2), coef(cox3),coef(scox3))
#>                [,1]       [,2]       [,3]       [,4]       [,5]        [,6]
#> tcell    -0.5658612 -0.4683225  0.3034864  0.4545260 -0.0837082 -0.08223882
#> platelet -0.5658612 -0.4683225 -0.2159670 -0.4755456 -0.0837082 -0.08223882
 par(mfrow=c(1,3))
 plot(scox1,col=2); plot(cox1,add=TRUE,col=1)
 plot(scox2,col=2); plot(cox2,add=TRUE,col=1)
 plot(scox3,col=2); plot(cox3,add=TRUE,col=1)

Multistate models: The Illness Death model

Using a hazard based simulation with delayed entry we can then simulate data from for example the general illness-death model. Here the cumulative hazards need to be specified.

First we set up some cumulative hazards, then we simulate some data and re-estimate the cumulative baselines

 data(base1cumhaz)
 data(base4cumhaz)
 data(drcumhaz)
 dr <- drcumhaz
 dr2 <- drcumhaz
 dr2[,2] <- 1.5*drcumhaz[,2]
 base1 <- base1cumhaz
 base4 <- base4cumhaz
 cens <- rbind(c(0,0),c(2000,0.5),c(5110,3))

 iddata <- simMultistate(nsim,base1,base1,dr,dr2,cens=cens)
 dlist(iddata,.~id|id<3,n=0)
#> id: 1
#>       time status entry death from to start     stop
#> 1 123.8971      3     0     1    1  3     0 123.8971
#> ------------------------------------------------------------ 
#> id: 2
#>            time status       entry death from to       start        stop
#> 2      6.802368      2    0.000000     0    1  2    0.000000    6.802368
#> 101  408.278262      1    6.802368     0    2  1    6.802368  408.278262
#> 154 1354.051677      2  408.278262     0    1  2  408.278262 1354.051677
#> 195 1455.213519      1 1354.051677     0    2  1 1354.051677 1455.213519
#> 231 1820.282337      2 1455.213519     0    1  2 1455.213519 1820.282337
#> 259 2129.840288      1 1820.282337     0    2  1 1820.282337 2129.840288
#> 278 2396.584549      2 2129.840288     0    1  2 2129.840288 2396.584549
#> 292 3388.354776      1 2396.584549     0    2  1 2396.584549 3388.354776
#> 305 3990.632013      2 3388.354776     0    1  2 3388.354776 3990.632013
#> 315 3991.456202      1 3990.632013     0    2  1 3990.632013 3991.456202
#> 324 5110.000000      0 3991.456202     0    1  0 3991.456202 5110.000000
  
 ### estimating rates from simulated data  
 c0 <- phreg(Surv(start,stop,status==0)~+1,iddata)
 c3 <- phreg(Surv(start,stop,status==3)~+strata(from),iddata)
 c1 <- phreg(Surv(start,stop,status==1)~+1,subset(iddata,from==2))
 c2 <- phreg(Surv(start,stop,status==2)~+1,subset(iddata,from==1))
 ###
 par(mfrow=c(2,2))
 bplot(c0)
 lines(cens,col=2) 
 bplot(c3,main="rates 1-> 3 , 2->3")
 lines(dr,col=1,lwd=2)
 lines(dr2,col=2,lwd=2)
 ###
 bplot(c1,main="rate 1->2")
 lines(base1,lwd=2)
 ###
 bplot(c2,main="rate 2->1")
 lines(base1,lwd=2)

Cumulative incidence

In this section we discuss how to simulate competing risks data that have a specfied cumulative incidence function. We consider for simplicity a competing risks model with two causes and denote the cumulative incidence curves as F1(t)=P(T<t,ϵ=1)F_1(t) = P(T < t, \epsilon=1) and F2(t)=P(T<t,ϵ=2)F_2(t) = P(T < t, \epsilon=2).

To generate data with the required cumulative incidence functions a simple approach is to first figure out if the subject dies and then from what cause, then finally draw the survival time according to the conditional distribution.

For simplicity we consider survival times in a fixed interval [0,τ][0,\tau], and first flip a coin with and probabilities 1F1(τ)F2(τ)1-F_1(\tau)-F_2(\tau) to decide if the subject is a survivor or dies. If the subject dies we then flip a coin with probabilities F1(τ)/(F1(τ)+F2(τ))F_1(\tau)/(F_1(\tau)+F_2(\tau)) and F2(τ)/(F1(τ)+F2(τ))F_2(\tau)/(F_1(\tau)+F_2(\tau)) to decide if ϵ=1\epsilon=1 or ϵ=2\epsilon=2, and finally draw a T=(F̃11(U)T = (\tilde F_1^{-1}(U) with F̃1(s)=F1(s)/F1(τ)\tilde F_1(s) = F_1(s)/F_1(\tau) and UU is a uniform.

We again note that if F̃1(s)\tilde F_1(s) and F1(s)F_1(s) are piecewise linear continuous functions then the inverses are easy to compute.

Cumulative incidence I

We here simulate two causes of death with two binary covariates

cif1 <- cbind(c(0,10,20,100),c(0,0.1,0.15,0.2))
cif2 <- cbind(c(0,10,20,100),c(0,0.4,0.45,0.5))

n <- 100; lrr1=c(0.2,0.1); lrr2=c(0.2,0.1); cens=NULL
### A binary, L binary
A <- rbinom(n,1,0.5)
L <- rbinom(n,1,0.5)
###
rr1 <- exp(cbind(A,L) %*% lrr1)
rr2 <- exp(cbind(A,L) %*% lrr2)
## model is fine
mmm<-max(rr1)*max(cif1[,2])+max(rr2)*max(cif2[,2])
mcif1 <- max(cif1[,2])
mcif2 <- max(cif2[,2])
if (mmm>1) warning(" models not satisfying sum <=1\n")
### here log-link model 
T1 <- simsubdist(cif1,rr1,type="cif")
T2 <- simsubdist(cif2,rr2,type="cif")
###
dies <- rbinom(n,1,rr1*mcif1+rr2*mcif2)
sel1 <- rbinom(n,1,mcif2/(mcif1+mcif2))+1
epsilon  <- dies*(sel1)
T1$epsilon <- epsilon
###
T1$A <- A; T1$L <- L
## times given 
T1$time <- T1$timecause
T1$time2 <- T2$timecause
T1$status <- epsilon
T1 <- dtransform(T1,time=100,epsilon==0)
T1 <- dtransform(T1,status=0,epsilon==0)
###
T1 <- dtransform(T1,time=time2,epsilon==2)
T1 <- dtransform(T1,status=2,epsilon==2)

dtable(T1,~status)
#> 
#> status
#>  0  1  2 
#> 25 21 54

par(mfrow=c(1,2))
lrr1=c(0.2,0.1);lrr2=c(0.2,0.1)
pcif1 <- cif(Event(time,status)~strata(A,L),T1,cause=1)
pcif2 <- cif(Event(time,status)~strata(A,L),T1,cause=2)
###
newd <- data.frame(expand.grid(A=0:1,L=0:1))
rr1 <- c(exp(as.matrix(newd) %*% lrr1))
rr2 <- c(exp(as.matrix(newd) %*% lrr2))
###
cifm1 <- cbind(cif1[,1],cif1[,2] %o% rr1)
cifm2 <- cbind(cif2[,1],cif2[,2] %o% rr2)
###
par(mfrow=c(1,2))
plot(pcif1,ylim=c(0,0.3)); 
matlines(cifm1[,1],cifm1[,-1],col=1,lwd=2)
###
plot(pcif2,ylim=c(0,0.7))
matlines(cifm2[,1],cifm2[,-1],col=1,lwd=2)

Cumulative incidence regression models

Now assume that given covariates F1(t;X)=P(T<t,ϵ=1|X)F_1(t;X) = P(T < t, \epsilon=1|X) and F2(t;X)=P(T<t,ϵ=2|X)F_2(t;X) = P(T < t, \epsilon=2|X) are two cumulative incidence functions that satistifes the needed constraints.

Possibly F1(t;X)=1exp(Λ1(t)exp(XTβ1)F_1(t;X) = 1 - \exp( \Lambda_1(t) \exp( X^T \beta_1)F2(t;X)=1exp(Λ2(t)exp(XTβ2)F_2(t;X) = 1 - \exp( \Lambda_2(t) \exp( X^T \beta_2) given estimators of Λ1\Lambda_1 and λ2\lambda_2 and β1\beta_1 and β2\beta_2. We can obtain a piecewise linear continuous approximation, F1L(t;X)F_1^L(t;X)
by linearly connecting estimates F̂1(tj;X)=1exp(Λ̂1(t)exp(XTβ̂1)\hat F_1(t_j;X) = 1 - \exp( \hat \Lambda_1(t) \exp( X^T \hat \beta_1). Now with these at hand
F1L(t;X)F_1^L(t;X) and F2L(t;X)F_2^L(t;X) we can generate data with these cumulative incidence functions.

Here both the cumulative incidence are on the specified form if the restriction is not important. Using sim.cifs but sim.cifs enforces the restriction. Here F1F_1 will be on the specified form, and F2F_2 not.

 data(bmt)
 ################################################################
 #  simulating several causes with specific cumulatives 
 ################################################################
 cif1 <-  cifreg(Event(time,cause)~tcell+age,data=bmt,cause=1)
 cif2 <-  cifreg(Event(time,cause)~tcell+age,data=bmt,cause=2)

 ## dd <- sim.cifs(list(cif1,cif2),nsim,data=bmt)
 dds <- sim.cifsRestrict(list(cif1,cif2),nsim,data=bmt)

 scif1 <-  cifreg(Event(time,cause)~tcell+age,data=dds,cause=1)
 scif2 <-  cifreg(Event(time,cause)~tcell+age,data=dds,cause=2)
    
 cbind(cif1$coef,scif1$coef)
#>             [,1]        [,2]
#> tcell -0.7959291 0.003453075
#> age    0.4160844 0.582797245
 cbind(cif2$coef,scif2$coef)
#>              [,1]       [,2]
#> tcell  0.66730051  1.7314537
#> age   -0.03250819 -0.5701337
 par(mfrow=c(1,2))   
 plot(cif1); plot(scif1,add=TRUE,col=2)
 plot(cif2); plot(scif2,add=TRUE,col=2)

  • Parametric form with baselines
    • Fine-Gray form
    • logistic link

We assumed that F1(t,X)=1exp(Λ1(t)exp(XTβ1))F_1(t,X) = 1-\exp( \Lambda_1(t) \exp( X^T \beta_1)) with Λ1(t)=ρ1(1exp(t))\Lambda_1(t) = \rho_1 \cdot (1-exp(-t)) and β1=(0,0.1)\beta_1 = (0,-0.1), and that the other cause was given by
F2(t,X)=1exp(Λ2(t)exp(XTβ2))(1F1(+,X))F_2(t,X) = 1-\exp( \Lambda_2(t) \exp( X^T \beta_2)) ( 1 - F_1(+\infty,X)) with Λ2(t)=ρ2(1exp(t))\Lambda_2(t) = \rho_2 \cdot (1-exp(-t)) and β2=(0.5,0.3)\beta_2 = (-0.5,0.3), a parametrization that satisfies the constraint F1+F21F_1+F_2 \leq 1.

 set.seed(100)
 rho1 <- 0.2; rho2 <- 10
 n <- nsim
 beta=c(0.0,-0.1,-0.5,0.3)
 dats <- simul.cifs(n,rho1,rho2,beta,rc=0.2)
 dtable(dats,~status)
#> 
#> status
#>  0  1  2 
#>  6 13 81
 dsort(dats) <- ~time
 fg <- cifreg(Event(time,status)~Z1+Z2,data=dats,cause=1,propodds=NULL)
 summary(fg)
#> 
#>    n events
#>  100     13
#> 
#>  100 clusters
#> coeffients:
#>    Estimate     S.E.  dU^-1/2 P-value
#> Z1 -0.25559  0.27563  0.28698  0.3538
#> Z2  0.43883  0.55113  0.57407  0.4259
#> 
#> exp(coeffients):
#>    Estimate    2.5%  97.5%
#> Z1  0.77446 0.45121 1.3293
#> Z2  1.55089 0.52658 4.5677

CIF Delayed entry

Now assume that given covariates F1(t;X)=P(T<t,ϵ=1|X)F_1(t;X) = P(T < t, \epsilon=1|X) and F2(t;X)=P(T<t,ϵ=2|X)F_2(t;X) = P(T < t, \epsilon=2|X) are two cumulative incidence functions that satistifes the needed constraints. We wish to generate data that follows these two piecewise linear cumulative indidence functions with delayed entry at time ss. We should thus generate data that follows the cumulative incidence functions F̃1(t,s;X)=F1(t;X)F1(s;;X)1F1(s;X)F2(s;X) \tilde F_1(t,s;X)= \frac{F_1(t;X) - F_1(s;;X)}{ 1 - F_1(s;X) - F_2(s;X)} and F̃2(t,s;X)=F2(t;X)F2(s;;X)1F1(s;X)F2(s;X) \tilde F_2(t,s;X)= \frac{F_2(t;X) - F_2(s;;X)}{ 1 - F_1(s;X) - F_2(s;X)} this can be done according to the recipe in the previous section.
To be specific (ignoring the XX in the formula) F11(F1(s)+U(1F1(s;X)F2(s;X))) F_1^{-1}( F_1(s) + U \cdot (1 - F_1(s;X) - F_2(s;X)) ) where UU is a uniform, will have distribution given by F̃1(t,s)\tilde F_1(t,s).

Recurrent events

See also recurrent events vignette

 data(base1cumhaz)
 data(base4cumhaz)
 data(drcumhaz)
 dr <- drcumhaz
 base1 <- base1cumhaz
 base4 <- base4cumhaz

 n <- 100
 rr <- simRecurrent(n,base1,death.cumhaz=dr)
 ###
 par(mfrow=c(1,3))
 showfitsim(causes=1,rr,dr,base1,base1,which=1:2)

 rr <- simRecurrentII(n,base1,base4,death.cumhaz=dr)
 dtable(rr,~death+status)
#> 
#>       status   0   1   2
#> death                   
#> 0             17 299  41
#> 1             83   0   0
 par(mfrow=c(2,2))

 showfitsim(causes=2,rr,dr,base1,base4,which=1:2)

 cumhaz <- list(base1,base1,base4)
 drl <- list(dr,base4)
 rr <- simRecurrentIII(n,cumhaz,death.cumhaz=drl)
 dtable(rr,~death+status)
#> 
#>       status   0   1   2   3
#> death                       
#> 0              6 191 183  17
#> 1             72   0   0   0
#> 2             22   0   0   0
 showfitsimIII(rr,cumhaz,drl) 

  • sim.recurrent can simulate based on cox hazard for events and death based on phreg
    • similar to sim.phreg

SessionInfo

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] mets_1.3.5     timereg_2.0.6  survival_3.7-0
#> 
#> loaded via a namespace (and not attached):
#>  [1] cli_3.6.3           knitr_1.49          rlang_1.1.4        
#>  [4] xfun_0.50           textshaping_0.4.1   jsonlite_1.8.9     
#>  [7] listenv_0.9.1       future.apply_1.11.3 lava_1.8.0         
#> [10] htmltools_0.5.8.1   ragg_1.3.3          sass_0.4.9         
#> [13] rmarkdown_2.29      grid_4.4.2          evaluate_1.0.3     
#> [16] jquerylib_0.1.4     fastmap_1.2.0       mvtnorm_1.3-3      
#> [19] numDeriv_2016.8-1.1 yaml_2.3.10         lifecycle_1.0.4    
#> [22] compiler_4.4.2      codetools_0.2-20    fs_1.6.5           
#> [25] Rcpp_1.0.13-1       future_1.34.0       systemfonts_1.1.0  
#> [28] lattice_0.22-6      digest_0.6.37       R6_2.5.1           
#> [31] parallelly_1.41.0   parallel_4.4.2      splines_4.4.2      
#> [34] bslib_0.8.0         Matrix_1.7-1        tools_4.4.2        
#> [37] globals_0.16.3      pkgdown_2.1.1       cachem_1.1.0       
#> [40] desc_1.4.3