Overview

When looking at multivariate survival data with the aim of learning about the dependence that is present, possibly after correcting for some covariates different approaches are available in the mets package

  • Binary models and adjust for censoring with inverse probabilty of censoring weighting
    • biprobit model
  • Bivariate surival models of Clayton-Oakes type
    • With regression structure on dependence parameter
    • With additive gamma distributed random effects
    • Special functionality for polygenic random effects modelling such as ACE, ADE ,AE and so forth.
  • Plackett OR model model
    • With regression structure on OR dependence parameter
  • Cluster stratified Cox

Typically it can be hard or impossible to specify random effects models with special structure among the parameters of the random effects. This is possible for our specification of the random effects models.

To be concrete about the model structure assume that we have paired survival data \((T_1, \delta_1, T_2, \delta_2, X_1, X_2)\) where the censored survival responses are \((T_1, \delta_1, T_2, \delta_2)\) and the covariates are \((X_1, X_2)\).

The basic models assumes that each subject has a marginal on Cox-form \[ \lambda_{s(k,i)}(t) \exp( X_{ki}^T \beta) \] where \(s(k,i)\) is a strata variable.

Gamma distributed frailties

The focus of this vignette is describe how to work on bivariate survival data using the addtive gamma-random effects models. We present two different ways of specifying different dependence structures.

  • Univariate models with a single random effect for each cluster and with a regression design on the variance.

  • Multivariate models with multiple random effects for each cluster.

The univariate models are then given a given cluster random effects \(Z_k\) with parameter \(\theta\) the joint survival function is given by the Clayton copula and on the form \[ \psi(\theta, \psi^{-1}(\theta,S_1(t,X_{k1}) ) + \psi^{-1}(\theta, S_1(t,X_{k1}) ) \] where \(\psi\) is the Laplace transform of a gamma distributed random variable with mean 1 and variance \(\theta\).

We then model the variance within clusters by a cluster specific regression design such that \[ \theta = h(z_j^T \alpha) \] where \(z\) is the regression design (specified by theta.des in the software), and \(h\) is link function, that is either \(exp\) or the identity.

This model can be fitted using a pairwise likelihood or the pseudo-likelihood using either

  • twostage

  • twostageMLE

To make the twostage approach possible we need a model with specific structure for the marginals. Therefore given the random effect of the clusters the survival distributions within a cluster are independent and on the form \[ P(T_j > t| X_j,Z) = exp( -Z \cdot \Psi^{-1}(\nu^{-1},S(t|X_j)) ) \] with \(\Psi\) the laplace of the gamma distribution with mean 1 and variance \(1/\nu\).

Additive Gamma frailties

For the multivariate models we are given a multivarite random effect each cluster \(Z=(Z_1,...,Z_d)\) with d random effects. The total random effect for each subject \(j\) in a cluster is then specified using a regression design on these random effects, with a regression vector \(V_j\) such that the total random effect is \(V_j^T (Z_1,...,Z_d)\). The elements of \(V_J\) are 1/0. The random effects \((Z_1,...,Z_d)\) has associated parameters \((\lambda_1,...,\lambda_d)\) and \(Z_j\) is Gamma distributed with

  • mean \(\lambda_j/V_1^T \lambda\)

  • variance \(\lambda_j/(V_1^T \lambda)^2\)

The key assumption to make the two-stage fitting possible is that \[\begin{align*} \nu =V_j^T \lambda \end{align*}\] is constant within clusters. The consequence of this is that the total random effect for each subject within a cluster, \(V_j^T (Z_1,...,Z_d)\), is gamma distributed with variance \(1/\nu\).

The DEFAULT parametrization (var.par=1) uses the variances of the random effecs \[\begin{align*} \theta_j = \lambda_j/\nu^2 \end{align*}\] For alternative parametrizations one can specify that the parameters are \(\theta_j=\lambda_j\) with the argument var.par=0.

Finally the parameters \((\theta_1,...,\theta_d)\) are related to the parameters of the model by a regression construction \(M\) (d x k), that links the \(d\) \(\theta\) parameters with the \(k\) underlying \(\alpha\) parameters \[\begin{align*} \theta & = M \alpha. \end{align*}\] The default is a diagonal matrix for \(M\). This can be used to make structural assumptions about the variances of the random-effects as is needed for the ACE model for example. In the software $ M $ is called theta.des

Assume that the marginal survival distribution for subject \(i\) within cluster \(k\) is given by \(S_{X_{k,i}}(t)\) given covariates \(X_{k,i}\).

Now given the random effects of the cluster \(Z_k\) and the covariates\(X_{k,i}\) \(i=1,\dots,n_k\) we assume that subjects within the cluster are independent with survival distributions \[\begin{align*} \exp(- ( V_{k,i} Z_k) \Psi^{-1} (\nu,S_{X_{k,i}}(t)) ). \end{align*}\]

A consequence of this is that the hazards given the covariates \(X_{k,i}\) and the random effects \(Z_k\) are given by \[\begin{align} \lambda_{k,i}(t;X_{k,i},Z_{k,i}) = ( V_{k,i} V_k) D_3 \Psi^{-1} (\nu,S_{X_{k,i}}(t)) D_t S_{X_{k,i}}(t) \label{eq-cond-haz} \end{align}\] where \(D_t\) and \(D_3\) denotes the partial derivatives with respect to \(t\) and the third argument, respectively.

Further, we can express the multivariate survival distribution as \[\begin{align} S(t_1,\dots,t_m) & = \exp( -\sum_{i=1}^m (V_i Z) \Psi^{-1}(\eta_l,\nu_l,S_{X_{k,i}}(t_i)) ) \nonumber \\ & = \prod_{l=1}^p \Psi(\eta_l,\eta , \sum_{i=1}^m Q_{k,i} \Psi^{-1}(\eta,\eta,S_{X_{k,i}}(t_i))). \label{eq-multivariate-surv} \end{align}\] In the case of considering just pairs, we write this function as \(C(S_{k,i}(t),S_{k,j}(t))\).

In addition to survival times from this model, we assume that we independent right censoring present \(U_{k,i}\) such that the given \(V_k\) and the covariates\(X_{k,i}\) \(i=1,\dots,n_k\) \((U_{k,1},\dots,U_{k,n_k})\) of \((T_{k,1},\dots,T_{k,n_k})\), and the conditional censoring distribution do not depend on \(V_k\).

One consequence of the model strucure is that the Kendall’s can be computed for two-subjects \((i,j)\) across two clusters 1'' and2’’ as \[\begin{align} E( \frac{( V_{1i} Z_1- V_{1j}Z_2)( V_{2i}Z_1 - V_{2j}Z_2 )}{( V_{1i}Z_1 + V_{2i}Z_2 ) ( V_{1j}Z_1 + V_{2j}Z_2 )} ) \end{align}\] under the assumption that that we compare pairs with equivalent marginals, \(S_{X_{1,i}}(t)= S_{X_{2,i}}(t)\) and \(S_{X_{1,j}}(t)= S_{X_{2,j}}(t)\), and that \(S_{X_{1,i}}(\infty)= S_{X_{1,j}}(\infty)=0\). Here we also use that \(\eta\) is the same across clusters. The Kendall’s tau would be the same for due to the same additive structure for the frailty terms, and the random effects thus have the same interpretation in terms of Kendall’s tau.

Univariate gamma (clayton-oakes) model twostage models

We start by fitting simple Clayton-Oakes models for the data, that is with an overall random effect that is Gamma distrubuted with variance \(\theta\). We can fit the model by a pseudo-MLE (twostageMLE) and a pairwise composite likelihood approach (twostage).

The pseudo-liklihood and the composite pairwise likelhood should give the same for this model since we have paired data. In addition the log-parametrization is illustrated with the var.link=1 option. In addition it is specified that we want a “clayton.oakes” model. We note that the standard errors differs because the twostage does not include the variance due to the baseline parameters for this type of modelling, so here it is better to use the twostageMLE.

 library(mets)
 data(diabetes)
 
 # Marginal Cox model  with treat as covariate
 margph <- phreg(Surv(time,status)~treat+cluster(id),data=diabetes)
 # Clayton-Oakes, MLE 
 fitco1<-twostageMLE(margph,data=diabetes,theta=1.0)
 summary(fitco1)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects
#> $estimates
#>                 Coef.        SE       z       P-val Kendall tau         SE
#> dependence1 0.9526614 0.3543033 2.68883 0.007170289    0.322645 0.08127892
#> 
#> $type
#> NULL
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"
 
 # Clayton-Oakes
 fitco2 <- survival.twostage(margph,data=diabetes,theta=0.0,detail=0,
                  clusters=diabetes$id,var.link=1,model="clayton.oakes")
 summary(fitco2)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects 
#> With log-link
#> $estimates
#>               log-Coef.       SE          z     P-val Kendall tau         SE
#> dependence1 -0.04849523 0.328524 -0.1476155 0.8826462   0.3226451 0.07179736
#> 
#> $vargam
#>             Estimate Std.Err   2.5% 97.5%  P-value
#> dependence1   0.9527   0.313 0.3392 1.566 0.002335
#> 
#> $type
#> [1] "clayton.oakes"
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"
 fitco3 <- survival.twostage(margph,data=diabetes,theta=1.0,detail=0,
                  clusters=diabetes$id,var.link=0,model="clayton.oakes")
 summary(fitco3)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects
#> $estimates
#>                 Coef.        SE        z       P-val Kendall tau         SE
#> dependence1 0.9526619 0.3129723 3.043917 0.002335193   0.3226451 0.07179736
#> 
#> $type
#> [1] "clayton.oakes"
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"

The marginal models can be either structured Cox model or as here with a baseline for each strata. This gives quite similar results to those before.

  # without covariates but marginal model stratified 
  marg <- phreg(Surv(time,status)~+strata(treat)+cluster(id),data=diabetes)
  fitcoa <- survival.twostage(marg,data=diabetes,theta=1.0,clusters=diabetes$id,
           model="clayton.oakes")
  summary(fitcoa)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects 
#> With log-link
#> $estimates
#>               log-Coef.        SE          z     P-val Kendall tau         SE
#> dependence1 -0.05683996 0.3279956 -0.1732949 0.8624196   0.3208241 0.07146893
#> 
#> $vargam
#>             Estimate Std.Err   2.5% 97.5%  P-value
#> dependence1   0.9447  0.3099 0.3374 1.552 0.002297
#> 
#> $type
#> [1] "clayton.oakes"
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"

Piecewise constant Clayton-Oakes model

Let the cross-hazard ratio (CHR) be defined as \[\begin{align} \eta(t_1,t_2) = \frac{ \lambda_1(t_1| T_2=t_2)}{ \lambda_1(t_1| T_2 \ge t_2)} = \frac{ \lambda_2(t_2| T_1=t_1)}{ \lambda_2(t_2| T_1 \ge t_1)} \end{align}\] where \(\lambda_1\) and \(\lambda_2\) are the conditional hazard functions of \(T_1\) and \(T_2\) given covariates. For the Clayton-Oakes model this ratio is \(\eta(t_1,t_2) = 1+\theta\), and as a consequence we see that if the co-twin is dead at any time we would increase our risk assessment on the hazard scale with the constant \(\eta(t_1,t_2)\). The Clayton-Oakes model also has the nice property that Kendall’s tau is linked directly to the dependence parameter \(\theta\) and is \(1/(1+2/\theta)\).

A very useful extension of the model the constant cross-hazard ratio (CHR) model is the piecewise constant cross-hazard ratio (CHR) for bivariate survival data , and this model was extended to competing risks in .

In the survival setting we let the CHR \[\begin{align} \eta(t_1,t_2) & = \sum \eta_{i,j} I(t_1 \in I_i, t_2 \in I_j) \end{align}\]

The model lets the CHR by constant in different part of the plane. This can be thought of also as having a separate Clayton-Oakes model for each of the regions specified in the plane here by the cut-points \(c(0,0.5,2)\) thus defining 9 regions.

This provides a constructive goodness of fit test for the whether the Clayton-Oakes model is valid. Indeed if valid the parameter should be the same in all regions.

First we generate some data from the Clayton-Oakes model with variance \(0.5\) and 2000 pairs. And fit the related model.

 d <- simClaytonOakes(2000,2,0.5,0,3)
  margph <- phreg(Surv(time,status)~x+cluster(cluster),data=d)
 # Clayton-Oakes, MLE 
 fitco1<-twostageMLE(margph,data=d)
 summary(fitco1)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects
#> $estimates
#>                Coef.         SE        z P-val Kendall tau         SE
#> dependence1 2.104941 0.09329868 22.56132     0   0.5127823 0.01107367
#> 
#> $type
#> NULL
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"

Now we cut the region at the cut-points \(c(0,0.5,2)\) thus defining 9 regions and fit a separate model for each region.
We see that the parameter is indeed rather constant over the 9 regions. A formal test can be constructed.

 udp <- piecewise.twostage(c(0,0.5,2),data=d,score.method="optimize",
 id="cluster",timevar="time",status="status",model="clayton.oakes",silent=0)
#> Data-set  1 out of  4
#>   Number of joint events: 509 of  2000
#> Data-set  2 out of  4
#>   Number of joint events: 276 of  1230
#> Data-set  3 out of  4
#>   Number of joint events: 257 of  1212
#> Data-set  4 out of  4
#>   Number of joint events: 609 of  951
 summary(udp)
#> [1] 1
#> Dependence parameter for Clayton-Oakes model 
#> Score of log-likelihood for parameter estimates (too large?)
#>             0 - 0.5      0.5 - 2
#> 0 - 0.5 0.002657042 8.842547e-05
#> 0.5 - 2 0.001730392 1.953300e-04
#> 
#> 
#> log-coefficient for dependence parameter (SE) 
#>          0 - 0.5        0.5 - 2      
#> 0 - 0.5  0.647 (0.070)  0.915 (0.083)
#> 0.5 - 2  0.649 (0.097)  0.784 (0.058)
#> 
#> Kendall's tau (SE) 
#>          0 - 0.5        0.5 - 2      
#> 0 - 0.5  0.488 (0.018)  0.555 (0.020)
#> 0.5 - 2  0.489 (0.024)  0.523 (0.014)

Multivariate gamma twostage models

To illustrate how the multivariate models can be used, we first set up some twin data with ACE structure. That is two shared random effects, one being the genes \(\sigma_g^2\) and one the environmental effect \(\sigma_e^2\). Monozygotic twins share all genes whereas the dizygotic twins only share half the genes. This can be expressed via 5 random effect for each twin pair (for example). We start by setting this up.

The pardes matrix tells how the the parameters of the 5 random effects are related, and the matrix her first has one random effect with parameter \(\theta_1\) (here the \(\sigma_g^2\) ), then the next 3 random effects have parameters \(0.5 \theta_1\) (here \(0.5 \sigma_g^2\) ), and the last random effect that is given by its own parameter \(\theta_2\) (here \(\sigma_e^2\) ).

 data <- simClaytonOakes.twin.ace(2000,2,1,0,3)

 out <- twin.polygen.design(data,id="cluster")
 pardes <- out$pardes
 pardes 
#>      [,1] [,2]
#> [1,]  1.0    0
#> [2,]  0.5    0
#> [3,]  0.5    0
#> [4,]  0.5    0
#> [5,]  0.0    1

The last part of the model structure is to decide how the random effects are shared for the different pairs (MZ and DZ), this is specfied by the random effects design (\(V_1\) and \(V_2\)) for each pair. This is here specified by an overall designmatrix for each subject (since they enter all pairs with the same random effects design).

For an MZ pair the two share the full gene random effect and the full environmental random effect. In contrast the DZ pairs share the 2nd random effect with half the gene-variance and have both a non-shared gene-random effect with half the variance, and finally a fully shared environmental random effect.

 des.rv <- out$des.rv
 # MZ
 head(des.rv,2)
#>   MZ DZ DZns1 DZns2 env
#> 1  1  0     0     0   1
#> 2  1  0     0     0   1
 # DZ 
 tail(des.rv,2)
#>      MZ DZ DZns1 DZns2 env
#> 3999  0  1     1     0   1
#> 4000  0  1     0     1   1

Now we call the twostage function. We see that we essentially recover the true values, and note that the output also compares the sizes of the genetic and environmental random effect. This number is sometimes called the heritability. In addition the total variance for each subject is also computed and is here around \(3\), as we indeed constructed.

 aa <- phreg(Surv(time,status)~x+cluster(cluster),data=data)
 ts <- twostage(aa,data=data,clusters=data$cluster,detail=0,
      theta=c(2,1),var.link=0,step=0.5,random.design=des.rv,theta.des=pardes)
 summary(ts)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects
#> $estimates
#>                 Coef.        SE         z        P-val Kendall tau         SE
#> dependence1 2.1188583 0.2098684 10.096130 0.000000e+00   0.5144285 0.02474134
#> dependence2 0.7306435 0.1667240  4.382353 1.174044e-05   0.2675719 0.04471963
#> 
#> $type
#> [1] "clayton.oakes"
#> 
#> $h
#>             Estimate Std.Err   2.5%  97.5%   P-value
#> dependence1   0.7436  0.0591 0.6278 0.8594 2.673e-36
#> dependence2   0.2564  0.0591 0.1406 0.3722 1.435e-05
#> 
#> $vare
#> NULL
#> 
#> $vartot
#>    Estimate Std.Err 2.5% 97.5%   P-value
#> p1     2.85  0.1374 2.58 3.119 1.629e-95
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"

The estimates can be transformed into Kendall’s tau estimates for MZ and DZ twins. The Kendall’s tau in the above output reflects how a gamma distributed random effect in the normal Clayton-Oakes model is related to the Kendall’s tau. In this setting the Kendall’s of MZ and DZ, however, should reflect both random effects.

We do this based on simulations. The Kendall’s tau of the MZ is around 0.60, and for DZ around 0.33. Both are quite high and this is due to a large shared environmental effect and large genetic effect.

kendall.ClaytonOakes.twin.ace(ts$theta[1],ts$theta[2],K=10000) 
#> $mz.kendall
#> [1] 0.5961616
#> 
#> $dz.kendall
#> [1] 0.3170802

Family data

For family data, things are quite similar since we use only the pairwise structure. We show how the designs are specified.

First we simulate data from an ACE model. 2000 families with two-parents that share only the environment, and two-children that share genes with their parents.

library(mets)
set.seed(1000)
data <- simClaytonOakes.family.ace(1000,2,1,0,3)
head(data)
#>        time status x cluster   type   mintime lefttime truncated
#> 1 0.2474705      1 1       1 mother 0.2474705        0         0
#> 2 0.9687073      1 0       1 father 0.2474705        0         0
#> 3 1.2621224      1 0       1  child 0.2474705        0         0
#> 4 0.6995364      1 0       1  child 0.2474705        0         0
#> 5 2.4438829      1 0       2 mother 0.6631202        0         0
#> 6 3.0000000      0 0       2 father 0.6631202        0         0
data$number <- c(1,2,3,4)
data$child <- 1*(data$number==3)

To set up the random effects some functions can be used. We here set up the ACE model that has 9 random effects with one shared environmental effect (the last random effect) and 4 genetic random effects for each parent, with variance \(\sigma_g^2/4\).

The random effect is again set-up with an overall designmatrix because it is again the same for each subject for all comparisons across family members. We below demonstrate how the model can be specified in various other ways.

Each child share 2 genetic random effects with each parent, and also share 2 genetic random effects with his/her sibling.

out <- ace.family.design(data,member="type",id="cluster")
out$pardes
#>       [,1] [,2]
#>  [1,] 0.25    0
#>  [2,] 0.25    0
#>  [3,] 0.25    0
#>  [4,] 0.25    0
#>  [5,] 0.25    0
#>  [6,] 0.25    0
#>  [7,] 0.25    0
#>  [8,] 0.25    0
#>  [9,] 0.00    1
head(out$des.rv,4)
#>      m1 m2 m3 m4 f1 f2 f3 f4 env
#> [1,]  1  1  1  1  0  0  0  0   1
#> [2,]  0  0  0  0  1  1  1  1   1
#> [3,]  1  1  0  0  1  1  0  0   1
#> [4,]  1  0  1  0  1  0  1  0   1

Then we fit the model

pa <- phreg(Surv(time,status)~+1+cluster(cluster),data=data)

# make ace random effects design
ts <- twostage(pa,data=data,clusters=data$cluster,
    var.par=1,var.link=0,theta=c(2,1),
        random.design=out$des.rv,theta.des=out$pardes)
summary(ts)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects
#> $estimates
#>                Coef.        SE        z        P-val Kendall tau         SE
#> dependence1 2.054484 0.2546601 8.067555 6.661338e-16   0.5067190 0.03098273
#> dependence2 1.097735 0.1129562 9.718236 0.000000e+00   0.3543669 0.02354244
#> 
#> $type
#> [1] "clayton.oakes"
#> 
#> $h
#>             Estimate Std.Err   2.5%  97.5%   P-value
#> dependence1   0.6518 0.04206 0.5693 0.7342 3.710e-54
#> dependence2   0.3482 0.04206 0.2658 0.4307 1.237e-16
#> 
#> $vare
#> NULL
#> 
#> $vartot
#>    Estimate Std.Err  2.5% 97.5%   P-value
#> p1    3.152  0.2423 2.677 3.627 1.053e-38
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"

The model can also be fitted by specifying the pairs that one wants for the pairwise likelhood. This is done by specifying the pairs argument. We start by considering all pairs as we also did before.

All pairs can be written up by calling the familycluster.index function.

There are 6000 pairs to consider, and the first 6 pairs for the first family is written out here.

# now specify fitting via specific pairs 
# first all pairs 
mm <- familycluster.index(data$cluster)
head(mm$familypairindex,n=12)
#>  [1] 1 2 1 3 1 4 2 3 2 4 3 4
pairs <- matrix(mm$familypairindex,ncol=2,byrow=TRUE)
head(pairs,n=6)
#>      [,1] [,2]
#> [1,]    1    2
#> [2,]    1    3
#> [3,]    1    4
#> [4,]    2    3
#> [5,]    2    4
#> [6,]    3    4

Then fitting the model using only specified pairs

ts <- twostage(pa,data=data,clusters=data$cluster, theta=c(2,1),var.link=0,step=1.0,
        random.design=out$des.rv, theta.des=out$pardes,pairs=pairs)
summary(ts)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects
#> $estimates
#>                Coef.        SE         z        P-val Kendall tau         SE
#> dependence1 2.054484 0.2565939  8.006755 1.110223e-15   0.5067190 0.03121800
#> dependence2 1.097735 0.1083008 10.135980 0.000000e+00   0.3543669 0.02257216
#> 
#> $type
#> [1] "clayton.oakes"
#> 
#> $h
#>             Estimate Std.Err   2.5%  97.5%   P-value
#> dependence1   0.6518 0.04171 0.5700 0.7335 4.729e-55
#> dependence2   0.3482 0.04171 0.2665 0.4300 6.828e-17
#> 
#> $vare
#> NULL
#> 
#> $vartot
#>    Estimate Std.Err  2.5% 97.5%   P-value
#> p1    3.152   0.242 2.678 3.626 8.595e-39
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"

Now we only use a random sample of the pairs by sampling these. The pairs picked still refers to the data given in the data argument, and clusters (families) are also specified as before.

ssid <- sort(sample(1:nrow(pairs),2000))
tsd <- twostage(pa,data=data,clusters=data$cluster,
    theta=c(2,1)/10,var.link=0,step=1.0, random.design=out$des.rv,
   theta.des=out$pardes,pairs=pairs[ssid,])
summary(tsd)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects
#> $estimates
#>                Coef.        SE        z        P-val Kendall tau         SE
#> dependence1 1.772343 0.3410861 5.196176 2.034297e-07   0.4698255 0.04793708
#> dependence2 1.195634 0.1374651 8.697726 0.000000e+00   0.3741461 0.02692207
#> 
#> $type
#> [1] "clayton.oakes"
#> 
#> $h
#>             Estimate Std.Err   2.5%  97.5%   P-value
#> dependence1   0.5972 0.05703 0.4854 0.7089 1.164e-25
#> dependence2   0.4028 0.05703 0.2911 0.5146 1.614e-12
#> 
#> $vare
#> NULL
#> 
#> $vartot
#>    Estimate Std.Err  2.5% 97.5%   P-value
#> p1    2.968  0.3502 2.282 3.654 2.359e-17
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"

Sometimes one only has the data from the pairs in addition to for example a cohort estimate of the marginal surival models. We now demonstrate how this is dealt with. Everything is essentially as before but need to organize the design differently compared to before we specified the design
for everybody in the cohort. In addition we do not here bring in the uncertainty from the baseline in the estimates, even though this is formally possible, but when the data of the marginal model and twostage data are not the same, we have to specify that we do not want the decomposition for the uncertainty due to the baseline (baseline.iid=0).

ids <- sort(unique(c(pairs[ssid,])))

pairsids <- c(pairs[ssid,])
pair.new <- matrix(fast.approx(ids,c(pairs[ssid,])),ncol=2)
head(pair.new)
#>      [,1] [,2]
#> [1,]    1    2
#> [2,]    1    3
#> [3,]    2    4
#> [4,]    5    6
#> [5,]    5    7
#> [6,]    8    9

# this requires that pair.new refers to id's in dataid (survival, status and so forth)
# random.design and theta.des are constructed to be the array 3 dims via individual specfication from ace.family.design
dataid <- dsort(data[ids,],"cluster")
outid <- ace.family.design(dataid,member="type",id="cluster")
outid$pardes
#>       [,1] [,2]
#>  [1,] 0.25    0
#>  [2,] 0.25    0
#>  [3,] 0.25    0
#>  [4,] 0.25    0
#>  [5,] 0.25    0
#>  [6,] 0.25    0
#>  [7,] 0.25    0
#>  [8,] 0.25    0
#>  [9,] 0.00    1
head(outid$des.rv)
#>      m1 m2 m3 m4 f1 f2 f3 f4 env
#> [1,]  1  1  1  1  0  0  0  0   1
#> [2,]  0  0  0  0  1  1  1  1   1
#> [3,]  1  1  0  0  1  1  0  0   1
#> [4,]  1  0  1  0  1  0  1  0   1
#> [5,]  1  1  1  1  0  0  0  0   1
#> [6,]  1  1  0  0  1  1  0  0   1

Now fitting the model using only the pair data.

tsdid <- twostage(pa,data=dataid,clusters=dataid$cluster,theta=c(2,1)/10,
      var.link=0,baseline.iid=0,
          random.design=outid$des.rv,theta.des=outid$pardes,pairs=pair.new)
summary(tsdid)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects
#> $estimates
#>                Coef.        SE        z        P-val Kendall tau         SE
#> dependence1 1.772343 0.3304783 5.362964 8.186736e-08   0.4698255 0.04644624
#> dependence2 1.195634 0.1336389 8.946747 0.000000e+00   0.3741460 0.02617273
#> 
#> $type
#> [1] "clayton.oakes"
#> 
#> $h
#>             Estimate Std.Err   2.5%  97.5%   P-value
#> dependence1   0.5972 0.05633 0.4868 0.7076 2.943e-26
#> dependence2   0.4028 0.05633 0.2924 0.5132 8.576e-13
#> 
#> $vare
#> NULL
#> 
#> $vartot
#>    Estimate Std.Err  2.5% 97.5%   P-value
#> p1    2.968  0.3332 2.315 3.621 5.251e-19
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"

paid <- phreg(Surv(time,status)~+1+cluster(cluster),data=dataid)
tsdidb <- twostage(paid,data=dataid,clusters=dataid$cluster,theta=c(2,1)/10,
  var.link=0,random.design=outid$des.rv,theta.des=outid$pardes,pairs=pair.new)
summary(tsdidb)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects
#> $estimates
#>                Coef.        SE        z        P-val Kendall tau         SE
#> dependence1 1.804711 0.3493009 5.166638 2.383419e-07   0.4743359 0.04825988
#> dependence2 1.186006 0.1395368 8.499590 0.000000e+00   0.3722548 0.02749323
#> 
#> $type
#> [1] "clayton.oakes"
#> 
#> $h
#>             Estimate Std.Err   2.5%  97.5%   P-value
#> dependence1   0.6034 0.05716 0.4914 0.7155 4.690e-26
#> dependence2   0.3966 0.05716 0.2845 0.5086 3.974e-12
#> 
#> $vare
#> NULL
#> 
#> $vartot
#>    Estimate Std.Err  2.5% 97.5%   P-value
#> p1    2.991  0.3594 2.286 3.695 8.724e-17
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"
coef(tsdidb)
#>                Coef.        SE        z        P-val Kendall tau         SE
#> dependence1 1.804711 0.3493009 5.166638 2.383419e-07   0.4743359 0.04825988
#> dependence2 1.186006 0.1395368 8.499590 0.000000e+00   0.3722548 0.02749323

Estimates changed because we used either the marginal from the full-data, in which case the standard errors did not reflect the uncertainty from the baseline, or the marginal estimated from only the sub-sample in which case the marginals were slightly different.

Pairwise specification of random effects and variances

Now we illustrate how one can also directly specify the random.design and theta.design for each pair, rather than taking an overall specification that can be used for the whole family via the rows of the des.rv for the relevant pairs. This can be much simpler in some situations.

pair.types <-  matrix(dataid[c(t(pair.new)),"type"],byrow=T,ncol=2)
head(pair.new)
#>      [,1] [,2]
#> [1,]    1    2
#> [2,]    1    3
#> [3,]    2    4
#> [4,]    5    6
#> [5,]    5    7
#> [6,]    8    9
head(pair.types)
#>      [,1]     [,2]    
#> [1,] "mother" "father"
#> [2,] "mother" "child" 
#> [3,] "father" "child" 
#> [4,] "mother" "child" 
#> [5,] "mother" "child" 
#> [6,] "mother" "child"

theta.des  <- rbind( c(rbind(c(1,0),c(1,0),c(0,1),c(0,0))),
        c(rbind(c(0.5,0),c(0.5,0),c(0.5,0),c(0,1))))
random.des <- rbind( 
        c(1,0,1,0),c(0,1,1,0),
        c(1,1,0,1),c(1,0,1,1))
mf <- 1*(pair.types[,1]=="mother" & pair.types[,2]=="father")
##          pair, rv related to pairs,  theta.des related to pair 
pairs.new <- cbind(pair.new,(mf==1)*1+(mf==0)*3,(mf==1)*2+(mf==0)*4,(mf==1)*1+(mf==0)*2,(mf==1)*3+(mf==0)*4)

pairs.new is matix with

  • columns 1:2 giving the indeces of the data points

  • columns 3:4 giving the indeces of the random.design for the different pairs

  • columns 5 giving the indeces of the theta.des written as rows

  • columns 6 giving the number of random variables for this pair

Looking at the first three rows. We see that the composite likehood is based on data-points (1,2), (3,4) and (5,6), these are (mother, father), (mother, child), and (father, child), respectively.

head(pairs.new[1:3,])
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    1    2    1    2    1    3
#> [2,]    1    3    3    4    2    4
#> [3,]    2    4    3    4    2    4
head(dataid)
#>        time status x cluster   type   mintime lefttime truncated number child
#> 1 0.2474705      1 1       1 mother 0.2474705        0         0      1     0
#> 2 0.9687073      1 0       1 father 0.2474705        0         0      2     0
#> 3 1.2621224      1 0       1  child 0.2474705        0         0      3     1
#> 4 0.6995364      1 0       1  child 0.2474705        0         0      4     0
#> 5 2.4438829      1 0       2 mother 0.6631202        0         0      1     0
#> 7 0.6631202      1 1       2  child 0.6631202        0         0      3     1

The random effects for these are specified from random effects with design read from the random.design, using the rows (1,2), (3,4) and (3,4), respecively, and with random effects that have variances given by theta.des rows, 1,2, and 2 respectively in the three cases. For the first pair (1,2), the random vectors and their variances are given by, (mother, father) pair,

random.des[1,]
#> [1] 1 0 1 0
random.des[2,]
#> [1] 0 1 1 0
matrix(theta.des[1,],4,2)
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    1    0
#> [3,]    0    1
#> [4,]    0    0

thus sharing only the third random effect with variance \(\sigma_e^2\) and having two non-shared random effects with variances \(\sigma_g^2\), and finally a last 4th random effect with variance \(0\) that thus could have been omitted.

now considering the parent and their child, they are thus sharing the first random effect with variance \(0.5 \sigma_g^2\) then there are two non-shared random effects with variances \(0.5 \sigma_g^2\), and finally a shared environment with variance \(\sigma_e^2\).

head(dataid)
#>        time status x cluster   type   mintime lefttime truncated number child
#> 1 0.2474705      1 1       1 mother 0.2474705        0         0      1     0
#> 2 0.9687073      1 0       1 father 0.2474705        0         0      2     0
#> 3 1.2621224      1 0       1  child 0.2474705        0         0      3     1
#> 4 0.6995364      1 0       1  child 0.2474705        0         0      4     0
#> 5 2.4438829      1 0       2 mother 0.6631202        0         0      1     0
#> 7 0.6631202      1 1       2  child 0.6631202        0         0      3     1
matrix(theta.des[2,],4,2)
#>      [,1] [,2]
#> [1,]  0.5    0
#> [2,]  0.5    0
#> [3,]  0.5    0
#> [4,]  0.0    1
random.des[3,]
#> [1] 1 1 0 1
random.des[4,]
#> [1] 1 0 1 1

And fitting again the same model as before

tsdid2 <- twostage(pa,data=dataid,clusters=dataid$cluster,
       theta=c(2,1)/10,var.link=0,step=1.0,random.design=random.des,
       baseline.iid=0, theta.des=theta.des,pairs=pairs.new,dim.theta=2)
summary(tsdid2)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects
#> $estimates
#>                Coef.        SE        z        P-val Kendall tau         SE
#> dependence1 1.772343 0.3304783 5.362964 8.186733e-08   0.4698255 0.04644624
#> dependence2 1.195634 0.1336389 8.946747 0.000000e+00   0.3741461 0.02617273
#> 
#> $type
#> [1] "clayton.oakes"
#> 
#> $h
#>             Estimate Std.Err   2.5%  97.5%   P-value
#> dependence1   0.5972 0.05633 0.4868 0.7076 2.943e-26
#> dependence2   0.4028 0.05633 0.2924 0.5132 8.576e-13
#> 
#> $vare
#> NULL
#> 
#> $vartot
#>    Estimate Std.Err  2.5% 97.5%   P-value
#> p1    2.968  0.3332 2.315 3.621 5.251e-19
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"
tsd$theta
#>                 [,1]
#> dependence1 1.772343
#> dependence2 1.195634
tsdid2$theta
#>                 [,1]
#> dependence1 1.772343
#> dependence2 1.195634
tsdid$theta
#>                 [,1]
#> dependence1 1.772343
#> dependence2 1.195634

Finally the same model structure can be setup based on a Kinship coefficient.

kinship  <- c()
for (i in 1:nrow(pair.new))
{
if (pair.types[i,1]=="mother" & pair.types[i,2]=="father") pk1 <- 0 else pk1 <- 0.5
kinship <- c(kinship,pk1)
}
head(kinship,n=10)
#>  [1] 0.0 0.5 0.5 0.5 0.5 0.5 0.5 0.0 0.5 0.5

out <- make.pairwise.design(pair.new,kinship,type="ace") 

Same same

tsdid3 <- twostage(pa,data=dataid,clusters=dataid$cluster,
   theta=c(2,1)/10,var.link=0,step=1.0,random.design=out$random.design,
   baseline.iid=0,theta.des=out$theta.des,pairs=out$new.pairs,dim.theta=2)
summary(tsdid3)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects
#> $estimates
#>                Coef.        SE        z        P-val Kendall tau         SE
#> dependence1 1.772343 0.3304783 5.362964 8.186733e-08   0.4698255 0.04644624
#> dependence2 1.195634 0.1336389 8.946747 0.000000e+00   0.3741461 0.02617273
#> 
#> $type
#> [1] "clayton.oakes"
#> 
#> $h
#>             Estimate Std.Err   2.5%  97.5%   P-value
#> dependence1   0.5972 0.05633 0.4868 0.7076 2.943e-26
#> dependence2   0.4028 0.05633 0.2924 0.5132 8.576e-13
#> 
#> $vare
#> NULL
#> 
#> $vartot
#>    Estimate Std.Err  2.5% 97.5%   P-value
#> p1    2.968  0.3332 2.315 3.621 5.251e-19
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"
tsdid2$theta
#>                 [,1]
#> dependence1 1.772343
#> dependence2 1.195634
tsdid$theta
#>                 [,1]
#> dependence1 1.772343
#> dependence2 1.195634

Now fitting the AE model without the “C” component for shared environment:

out <- make.pairwise.design(pair.new,kinship,type="ae") 
tsdid4 <- twostage(pa,data=dataid,clusters=dataid$cluster,
   theta=c(2,1)/10,var.link=0,step=1.0,random.design=out$random.design,
   baseline.iid=0,theta.des=out$theta.des,pairs=out$new.pairs,dim.theta=1)
summary(tsdid4)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects
#> $estimates
#>                Coef.        SE        z P-val Kendall tau         SE
#> dependence1 4.259319 0.4792785 8.886939     0   0.6804764 0.02446605
#> 
#> $type
#> [1] "clayton.oakes"
#> 
#> $h
#>             Estimate Std.Err 2.5% 97.5% P-value
#> dependence1        1       0    1     1       0
#> 
#> $vare
#> NULL
#> 
#> $vartot
#>    Estimate Std.Err 2.5% 97.5%   P-value
#> p1    4.259  0.4793 3.32 5.199 6.282e-19
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"

Univariate plackett model twostage models

The copula known as the Plackett distribution, see , is on the form \[\begin{align} C(u,v; \theta) = \begin{cases} \frac{ S - (S^2 - 4 u v \theta (\theta-a))}{2 (\theta -1)} & \mbox{ if } \theta \ne 1 \\ u v & \mbox{ if } \theta = 1 \end{cases} \end{align}\] with \(S=1+(\theta-1) (u + v)\). With marginals \(S_i\) we now define the bivariate survival function as \(C(u_1,u_2)=H(S_1(t_1),S_2(t_2))\) with \(u_i=S_i(t_i)\).

The dependence parameter \(\theta\) has the nice interpretation that the it is equivalent to the odds-ratio of all \(2 \times 2\) tables for surviving past any cut of the plane \((t_1,t_2)\), that is \[ \theta = \frac{ P(T_1 > t_1 | T_2 >t_2) P(T_1 \leq t_1 | T_2>t_2) }{P(T_1 > t_1 | T_2 \leq t_2) P(T_1 \leq t_1 | T_2 \leq t_2 ) }. \]

One additional nice feature of the odds-ratio measure it that it is directly linked to the Spearman correlation, \(\rho\), that can be computed as \[\begin{align} \frac{\theta+1}{\theta -1} - \frac{2 \theta}{(\theta-1)^2} \log(\theta) \end{align}\] when \(\theta \ne 1\), if \(\theta=1\) then \(\rho=0\).

This model has a more free parameter than the Clayton-Oakes model.

 library(mets)
 data(diabetes)
 
 # Marginal Cox model  with treat as covariate
 margph <- phreg(Surv(time,status)~treat+cluster(id),data=diabetes)
 # Clayton-Oakes, MLE 
 fitco1<-twostageMLE(margph,data=diabetes,theta=1.0)
 summary(fitco1)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects
#> $estimates
#>                 Coef.        SE       z       P-val Kendall tau         SE
#> dependence1 0.9526614 0.3543033 2.68883 0.007170289    0.322645 0.08127892
#> 
#> $type
#> NULL
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"
 
 # Plackett model
 mph <- phreg(Surv(time,status)~treat+cluster(id),data=diabetes)
 fitp <- survival.twostage(mph,data=diabetes,theta=3.0,Nit=40,
                clusters=diabetes$id,var.link=1,model="plackett")
 summary(fitp)
#> Dependence parameter for Odds-Ratio (Plackett) model 
#> With log-link
#> $estimates
#>             log-Coef.        SE        z        P-val Spearman Corr.         SE
#> dependence1   1.14188 0.2754994 4.144764 3.401635e-05      0.3648217 0.08073474
#> 
#> $or
#>             Estimate Std.Err  2.5% 97.5%   P-value
#> dependence1    3.133   0.863 1.441 4.824 0.0002837
#> 
#> $type
#> [1] "plackett"
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"
 
 # without covariates but with stratafied 
 marg <- phreg(Surv(time,status)~+strata(treat)+cluster(id),data=diabetes)
 fitpa <- survival.twostage(marg,data=diabetes,theta=1.0,
                 clusters=diabetes$id,score.method="optimize")
 summary(fitpa)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects 
#> With log-link
#> $estimates
#>               log-Coef.        SE          z     P-val Kendall tau         SE
#> dependence1 -0.05683487 0.3239422 -0.1754476 0.8607279   0.3208252 0.07058583
#> 
#> $vargam
#>             Estimate Std.Err   2.5% 97.5%  P-value
#> dependence1   0.9448   0.306 0.3449 1.545 0.002022
#> 
#> $type
#> [1] "clayton.oakes"
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"
 
 fitcoa <- survival.twostage(marg,data=diabetes,theta=1.0,clusters=diabetes$id,
                  model="clayton.oakes")
 summary(fitcoa)
#> Dependence parameter for Clayton-Oakes model
#> Variance of Gamma distributed random effects 
#> With log-link
#> $estimates
#>               log-Coef.        SE          z     P-val Kendall tau         SE
#> dependence1 -0.05683996 0.3279956 -0.1732949 0.8624196   0.3208241 0.07146893
#> 
#> $vargam
#>             Estimate Std.Err   2.5% 97.5%  P-value
#> dependence1   0.9447  0.3099 0.3374 1.552 0.002297
#> 
#> $type
#> [1] "clayton.oakes"
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"

With a regression design

 mm <- model.matrix(~-1+factor(adult),diabetes)
 fitp <- survival.twostage(mph,data=diabetes,theta=3.0,Nit=40,
                clusters=diabetes$id,var.link=1,model="plackett",
        theta.des=mm)
 summary(fitp)
#> Dependence parameter for Odds-Ratio (Plackett) model 
#> With log-link
#> $estimates
#>                log-Coef.        SE        z       P-val Spearman Corr.
#> factor(adult)1  1.098333 0.3356654 3.272106 0.001067497      0.3519988
#> factor(adult)2  1.231962 0.4708683 2.616363 0.008887198      0.3909505
#>                        SE
#> factor(adult)1 0.09930816
#> factor(adult)2 0.13514292
#> 
#> $or
#>                Estimate Std.Err   2.5% 97.5% P-value
#> factor(adult)1    2.999   1.007 1.0260 4.972 0.00289
#> factor(adult)2    3.428   1.614 0.2643 6.592 0.03369
#> 
#> $type
#> [1] "plackett"
#> 
#> attr(,"class")
#> [1] "summary.mets.twostage"
 # Piecewise constant cross hazards ratio modelling

 d <- subset(simClaytonOakes(2000,2,0.5,0,stoptime=2,left=0),!truncated)
 udp <- piecewise.twostage(c(0,0.5,2),data=d,score.method="optimize",
                           id="cluster",timevar="time",
                           status="status",model="plackett",silent=0)
#> Data-set  1 out of  4
#>   Number of joint events: 522 of  2000
#> Data-set  2 out of  4
#>   Number of joint events: 263 of  1194
#> Data-set  3 out of  4
#>   Number of joint events: 278 of  1209
#> Data-set  4 out of  4
#>   Number of joint events: 579 of  925
 summary(udp)
#> [1] 1
#> Dependence parameter for Plackett model 
#> Score of log-likelihood for parameter estimates (too large?)
#>               0 - 0.5       0.5 - 2
#> 0 - 0.5 -0.0005197763 -0.0004069175
#> 0.5 - 2 -0.0002803075  0.0010056645
#> 
#> 
#> log-coefficient for dependence parameter (SE) 
#>          0 - 0.5        0.5 - 2      
#> 0 - 0.5  1.589 (0.081)  1.75  (0.124)
#> 0.5 - 2  1.584 (0.124)  2.025 (0.099)
#> 
#> Spearman Correlation (SE) 
#>          0 - 0.5        0.5 - 2      
#> 0 - 0.5  0.489 (0.021)  0.53  (0.031)
#> 0.5 - 2  0.488 (0.032)  0.595 (0.022)

SessionInfo

sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 16.04.6 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/openblas-base/libblas.so.3
#> LAPACK: /usr/lib/libopenblasp-r0.2.18.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] mets_1.2.8.1    lava_1.6.7      timereg_1.9.7   survival_3.1-12
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.5          cpp11_0.2.1         knitr_1.29         
#>  [4] magrittr_1.5        splines_4.0.2       lattice_0.20-41    
#>  [7] R6_2.4.1            ragg_0.3.1          rlang_0.4.7        
#> [10] stringr_1.4.0       tools_4.0.2         grid_4.0.2         
#> [13] xfun_0.17           htmltools_0.5.0     systemfonts_0.3.1  
#> [16] yaml_2.2.1          assertthat_0.2.1    rprojroot_1.3-2    
#> [19] digest_0.6.25       numDeriv_2016.8-1.1 pkgdown_1.6.1      
#> [22] crayon_1.3.4        Matrix_1.2-18       fs_1.5.0           
#> [25] memoise_1.1.0       evaluate_0.14       rmarkdown_2.3      
#> [28] stringi_1.5.3       compiler_4.0.2      desc_1.2.0         
#> [31] backports_1.1.10    mvtnorm_1.1-1