Conditional Average Treatment Effect estimation with cross-fitting.

cate(
  response_model,
  propensity_model,
  cate_model = ~1,
  contrast = c(1, 0),
  data,
  nfolds = 1,
  rep = 1,
  silent = FALSE,
  stratify = FALSE,
  mc.cores,
  ...
)

Arguments

response_model

formula or ml_model object (formula => glm)

propensity_model

formula or ml_model object (formula => glm)

cate_model

formula specifying regression design for conditional average treatment effects

contrast

treatment contrast (default 1 vs 0)

data

data.frame

nfolds

Number of folds

rep

Number of replications of cross-fitting procedure

silent

supress all messages and progressbars

stratify

If TRUE the response_model will be stratified by treatment

mc.cores

mc.cores Optional number of cores. parallel::mcmapply used instead of future

...

additional arguments to future.apply::future_mapply

Value

cate.targeted object

Details

We have observed data \((Y,A,W)\) where \(Y\) is the response variable, \(A\) the binary treatment, and \(W\) covariates. We further let \(V\) be a subset of the covariates. Define the conditional potential mean outcome $$\psi_{a}(P)(V) = E_{P}[E_{P}(Y\mid A=a, W)|V]$$ and let \(m(V; \beta)\) denote a parametric working model, then the target parameter is the mean-squared error $$\beta(P) = \operatorname{argmin}_{\beta} E_{P}[\{\Psi_{1}(P)(V)-\Psi_{0}(P)(V)\} - m(V; \beta)]^{2}$$

References

Mark J. van der Laan (2006) Statistical Inference for Variable Importance, The International Journal of Biostatistics.

Author

Klaus Kähler Holst, Andreas Nordland

Examples

sim1 <- function(n=1000, ...) {
  w1 <- rnorm(n)
  w2 <- rnorm(n)
  a <- rbinom(n, 1, expit(-1 + w1))
  y <- cos(w1) + w2*a + 0.2*w2^2 + a + rnorm(n)
  data.frame(y, a, w1, w2)
}

d <- sim1(5000)
## ATE
cate(cate_model=~1,
     response_model=y~a*(w1+w2),
     propensity_model=a~w1+w2,
     data=d)
#>             Estimate Std.Err   2.5%  97.5%    P-value
#> E[y(1)]       1.8047 0.04831 1.7100 1.8994 2.099e-305
#> E[y(0)]       0.8308 0.01984 0.7919 0.8697  0.000e+00
#> ───────────                                          
#> (Intercept)   0.9740 0.05491 0.8664 1.0816  2.175e-70
## CATE
cate(cate_model=~1+w2,
     response_model=y~a*(w1+w2),
     propensity_model=a~w1+w2,
     data=d)
#>             Estimate Std.Err   2.5%  97.5%    P-value
#> E[y(1)]       1.8047 0.04831 1.7100 1.8994 2.099e-305
#> E[y(0)]       0.8308 0.01984 0.7919 0.8697  0.000e+00
#> ───────────                                          
#> (Intercept)   0.9502 0.05280 0.8467 1.0536  2.093e-72
#> w2            1.0377 0.04756 0.9445 1.1309 1.586e-105

if (FALSE)  ## superlearner example
mod1 <- list(
   glm=predictor_glm(y~w1+w2),
   gam=predictor_gam(y~s(w1) + s(w2))
)
s1 <- predictor_sl(mod1, nfolds=5)
#> Error: object 'mod1' not found
cate(cate_model=~1,
     response_model=s1,
     propensity_model=predictor_glm(a~w1+w2, family=binomial),
     data=d,
     stratify=TRUE)
#> Error: object 's1' not found
 # \dontrun{}