Conditional Average Treatment Effect estimation with cross-fitting.
cate(
response.model,
propensity.model,
cate.model = ~1,
contrast = c(1, 0),
data,
nfolds = 1,
rep = 1,
rep.type = c("nuisance", "average"),
silent = FALSE,
stratify = FALSE,
mc.cores = NULL,
second.order = TRUE,
response_model = deprecated,
cate_model = deprecated,
propensity_model = deprecated,
treatment = deprecated,
...
)
formula or learner object (formula => learner_glm)
formula or learner object (formula => learner_glm)
formula specifying regression design for conditional average treatment effects
treatment contrast (default 1 vs 0)
data.frame
number of folds
number of replications of cross-fitting procedure
repeated cross-fitting applied by averaging nuisance models
(rep.type="nuisance"
) or by average estimates from each replication
(rep.type="average"
).
suppress all messages and progressbars
if TRUE the response.model will be stratified by treatment
(optional) number of cores. parallel::mcmapply used instead of future
add seconder order term to IF to handle misspecification of outcome models
Deprecated. Use response.model instead.
Deprecated. Use cate.model instead.
Deprecated. Use propensity.model instead.
Deprecated. Use cate.model instead.
additional arguments to future.apply::future_mapply
cate.targeted object
We have observed data \((Y,A,W)\) where \(Y\) is the response variable, \(A\) the binary treatment, and \(W\) covariates. We further let \(V\) be a subset of the covariates. Define the conditional potential mean outcome $$\psi_{a}(P)(V) = E_{P}[E_{P}(Y\mid A=a, W)|V]$$ and let \(m(V; \beta)\) denote a parametric working model, then the target parameter is the mean-squared error $$\beta(P) = \operatorname{argmin}_{\beta} E_{P}[\{\Psi_{1}(P)(V)-\Psi_{0}(P)(V)\} - m(V; \beta)]^{2}$$
Mark J. van der Laan (2006) Statistical Inference for Variable Importance, The International Journal of Biostatistics.
sim1 <- function(n=1000, ...) {
w1 <- rnorm(n)
w2 <- rnorm(n)
a <- rbinom(n, 1, plogis(-1 + w1))
y <- cos(w1) + w2*a + 0.2*w2^2 + a + rnorm(n)
data.frame(y, a, w1, w2)
}
d <- sim1(5000)
## ATE
cate(cate.model=~1,
response.model=y~a*(w1+w2),
propensity.model=a~w1+w2,
data=d)
#> Estimate Std.Err 2.5% 97.5% P-value
#> E[y(1)] 1.8508 0.04186 1.7687 1.9328 0.000e+00
#> E[y(0)] 0.8279 0.02054 0.7876 0.8681 0.000e+00
#> ───────────
#> (Intercept) 1.0229 0.04768 0.9295 1.1164 4.287e-102
## CATE
cate(cate.model=~1+w2,
response.model=y~a*(w1+w2),
propensity.model=a~w1+w2,
data=d)
#> Estimate Std.Err 2.5% 97.5% P-value
#> E[y(1)] 1.8508 0.04186 1.7687 1.9328 0.000e+00
#> E[y(0)] 0.8279 0.02054 0.7876 0.8681 0.000e+00
#> ───────────
#> (Intercept) 0.9999 0.04654 0.9087 1.0911 2.146e-102
#> w2 0.9879 0.04304 0.9035 1.0723 1.451e-116
if (FALSE) ## superlearner example
mod1 <- list(
glm = learner_glm(y~w1+w2),
gam = learner_gam(y~s(w1) + s(w2))
)
s1 <- learner_sl(mod1, nfolds=5)
#> Error: object 'mod1' not found
cate(cate.model=~1,
response.model=s1,
propensity.model=learner_glm(a~w1+w2, family=binomial),
data=d,
stratify=TRUE)
#> Error: object 's1' not found
# \dontrun{}