This function creates a predictor object (class learner) from a list of existing learner objects. When estimating this model a stacked prediction will be created by weighting together the predictions of each of the initial learners The weights are learned using cross-validation.
Usage
superlearner(
learners,
data,
nfolds = 10,
meta.learner = metalearner_nnls,
model.score = mse,
mc.cores = NULL,
future.seed = TRUE,
silent = TRUE,
name.prefix = NULL,
...
)Arguments
- learners
(list) List of learner objects (i.e. learner_glm)
- data
(data.frame) Data containing the response variable and covariates.
- nfolds
(integer) Number of folds to use in cross-validation to estimate the ensemble weights.
- meta.learner
(function) Algorithm to learn the ensemble weights (default non-negative least squares). Must be a function of the response (nx1 vector),
y, and the predictions (nxp matrix),pred, with p being the number of learners. Alternatively, this can be set to the character value "discrete", in which case the Discrete Super-Learner is applied where the model with the lowest risk (model-score) is given weight 1 and all other learners weight 0.- model.score
(function) Model scoring method (see learner)
- mc.cores
(integer) If not NULL, then parallel::mcmapply is used with
mc.coresnumber of cores for parallelization instead of the future.apply::future_lapply package. Parallelization is disabled withmc.cores = 1.- future.seed
(logical or integer) Argument passed on to future.apply::future_lapply. If TRUE, then .Random.seed is used if it holds a L'Ecuyer-CMRG RNG seed, otherwise one is created randomly.
- silent
(logical) Suppress all messages and progressbars
- name.prefix
(character) Prefix used to name learner objects in
learnerswithout names. If NULL, then obtain the name from the info field of a learner.- ...
Additional arguments to parallel::mclapply or future.apply::future_lapply.
References
Luedtke & van der Laan (2016) Super-Learning of an Optimal Dynamic Treatment Rule, The International Journal of Biostatistics.
Examples
sim1 <- function(n = 5e2) {
x1 <- rnorm(n, sd = 2)
x2 <- rnorm(n)
y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5)
data.frame(y, x1, x2)
}
m <- list(
"mean" = learner_glm(y ~ 1),
"glm" = learner_glm(y ~ x1 + x2)
)
sl <- superlearner(m, data = sim1(), nfolds = 2)
predict(sl, newdata = sim1(n = 5))
#> [1] -0.65248151 2.36100714 0.01658889 -1.72611835 -0.67508451
predict(sl, newdata = sim1(n = 5), all.learners = TRUE)
#> mean glm
#> 1 0.04057825 1.8983151
#> 2 0.04057825 -2.0044428
#> 3 0.04057825 -0.1975450
#> 4 0.04057825 -4.4728980
#> 5 0.04057825 -0.3934162
