This function creates a predictor object (class learner) from a list of existing learner objects. When estimating this model a stacked prediction will be created by weighting together the predictions of each of the initial learners The weights are learned using cross-validation.
superlearner(
learners,
data,
nfolds = 10,
meta.learner = metalearner_nnls,
model.score = mse,
mc.cores = NULL,
future.seed = TRUE,
silent = TRUE,
name.prefix = NULL,
...
)
(list) List of learner objects (i.e. learner_glm)
(data.frame) Data containing the response variable and covariates.
(integer) Number of folds to use in cross-validation to estimate the ensemble weights.
(function) Algorithm to learn the ensemble weights
(default non-negative least squares). Must be a function of the response
(nx1 vector), y
, and the predictions (nxp matrix), pred
, with p being
the number of learners. Alternatively, this can be set to the character
value "discrete", in which case the Discrete Super-Learner is applied where
the model with the lowest risk (model-score) is given weight 1 and all
other learners weight 0.
(function) Model scoring method (see learner)
(integer) If not NULL, then parallel::mcmapply is used with
mc.cores
number of cores for parallelization instead of the
future.apply::future_lapply package. Parallelization is disabled with
mc.cores = 1
.
(logical or integer) Argument passed on to future.apply::future_lapply. If TRUE, then .Random.seed is used if it holds a L'Ecuyer-CMRG RNG seed, otherwise one is created randomly.
(logical) Suppress all messages and progressbars
(character) Prefix used to name learner objects in
learners
without names. If NULL, then obtain the name from the info field
of a learner.
Additional arguments to parallel::mclapply or future.apply::future_lapply.
Luedtke & van der Laan (2016) Super-Learning of an Optimal Dynamic Treatment Rule, The International Journal of Biostatistics.
sim1 <- function(n = 5e2) {
x1 <- rnorm(n, sd = 2)
x2 <- rnorm(n)
y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5)
data.frame(y, x1, x2)
}
m <- list(
"mean" = learner_glm(y ~ 1),
"glm" = learner_glm(y ~ x1 + x2)
)
sl <- superlearner(m, data = sim1(), nfolds = 2)
predict(sl, newdata = sim1(n = 5))
#> [1] -0.65248151 2.36100714 0.01658889 -1.72611835 -0.67508451
predict(sl, newdata = sim1(n = 5), all.learners = TRUE)
#> mean glm
#> 1 0.04057825 1.8983151
#> 2 0.04057825 -2.0044428
#> 3 0.04057825 -0.1975450
#> 4 0.04057825 -4.4728980
#> 5 0.04057825 -0.3934162