Superlearner (stacked/ensemble learner)

This function creates a predictor object (class learner) from a list of existing learner objects. When estimating this model a stacked prediction will be created by weighting together the predictions of each of the initial learners The weights are learned using cross-validation.

Usage

superlearner(
  learners,
  data,
  nfolds = 10,
  meta.learner = metalearner_nnls,
  model.score = mse,
  mc.cores = NULL,
  future.seed = TRUE,
  silent = TRUE,
  name.prefix = NULL,
  ...
)

Arguments

learners: (list) List of learner objects (i.e. learner_glm)
data: (data.frame) Data containing the response variable and covariates.
nfolds: (integer) Number of folds to use in cross-validation to estimate the ensemble weights.
meta.learner: (function) Algorithm to learn the ensemble weights (default non-negative least squares). Must be a function of the response (nx1 vector), y, and the predictions (nxp matrix), pred, with p being the number of learners. Alternatively, this can be set to the character value "discrete", in which case the Discrete Super-Learner is applied where the model with the lowest risk (model-score) is given weight 1 and all other learners weight 0.
model.score: (function) Model scoring method (see learner)
mc.cores: (integer) If not NULL, then parallel::mcmapply is used with mc.cores number of cores for parallelization instead of the future.apply::future_lapply package. Parallelization is disabled with mc.cores = 1.
future.seed: (logical or integer) Argument passed on to future.apply::future_lapply. If TRUE, then .Random.seed is used if it holds a L'Ecuyer-CMRG RNG seed, otherwise one is created randomly.
silent: (logical) Suppress all messages and progressbars
name.prefix: (character) Prefix used to name learner objects in learners without names. If NULL, then obtain the name from the info field of a learner.
...: Additional arguments to parallel::mclapply or future.apply::future_lapply.

References

Luedtke & van der Laan (2016) Super-Learning of an Optimal Dynamic Treatment Rule, The International Journal of Biostatistics.

Examples

sim1 <- function(n = 5e2) {
   x1 <- rnorm(n, sd = 2)
   x2 <- rnorm(n)
   y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5)
   data.frame(y, x1, x2)
}
m <- list(
  "mean" = learner_glm(y ~ 1),
  "glm" = learner_glm(y ~ x1 + x2)
)
sl <- superlearner(m, data = sim1(), nfolds = 2)
predict(sl, newdata = sim1(n = 5))
#> [1] -0.65248151  2.36100714  0.01658889 -1.72611835 -0.67508451
predict(sl, newdata = sim1(n = 5), all.learners = TRUE)
#>         mean        glm
#> 1 0.04057825  1.8983151
#> 2 0.04057825 -2.0044428
#> 3 0.04057825 -0.1975450
#> 4 0.04057825 -4.4728980
#> 5 0.04057825 -0.3934162

Usage

Arguments

References

See also

Examples