This function creates a predictor object (class learner) from a list of existing learner objects. When estimating this model a stacked prediction will be created by weighting together the predictions of each of the initial learners The weights are learned using cross-validation.

superlearner(
  learners,
  data,
  nfolds = 10,
  meta.learner = metalearner_nnls,
  model.score = mse,
  mc.cores = NULL,
  future.seed = TRUE,
  silent = TRUE,
  name.prefix = NULL,
  ...
)

Arguments

learners

(list) List of learner objects (i.e. learner_glm)

data

(data.frame) Data containing the response variable and covariates.

nfolds

(integer) Number of folds to use in cross-validation to estimate the ensemble weights.

meta.learner

(function) Algorithm to learn the ensemble weights (default non-negative least squares). Must be a function of the response (nx1 vector), y, and the predictions (nxp matrix), pred, with p being the number of learners. Alternatively, this can be set to the character value "discrete", in which case the Discrete Super-Learner is applied where the model with the lowest risk (model-score) is given weight 1 and all other learners weight 0.

model.score

(function) Model scoring method (see learner)

mc.cores

(integer) If not NULL, then parallel::mcmapply is used with mc.cores number of cores for parallelization instead of the future.apply::future_lapply package. Parallelization is disabled with mc.cores = 1.

future.seed

(logical or integer) Argument passed on to future.apply::future_lapply. If TRUE, then .Random.seed is used if it holds a L'Ecuyer-CMRG RNG seed, otherwise one is created randomly.

silent

(logical) Suppress all messages and progressbars

name.prefix

(character) Prefix used to name learner objects in learners without names. If NULL, then obtain the name from the info field of a learner.

...

Additional arguments to parallel::mclapply or future.apply::future_lapply.

References

Luedtke & van der Laan (2016) Super-Learning of an Optimal Dynamic Treatment Rule, The International Journal of Biostatistics.

Examples

sim1 <- function(n = 5e2) {
   x1 <- rnorm(n, sd = 2)
   x2 <- rnorm(n)
   y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5)
   data.frame(y, x1, x2)
}
m <- list(
  "mean" = learner_glm(y ~ 1),
  "glm" = learner_glm(y ~ x1 + x2)
)
sl <- superlearner(m, data = sim1(), nfolds = 2)
predict(sl, newdata = sim1(n = 5))
#> [1] -0.65248151  2.36100714  0.01658889 -1.72611835 -0.67508451
predict(sl, newdata = sim1(n = 5), all.learners = TRUE)
#>         mean        glm
#> 1 0.04057825  1.8983151
#> 2 0.04057825 -2.0044428
#> 3 0.04057825 -0.1975450
#> 4 0.04057825 -4.4728980
#> 5 0.04057825 -0.3934162