Constructs a learner class object for fitting entire lasso or
elastic-net regularization paths for various linear and non-linear regression
models with glmnet::cv.glmnet. Predictions are returned for the value of
lambda
that gives minimum cvm
. That is, glmnet::predict.cv.glmnet is
called with s = "lambda.min"
.
learner_glmnet_cv(
formula,
info = "glmnet::cv.glmnet",
family = gaussian(),
lambda = NULL,
alpha = 1,
nfolds = 10,
learner.args = NULL,
...
)
(formula) Formula specifying response and design matrix.
(character) Optional information to describe the instantiated learner object.
Either a character string representing
one of the built-in families, or else a glm()
family object. For more
information, see Details section below or the documentation for response
type (above).
Optional user-supplied lambda sequence; default is
NULL
, and glmnet
chooses its own sequence. Note that this is done
for the full model (master sequence), and separately for each fold.
The fits are then alligned using the master sequence (see the allignment
argument for additional details). Adapting lambda
for each fold
leads to better convergence. When lambda
is supplied, the same sequence
is used everywhere, but in some GLMs can lead to convergence issues.
The elasticnet mixing parameter, with \(0\le\alpha\le 1\).
The penalty is defined as
$$(1-\alpha)/2||\beta||_2^2+\alpha||\beta||_1.$$ alpha=1
is the
lasso penalty, and alpha=0
the ridge penalty.
number of folds - default is 10. Although nfolds
can be
as large as the sample size (leave-one-out CV), it is not recommended for
large datasets. Smallest value allowable is nfolds=3
(list) Additional arguments to learner$new().
Other arguments that can be passed to glmnet, for example alpha
, nlambda
, etc. See glmnet
for details.
learner object.
# continuous outcome
n <- 5e2
x1 <- rnorm(n, sd = 2)
x2 <- rnorm(n)
lp <- x1 + x2*x1 + cos(x1)
y <- rnorm(n, lp, sd = 2)
d0 <- data.frame(y, x1, x2)
lr <- learner_glmnet_cv(y ~ x1 + x2)
lr$estimate(d0, nfolds = 3)
lr$predict(data.frame(x1 = c(0, 1), x2 = 1))
#> [1] 0.1958814 1.3180883
# count outcome with different exposure time
w <- 50 + rexp(n, rate = 1 / 5)
y <- rpois(n, exp(0.5 * x1 - 1 * x2 + log(w)) * rgamma(n, 1 / 2, 1 / 2))
d0 <- data.frame(y, x1, x2, w)
lr <- learner_glmnet_cv(y ~ x1 + x2 + offset(log(w)), family = "poisson")
lr$estimate(d0, nfolds = 3)
lr$predict(data.frame(x1 = 1, x2 = 1, w = c(1, 5)))
#> [1] 0.9681549 4.8407746