Interface for statistical and machine learning models to be used for nuisance model estimation in targeted learning.
The following list provides an overview of constructors for many commonly used models.
Regression and classification: learner_glm, learner_gam, learner_grf,
learner_hal, learner_glmnet_cv, learner_svm, learner_xgboost,
learner_mars
Regression: learner_isoreg
Classification: learner_naivebayes
Ensemble (super learner): learner_sl
info
Optional information/name of the model
clear
Remove fitted model from the learner object
fit
Return estimated model object.
formula
Return model formula. Use learner$update() to update the formula.
new()
Create a new prediction model object
formula
formula specifying outcome and design matrix
estimate
function for fitting the model. This must be a function with response, 'y', and design matrix, 'x'. Alternatively, a function with a formula and data argument. See the examples section.
predict
prediction function (must be a function of model object, 'object', and new design matrix, 'newdata')
predict.args
optional arguments to prediction function
estimate.args
optional arguments to estimate function
info
optional description of the model
specials
optional specials terms (weights, offset, id, subset, ...) passed on to design
formula.keep.specials
if TRUE then special terms defined by
specials
will be removed from the formula before it is being passed to
the estimate print.function()
intercept
(logical) include intercept in design matrix
estimate()
Estimation method
predict()
Prediction method
update()
Update formula
summary()
Summary method to provide more extensive information than learner$print().
summarized_learner object, which is a list with the following elements:
description of the learner
formula specifying outcome and design matrix
function for fitting the model
arguments to estimate function
function for making predictions from fitted model
arguments to predict function
provided special terms
include intercept in design matrix
lr <- learner_glm(y ~ x, family = "nb")
lr$summary()
lr_sum <- lr$summary() # store returned summary in new object
names(lr_sum)
print(lr_sum)
response()
Extract response from data
data
data.frame
eval
when FALSE return the untransformed outcome (i.e., return 'a' if formula defined as I(a==1) ~ ...)
...
additional arguments to design
design()
Generate design object (design matrix and response) from data
data
data.frame
...
additional arguments to design
data(iris)
rf <- function(formula, ...) {
learner$new(formula,
info = "grf::probability_forest",
estimate = function(x, y, ...) {
grf::probability_forest(X = x, Y = y, ...)
},
predict = function(object, newdata) {
predict(object, newdata)$predictions
},
estimate.args = list(...)
)
}
args <- expand.list(
num.trees = c(100, 200), mtry = 1:3,
formula = c(Species ~ ., Species ~ Sepal.Length + Sepal.Width)
)
models <- lapply(args, function(par) do.call(rf, par))
x <- models[[1]]$clone()
x$estimate(iris)
predict(x, newdata = head(iris))
#> setosa versicolor virginica
#> [1,] 0.9820000 0.001833333 0.016166667
#> [2,] 0.9431825 0.052595238 0.004222222
#> [3,] 0.9858730 0.009047619 0.005079365
#> [4,] 0.9783730 0.016547619 0.005079365
#> [5,] 0.9820000 0.001833333 0.016166667
#> [6,] 0.9119127 0.060492063 0.027595238
# \donttest{
# Reduce Ex. timing
a <- targeted::cv(models, data = iris)
cbind(coef(a), attr(args, "table"))
#> brier -logscore num.trees mtry
#> model1 0.09174485 0.2054654 100 1
#> model2 0.09520162 0.2144675 200 1
#> model3 0.08360182 0.1833965 100 2
#> model4 0.08541814 0.1845019 200 2
#> model5 0.07572792 0.1602475 100 3
#> model6 0.07764519 0.1647030 200 3
#> model7 0.34626400 0.5640032 100 1
#> model8 0.34910402 0.5639633 200 1
#> model9 0.34603105 0.5566681 100 2
#> model10 0.34599329 0.5632747 200 2
#> model11 0.34506560 0.5573672 100 3
#> model12 0.34878419 0.5620248 200 3
#> formula
#> model1 Species ~ .
#> model2 Species ~ .
#> model3 Species ~ .
#> model4 Species ~ .
#> model5 Species ~ .
#> model6 Species ~ .
#> model7 Species ~ Sepal.Length + Sepal.Width
#> model8 Species ~ Sepal.Length + Sepal.Width
#> model9 Species ~ Sepal.Length + Sepal.Width
#> model10 Species ~ Sepal.Length + Sepal.Width
#> model11 Species ~ Sepal.Length + Sepal.Width
#> model12 Species ~ Sepal.Length + Sepal.Width
# }
# defining learner via function with arguments y (response)
# and x (design matrix)
f1 <- learner$new(
estimate = function(y, x) lm.fit(x = x, y = y),
predict = function(object, newdata) newdata %*% object$coefficients
)
# defining the learner via arguments formula and data
f2 <- learner$new(
estimate = function(formula, data, ...) glm(formula, data, ...)
)
# generic learner defined from function (predict method derived per default
# from stats::predict
f3 <- learner$new(
estimate = function(dt, ...) {
lm(y ~ x, data = dt)
}
)
## ------------------------------------------------
## Method `learner$summary`
## ------------------------------------------------
lr <- learner_glm(y ~ x, family = "nb")
lr$summary()
#> ────────── learner object ──────────
#> glm
#>
#> formula: y ~ x <environment: 0x55ff0e203de0>
#> estimate: formula, data, family, ...
#> estimate.args: family=nb
#> predict: object, newdata, ...
#> predict.args:
#> specials:
lr_sum <- lr$summary() # store returned summary in new object
names(lr_sum)
#> [1] "formula" "info" "estimate.args" "predict.args"
#> [5] "estimate" "predict" "specials" "intercept"
print(lr_sum)
#> ────────── learner object ──────────
#> glm
#>
#> formula: y ~ x <environment: 0x55ff0e203de0>
#> estimate: formula, data, family, ...
#> estimate.args: family=nb
#> predict: object, newdata, ...
#> predict.args:
#> specials: