Interface for statistical and machine learning models to be used for nuisance model estimation in targeted learning.
The following list provides an overview of constructors for many commonly used models.
Regression and classification: learner_glm, learner_gam, learner_grf,
learner_hal, learner_glmnet_cv, learner_svm, learner_xgboost,
learner_mars
Regression: learner_isoreg
Classification: learner_naivebayes
Ensemble (super learner): learner_sl
infoOptional information/name of the model
clearRemove fitted model from the learner object
fitReturn estimated model object.
formulaReturn model formula. Use learner$update() to update the formula.
new()Create a new prediction model object
formulaformula specifying outcome and design matrix
estimatefunction for fitting the model. This must be a function with response, 'y', and design matrix, 'x'. Alternatively, a function with a formula and data argument. See the examples section.
predictprediction function (must be a function of model object, 'object', and new design matrix, 'newdata')
predict.argsoptional arguments to prediction function
estimate.argsoptional arguments to estimate function
infooptional description of the model
specialsoptional specials terms (weights, offset, id, subset, ...) passed on to design
formula.keep.specialsif TRUE then special terms defined by
specials will be removed from the formula before it is being passed to
the estimate print.function()
intercept(logical) include intercept in design matrix
estimate()Estimation method
predict()Prediction method
update()Update formula
summary()Summary method to provide more extensive information than learner$print().
summarized_learner object, which is a list with the following elements:
description of the learner
formula specifying outcome and design matrix
function for fitting the model
arguments to estimate function
function for making predictions from fitted model
arguments to predict function
provided special terms
include intercept in design matrix
lr <- learner_glm(y ~ x, family = "nb")
lr$summary()
lr_sum <- lr$summary() # store returned summary in new object
names(lr_sum)
print(lr_sum)response()Extract response from data
datadata.frame
evalwhen FALSE return the untransformed outcome (i.e., return 'a' if formula defined as I(a==1) ~ ...)
...additional arguments to design
design()Generate design object (design matrix and response) from data
datadata.frame
...additional arguments to design
data(iris)
rf <- function(formula, ...) {
learner$new(formula,
info = "grf::probability_forest",
estimate = function(x, y, ...) {
grf::probability_forest(X = x, Y = y, ...)
},
predict = function(object, newdata) {
predict(object, newdata)$predictions
},
estimate.args = list(...)
)
}
args <- expand.list(
num.trees = c(100, 200), mtry = 1:3,
formula = c(Species ~ ., Species ~ Sepal.Length + Sepal.Width)
)
models <- lapply(args, function(par) do.call(rf, par))
x <- models[[1]]$clone()
x$estimate(iris)
predict(x, newdata = head(iris))
#> setosa versicolor virginica
#> [1,] 0.9820000 0.001833333 0.016166667
#> [2,] 0.9431825 0.052595238 0.004222222
#> [3,] 0.9858730 0.009047619 0.005079365
#> [4,] 0.9783730 0.016547619 0.005079365
#> [5,] 0.9820000 0.001833333 0.016166667
#> [6,] 0.9119127 0.060492063 0.027595238
# \donttest{
# Reduce Ex. timing
a <- targeted::cv(models, data = iris)
cbind(coef(a), attr(args, "table"))
#> brier -logscore num.trees mtry
#> model1 0.09174485 0.2054654 100 1
#> model2 0.09520162 0.2144675 200 1
#> model3 0.08360182 0.1833965 100 2
#> model4 0.08541814 0.1845019 200 2
#> model5 0.07572792 0.1602475 100 3
#> model6 0.07764519 0.1647030 200 3
#> model7 0.34626400 0.5640032 100 1
#> model8 0.34910402 0.5639633 200 1
#> model9 0.34603105 0.5566681 100 2
#> model10 0.34599329 0.5632747 200 2
#> model11 0.34506560 0.5573672 100 3
#> model12 0.34878419 0.5620248 200 3
#> formula
#> model1 Species ~ .
#> model2 Species ~ .
#> model3 Species ~ .
#> model4 Species ~ .
#> model5 Species ~ .
#> model6 Species ~ .
#> model7 Species ~ Sepal.Length + Sepal.Width
#> model8 Species ~ Sepal.Length + Sepal.Width
#> model9 Species ~ Sepal.Length + Sepal.Width
#> model10 Species ~ Sepal.Length + Sepal.Width
#> model11 Species ~ Sepal.Length + Sepal.Width
#> model12 Species ~ Sepal.Length + Sepal.Width
# }
# defining learner via function with arguments y (response)
# and x (design matrix)
f1 <- learner$new(
estimate = function(y, x) lm.fit(x = x, y = y),
predict = function(object, newdata) newdata %*% object$coefficients
)
# defining the learner via arguments formula and data
f2 <- learner$new(
estimate = function(formula, data, ...) glm(formula, data, ...)
)
# generic learner defined from function (predict method derived per default
# from stats::predict
f3 <- learner$new(
estimate = function(dt, ...) {
lm(y ~ x, data = dt)
}
)
## ------------------------------------------------
## Method `learner$summary`
## ------------------------------------------------
lr <- learner_glm(y ~ x, family = "nb")
lr$summary()
#> ────────── learner object ──────────
#> glm
#>
#> formula: y ~ x <environment: 0x55ff0e203de0>
#> estimate: formula, data, family, ...
#> estimate.args: family=nb
#> predict: object, newdata, ...
#> predict.args:
#> specials:
lr_sum <- lr$summary() # store returned summary in new object
names(lr_sum)
#> [1] "formula" "info" "estimate.args" "predict.args"
#> [5] "estimate" "predict" "specials" "intercept"
print(lr_sum)
#> ────────── learner object ──────────
#> glm
#>
#> formula: y ~ x <environment: 0x55ff0e203de0>
#> estimate: formula, data, family, ...
#> estimate.args: family=nb
#> predict: object, newdata, ...
#> predict.args:
#> specials: