Interface for statistical and machine learning models to be used for nuisance model estimation in targeted learning.

The following list provides an overview of constructors for many commonly used models.

Regression and classification: learner_glm, learner_gam, learner_grf, learner_hal, learner_glmnet_cv, learner_svm, learner_xgboost, learner_mars
Regression: learner_isoreg
Classification: learner_naivebayes
Ensemble (super learner): learner_sl

Author

Klaus Kähler Holst, Benedikt Sommer

Public fields

info

Optional information/name of the model

Active bindings

clear

Remove fitted model from the learner object

fit

Return estimated model object.

formula

Return model formula. Use learner$update() to update the formula.

Methods


Method new()

Create a new prediction model object

Usage

learner$new(
  formula = NULL,
  estimate,
  predict = stats::predict,
  predict.args = NULL,
  estimate.args = NULL,
  info = NULL,
  specials = c(),
  formula.keep.specials = FALSE,
  intercept = FALSE
)

Arguments

formula

formula specifying outcome and design matrix

estimate

function for fitting the model. This must be a function with response, 'y', and design matrix, 'x'. Alternatively, a function with a formula and data argument. See the examples section.

predict

prediction function (must be a function of model object, 'object', and new design matrix, 'newdata')

predict.args

optional arguments to prediction function

estimate.args

optional arguments to estimate function

info

optional description of the model

specials

optional specials terms (weights, offset, id, subset, ...) passed on to design

formula.keep.specials

if TRUE then special terms defined by specials will be removed from the formula before it is being passed to the estimate print.function()

intercept

(logical) include intercept in design matrix


Method estimate()

Estimation method

Usage

learner$estimate(data, ..., store = TRUE)

Arguments

data

data.frame

...

Additional arguments to estimation method

store

Logical determining if estimated model should be stored inside the class.


Method predict()

Prediction method

Usage

learner$predict(newdata, ..., object = NULL)

Arguments

newdata

data.frame

...

Additional arguments to prediction method

object

Optional model fit object


Method update()

Update formula

Usage

learner$update(formula)

Arguments

formula

formula or character which defines the new response


Method print()

Print method

Usage

learner$print()


Method summary()

Summary method to provide more extensive information than learner$print().

Usage

learner$summary()

Returns

summarized_learner object, which is a list with the following elements:

info

description of the learner

formula

formula specifying outcome and design matrix

estimate

function for fitting the model

estimate.args

arguments to estimate function

predict

function for making predictions from fitted model

predict.args

arguments to predict function

specials

provided special terms

intercept

include intercept in design matrix

Examples

lr <- learner_glm(y ~ x, family = "nb")
lr$summary()

lr_sum <- lr$summary() # store returned summary in new object
names(lr_sum)
print(lr_sum)


Method response()

Extract response from data

Usage

learner$response(data, eval = TRUE, ...)

Arguments

data

data.frame

eval

when FALSE return the untransformed outcome (i.e., return 'a' if formula defined as I(a==1) ~ ...)

...

additional arguments to design


Method design()

Generate design object (design matrix and response) from data

Usage

learner$design(data, ...)

Arguments

data

data.frame

...

additional arguments to design


Method opt()

Get options

Usage

learner$opt(arg)

Arguments

arg

name of option to get value of


Method clone()

The objects of this class are cloneable with this method.

Usage

learner$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

data(iris)
rf <- function(formula, ...) {
  learner$new(formula,
    info = "grf::probability_forest",
    estimate = function(x, y, ...) {
      grf::probability_forest(X = x, Y = y, ...)
    },
    predict = function(object, newdata) {
      predict(object, newdata)$predictions
    },
    estimate.args = list(...)
  )
}

args <- expand.list(
  num.trees = c(100, 200), mtry = 1:3,
  formula = c(Species ~ ., Species ~ Sepal.Length + Sepal.Width)
)
models <- lapply(args, function(par) do.call(rf, par))

x <- models[[1]]$clone()
x$estimate(iris)
predict(x, newdata = head(iris))
#>         setosa  versicolor   virginica
#> [1,] 0.9820000 0.001833333 0.016166667
#> [2,] 0.9431825 0.052595238 0.004222222
#> [3,] 0.9858730 0.009047619 0.005079365
#> [4,] 0.9783730 0.016547619 0.005079365
#> [5,] 0.9820000 0.001833333 0.016166667
#> [6,] 0.9119127 0.060492063 0.027595238

# \donttest{
# Reduce Ex. timing
a <- targeted::cv(models, data = iris)
cbind(coef(a), attr(args, "table"))
#>              brier -logscore num.trees mtry
#> model1  0.09174485 0.2054654       100    1
#> model2  0.09520162 0.2144675       200    1
#> model3  0.08360182 0.1833965       100    2
#> model4  0.08541814 0.1845019       200    2
#> model5  0.07572792 0.1602475       100    3
#> model6  0.07764519 0.1647030       200    3
#> model7  0.34626400 0.5640032       100    1
#> model8  0.34910402 0.5639633       200    1
#> model9  0.34603105 0.5566681       100    2
#> model10 0.34599329 0.5632747       200    2
#> model11 0.34506560 0.5573672       100    3
#> model12 0.34878419 0.5620248       200    3
#>                                      formula
#> model1                           Species ~ .
#> model2                           Species ~ .
#> model3                           Species ~ .
#> model4                           Species ~ .
#> model5                           Species ~ .
#> model6                           Species ~ .
#> model7  Species ~ Sepal.Length + Sepal.Width
#> model8  Species ~ Sepal.Length + Sepal.Width
#> model9  Species ~ Sepal.Length + Sepal.Width
#> model10 Species ~ Sepal.Length + Sepal.Width
#> model11 Species ~ Sepal.Length + Sepal.Width
#> model12 Species ~ Sepal.Length + Sepal.Width
# }

# defining learner via function with arguments y (response)
# and x (design matrix)
f1 <- learner$new(
  estimate = function(y, x) lm.fit(x = x, y = y),
  predict = function(object, newdata) newdata %*% object$coefficients
)
# defining the learner via arguments formula and data
f2 <- learner$new(
  estimate = function(formula, data, ...) glm(formula, data, ...)
)
# generic learner defined from function (predict method derived per default
# from stats::predict
f3 <- learner$new(
  estimate = function(dt, ...) {
    lm(y ~ x, data = dt)
  }
)

## ------------------------------------------------
## Method `learner$summary`
## ------------------------------------------------

lr <- learner_glm(y ~ x, family = "nb")
lr$summary()
#> ────────── learner object ──────────
#> glm 
#> 
#> formula: y ~ x <environment: 0x55ff0e203de0> 
#> estimate: formula, data, family, ... 
#> estimate.args: family=nb 
#> predict: object, newdata, ... 
#> predict.args:   
#> specials:  

lr_sum <- lr$summary() # store returned summary in new object
names(lr_sum)
#> [1] "formula"       "info"          "estimate.args" "predict.args" 
#> [5] "estimate"      "predict"       "specials"      "intercept"    
print(lr_sum)
#> ────────── learner object ──────────
#> glm 
#> 
#> formula: y ~ x <environment: 0x55ff0e203de0> 
#> estimate: formula, data, family, ... 
#> estimate.args: family=nb 
#> predict: object, newdata, ... 
#> predict.args:   
#> specials: