Skip to main content

Chapter 2 Introduction

index.utf8.md

code{white-space: pre;}

pre:not([class]) {
background-color: white;
}

if (window.hljs) {
hljs.configure({languages: []});
hljs.initHighlightingOnLoad();
if (document.readyState && document.readyState === “complete”) {
window.setTimeout(function() { hljs.initHighlighting(); }, 0);
}
}

h1 {
font-size: 34px;
}
h1.title {
font-size: 38px;
}
h2 {
font-size: 30px;
}
h3 {
font-size: 24px;
}
h4 {
font-size: 18px;
}
h5 {
font-size: 16px;
}
h6 {
font-size: 12px;
}
.table th:not([align]) {
text-align: left;
}

.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
code {
color: inherit;
background-color: rgba(0, 0, 0, 0.04);
}
img {
max-width:100%;
}
.tabbed-pane {
padding-top: 12px;
}
.html-widget {
margin-bottom: 20px;
}
button.code-folding-btn:focus {
outline: none;
}
summary {
display: list-item;
}

/* padding for bootstrap navbar */
body {
padding-top: 51px;
padding-bottom: 40px;
}
/* offset scroll position for anchor links (for fixed navbar) */
.section h1 {
padding-top: 56px;
margin-top: -56px;
}
.section h2 {
padding-top: 56px;
margin-top: -56px;
}
.section h3 {
padding-top: 56px;
margin-top: -56px;
}
.section h4 {
padding-top: 56px;
margin-top: -56px;
}
.section h5 {
padding-top: 56px;
margin-top: -56px;
}
.section h6 {
padding-top: 56px;
margin-top: -56px;
}
.dropdown-submenu {
position: relative;
}
.dropdown-submenu>.dropdown-menu {
top: 0;
left: 100%;
margin-top: -6px;
margin-left: -1px;
border-radius: 0 6px 6px 6px;
}
.dropdown-submenu:hover>.dropdown-menu {
display: block;
}
.dropdown-submenu>a:after {
display: block;
content: ” “;
float: right;
width: 0;
height: 0;
border-color: transparent;
border-style: solid;
border-width: 5px 0 5px 5px;
border-left-color: #cccccc;
margin-top: 5px;
margin-right: -10px;
}
.dropdown-submenu:hover>a:after {
border-left-color: #ffffff;
}
.dropdown-submenu.pull-left {
float: none;
}
.dropdown-submenu.pull-left>.dropdown-menu {
left: -100%;
margin-left: 10px;
border-radius: 6px 0 6px 6px;
}

// manage active state of menu based on current page
$(document).ready(function () {
// active menu anchor
href = window.location.pathname
href = href.substr(href.lastIndexOf(‘/’) + 1)
if (href === “”)
href = “index.html”;
var menuAnchor = $(‘a[href=”‘ + href + ‘”]’);

// mark it active
menuAnchor.parent().addClass(‘active’);

// if it’s got a parent navbar menu mark it active as well
menuAnchor.closest(‘li.dropdown’).addClass(‘active’);
});

.tabset-dropdown > .nav-tabs {
display: inline-table;
max-height: 500px;
min-height: 44px;
overflow-y: auto;
background: white;
border: 1px solid #ddd;
border-radius: 4px;
}

.tabset-dropdown > .nav-tabs > li.active:before {
content: “”;
font-family: ‘Glyphicons Halflings’;
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}

.tabset-dropdown > .nav-tabs.nav-tabs-open > li.active:before {
content: “”;
border: none;
}

.tabset-dropdown > .nav-tabs.nav-tabs-open:before {
content: “”;
font-family: ‘Glyphicons Halflings’;
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}

.tabset-dropdown > .nav-tabs > li.active {
display: block;
}

.tabset-dropdown > .nav-tabs > li > a,
.tabset-dropdown > .nav-tabs > li > a:focus,
.tabset-dropdown > .nav-tabs > li > a:hover {
border: none;
display: inline-block;
border-radius: 4px;
}

.tabset-dropdown > .nav-tabs.nav-tabs-open > li {
display: block;
float: none;
}

.tabset-dropdown > .nav-tabs > li {
display: none;
}

Introductory Material

Our objective in this chapter is to estimate the average causal treatment effect. In the nomenclature of potential outcomes, the treatment effect is defined as

[
begin{aligned}
delta^{*} = Eleft{Y^{*}(1)right} –
Eleft{Y^{*}(0)right};
end{aligned}
]

that is, the treatment effect is the difference in the average potential outcomes if all patients were to receive treatment (A=1) and that if all patients were to receive treatment (A=0).

In the following pages, we provide implementations for several of the estimators of the treatment effect that were discussed in Chapter 2 of the book: the naive, the outcome regression, the stratification, the inverse probability weighted, and the doubly robust estimators. For each method, we

  • provide a quick review of the form of the estimator,
  • define an estimator specific form for standard error estimates,
  • present R code to illustrate how one can implement the method and standard error being discussed, and
  • show the results of those implementations for a variety of models.

The implementations discussed are not general, having been written for the example data set, and they do not incorporate validation steps that we believe are important in all code development. We provide more general and robust implementations through an R package SomeCleverName, which is available for download under the R Codes section accessed through the header above. It is hoped that these more general implementations will be useful in practice with no additional code development required by the reader.

We assume a basic level of knowledge about the R statistical computing environment as well as awareness of common regression methods available through R. If you are new to R, there are numerous on-line tutorials, discussion groups, and blogs available; for example R For Beginners by Emmanuel Paradis.

Code Items in Text

We use a variety of font decorations when referring in text passages to packages, functions, and input arguments. Package names are printed in bold packageName. Functions that are defined by a package in R are highlighted in blue font, functionName(). Functions that we define through the website are highlighted in green; functionName(). Finally, input arguments for functions are in bold text-type argumentName.

If a function exists in R but is not provided in the base (default) package, we include the package name using the double colon notation, which is the convention of R. For example, lm(), which fits linear models, is defined by package stats and is referenced as stats::lm() in the text or as stats::lm() in code.

Function Arguments

As mentioned previously, function argument names in text sections are bold text-type, argumentName. In addition, for code blocks, we always use the named argument input convention (name = object). Many functions allow for unnamed, positional inputs; however, we believe that named inputs provide more clarity.

For example, we will write the code to generate 100 random numbers from a normal distribution with mean 10 and standard deviation 2 as

stats::rnorm(n = 100, mean = 10, sd = 2)
  [1]  9.893237 10.507688 10.012688  8.707147  9.548533  8.982988  8.066013
  [8] 10.205411 10.397472  7.819400  9.336019  9.339630  8.867144  9.542909
 [15] 11.762094 10.739503 10.438619  9.391717 11.680585 12.787967 13.701214
 [22]  9.848661  9.717627 11.654497 10.164340 10.575105 10.022330  8.149515
 [29] 12.045776 12.352308 14.178142  6.689610 11.020653  9.162687  9.094733
 [36]  6.420989  8.193088  8.233419 10.169635  9.511913  8.952925  8.208049
 [43]  8.624921  8.897360  7.426237 10.897978 14.522398 10.113130 13.442828
 [50] 11.570495  8.757094  7.942937 11.493073 11.205697 10.903682 10.431809
 [57] 10.723979  8.893913  9.772852  9.078384  8.752082 11.302833  8.076248
 [64]  9.626894 12.942590  9.523719  6.318247 12.348221  8.277386  8.936629
 [71]  9.510566 10.220424  5.725107  9.738102 10.966470  9.057370 11.275425
 [78] 14.878563  8.573202  7.608465 12.093278  9.748635  9.122764  8.007697
 [85]  8.870171  8.727354  9.020773  9.752939 11.027651 11.415775 14.261205
 [92] 13.618538  6.605776  9.958352  7.414723  7.152316 12.656295  7.991488
 [99]  8.789417 10.951555

though R would also understand

stats::rnorm(100, 10, 2)
  [1]  8.961146  7.261345  9.823388 10.906348 13.455027  9.048661  6.465663
  [8]  6.749245  6.052349 10.772465  7.888255 10.666305  7.767421  9.086291
 [15]  7.886885  5.927129 10.249050  9.627808  7.940593 13.131739  9.737940
 [22]  6.096359 10.347302 12.520441 10.291120  9.322992  9.771034 12.995770
 [29]  9.508409 13.539901  8.239068 11.459178 12.549406 11.473786  7.475902
 [36] 11.224677  9.293729  7.563118 11.139834  6.727723 13.305250  8.854498
 [43]  8.117358  5.139411  7.026799 10.847335 10.943316 12.779382  9.072890
 [50]  9.233614 15.936254 11.208169 11.365441  7.079556 12.145652  9.411931
 [57]  9.859032  8.530485 10.991720  8.503772  8.988956  8.965980 11.300199
 [64]  7.001940 12.698560  9.188789 14.192957  9.627838 11.948128 11.748514
 [71] 13.884357  8.571470  7.855020  9.834301  8.921868 11.098543 10.285607
 [78]  8.319610 12.028411  9.867531  9.946538  8.527994  8.623172 11.092249
 [85] 11.874513  9.410972  9.827292  8.233111 11.976960 13.546498 10.340028
 [92]  8.806128  7.252567 10.523667 11.528861  8.806399  7.799699 10.901113
 [99]  4.984706 10.524021

to be equivalent. (Though the calls to stats::rnorm() are equivalent, the results are not because we did not reset the seed.)

Numeric vs Integer

All numeric values include a decimal. All integers are indicated using R’s “L” notation; i.e. the integer value 1 is coded as 1L. This allows for more efficient and robust tests for equality.

Curly Brackets vs Parentheses

We group expressions using curly brackets. For example, our convention is to write

res <- {a + b} * {c + d}

in contrast to

res <- (a + b) * (c + d)

Both are acceptable in R, but there are circumstances under which curly brackets can lead to improved speed.

Help with Functions

If you are not familiar with a function used in our implementations (for illustration assume the method is stats::lm()) you can type ?lm or ?stats::lm at the command prompt of R to obtain the official documentation or type “lm R CRAN” into your favorite search engine to locate blog posts, articles, or tutorials.

Most methods discussed in this chapter rely on outcome and/or propensity score regressions. For such methods, we make use of R’s modelObj package, which facilitates the implementation of statistical methods using a modeling object framework. This framework separates the details of a regression step from the implementation of a new method. We briefly introduce the features of this package that we use throughout the book.

For the methods developed in this chapter, the modelObj framework may seem a bit heavy-handed. However, in later chapters we discuss the methods for dynamic treatment regimes as implemented in package DynTxRegime, which was built on the modelObj framework. It is our hope that introducing the framework early in our discussion will facilitate their use in those more complex settings.

Introduction

A modeling object can be thought of as a Class, such as those encountered in high-level languages C++ and Java. In the traditional language of classes, a modeling object has ‘state variables’ that include the postulated model, the method to be used to estimate parameters, and the method to be used to make predictions. The object also has behaviors such as ‘obtain parameter estimates’ and ‘make predictions.’

This framework essentially separates the implementation of a statistical method that requires a regression step from the details of the regression step. Specifically, a developer does not need to specify a specific regression method (and its inputs!) such as stats::lm() or stats::nls(), but simply passes a modeling object and the appropriate data to a generic ‘fit’ method. The package takes care of the rest. Through this framework, developers can avoid introducing artificial limitations of coding choices regarding supported regression methods.

Users of methods developed on the modeling object framework provide the details of a regression step as a compact input variable. The defined object contains all control parameters for regression and prediction.

This modeling object framework has been implemented in R package modelObj. Users of methods developed using modelObj define modeling objects through function modelObj::buildModelObj().

Function buildModelObj()

modelObj::buildModelObj(model,
                        solver.method = NULL, 
                        solver.args = NULL,
                        predict.method = NULL, 
                        predict.args = NULL)

Input model is a standard R formula object specifying the model to be used for the regression step, e.g., y ∼ x. Note that the left-hand-side of the formula will be ignored.

Inputs solver.method and solver.args specify the method to be used to obtain parameter estimates. solver.method is a character string specifying the R function, e.g., “lm,” “nls,” or “glm”. solver.args is an optional named list that is used to modify default values of arguments passed to the function specified in solver.method. See the examples below for further details.

Similarly, inputs predict.method and predict.args specify the method to be used to obtain predictions. predict.method is a character string specifying the R function, e.g., “predict.lm,” “predict,” or “predict.glm”. predict.args is an optional named list that is used to modify default values of arguments passed to the function specified in predict.method.

Examples

Continuous Outcome Variable

To illustrate, assume that our data set, data, contains covariates (x1) and (x2) and a continuous outcome variable, (y). We postulate a model (y sim beta_0 + beta_1~x1 + beta_2~x2) and want to obtain parameter estimates using ordinary least squares through R’s stats::lm() function. Typically, this is coded in R as

fit <- stats::lm(formula = y ~ x1 + x2, data = data)

And, predictions from that analysis would be obtained as

pred <- stats::predict.lm(object = fit)

The modeling object defining these regression and prediction steps is specified as

mo <- modelObj::buildModelObj(model = ~ x1 + x2, 
                              solver.method = "lm",
                              predict.method = "predict.lm")

Notice that we did not explicitly include the package name stats when specifying solver.method or predict.method. Because we specify the function names using character strings, we cannot include the package name. With this choice of input style, the stats package must be loaded into your R environment prior to calling modelObj::buildModelObj(). Users can instead provide the function itself (solver.method = stats::lm); however, this passes the entire function as input and results in cumbersome screen prints. Our choice here is purely for aesthetics.

Binary Outcome Variable

For a binary outcome, (y,) we postulate the logistic regression model (text{logit}(y) sim beta_0 + beta_1 x_1 + beta_2 x_2.) Parameter estimates are obtained using maximum likelihood through R’s stats::glm() function.

fit <- stats::glm(formula = y ~ x1 + x2, 
                  data = data, 
                  family = binomial)

Notice that an additional input is required; namely, family=binomial. This input specifies the family of the model and the link function; the default for stats::glm() is family=gaussian.

Predictions from the above analysis would be obtained as

pred <- stats::predict.glm(object = fit, type = "response")

Again there is a change to the default inputs; i.e., type = “response”. The default input for stats::predict.glm() is type = “link”, which returns the predictions on the scale of the linear predictors not the outcome variable.

The modeling object defining these regression and prediction steps is specified as

mo <- modelObj::buildModelObj(model = ~ x1 + x2, 
                              solver.method = "glm",
                              solver.args = list(family = "binomial"),
                              predict.method = "predict.glm",
                              predict.args = list(type = "response"))

Limitations of modelObj

There are two important limitations to this framework that must be kept in mind whenever using packages or functions that are built on the modelObj framework.

  1. There is no built-in model checking. It is the responsibility of the user to define models responsibly.

  2. Care must be taken in specifying the scale of the predictions, as in the binary example above.

Methods for Modeling Objects

Package modelObj has several methods available. In this chapter, we will make extensive use of only three of them: modelObj::fit(), modelObj::predict(), and modelObj::fitObject().

modelObj::fit()

modelObj::fit(object, data, response, ...)

Function modelObj::fit() performs the regression step. It takes as input object, a modeling object returned by a call to modelObj::buildModelObj(); data, a data.frame containing all relevant covariates; and response, the outcome of interest. This function returns an object of class “modeObjFit,” which includes the regression results.

modelObj::predict()

modelObj::predict(object, ...)

Function modelObj::predict() obtains predictions. This function takes as input object, an object returned by modelObj::fit(), and an optional input newdata containing a data.frame for which predictions are to be made. The value returned is generally a vector or matrix of predictions; the exact structure is determined by the prediction method specified in the modeling object.

modelObj::fitObject()

modelObj::fitObject(object, ...)

Function modelObj::fitObject() is a utility function that strips away the modeling object framework and returns the value object of the regression method. This allows users to complete post-analysis steps in the expected way. For example, calls to R’s stats::lm() method return objects of class “lm”, to which methods such as stats::residuals() and stats::plot.lm() can be applied.

$(“.main-container”).removeClass(“main-container”).removeClass(“container-fluid”).addClass(“footerPushDown”);

function openCity(evt, cityName, cls, tcls) {
var i, x, tablinks;
x = document.getElementsByClassName(cls);
for (i = 0; i < x.length; i++) {
x[i].style.display = "none";
}
tablinks = document.getElementsByClassName(tcls);
for (i = 0; i < x.length; i++) {
tablinks[i].className = tablinks[i].className.replace(" w3-red", "");
}
document.getElementById(cityName).style.display = "block";
evt.currentTarget.className += " w3-red";
}

var popOverSettings = {
placement: 'bottom',
}

$(document).ready(function() {
$('[data-toggle="tooltip"]').tooltip(popOverSettings);
$('[data-toggle="tooltip"]').on('shown.bs.tooltip', function () {
MathJax.Hub.Queue(["Typeset",MathJax.Hub]);
})

});

$('.myTop a[data-toggle="tab"]').on('click', function(){
$('.myBottom li.active').removeClass('active')
$('.myBottom a[href="'+$(this).attr('href')+'"]').parent().addClass('active');
document.body.scrollTop = 0; // For Safari
document.documentElement.scrollTop = 0; // For Chrome, Firefox, IE and Opera
})

$('.myBottom a[data-toggle="tab"]').on('click', function(){
$('.myTop li.active').removeClass('active')
$('.myTop a[href="'+$(this).attr('href')+'"]').parent().addClass('active');
document.body.scrollTop = 0; // For Safari
document.documentElement.scrollTop = 0; // For Chrome, Firefox, IE and Opera
})

function myFunction(x) {
if (x.style.display === "none") {
x.style.display = "block";
} else {
x.style.display = "none";
}
}

// add bootstrap table styles to pandoc tables
function bootstrapStylePandocTables() {
$(‘tr.header’).parent(‘thead’).parent(‘table’).addClass(‘table table-condensed’);
}
$(document).ready(function () {
bootstrapStylePandocTables();
});

$(document).ready(function () {
window.buildTabsets(“TOC”);
});

$(document).ready(function () {
$(‘.tabset-dropdown > .nav-tabs > li’).click(function () {
$(this).parent().toggleClass(‘nav-tabs-open’)
});
});

(function () {
var script = document.createElement(“script”);
script.type = “text/javascript”;
script.src = “site_libs/mathjax-local/MathJax.js?config=TeX-AMS-MML_HTMLorMML”;
document.getElementsByTagName(“head”)[0].appendChild(script);
})();