Skip to contents

Main Features

The propertee package, Prognostic Regression Offsets with Propagation of ERrors (for Treatment Effect Estimation), facilitates direct adjustment for experiments and observational studies with design-informed standard errors and flexible options for covariance adjustment. It uses explicit specification of study design to provide probability of assignment weights and standard errors that appropriately reflect the design. For covariance adjustment of its Hajek and (one-way) fixed effects estimates, it enables offsetting the outcome against predictions from a dedicated covariance model, with standard error calculations propagating error as appropriate from the covariance model.

The main workflow consists of two main steps and one optional step:

  1. Generate a Design object which encodes the study design, including the unit of assignment, treatment status of each unit of assignment, and optionally block information. This is accomplished with the obs_design(), rct_design() or rd_design() functions.
  2. Optionally, fit a covariate adjustment model.
  3. Fit a model to estimate a treatment effect, accounting for the design information, and optionally the covariate adjustment. This is done via the lmitt() function.

Example Data

The example dataset comes from the state of Tennessee’s Student-Teacher Achievement Ratio (STAR) experiment. Students were randomly assigned to three possible classroom conditions: small (13 to 17 students per teacher), regular class (22 to 25 students per teacher), and regular-with-aide class (22 to 25 students with a full-time teacher’s aide).

data(STARdata)
table(STARdata$stark)
#> 
#>      regular        small regular+aide 
#>         2194         1900         2231

For simplicity for this first example, we will examine a single binary treatment - “small” classrooms versus “regular” and “regular+aide” classrooms.

STARdata$starkbinary <- STARdata$stark == "small"
table(STARdata$starkbinary)
#> 
#> FALSE  TRUE 
#>  4425  1900

After this basic example, we will see how propertee makes it easy to handle non-binary treatment variables by introducing dichotomys.

The outcome of interest is a reading score at the end of kindergarten.

summary(STARdata$readk)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
#>   315.0   414.0   433.0   436.7   453.0   627.0    5809

The students were blocked into schools via the schoolidk variable:

length(unique(STARdata$schoolidk))
#> [1] 80
head(table(STARdata$schoolidk))
#> 
#>   1   2   3   4   5   6 
#>  74  54 100  60  61  54

Students were the assigned units, so we need a unique identifier per student. If this does not currently exist, it can easily be generated:

STARdata$studentid <- seq_len(nrow(STARdata))
head(STARdata$studentid)
#> [1] 1 2 3 4 5 6

A Basic Example

Defining the Design

The three _design functions (rct_design(), obj_design(), and rd_design()) operate similarly. The first argument is the most important, and encodes all the design information through the use of a formula. The left-hand side of the formula identifies the treatment variable. The right-hand side of the formula consists of the following potential pieces of information:

  1. unit_of_assignment(): This identifies the variable(s) which indicate the units of assignment. This is required for all designs. The alias uoa() can be used in its place.
  2. block(): The identifies the variable(s) which contain block information. Optional.
  3. forcing(): In regression discontinuity designs (rd_design()), this identifies the variable(s) which contain forcing information.

To define a Design in our example:

des <- obs_design(starkbinary ~ unit_of_assignment(studentid) + block(schoolidk),
                  data = STARdata, na.fail = FALSE)
summary(des)
#> Observational Study
#> 
#>  Structure          Variables  
#>  ---------          ---------  
#>  Treatment          starkbinary
#>  Unit of Assignment studentid  
#>  Block              schoolidk  
#> 
#> Number of units per Treatment group: 
#>  Txt Grp Num Units
#>    FALSE      4425
#>     TRUE      1900

Should more than one variable be needed to identify the unit of assignment, block, or forcing, they can be included. For example, perhaps schoolidk may be unique within district, but potentially not unique across districts. Then we’d use something like block(districtid, schoolidk) in the _design function.

Estimating the treatment effect

The main function for estimating treatment effects is the lmitt() function. It takes in three main required arguments:

  1. A formula specifying the outcome and the desired treatment effect.
  2. The data set containing the outcome information.
  3. A Design.

Note that the data set does not need to be the same data set which generated the Design; it does however need to include the same variables to identify the units of assignment. (If the variable names differ, the by= argument can be used to link them, though we recommend renaming to reduce the likelihood of issues.)

For example, you may have one dataset containing school-level information, and a separate dataset containing student-level information. Assume school is the unit of assignment. While you could of course merge those two data-sets, propertee can instead use the school-level data to define the Design, and the student-level data to estimate the treatment effect.

The formula entering lmitt() can take on one of two forms:

y ~ 1

will estimate the main treatment effect on outcome variable y, and

y ~ x

will estimate subgroup specific treatment effects for each level of x for the outcome y, if x is categorical. For continuous x, a main effect and a treatment-x interaction effect is estimated.

Therefore, to estimate the treatment effect in our example, we can run:

te <- lmitt(readk ~ 1, data = STARdata, design = des)
#> The Design object contains block-level information, but it is not used in this model. Block information is used when weights are defined via `ate()` or `ett()` or if the `absorb=TRUE` argument is passed.
summary(te)
#> 
#> Call:
#> lmitt(readk ~ 1, data = STARdata, design = des)
#> 
#>  Treatment Effects :
#>                  Estimate Std. Error t value Pr(>|t|)    
#> starkbinary.TRUE   5.4632     0.9207   5.934 3.13e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Std. Error calculated via type "CR0"

The data includes ethnicity; we can estimate subgroup effects by ethnicity:

te_s <- lmitt(readk ~ ethnicity, data = STARdata, design = des)
#> The Design object contains block-level information, but it is not used in this model. Block information is used when weights are defined via `ate()` or `ett()` or if the `absorb=TRUE` argument is passed.
summary(te_s)
#> Warning: The following subgroups do not have sufficient degrees of freedom for
#> standard error estimates and will be returned as NA: ethnicityamindian
#> 
#> Call:
#> lmitt(readk ~ ethnicity, data = STARdata, design = des)
#> 
#>  Treatment Effects :
#>                                    Estimate Std. Error t value Pr(>|t|)    
#> starkbinary.TRUE_ethnicitycauc        4.717      1.147   4.112 3.97e-05 ***
#> starkbinary.TRUE_ethnicityafam        6.607      1.468   4.500 6.94e-06 ***
#> starkbinary.TRUE_ethnicityasian      -9.939     21.021  -0.473   0.6364    
#> starkbinary.TRUE_ethnicityhispanic   35.667     18.635   1.914   0.0557 .  
#> starkbinary.TRUE_ethnicityamindian   19.000         NA      NA       NA    
#> starkbinary.TRUE_ethnicityother      13.333     26.857   0.496   0.6196    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Std. Error calculated via type "CR0"

Including design weights

Design weights can be easily included in this estimation. propertee supports average treatment effect (ATE) and effect of the treatment on the treated (ETT) weights.

To include one of the weights, simply include the weights = "ate" or weights = "ett" argument to lmitt():

lmitt(readk ~ 1, data = STARdata, design = des, weights = "ate")
#> starkbinary.TRUE 
#>         6.116683
lmitt(readk ~ 1, data = STARdata, design = des, weights = "ett")
#> starkbinary.TRUE 
#>         5.650771

Internally, these call the ate() or ett() functions which can be used directly.

head(ate(des, data = STARdata))
#> [1] 0.000000 3.965517 4.705882 0.000000 1.445946 0.000000

When included inside lmitt(), you do not need to specify any additional arguments to ate() or ett(), enabling easy functions of weights. For example if you had some other weight variable, say wgt, you could include weights = wgt*ate() in the lmitt() call.

Covariance Adjustment models

By itself, lmitt() does not allow for other covariates; e.g. something like lmitt(y ~ 1 + control_var,... will fail. To adjust for covariates, a separate covariate model should be fit. Any model which supports a predict() function should work.

camod <- lm(readk ~ gender + birth + lunchk, data = STARdata)

The cov_adj() function can be used to process the covariance adjustment model and produce the required values; and its output can be passed as an offset=.

lmitt(readk ~ 1, data = STARdata, design = des,
      weights = "ate", offset = cov_adj(camod))
#> Warning in validityMethod(object): Some covariance adjustments are NA; be
#> careful of dropping these observations when fitting the ITT effect model
#> starkbinary.TRUE 
#>         6.050889

Similarly to the weight functions, cov_adj() attempts to locate the correct arguments (in this case, mainly the data= argument) to use in the model command; while cov_adj() does fall back to using the data which is in the covariance model, its safer to use the newdata= argument if calling cov_adj() outside of the model.

head(cov_adj(camod, newdata = STARdata))
#> [1]       NA 448.8500 450.2802       NA 425.4395       NA

Also, similarly to weights, cov_adj() can be used in normal modeling commands as well.

lm(readk ~ starkbinary, data = STARdata, weights = ate(des),
   offset = cov_adj(camod))
#> Warning in validityMethod(object): Some covariance adjustments are NA; be
#> careful of dropping these observations when fitting the ITT effect model
#> 
#> Call:
#> lm(formula = readk ~ starkbinary, data = STARdata, weights = ate(des), 
#>     offset = cov_adj(camod))
#> 
#> Coefficients:
#>     (Intercept)  starkbinaryTRUE  
#>          -1.684            6.051

Absorbing Blocks

If fixed effects for blocks are desired, which can be absorbed away to avoid estimating, the absorb=TRUE argument can be passed.

lmitt(readk ~ 1, data = STARdata, design = des, absorb = TRUE)
#> starkbinary.TRUE 
#>          5.14933