Combining Experimental and Non-Experimental Design in Causal Inference

Combining Experimental and Non-Experimental Design in Causal Inference Kari Lock Morgan Department of Statistics Penn State University Rao Prize Conference May 12 th, 2017

A Tribute to Don Design trumps analysis Motivated by a real study Experimental design & rerandomization Observational study & propensity scores Rubin causal model & potential outcomes Educational testing (AP scores) (Missing data) (Noncompliance)

Design trumps Analysis For Objective Causal Inference, Design trumps Analysis Rubin 2008 X = covariates, W = treatment, Y = outcome(s) Design W X Analysis Y W, X Balance covariates As much as possible should be done without observed outcomes!

Knowledge in Action Goal: estimate causal effect of Knowledge in Action (KIA) (a form of project-based learning) in AP classes on AP scores and other outcomes Part 1 ( Efficacy Study ): randomize schools to KIA or control; compare outcomes after 1 year Part 2 ( Maturation Study ): continue to follow schools another year (experimental & observational)

Districts (blocks) *In this talk I ll just focus on one district District 1 District 5 Schools (clusters) RANDOMIZATION Teachers Students OUTCOMES

Covariates Covariates available at randomization: School covariates (e.g. Title 1 status, type, etc.) Teacher covariates (e.g. years of experience) Previous student (class) covariates: Race/ethnicity Poverty status Parental education PSAT scores x 1 8 th grade standardized test scores Total number of students Number of students who took the AP exam If covariates are available, we should use them when we randomize! x 2 2 covariates used for randomization

Rerandomization Collect covariate data Specify criteria for acceptable balance (Re)randomize Randomize units units to to treatment groups Check balance xx 1,TT xx 1,CC < 0.05 and xx 2,TT xx 2,CC < 0.05 unacceptable acceptable Conduct experiment Analyze results

Covariate Balance: Empirical Percent reduction in variance: PPPPPPPP = vvvvvv xx jj,tt xx jj,cc vvvvvv xx jj,tt xx jj,cc rrrrrrrrrrrr. vvvvvv xx jj,tt xx jj,cc

Covariate Balance: Theoretical Suppose xx jj,tt xx jj,cc ~ Normal for jj 1 kk xx 1 xx 2 xx kk Rerandomize if xx jj,tt xx jj,cc aa jj for jj 1 kk Then the PRIV for xx jj is pp xx1 = 0.984 pp xx2 = 0.973 pp zzjj = 1 2 γγ 3 2, aa jj 2 2vvvvvv(xx jj ) γγ 1 2, aa jj 2 2vvvvvv(xx jj ) 1 nn + 1 TT nncc 1 nn + 1 TT nncc, where γγ(bb, cc) 0 cc yy bb 1 ee yy dddd.

Outcome PRIV If rerandomization is equal percent variance reducing (EPVR), then PRIV for the outcome difference in means is PPPPPPPP YY = RR 2 PPPPPPPP XX Here, RR 2 0.75 and PPPPPPPP XX 98%, so PPPPPPPP YY 0.75 0.98 = 74% Precision increases by a factor of 1 1 0.74 = 3.85 Equivalent to almost quadrupling n!!! (Effective sample size goes from 76 to 293!) NOTE: This is TRUE variance! Need randomization-based inference to reflect this

Correlational Structure

Affine Invariance Affine invariance: rerandomization stays the same for any affine transformation a + bx If rerandomization criterion is affinely invariant and x is ellipsoidally symmetric 1. Ε XX TT XX cc rerand. = Ε XX TT XX cc = 00 => Rerandomization leads to unbiased estimates for any linear function of x 2. cov XX TT XX cc rerand. cov XX TT XX cc Preserves the correlations of XX TT XX cc Balance improvement equal for each xx jj (equal percent variance reducing) (Morgan and Rubin, Annals of Statistics, 2012)

Mahalanobis Mahalanobis: XX TT XX cc cov xx 1 XX TT XX cc

Knowledge in Action Part 1 ( Efficacy Study ): randomize schools to KIA or control; compare outcomes after 1 year Part 2 ( Maturation Study ): continue to follow schools another year (experimental & observational)

Covariate data for schools not in RCT MATCHING Matched Sample: 2 years of KIA no KIA Covariate data for schools in RCT RANDOMIZE WAVE 1: WAVE 2: KIA KIA: 2 nd year KIA: 1 st year 2 years of KIA no KIA? 1 year of KIA no KIA 2 years of KIA 1 year of KIA 2015-2016 2016-2017 2017-2018

2 years of KIA no KIA? 2 years of KIA no KIA Non-experimental direct approach Matched Sample: WAVE 1: KIA KIA: 2 nd year WHICH IS BETTER??? WAVE 2: KIA: 1 st year 1 year of KIA no KIA 2 years of KIA 1 year of KIA Experimental indirect approach 2016-2017 2017-2018

Potential Outcomes & Estimands YY jj (WW jj, tt)= potential outcome for school j under treatment WW jj in year t Causal effect: compare potential outcomes under different treatments ττ 1,tt YY 1, tt YY 0, tt = nn jj=1 YY jj 1, tt nn nn jj=1 YY jj 0, tt nn ττ 2 1,tt YY 2, tt YY 1, tt = nn jj=1 YY jj 2, tt nn nn jj=1 YY jj 1, tt nn ττ 2,tt YY 2, tt YY 0, tt = nn jj=1 YY jj 2, tt nn nn jj=1 YY jj 1, tt nn

Estimators ττ 1,2017 nn jj=1 WWjj YY jj (1,2017) nn jj=1 jj=1 WW jj nn (1 WWjj ) YY jj (0,2017) nn jj=1 (1 WW jj ) ττ 2 1,2018 nn jj=1 II WWjj =2 YY jj (2,2018) nn jj=1 II WWjj =2 nn jj=1 II WWjj =1 YY jj (1,2018) nn jj=1 II WWjj =1 ττ 2,2018 nn jj=1 II WWjj =2 YY jj (2,2018) nn jj=1 II WWjj =2 nn jj=1 II WWjj =1 YY jj (0,2018) nn jj=1 II WWjj =1

Propensity Score Matching 1 if in Wave 1 of experiment WW jj = 0 if not in experiment Propensity score: ee jj = PP WW jj = 1 xx jj ) Match each Wave 1 teacher with a control with a similar propensity score Criteria for success: Quality of observed covariate data can only balance observed data Good matches available adequate overlap between groups large enough pool of potential controls

Propensity Score Matching If we have good matches, we can balance observed covariates Key point: unless we have data on all relevant covariates (which we won t), there will still be bias (baseline differences) Usually hard to quantify this bias BUT we have a very rare feature!!

1 year of KIA no KIA 2 years of KIA no KIA Matched Sample: WAVE 1: WAVE 2: KIA KIA: 2 nd year KIA: 1 st year We can validate the nonexperimental approach by comparing 1 year impact estimates! 1 year of KIA no KIA 2016-2017 2017-2018

2 years of KIA no KIA? 1 year of KIA no KIA 2 years of KIA no KIA Non-experimental direct approach Matched Sample: WAVE 1: KIA KIA: 2 nd year WHICH IS BETTER??? WAVE 2: KIA: 1 st year 1 year of KIA no KIA 2 years of KIA 1 year of KIA Experimental indirect approach 2016-2017 2017-2018

Experimental Indirect Approach ττ 2 1,2018 + ττ 1,2017 = YY 2,2018 YY 1,2018 + YY 1,2017 YY 0,2017 Critical assumption: potential outcomes may depend on year, but treatment effects do not That is, YY 1,2017 YY 1,2018, but ττ 1,2017 = ττ 1,2018 ττ 1 This implies ττ 1 + ττ 2 1 = ττ 2

Define ττ 2 ττ 1 + ττ 2 1 Unbiased Theorem: Assuming treatment effects do not vary by year, Ε ττ 2 = ττ 2. Proof: Ε ττ 2 = E ττ 1 + ττ 2 1 = ττ 1 + ττ 2 1 = ττ 2.

Variance vvvvvv( ττ 2 ) = vvvvvv( ττ 1 + ττ 2 1 ) = vvvvvv ττ 1 +vvvvvv ττ 2 1 + 2cov( ττ 1, ττ 2 1 ) Both estimates are comparisons of the same teachers; likely to be highly positively correlated More than double the variance of each individual estimate

Constant Treatment Effect? Suppose constant treatment effect, so YY jj 1, tt = YY jj 0, tt + ττ 1 and YY jj 2, tt = YY jj 1, tt + ττ 2 1 jj. Then: o o ττ 1 = ττ 1 + YY WWWWWWWWW (0, 2017) YY WWWWWWWWW (0, 2017) ττ 2 1 = ττ 2 1 + YY WWWWWWWWW (0, 2018) YY WWWWWWWWW (0, 2018) Under additivity, and if we again assume differences in time cancel with comparisons within the same year, then ττ 1 and ττ 2 1 are perfectly correlated! vvvvvv( ττ 2 ) = vvvvvv ττ 1 +vvvvvv ττ 2 1 + 2 vvvvvv ττ 1 vvvvvv ττ 2 1 If vvvvvv ττ 1 vvvvvv ττ 2 1, then vvvvvv( ττ 2 ) 4vvvvvv ττ 1

2 years of KIA no KIA? 1 year of KIA no KIA 2 years of KIA no KIA Non-experimental direct approach Matched Sample: WHICH IS BETTER??? WAVE 1: WAVE 2: KIA KIA: 2 nd year KIA: 1 st year BIAS- VARIANCE TRADEOFF! Complementary! 1 year of KIA no KIA 2 years of KIA 1 year of KIA Experimental indirect approach 2016-2017 2017-2018

Other Interesting Tidbits Student-level versus school level analysis Combined analyses? Student/parental consent => missing data Joiners Non-compliance Teachers switching schools/courses Anticipation bias and more!

Conclusion Rerandomization can improve experimental design Propensity score matching can improve observational studies Bias-variance tradeoff for 2 year impact Lots of fun statistics in rich applied problems!

klm47@psu.edu Funded by George Lucas Educational Foundation Joint work with Anna Saavedra, Amie Rappaport, Ying Liu, and Juan Saavedra

Weighting Option 1: Weight schools equally ττ 1 = nn jj=1 WW jj YY jj 1 nn jj=1 nn (1 WW jj ) YY jj 0 WW nn jj (1 WW jj ) jj=1 jj=1 Option 2: Weight students equally nn ττ 1 = jj=1 WW jj YY jj 1 nn jj nn jj=1(1 WW jj ) YY jj 0 nn jj WW jj nn nn jj (1 WW jj )nn jj jj=1 nn jj=1 Differing number of students (3-127) ττ may vary with class size = nn jj=1 nn jj WWjj YYii ii=1 (1) nn nn nn jj jj=1 (1 WWjj ) YYii ii=1 (0) WW jj nn nn jj (1 WW jj )nn jj jj=1 jj=1

Multilevel Model 2 Student-level: YY ii WW jj ii ~NN μμ jj WW jj + ββ 1 xx 1, σσ YY 2 School-level: μμ jj WW jj ~NN αα kk + ττww jj + ββ 2 xx 2, σσ μμ District-level: αα kk ~NN(αα + ββ 3 xx 3, σσ 2 αα ) Smaller schools shrink more; in between the two weighting extremes