Lecture 13 Estimating Regression Discontinuity

Nick Huntington-Klein

2021-01-29

Recap

  • Regression discontinuity is a design that can be used when treatment is applied based on a cutoff
  • Above the cutoff? Treated! Below the cutoff? Not treated! (or below/above)
  • By comparing people right around the cutoff, we are effectively closing all back doors
  • Isolating the treatment effect!

Today

  • Surely we aren’t just comparing averages above and below!
  • How can we actually implement regression discontinuity (presumably with regression?)
  • What do we need to keep in mind?
  • What about close variations? What if the cutoff doesn’t assign treatment perfectly?

Regression Discontinuity in Regression

  • How can re make a model for RDD?
  • We want to: look for a jump at a cutoff point
  • Get as good an idea of what the outcome is just on either side of the cutoff
  • So…

Regression Discontinuity in Regression

Let’s start with the simple linear version:

\[ Y = \beta_0 + \beta_1(X-Cutoff) + \beta_2Treated + \]

\[ \beta_3Treated\times(X-Cutoff)+\varepsilon \]

  • This formulation basically allows there to be two lines: one to the left of the cutoff ( \(\beta_0 + \beta_1(X-Cutoff)\) ), and one to the right ( \((\beta_0 + \beta_2) + (\beta_1 + \beta_3)(X-Cutoff)\) )
  • The jump at the cutoff is given by \(\beta_2\) - that’s our RDD estimate
  • We use \(X\) relative to the cutoff so that we can easily locate the jump in the \(\beta_2\) coefficient

Regression Discontinuity in Regression

Choices!

  • This is of course the simplest version!
  • Things to consider:
  • Bandwidth
  • Functional form
  • Controls

Bandwidth

  • The idea of RDD is that people just around the cutoff are very much comparable
  • Basically random if your test score is 79 vs. 81 if the cutoff is 80, for example
  • So people far away from the cutoff aren’t too informative! At best they help determine the slope of the fitted lines
  • So… drop ’em!

Bandwidth

  • RDD generally uses data only from the observations in a given range around the cutoff
  • (Or at least weights them less the further away they are from cutoff)
  • How wide should the bandwidth be?
  • There’s a big wide literature on optimal bandwidth selection which balances the addition of bias (from adding people far away from the cutoff who may have back doors) vs. variance (from adding more people so as to improve estimator precision)
  • We won’t be doing this by hand, we can often rely on an RDD command to do this for us

Functional Form

  • Why fit a straight line on either side? If the true relationship is curvy this will give us the wrong result!
  • We can be much more flexible! As long as we fit some sort of line on either side, we can look for the jump
  • One way to do this is with polynomials ( \(\tilde{X} = X-Cutoff\), \(T = Treated\) ):

\[ Y = \beta_0 + \beta_1\tilde{X} + + \beta_2 \tilde{X}^2 + \beta_3T + \beta_4\tilde{X}T + + \beta_5 \tilde{X}^2T+\varepsilon \]

Functional Form

  • (by the way, you can take this basic interaction-with-cutoff design idea and use it to look at how anything changes before and after cutoff, not just the level of \(Y\)! You could look at how the slope changes (“regression kink”), or how some other identified effect changes, or just about anything! The beauty of flexible design)

Functional Form

  • The interpretation is the same as before - look for the jump!
  • We do want to be careful with polynomials though, and not add too many
  • Remember, the more polynomial terms we add, the stranger the behavior of the line at either end of the range of data
  • And the cutoff is at the far-right end of the pre-cutoff data and the far-left end of the post-cutoff data!
  • So we can get illusory effects generated by having too many terms

Functional Form

  • A common approach is to use non-parametric regression or local linear regression
  • This doesn’t impose any particular shape! And it’s easy to get a prediction on either side of the cutoff
  • This allows for non-straight lines without dealing with the issues polynomials bring us

Different Functional Forms

  • Let’s look at the same data with a few different functional forms
  • Remember, the RDD effect is the jump at the cutoff. The TRUE effect here will be \(.3\), and the TRUE model is an order-2 polynomial
tb <- tibble(Running = runif(200)) %>%
  mutate(Y = 1.5*Running - .6*Running^2 + .3*(Running > .5) + rnorm(200, 0, .25)) %>%
  mutate(RC = Running - .5, Treated = Running > .5)

Different Functional Forms

Different Functional Forms

Different Functional Forms

Different Functional Forms

Different Functional Forms

Different Functional Forms

Functional Form:

So:

  • Avoid higher-order polynomials
  • Even the “true model” can be worse than something simpler sometimes (although if I rerun this with different random data, linear > squared doesn’t always remain true)
  • (And fewer terms makes more sense too once we apply a bandwidth and zoom in)
  • Be very suspicious if your fit veers wildly off right aroud the cutoff
  • Consider a nonparametric approach

Controls

  • Generally you don’t need control variables in an RDD
  • If the design is valid, you’ve closed all back doors. That’s sort of the whole point!
  • Although maybe we want some if we have a wide bandwidth - this will remove some of the bias
  • Still, we can get real value from having access to control variables. How?

Controls

  • Control variables allow us to perform placebo tests of our RDD model
  • RDD should close all back doors… but what if it doesn’t? What if we missed something
  • We can rerun our RDD model, but simply use a control variable as the outcome
  • If we find an effect… uh oh, that shouldn’t happen! (outside of the levels expected by normal sampling variation)
  • You can run these for every control variable you have!

Fuzzy Regression Discontinuity

  • Commonly, treatment isn’t entirely assigned on the basis of a cutoff
  • But it becomes much more/less common at the cutoff
  • We can still work with this!
  • This is called fuzzy regression discontinuity

Fuzzy Regression Discontinuity

  • We can start by making sure there’s actually a jump in treatment at the cutoff, by running RDD with treatment as the outcome
  • There has to at least be a jump (up or down) in treatment probability at the cutoff
  • If there isn’t (or if there is but it’s tiny - we’ll be dividing by this later and don’t want to divide by something close-to-zero) that’s a problem!
  • Statistically won’t work, and theoretically implies we were wrong about our RDD design

Fuzzy Regression Discontinuity

Fuzzy Regression Discontinuity

  • So what happens if we just do RDD as normal?
  • The effect is understated because we have some untreated in the post-cutoff and treated in the pre.
  • So with a positive effect the pre-cutoff value goes up (because we mix some treatment effect in there) and the post-cutoff value goes down (since we mix some untreated in there), bringing them closer together and shrinking the effect estimate

Fuzzy Regression Discontinuity

Fuzzy Regression Discontinuity

  • This is simulated data, the true effect is 2.
Y
(Intercept) 0.980***
(0.189)
Running 2.574***
(0.638)
AboveTRUE 1.113**
(0.554)
Running × AboveTRUE -0.677
(0.935)
Num.Obs. 150
R2 0.590
* p < 0.1, ** p < 0.05, *** p < 0.01

Fuzzy Regression Discontinuity

  • We can scale by how much the treatment prevalence jumped… if the chance of being treated only went up by 50%, then the effect we see should be 50% as large, so let’s adjust that away!

Fuzzy Regression Discontinuity

  • We can try literally dividing the effect on \(Y\) by the effect on \(Treated\)
Y Treated
(Intercept) 0.980*** 0.003
(0.189) (0.069)
Running 2.574*** 0.647***
(0.638) (0.231)
AboveTRUE 1.113** 0.663***
(0.554) (0.201)
Running × AboveTRUE -0.677 -0.287
(0.935) (0.339)
Num.Obs. 150 150
R2 0.590 0.627
* p < 0.1, ** p < 0.05, *** p < 0.01

Fuzzy Regression Discontinuity

  • Or can use instrumental variables (IV) for this (which we’ll get to later), with being above the cutoff as an instrument of treatment
Instrumental Variables
(Intercept) 0.970***
(0.132)
Running 1.599***
(0.564)
Treat 1.616***
(0.439)
Running × Treat -0.235
(0.586)
Num.Obs. 150
R2 0.822
* p < 0.1, ** p < 0.05, *** p < 0.01

But Really…

  • There are additional estimation details that are difficult to do yourself
  • There are optimal bandwidth selection operators
  • There is bias introduced by taking points away from the cutoff, but also available corrections for that bias
  • We probably want to use a command that does this stuff for us

rdrobust

  • The rdrobust package has the rdrobust function which runs regression discontinuty with:
  • Options for fuzzy RD
  • Optimal bandwidth selection
  • Bias correction
  • Lots of options (no control variables though)
  • Unfortunately doesn’t work with modelsummary

rdrobust

  • Remember the simulated data we had earlier with the true effect of .3?
library(rdrobust)
m <- rdrobust(tb$Y, tb$Running, c = .5)
summary(m)

rdrobust

## Call: rdrobust
## 
## Number of Obs.                  200
## BW type                       mserd
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                 104          96
## Eff. Number of Obs.             23          19
## Order est. (p)                   1           1
## Order bias  (q)                  2           2
## BW est. (h)                  0.146       0.146
## BW bias (b)                  0.247       0.247
## rho (h/b)                    0.593       0.593
## Unique Obs.                    104          96
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     0.322     0.170     1.897     0.058    [-0.011 , 0.655]     
##         Robust         -         -     1.797     0.072    [-0.034 , 0.772]     
## =============================================================================

rdplot

  • Or, easily plot the results! Note the default uses order-4 polynomial unlike rdrobust which is local linear
rdplot(tb$Y, tb$Running, c = .5)

rdplot

And for Special Cases

  • We’ll probably be actually estimating RDD models with rdrobust - going through the by-hand stuff is important for knowing what is going on though!
  • rdrobust is one of a family of packages for different kinds of RDD:
  • rdpower for power anayses of regression discontinuity models (do this!)
  • rdmulti for RDD with multiple cutoffs
  • And the wonkier rdlocrand and rddensity

Practice

  • Discuss: one place where RDD is used frequently is in politics, where vote share is used as the running variable and a .5 cutoff determines who is elected
  • Why does this work? What assumptions do we need to make? What issues might there be? Is this ‘fuzzy’ or not?

Practice

  • Load the rdrobust package and the rdrobust_RDsenate data.
  • Perform a linear and order-2 polynomial RDD using lm() with a bandwidth of .05 on either side
  • Then use rdrobust and rdplot to do the same