Lecture 13 Estimating Regression Discontinuity

Nick Huntington-Klein

2021-01-29

Recap

Regression discontinuity is a design that can be used when treatment is applied based on a cutoff
Above the cutoff? Treated! Below the cutoff? Not treated! (or below/above)
By comparing people right around the cutoff, we are effectively closing all back doors
Isolating the treatment effect!

Today

Surely we aren’t just comparing averages above and below!
How can we actually implement regression discontinuity (presumably with regression?)
What do we need to keep in mind?
What about close variations? What if the cutoff doesn’t assign treatment perfectly?

Regression Discontinuity in Regression

How can re make a model for RDD?
We want to: look for a jump at a cutoff point
Get as good an idea of what the outcome is just on either side of the cutoff
So…

Regression Discontinuity in Regression

Let’s start with the simple linear version:

\[ Y = \beta_0 + \beta_1(X-Cutoff) + \beta_2Treated + \]

\[ \beta_3Treated\times(X-Cutoff)+\varepsilon \]

This formulation basically allows there to be two lines: one to the left of the cutoff ( \(\beta_0 + \beta_1(X-Cutoff)\) ), and one to the right ( \((\beta_0 + \beta_2) + (\beta_1 + \beta_3)(X-Cutoff)\) )
The jump at the cutoff is given by \(\beta_2\) - that’s our RDD estimate
We use \(X\) relative to the cutoff so that we can easily locate the jump in the \(\beta_2\) coefficient

Regression Discontinuity in Regression

Choices!

This is of course the simplest version!
Things to consider:
Bandwidth
Functional form
Controls

Bandwidth

The idea of RDD is that people just around the cutoff are very much comparable
Basically random if your test score is 79 vs. 81 if the cutoff is 80, for example
So people far away from the cutoff aren’t too informative! At best they help determine the slope of the fitted lines
So… drop ’em!

Bandwidth

RDD generally uses data only from the observations in a given range around the cutoff
(Or at least weights them less the further away they are from cutoff)
How wide should the bandwidth be?
There’s a big wide literature on optimal bandwidth selection which balances the addition of bias (from adding people far away from the cutoff who may have back doors) vs. variance (from adding more people so as to improve estimator precision)
We won’t be doing this by hand, we can often rely on an RDD command to do this for us

Functional Form

Why fit a straight line on either side? If the true relationship is curvy this will give us the wrong result!
We can be much more flexible! As long as we fit some sort of line on either side, we can look for the jump
One way to do this is with polynomials ( \(\tilde{X} = X-Cutoff\), \(T = Treated\) ):

\[ Y = \beta_0 + \beta_1\tilde{X} + + \beta_2 \tilde{X}^2 + \beta_3T + \beta_4\tilde{X}T + + \beta_5 \tilde{X}^2T+\varepsilon \]

Functional Form

(by the way, you can take this basic interaction-with-cutoff design idea and use it to look at how anything changes before and after cutoff, not just the level of \(Y\)! You could look at how the slope changes (“regression kink”), or how some other identified effect changes, or just about anything! The beauty of flexible design)

Functional Form

The interpretation is the same as before - look for the jump!
We do want to be careful with polynomials though, and not add too many
Remember, the more polynomial terms we add, the stranger the behavior of the line at either end of the range of data
And the cutoff is at the far-right end of the pre-cutoff data and the far-left end of the post-cutoff data!
So we can get illusory effects generated by having too many terms

Functional Form

A common approach is to use non-parametric regression or local linear regression
This doesn’t impose any particular shape! And it’s easy to get a prediction on either side of the cutoff
This allows for non-straight lines without dealing with the issues polynomials bring us

Different Functional Forms

Let’s look at the same data with a few different functional forms
Remember, the RDD effect is the jump at the cutoff. The TRUE effect here will be \(.3\), and the TRUE model is an order-2 polynomial

tb <- tibble(Running = runif(200)) %>%
  mutate(Y = 1.5*Running - .6*Running^2 + .3*(Running > .5) + rnorm(200, 0, .25)) %>%
  mutate(RC = Running - .5, Treated = Running > .5)

Different Functional Forms

Functional Form:

So:

Avoid higher-order polynomials
Even the “true model” can be worse than something simpler sometimes (although if I rerun this with different random data, linear > squared doesn’t always remain true)
(And fewer terms makes more sense too once we apply a bandwidth and zoom in)
Be very suspicious if your fit veers wildly off right aroud the cutoff
Consider a nonparametric approach

Controls

Generally you don’t need control variables in an RDD
If the design is valid, you’ve closed all back doors. That’s sort of the whole point!
Although maybe we want some if we have a wide bandwidth - this will remove some of the bias
Still, we can get real value from having access to control variables. How?

Controls

Control variables allow us to perform placebo tests of our RDD model
RDD should close all back doors… but what if it doesn’t? What if we missed something
We can rerun our RDD model, but simply use a control variable as the outcome
If we find an effect… uh oh, that shouldn’t happen! (outside of the levels expected by normal sampling variation)
You can run these for every control variable you have!

Fuzzy Regression Discontinuity

Commonly, treatment isn’t entirely assigned on the basis of a cutoff
But it becomes much more/less common at the cutoff
We can still work with this!
This is called fuzzy regression discontinuity

Fuzzy Regression Discontinuity

We can start by making sure there’s actually a jump in treatment at the cutoff, by running RDD with treatment as the outcome
There has to at least be a jump (up or down) in treatment probability at the cutoff
If there isn’t (or if there is but it’s tiny - we’ll be dividing by this later and don’t want to divide by something close-to-zero) that’s a problem!
Statistically won’t work, and theoretically implies we were wrong about our RDD design

Fuzzy Regression Discontinuity

So what happens if we just do RDD as normal?
The effect is understated because we have some untreated in the post-cutoff and treated in the pre.
So with a positive effect the pre-cutoff value goes up (because we mix some treatment effect in there) and the post-cutoff value goes down (since we mix some untreated in there), bringing them closer together and shrinking the effect estimate

Fuzzy Regression Discontinuity

This is simulated data, the true effect is 2.

	Y
(Intercept)	0.980***
	(0.189)
Running	2.574***
	(0.638)
AboveTRUE	1.113**
	(0.554)
Running × AboveTRUE	-0.677
	(0.935)
Num.Obs.	150
R2	0.590
* p < 0.1, p < 0.05, * p < 0.01

Fuzzy Regression Discontinuity

We can scale by how much the treatment prevalence jumped… if the chance of being treated only went up by 50%, then the effect we see should be 50% as large, so let’s adjust that away!

Fuzzy Regression Discontinuity

We can try literally dividing the effect on \(Y\) by the effect on \(Treated\)

	Y	Treated
(Intercept)	0.980***	0.003
	(0.189)	(0.069)
Running	2.574***	0.647***
	(0.638)	(0.231)
AboveTRUE	1.113**	0.663***
	(0.554)	(0.201)
Running × AboveTRUE	-0.677	-0.287
	(0.935)	(0.339)
Num.Obs.	150	150
R2	0.590	0.627
* p < 0.1, p < 0.05, * p < 0.01

Fuzzy Regression Discontinuity

Or can use instrumental variables (IV) for this (which we’ll get to later), with being above the cutoff as an instrument of treatment

	Instrumental Variables
(Intercept)	0.970***
	(0.132)
Running	1.599***
	(0.564)
Treat	1.616***
	(0.439)
Running × Treat	-0.235
	(0.586)
Num.Obs.	150
R2	0.822
* p < 0.1, p < 0.05, * p < 0.01

But Really…

There are additional estimation details that are difficult to do yourself
There are optimal bandwidth selection operators
There is bias introduced by taking points away from the cutoff, but also available corrections for that bias
We probably want to use a command that does this stuff for us

rdrobust

The rdrobust package has the rdrobust function which runs regression discontinuty with:
Options for fuzzy RD
Optimal bandwidth selection
Bias correction
Lots of options (no control variables though)
Unfortunately doesn’t work with modelsummary

rdrobust

Remember the simulated data we had earlier with the true effect of .3?

library(rdrobust)
m <- rdrobust(tb$Y, tb$Running, c = .5)
summary(m)

rdrobust

## Call: rdrobust
## 
## Number of Obs.                  200
## BW type                       mserd
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                 104          96
## Eff. Number of Obs.             23          19
## Order est. (p)                   1           1
## Order bias  (q)                  2           2
## BW est. (h)                  0.146       0.146
## BW bias (b)                  0.247       0.247
## rho (h/b)                    0.593       0.593
## Unique Obs.                    104          96
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     0.322     0.170     1.897     0.058    [-0.011 , 0.655]     
##         Robust         -         -     1.797     0.072    [-0.034 , 0.772]     
## =============================================================================

rdplot

Or, easily plot the results! Note the default uses order-4 polynomial unlike rdrobust which is local linear

rdplot(tb$Y, tb$Running, c = .5)

rdplot

And for Special Cases

We’ll probably be actually estimating RDD models with rdrobust - going through the by-hand stuff is important for knowing what is going on though!
rdrobust is one of a family of packages for different kinds of RDD:
rdpower for power anayses of regression discontinuity models (do this!)
rdmulti for RDD with multiple cutoffs
And the wonkier rdlocrand and rddensity

Practice

Discuss: one place where RDD is used frequently is in politics, where vote share is used as the running variable and a .5 cutoff determines who is elected
Why does this work? What assumptions do we need to make? What issues might there be? Is this ‘fuzzy’ or not?

Practice

Load the rdrobust package and the rdrobust_RDsenate data.
Perform a linear and order-2 polynomial RDD using lm() with a bandwidth of .05 on either side
Then use rdrobust and rdplot to do the same