Lecture 22 Regression Discontinuity

Recap

We’ve been going over ways in which we can use control groups to isolate causal effects
We can select similar control groups using matching or controlling (what economists call “selection on observables”)
We can use a treated group at a different time as its own control group with fixed effects
When a treatment is applied at a particular time, we can select a reasonable control to account for the effects of time using difference-in-difference (a “natural experiment”)

Today

We’re going to go over one other way in which we can find and isolate a very convincing control group
Like DID, it’s also an example of a natural experiment
Regression discontinuity

Regression Discontinuity

For regression discontinuity to work, we need the Treatment to be assigned based on a cutoff of what’s called a “running variable”
For example, imagine we want to know the effects of being in a Gifted and Talented (GATE) program on your adult earnings
Being admitted to the program is based on your test score (running variable)
If you score above 75, you’re in the program. 75 or below, you’re out!

Regression Discontinuity

Notice that the y-axis here is In GATE, not the outcome

Regression Discontinuity

Here’s how it look when we look at the actual outcome

Regression Discontinuity

Now, we have a bit of a problem!
If we look at the relationship between treatment and going to college, we’ll be picking up the fact that higher test scores make you more likely to go to college anyway

Regression Discontinuity

Except, that’s not actually what the diagram looks like! Test only affects GATE to the extent that it makes you be above the 90 cutoff!

Regression Discontinuity

What can we do with that information?
Well, imagine that we looked at the area just around the cutoff
Say, the cutoff is 75, so we look at 73 to 77
Within that group, it’s basically random whether you fall on one side of the line or another

Regression Discontinuity

Someone with a 75 is, on average, almost exactly the same as someone with a 76, except that one got the treatment and the other didn’t!
Heck, that tiny test score difference could be due to just having a bad day before the test
So we have two groups - the just-barely-missed-outs and the just-barely-made-its, that are basically exactly the same except that one happened to get treatment
A perfect description of what we’re looking for in a control group!

Regression Discontinuity

So we look directly around the cutoff, and compare just below to just above.
This is our way of controlling for test score and closing the GATE <- Above <- Test -> earn back door
Why not just control for Test in the normal way?
Because if we really think that, right around the cutoff, it’s random whether you’re on one side or the other, we don’t just close the Test back door, we have effectively random assignment, like an experiment!
We’re not just closing the Test back door, we’re closing all back doors

In Practice

rdd.data <- tibble(test = runif(1000)*100) %>%
  mutate(GATE = test >= 75) %>% mutate(earn = runif(1000)*40+10*GATE+test/2)
#Choose a "bandwidth" of how wide around the cutoff to look (arbitrary in our example)
#Bandwidth of 2 with a cutoff of 75 means we look from 75-2 to 75+2
bandwidth <- 2
#Just look within the bandwidth
rdd <- rdd.data %>% filter(abs(75-test) < bandwidth) %>%
  #Create a variable indicating we're above the cutoff
  mutate(above = test >= 75) %>%
  #And compare our outcome just below the cutoff to just above
  group_by(above) %>% summarize(earn = mean(earn))
rdd
#Our effect looks just about right (10 is the truth)
rdd$earn[2] - rdd$earn[1]

## # A tibble: 2 x 2
##   above  earn
##   <lgl> <dbl>
## 1 FALSE  55.2
## 2 TRUE   66.0

## [1] 10.80055

Graphically

Example: Corporate Social Responsibility

Corporate Social Responsibility (CSR) is when corporations engage in the kind of behavior that nonprofits usually do - community outreach, charity, etc.
Is this good for the corporation? Or would it make more sense to just send the money they spend to actual nonprofits if they just want to do good?
This is a causal question

Example: Corporate Social Responsibility

Convenient for our purposes, CSR policies are voted on by shareholder boards
If a board votes 49% in favor, it fails. 51% in favor? It passes!
Sounds like a regression discontinuity to me!
“Close votes” is a common application of regression discontinuity

Example: Corporate Social Responsibility

So how do CSR policy announcements affect stock prices?

Example: Corporate Social Responsbility

Caroline Flammer studies this topic
Looking at the “abnormal return” (stock price return minus what’s expected given the market) comparing CSR votes that just won vs. CSR votes that just lost
So what should we do?
Focus just around the cutoff and compare abnormal returns just above and just below.

Example: Corporate Social Responsibility

Example: Corporate Social Responsibility

Looks like stock returns increase by about .02, comparing CSRs that just lost to just won!
Seems like the market likes seeing those CSRs and values them
And all those things that we might expect to correlate with both stock price growth and CSRs - tech-savvy, youthful leadership, etc., we’ve closed those back doors too!

Balance

Have we really closed those back doors?
One thing that’s so great about RDD is that, since it’s basically random whether you’re on one side of the cutoff or another, there shouldn’t be other back doors
We can check this by seeing if other variables differ on either side of the line
This is our way of testing our diagram - if our diagram is true, then above should have no relationship with any back door variable after focusing around the cutoff

Balance

rdd.data <- tibble(test = runif(500)*100) %>%
  mutate(backdoor=rnorm(500)+test/50) %>% mutate(GATE = test + backdoor >= 75) %>%
  mutate(earn = runif(500)*40+10*GATE+5*backdoor+test/2)
bandwidth <- 2
rdd <- rdd.data %>% filter(abs(75-test) < bandwidth) %>%
  #Create a variable indicating we're above the cutoff
  mutate(above = test >= 75) %>%
  #And compare our outcome just below the cutoff to just above
  group_by(above) %>% summarize(backdoor = mean(backdoor))
rdd

## # A tibble: 2 x 2
##   above backdoor
##   <lgl>    <dbl>
## 1 FALSE     1.22
## 2 TRUE      1.57

#Not a lot of difference!
rdd$backdoor[2] - rdd$backdoor[1]

## [1] 0.3516092

Balance

Notice there’s NO real difference here, indicating that we’ve closed that back door

Summing Up

We’ve covered four main methods of making comparisons as close as possible
Controlling and matching both take a set of measured variables and adjust so you’re looking at variation within those variables
Difference-in-difference takes a chosen comparison group and uses it to adjust for changes over time in your treated group of interest
Regression discontinuity uses a cutoff in a running variable to identify a treated and nontreated group that are basically randomly assigned

Summing Up

Next time we’ll be putting some more work into practicing and applying these methods
And thinking carefully about how we can use them to create an appropriate research design so we can figure out our causal effects of interest!

Practice

Does winning help your party stay in power 30 years later?
Install and load the politicaldata package, and load data(house_results)
Create tibbles hr76 and hr16 with only 1976 and 2016
Create repadv76 equal to rep vote minus dem for 1976, and filter only to those with !is.na(repadv75)
Create repwins16 equal to rep > dem for 2016, and filter !is.na(repwins16)
select() only district,repadv76, repwins16, and inner_join() the two data sets
Compare repwins16 mean above and below repadv76=0 with a bandwidth of .04

Practice Answers

#install.packages('politicaldata')
library(politicaldata)
data(house_results)

hr76 <- filter(house_results,year==1976) %>%
  mutate(repadv76 = rep - dem) %>%
  filter(!is.na(repadv76)) %>%
  select(district,repadv76)
hr16 <- filter(house_results,year==2016) %>%
  mutate(repwins16 = rep > dem) %>%
  filter(!is.na(repwins16)) %>%
  select(district,repwins16)

fulldata <- inner_join(hr76,hr16)
bandwidth <- .04 

fulldata %>% filter(abs(repadv76-0)<=.04) %>%
  mutate(above = repadv76 > 0) %>%
  group_by(above) %>% summarize(repwins16=mean(repwins16))

## # A tibble: 2 x 2
##   above repwins16
##   <lgl>     <dbl>
## 1 FALSE     0.737
## 2 TRUE      0.889

Lecture 22 Regression Discontinuity

Nick Huntington-Klein

March 17, 2019

Recap

Today

Regression Discontinuity

Regression Discontinuity

Regression Discontinuity

Regression Discontinuity

Regression Discontinuity

Regression Discontinuity

Regression Discontinuity

Regression Discontinuity

In Practice

Graphically

Balance

Balance

Balance

Summing Up

Summing Up

Practice

Practice Answers