Lecture 19 Fixed Effects

Nick Huntington-Klein

March 8, 2019

Recap

  • Last time we talked about how controlling is a common way of blocking back doors to identify an effect
  • We can control for a variable W by using our method of using W to explain our other variables, then take the residuals
  • Another form of controlling is using a sample that has only observations with similar values of W
  • Some variables you want to be careful NOT to control for - you don’t want to close front doors, or open back doors by controlling for colliders

Today

  • Today we’ll be starting on our path for the rest of the class, where we’ll be talking about standard methods for performing causal inference
  • Different ways of getting identification once we have a diagram!
  • Our goal here will be to understand these methods conceptually
  • We won’t necessarily be doing best-statistical-practices for these. You’ll learn those in later classes, and best-practices change over time anyway
  • Our goal is to understand these methods and be able to apply a straightforward version of them, not to publish a research paper

Today

  • In particular we’ll be talking about a method that is commonly used to identify causal effects, called fixed effects
  • We’ll be discussing the kind of causal diagram that fixed effects can identify
  • All of the methods we’ll be discussing are like this - they’ll only apply to particular diagrams
  • And so knowing our diagrams will be key to knowing when to use a given method

The Problem

  • One problem we ran into last time is that we can’t really control for things if we can’t measure them
  • And there are lots of things we can’t measure or don’t have data for!
  • So what can we do?

The Solution

  • If we observe each person/firm/country multiple times, then we can forget about controlling for the actual back-door variable we’re interested in
  • And just control for person/firm/country identity instead!
  • This will control for EVERYTHING unique to that individual, whether we can measure it or not!

In Practice

  • Let’s do this on the data from the “gapminder” package
  • This data tracks life expectancy and GDP per capita in many countries over time
library(gapminder)
data(gapminder)
cor(gapminder$lifeExp,log(gapminder$gdpPercap))
## [1] 0.8076179
gapminder <- gapminder %>% group_by(country) %>%
  mutate(lifeExp.r = lifeExp - mean(lifeExp),
         logGDP.r = log(gdpPercap) - mean(log(gdpPercap))) %>% ungroup()
cor(gapminder$lifeExp.r,gapminder$logGDP.r)
## [1] 0.6404051

So What?

  • This isn’t any different, mechanically, from any other time we’ve controlled for something
  • So what’s different here?
  • Let’s think about what we’re doing conceptually

What’s the Diagram?

  • Why are we controlling for things in this gapminder analysis?
  • Because there are LOTS of things that might be back doors between GDP per capita and life expectancy
  • War, disease, political institutions, trade relationships, health of the population, economic institutions…

What’s the Diagram?

What’s the Diagram?

  • There’s no way we can identify this
  • The list of back doors is very long
  • And likely includes some things we can’t measure!

What’s the Diagram?

  • HOWEVER! If we think that these things are likely to be constant within country…
  • Then we don’t really have a big long list of back doors, we just have one: “country”

What We Get

  • So what we get out of this is that we can identify our effect even if some of our back doors include variables that we can’t actually measure
  • When we do this, we’re basically comparing countries to themselves at different time periods!
  • Pretty good way to do an apples-to-apples comparison!

Graphically

Graphically

  • The post-fixed-effects dots are basically a bunch of “Raw Country X” pasted together.
  • Imagine taking “Raw Pakistan” and moving it to the center, then taking “Raw Britain” and moving it to the center, etc.
  • Ignoring the baseline differences between Pakistan, Britain, China, etc., in their GDP per capita and life expectancy, and just looking within each country.
  • We are ignoring all differences between countries (since that way back doors lie!) and looking only at differences within countries.
  • Fixed Effects is sometimes also referred to as the “within” estimator

In Action

Notably

  • This does assume, of course, that all those back door variables CAN be described by country
  • In other words, that these back doors operate by things that are fixed within country
  • If something is a back door and changes over time in that country, fixed effects won’t help!

Varying Over Time

  • For example, earlier we mentioned war… that’s not fixed within country! A given country is at war sometimes and not other times.

Varying Over Time

  • Of course, in this case, we could control for War as well and be good!
  • Time-varying things doesn’t mean that fixed effects doesn’t work, it just means you need to control for that stuff too
  • It always comes down to thinking carefully about your diagram
  • Fixed effects mainly works as a convenient way of combining together lots of different constant-within-country back doors into something that lets us identify the model even if we can’t measure them all

Example: Sentencing

  • What effect do sentencing reforms have on crime?
  • One purpose of punishment for crime is to deter crime
  • If sentences are more clear and less risky, that may reduce a deterrent to crime and so increase crime
  • Marvell & Moody study this using data on reforms in US states from 1969-1989

Example: Sentencing

  • I’ve omitted code reading in the data
  • But in our data we have multiple observations per state
head(mmdata)
## # A tibble: 6 x 6
##   state   year assault robbery pop1000 sentreform
##   <chr>  <dbl>   <dbl>   <dbl>   <dbl>      <dbl>
## 1 "ALA "    70    7413    1731    3450          0
## 2 "ALA "    71    7645    2005    3497          0
## 3 "ALA "    72    7431    2407    3540          0
## 4 "ALA "    73    8362    2809    3581          0
## 5 "ALA "    74    8429    3562    3628          0
## 6 "ALA "    75    8440    4446    3681          0
mmdata <- mmdata %>% mutate(assaultper1000 = assault/pop1000,
         robberyper1000 = robbery/pop1000)

Fixed Effects

  • We can see how robbery rates evolve in each state over time as states implement reform

Fixed Effects

  • You can tell that states are more or less likely to implement reform in a way that’s correlated with the level of robbery they already had
  • So SOMETHING about the state is driving both the level of robberies AND the decision to implement reform
  • Who knows what!
  • Our diagram has reform -> robberies and reform <- state -> robberies, which is something we can address with fixed effects.

Fixed Effects

cor(mmdata$sentreform,mmdata$robberyper1000)
## [1] 0.1351003
mmdata <- mmdata %>% group_by(state) %>%
  mutate(reform.m = sentreform-mean(sentreform),
         robbery.m = robberyper1000-mean(robberyper1000))
cor(mmdata$reform.m,mmdata$robbery.m)
## [1] 0.2482301

Example

  • The numbers were different
  • The 0.135 included the fact that different kinds of states tend to institute reform
  • The 0.248 doesn’t!
  • Looks like the deterrent effect was real! Although important to consider if there might be time-varying back doors too, we don’t account for those in our analysis
  • What things might change within state over time that would be related to robberies and to sentencing reform?

Practice

  • We want to know the effect of your teacher on the test scores of high school students
  • Some potential back doors might go through: parents' intelligence, age, demographics, school, last year's teacher
  • Draw a diagram including all these variables, plus maybe some unobservables where appropriate
  • If you used fixed effects for students, what back doors would still be open?

Practice Answers

  • Fixed effects would close your back doors for parents' intelligence, demographics, and school, but leave open age and last year's teacher