Lecture 19 Fixed Effects

Nick Huntington-Klein

March 8, 2019


  • Last time we talked about how controlling is a common way of blocking back doors to identify an effect
  • We can control for a variable W by using our method of using W to explain our other variables, then take the residuals
  • Another form of controlling is using a sample that has only observations with similar values of W
  • Some variables you want to be careful NOT to control for - you don’t want to close front doors, or open back doors by controlling for colliders


  • Today we’ll be starting on our path for the rest of the class, where we’ll be talking about standard methods for performing causal inference
  • Different ways of getting identification once we have a diagram!
  • Our goal here will be to understand these methods conceptually
  • We won’t necessarily be doing best-statistical-practices for these. You’ll learn those in later classes, and best-practices change over time anyway
  • Our goal is to understand these methods and be able to apply a straightforward version of them, not to publish a research paper


  • In particular we’ll be talking about a method that is commonly used to identify causal effects, called fixed effects
  • We’ll be discussing the kind of causal diagram that fixed effects can identify
  • All of the methods we’ll be discussing are like this - they’ll only apply to particular diagrams
  • And so knowing our diagrams will be key to knowing when to use a given method

The Problem

  • One problem we ran into last time is that we can’t really control for things if we can’t measure them
  • And there are lots of things we can’t measure or don’t have data for!
  • So what can we do?

The Solution

  • If we observe each person/firm/country multiple times, then we can forget about controlling for the actual back-door variable we’re interested in
  • And just control for person/firm/country identity instead!
  • This will control for EVERYTHING unique to that individual, whether we can measure it or not!

In Practice

  • Let’s do this on the data from the “gapminder” package
  • This data tracks life expectancy and GDP per capita in many countries over time
## [1] 0.8076179
gapminder <- gapminder %>% group_by(country) %>%
  mutate(lifeExp.r = lifeExp - mean(lifeExp),
         logGDP.r = log(gdpPercap) - mean(log(gdpPercap))) %>% ungroup()
## [1] 0.6404051

So What?

  • This isn’t any different, mechanically, from any other time we’ve controlled for something
  • So what’s different here?
  • Let’s think about what we’re doing conceptually

What’s the Diagram?

  • Why are we controlling for things in this gapminder analysis?
  • Because there are LOTS of things that might be back doors between GDP per capita and life expectancy
  • War, disease, political institutions, trade relationships, health of the population, economic institutions…

What’s the Diagram?