# Lecture 24 Instrumental Variables

## Recap

• We’ve covered quite a few methods for isolating causal effects!
• Controlling for variables to close back doors (explain X and Y with the control, remove what’s explained)
• Matching on variables to close back doors (find treated and non-treated observations with )
• Using a control group to control for time (before/after difference for treated and untreated, then difference them)
• Using a cutoff to construct a very good control group (treated/untreated difference near a cutoff)

## Today

• We’ve got ONE LAST METHOD!
• Today we’ll be covering instrumental variables
• The basic idea is that we have some variable - the instrumental variable - that causes `X` but has no other back doors!

## Natural Experiments

• This calls back to our idea of trying to mimic an experiment without having an experiment. In fact, let’s think about an actual randomized experiment.
• We have some random assignment `R` that determines your `X`. So even though we have back doors between `X` and `Y`, we can identify `X -> Y`

## Natural Experiments

• The idea of instrumental variables is this:
• What if we can find a variable that can take the place of R in the diagram despite not actually being something we randomized in an experiment?
• If we can do that, we’ve clearly got a “natural experiment”
• When we find a variable that can do that, we call it an “instrument” or “instrumental variable”
• Let’s call it `Z`

## Instrumental Variable

So, for `Z` take the place of `R` in the diagram, what do we need?

• `Z` must be related to `X` (typically `Z -> X` but not always)
• There must be no open paths from `Z` to `Y` except for ones that go through `X`

In other words “`Z` is related to `X`, and all the effect of `Z` on `Y` goes THROUGH `X`

## Instrumental Variable

How?

• Explain `X` with `Z`, and keep only what is explained, `X'`
• Explain `Y` with `Z`, and keep only what is explained, `Y'`
• [If `Z` is logical/binary] Divide the difference in `Y'` between `Z` values by the difference in `X'` between `Z` values
• [If `Z` is not logical/binary] Get the correlation between `X'` and `Y'`