Lecture 15 Drawing Causal Diagrams
Nick Huntington-Klein
February 27, 2019
Recap
- Last time we covered the concept of “controlling” or “adjusting” for a variable
- And we knew what to control for because of our causal diagram
- A causal diagram is your model of what you think the data-generating process is
- Which you can use to figure out how to identify particular arrows of interest
Today
- Today we’re going to be working through the process of how to make causal diagrams
- This will require us to understand what is going on in the real world
- (especially since we won’t know the right answer, like when we simulate our own data!)
- And then represent our understanding in the diagram
Remember
- Our goal is to represent the underlying data-generating process
- This is going to require some common-sense thinking
- As well as some economic intuition
- In real life, we’re not going to know what the right answer is
- Our models will be wrong. We just need them to be useful
Steps to a Causal Diagram
- Consider all the variables that are likely to be important in the data generating process (this includes variables you can’t observe)
- For simplicity, combine them together or prune the ones least likely to be important
- Consider which variables are likely to affect which other variables and draw arrows from one to the other
- (Bonus: Test some implications of the model to see if you have the right one)
Some notes
- Drawing an arrow requires a direction. You’re making a statement!
- Omitting an arrow is a statement too - you’re saying neither causes the other (directly)
- If two variables are correlated but neither causes the other, that means they’re both caused by some other (perhaps unobserved) variable that causes both - add it!
- There shouldn’t be any cycles - You shouldn’t be able to follow the arrows in one direction and end up where you started
- If there should be a feedback loop, like “rich get richer”, distinguish between the same variable at different points in time to avoid it
So let’s do it!
- Let’s start with an econometrics classic: what is the causal effect of an additional year of education on earnings?
- That is, if we reached in and made someone get one more year of education than they already did, how much more money would they earn?
1. Listing Variables
- We can start with our two main variables of interest:
- Education [we call this the “treatment” or “exposure” variable]
- Earnings [the “outcome”]
1. Listing Variables
- Then, we can add other variables likely to be relevant
- Focus on variables that are likely to cause or be caused by treatment
- ESPECIALLY if they’re related both to the treatment and the outcome
- They don’t have to be things you can actually observe/measure
- Variables that affect the outcome but aren’t related to anything else aren’t really important (you’ll see why next week)
1. Listing Variables
- So what can we think of?
- Ability
- Socioeconomic status
- Demographics
- Phys. ed requirements
- Year of birth
- Location
- Compulsory schooling laws
- Job connections
2. Simplify
- There’s a lot going on - in any social science system, there are THOUSANDS of things that could plausibly be in your diagram
- So we simplify. We ignore things that are likely to be only of trivial importance [so Phys. ed is out!]
- And we might try to combine variables that are very similar or might overlap in what we measure [Socioeconomic status, Demographics, Location -> Background]
- Now: Education, Earnings, Background, Year of birth, Location, Compulsory schooling, and Job Connections
3. Arrows!
- Consider which variables are likely to cause which others
- And, importantly, which arrows you can leave out
- The arrows you leave out are important to think about - you sure there’s no effect? - and prized! You need those NON-arrows to be able to causally identify anything.
3. Arrows
- Let’s start with our effect of interest
- Education causes earnings
3. Arrows
- Remaining: Background, Year of birth, Location, Compulsory schooling, and Job Connections
- All of these but Job Connections should cause Ed