# Lecture 15 Drawing Causal Diagrams

## Recap

• Last time we covered the concept of “controlling” or “adjusting” for a variable
• And we knew what to control for because of our causal diagram
• A causal diagram is your model of what you think the data-generating process is
• Which you can use to figure out how to identify particular arrows of interest

## Today

• Today we’re going to be working through the process of how to make causal diagrams
• This will require us to understand what is going on in the real world
• (especially since we won’t know the right answer, like when we simulate our own data!)
• And then represent our understanding in the diagram

## Remember

• Our goal is to represent the underlying data-generating process
• This is going to require some common-sense thinking
• As well as some economic intuition
• In real life, we’re not going to know what the right answer is
• Our models will be wrong. We just need them to be useful

## Steps to a Causal Diagram

1. Consider all the variables that are likely to be important in the data generating process (this includes variables you can’t observe)
2. For simplicity, combine them together or prune the ones least likely to be important
3. Consider which variables are likely to affect which other variables and draw arrows from one to the other
4. (Bonus: Test some implications of the model to see if you have the right one)

## Some notes

• Drawing an arrow requires a direction. You’re making a statement!
• Omitting an arrow is a statement too - you’re saying neither causes the other (directly)
• If two variables are correlated but neither causes the other, that means they’re both caused by some other (perhaps unobserved) variable that causes both - add it!
• There shouldn’t be any cycles - You shouldn’t be able to follow the arrows in one direction and end up where you started
• If there should be a feedback loop, like “rich get richer”, distinguish between the same variable at different points in time to avoid it

## So let’s do it!

• Let’s start with an econometrics classic: what is the causal effect of an additional year of education on earnings?
• That is, if we reached in and made someone get one more year of education than they already did, how much more money would they earn?

## 1. Listing Variables

• Education [we call this the “treatment” or “exposure” variable]
• Earnings [the “outcome”]

## 1. Listing Variables

• Then, we can add other variables likely to be relevant
• Focus on variables that are likely to cause or be caused by treatment
• ESPECIALLY if they’re related both to the treatment and the outcome
• They don’t have to be things you can actually observe/measure
• Variables that affect the outcome but aren’t related to anything else aren’t really important (you’ll see why next week)

## 1. Listing Variables

• So what can we think of?
• Ability
• Socioeconomic status
• Demographics
• Phys. ed requirements
• Year of birth
• Location
• Compulsory schooling laws
• Job connections

## 2. Simplify

• There’s a lot going on - in any social science system, there are THOUSANDS of things that could plausibly be in your diagram
• So we simplify. We ignore things that are likely to be only of trivial importance [so Phys. ed is out!]
• And we might try to combine variables that are very similar or might overlap in what we measure [Socioeconomic status, Demographics, Location -> Background]
• Now: Education, Earnings, Background, Year of birth, Location, Compulsory schooling, and Job Connections

## 3. Arrows!

• Consider which variables are likely to cause which others
• And, importantly, which arrows you can leave out
• The arrows you leave out are important to think about - you sure there’s no effect? - and prized! You need those NON-arrows to be able to causally identify anything.