# Lecture 14 Causal Diagrams

## Recap

• Last time we talked about causality
• The idea that if we could reach in and manipulate `X`, and as a result `Y` changes too, then `X` causes `Y`
• We also talked about how we can identify causality in data
• Part of that will necessarily require us to have a model

## Models

• We have to have a model to get at causality
• A model is our way of understanding the world. It’s our idea of what we think the data-generating process is
• Models can be informal or formal - “The sun rises every day because the earth spins” vs. super-complex astronomical models of the galaxy with thousands of equations
• All models are wrong. Even quantum mechanics. But as long as models are right enough to be useful, we’re good to go!

## Models

• Once we do have a model, though, that model will tell us exactly how we can find a causal effect
• (if it’s possible; it’s not always possible)
• Sort of like how, last time, we knew how `X` was assigned, and using that information we were able to get a good estimate of the true treatment

## Example

• Let’s work through a familiar example from before, where we know the data generating process
``````# Is your company in tech? Let's say 30% of firms are
df <- tibble(tech = sample(c(0,1),500,replace=T,prob=c(.7,.3))) %>%
#Tech firms on average spend \$3mil more defending IP lawsuits
mutate(IP.spend = 3*tech+runif(500,min=0,max=4)) %>%
#Tech firms also have higher profits. But IP lawsuits lower profits
mutate(log.profit = 2*tech - .3*IP.spend + rnorm(500,mean=2))
# Now let's check for how profit and IP.spend are correlated!
cor(df\$log.profit,df\$IP.spend)``````
``## [1] 0.1609575``
• Uh-oh! Truth is negative relationship, but data says positive!!

## Example

• How do we suspect the data was generated? (ignoring for a moment that we know already)
• We know that being a tech company leads you to have to spend more money on IP lawsuits
• We know that being a tech company leads you to have higher profits
• We know that IP lawsuits lower your profits

## Example

• From this, we realize that part of what we get when we calculate `cor(df\$log.profit,df\$IP.spend)` is the influence of being a tech company
• Meaning that if we remove that influence, what’s left over should be the actual, negative, effect of IP lawsuits
• Now, we can get to this intuitively, but it would be much more useful if we had a more formal model that could tell us what to do in lots of situations

## Causal Diagrams

• Enter the causal diagram!
• A causal diagram (aka a Directed Acyclic Graph) is a way of writing down your model that lets you figure out what you need to do to find your causal effect of interest
• All you need to do to make a causal diagram is write down all the important features of the data generating process, and also write down what you think causes what!

## Example

• We know that being a tech company leads you to have to spend more money on IP lawsuits
• We know that being a tech company leads you to have higher profits
• We know that IP lawsuits lower your profits

## Example

• We know that being a tech company leads you to have to spend more money on IP lawsuits
• We know that being a tech company leads you to have higher profits
• We know that IP lawsuits lower your profits

## Example

• We know that being a tech company leads you to have to spend more money on IP lawsuits
• We know that being a tech company leads you to have higher profits
• We know that IP lawsuits lower your profits