Lecture 14 Causal Diagrams

Nick Huntington-Klein

February 25, 2019

Recap

  • Last time we talked about causality
  • The idea that if we could reach in and manipulate X, and as a result Y changes too, then X causes Y
  • We also talked about how we can identify causality in data
  • Part of that will necessarily require us to have a model

Models

  • We have to have a model to get at causality
  • A model is our way of understanding the world. It’s our idea of what we think the data-generating process is
  • Models can be informal or formal - “The sun rises every day because the earth spins” vs. super-complex astronomical models of the galaxy with thousands of equations
  • All models are wrong. Even quantum mechanics. But as long as models are right enough to be useful, we’re good to go!

Models

  • Once we do have a model, though, that model will tell us exactly how we can find a causal effect
  • (if it’s possible; it’s not always possible)
  • Sort of like how, last time, we knew how X was assigned, and using that information we were able to get a good estimate of the true treatment

Example

  • Let’s work through a familiar example from before, where we know the data generating process
# Is your company in tech? Let's say 30% of firms are
df <- tibble(tech = sample(c(0,1),500,replace=T,prob=c(.7,.3))) %>%
  #Tech firms on average spend $3mil more defending IP lawsuits
  mutate(IP.spend = 3*tech+runif(500,min=0,max=4)) %>%
  #Tech firms also have higher profits. But IP lawsuits lower profits
  mutate(log.profit = 2*tech - .3*IP.spend + rnorm(500,mean=2))
# Now let's check for how profit and IP.spend are correlated!
cor(df$log.profit,df$IP.spend)
## [1] 0.1609575
  • Uh-oh! Truth is negative relationship, but data says positive!!

Example

  • Now we can ask: what do we know about this situation?
  • How do we suspect the data was generated? (ignoring for a moment that we know already)
    • We know that being a tech company leads you to have to spend more money on IP lawsuits
    • We know that being a tech company leads you to have higher profits
    • We know that IP lawsuits lower your profits

Example

  • From this, we realize that part of what we get when we calculate cor(df$log.profit,df$IP.spend) is the influence of being a tech company
  • Meaning that if we remove that influence, what’s left over should be the actual, negative, effect of IP lawsuits
  • Now, we can get to this intuitively, but it would be much more useful if we had a more formal model that could tell us what to do in lots of situations

Causal Diagrams

  • Enter the causal diagram!
  • A causal diagram (aka a Directed Acyclic Graph) is a way of writing down your model that lets you figure out what you need to do to find your causal effect of interest
  • All you need to do to make a causal diagram is write down all the important features of the data generating process, and also write down what you think causes what!

Example

  • We know that being a tech company leads you to have to spend more money on IP lawsuits
  • We know that being a tech company leads you to have higher profits
  • We know that IP lawsuits lower your profits

Example

  • We know that being a tech company leads you to have to spend more money on IP lawsuits
  • We know that being a tech company leads you to have higher profits
  • We know that IP lawsuits lower your profits

Example

  • We know that being a tech company leads you to have to spend more money on IP lawsuits
  • We know that being a tech company leads you to have higher profits
  • We know that IP lawsuits lower your profits