Lecture 18 Closing Back Doors: Controlling

Nick Huntington-Klein

March 6, 2019

Recap

  • We discussed how to draw a causal diagram
  • How to identify the front and back door paths
  • And how we can close those back door paths by controlling/adjusting in order to identify the front-door paths we want!
  • And so we get our causal effect

Today

  • Today we’re going to be going a little deeper into what it means to actually control/adjust for things
  • And we’re also going to talk about times when controlling/adjusting makes things WORSE - collider bias!
  • I’m going to just start saying “controlling”, by the way - “adjusting” is a little more accurate, but “controlling” is more common

Controlling

  • Up to now, here’s how we’ve been getting the relationship between X and Y while controlling for W:
  1. See what part of X is explained by W, and subtract it out. Call the result the residual part of X.
  2. See what part of Y is explained by W, and subtract it out. Call the result the residual part of Y.
  3. Get the relationship between the residual part of X and the residual part of Y.
  • With the last step including things like getting the correlation, plotting the relationship, calculating the variance explained, or comparing mean Y across values of X

In code

df <- tibble(w = rnorm(100)) %>%
  mutate(x = 2*w + rnorm(100)) %>%
  mutate(y = 1*x + 4*w + rnorm(100))
cor(df$x,df$y)
## [1] 0.9479742
df <- df %>% group_by(cut(w,breaks=5)) %>%
  mutate(x.resid = x - mean(x),
         y.resid = y - mean(y))
cor(df$x.resid,df$y.resid)
## [1] 0.7367752

In Diagrams

  • The relationship between X and Y reflects both X->Y and X<-W->Y
  • We remove the part of X and Y that W explains to get rid of X<-W and W->Y, blocking X<-W->Y and leaving X->Y

More than One Variable

  • It’s quite possible to control for more than one variable at a time
  • Although we won’t be doing it much in this class
  • A common way to do this is called multiple regression
  • You can do it with our method too, but it gets tedious pretty quickly

More than One Variable

df <- tibble(w = rnorm(100),v=rnorm(100)) %>%
  mutate(x = 2*w + 3*v + rnorm(100)) %>%
  mutate(y = 1*x + 4*w + 1.5*v + rnorm(100))
cor(df$x,df$y)
## [1] 0.9340934
df <- df %>% group_by(cut(w,breaks=5)) %>%
  mutate(x.resid = x - mean(x),
         y.resid = y - mean(y)) %>%
  group_by(cut(v,breaks=5)) %>%
  mutate(x.resid2 = x.resid - mean(x.resid),
         y.resid2 = y.resid - mean(y.resid))
cor(df$x.resid2,df$y.resid2)
## [1] 0.7419072

Graphically

Intuitively

  • So what does this actually mean? Why do we do it this way?
  • As mentioned before, the goal here is to remove X<-W and W->Y so as to close the back door
  • But the way we actually do this is by removing differences that are predicted by W
  • In other words, we are are comparing people as though they had the same value of W

Intuitively

  • That’s why you hear some people refer to controlling as “holding W constant” - we literally remove the variation in W, leaving it “constant”
  • Another way of thinking of it is that you’re looking for variation of X and Y within values of W - this is made clear in the animation
  • Comparing apples to apples

Intuitively

  • Thinking about it this way also makes it clear that there are other ways to control for things besides the method we’ve outlined
  • Anything that ensures that we’re looking at observations with the same (or at least very very similar) values of W is in effect controlling for W
  • A common way this happens is by selecting a sample

An Example

  • We’ll borrow an example from the Wooldridge econometrics textbook (data available in the wooldridge package)
  • LaLonde (1986) is a study of whether a job training program improves earnings in 1978 (re78)
  • Specifically, it has data on an experiment of assigning people to a job training program (data jtrain2)
  • And also data on people who chose to participate in that program, or didn’t (data jtrain3)
  • The goal of causal inference - do something to jtrain3 so it gives us the “correct” result from jtrain2

LaLonde

library(wooldridge)
#EXPERIMENT
data(jtrain2)
jtrain2 %>% group_by(train) %>% summarize(wage = mean(re78))
## # A tibble: 2 x 2
##   train  wage
##   <int> <dbl>
## 1     0  4.55
## 2     1  6.35
#BY CHOICE
data(jtrain3)
jtrain3 %>% group_by(train) %>% summarize(wage = mean(re78))
## # A tibble: 2 x 2
##   train  wage
##   <int> <dbl>
## 1     0 21.6 
## 2     1  6.35

Hmm…

  • What back doors might the jtrain3 analysis be facing?
  • People who need training want to get it but are likely to get lower wages anyway!

Apples to Apples

  • The two data sets are looking at very different groups of people!
library(stargazer)
stargazer(select(jtrain2,re75,re78),type='text')
stargazer(select(jtrain3,re75,re78),type='text')
## 
## ===========================================================
## Statistic  N  Mean  St. Dev.  Min  Pctl(25) Pctl(75)  Max  
## -----------------------------------------------------------
## re75      445 1.377  3.151     0      0       1.2      25  
## re78      445 5.301  6.631   0.000  0.000    8.125   60.308
## -----------------------------------------------------------
## 
## ===============================================================
## Statistic   N    Mean  St. Dev.  Min  Pctl(25) Pctl(75)   Max  
## ---------------------------------------------------------------
## re75      2,675 17.851  13.878    0     7.6      25.6     157  
## re78      2,675 20.502  15.633  0.000  9.243    28.816  121.174
## ---------------------------------------------------------------

Controlling

  • We can’t measure “needs training” directly, but we can sort of control for it by limiting ourselves solely to the kind of people who need it - those who had low wages in 1975!
## # A tibble: 2 x 2
##   train  wage
##   <int> <dbl>
## 1     0  4.55
## 2     1  6.35
## # A tibble: 2 x 2
##   train  wage
##   <int> <dbl>
## 1     0  5.62
## 2     1  6.00

Controlling

  • Not exactly the same (not surprising - we were pretty arbitrary in how we controlled for need.tr, and we never closed train <- U -> wage, oh and we left out plenty of other back doors: race, age, etc.) but an improvement
  • This goes to show that choosing a sample is a form of controlling
  • ANYTHING that ensures you’re looking at observations with similar values of W is a form of controlling for W

Bad Controls

  • So far so good - we have the concept of what it means to control and some ways we can do it, so we can get apples-to-apples comparisons
  • But what should we control for?
  • Everything, right? We want to make sure our comparison is as apple-y as possible!
  • Well, no, not actually

Bad Controls

  • Some controls can take you away from showing you the front door
  • We already discussed how it’s not a good idea to block a front-door path.
  • An increase in the price of cigarettes might improve your health, but not if we control for the number of cigarettes you smoke!

Bad Controls

  • There is another kind of bad control - a collider
  • Basically, if you’re listing out paths, and you see a path where the arrows collide by both pointing at the same variable, that path is already blocked
  • Like this: X <- W -> C <- Z -> Y
  • Note the -> C <-. Those arrow are colliding!
  • If we control for the collider C, that path opens back up!

Colliders

  • One kind of diagram (of many) where this might pop up:

Colliders

  • How could this be?
  • Because even if two variables cause the same thing (a -> m, b -> m), that doesn’t make them related. Your parents both caused your genetic makeup, that doesn’t make their genetics related. Knowing dad’s eye color tells you nothing about mom’s.
  • But within given values of the collider, they ARE related. If you’re brown-eyed, then observing that your dad has blue eyes tells us that your mom is brown-eyed

Colliders

  • So here, x <- a -> m <- b -> y is pre-blocked, no problem. a and b are unrelated, so no back door issue!
  • Control for m and now a and b are related, back door path open.

Example

  • You want to know if programming skills reduce your social skills
  • So you go to a tech company and test all their employees on programming and social skills
  • Let’s imagine that the truth is that programming skills and social skills are unrelated
  • But you find a negative relationship! What gives?

Example

  • Oops! By surveying only the tech company, you controlled for “works in a tech company”
  • To do that, you need programming skills, social skills, or both! It’s a collider!

Example

set.seed(14233)
survey <- tibble(prog=rnorm(1000),social=rnorm(1000)) %>%
  mutate(hired = (prog + social > .25))
#Truth
cor(survey$prog,survey$social)
## [1] 0.03710333
#Controlling by just surveying those hired
cor(filter(survey,hired==1)$prog,filter(survey,hired==1)$social)
## [1] -0.4789209
#Surveying everyone and controlling with our normal method
survey <- survey %>% group_by(hired) %>%  mutate(p.resid = prog - mean(prog),
         s.resid = social - mean(social)) %>% ungroup()
cor(survey$p.resid,survey$s.resid)
## [1] -0.4268598

Graphically

Colliders

  • This doesn’t just create correlations from nothing, it can also distort causal effects that ARE there
  • For example, did you know that height is UNrelated to basketball skill… among NBA players?

Colliders

  • Sometimes, things can get real tricky
  • In some cases, the same variable NEEDS to be controlled for to close a back door path, but it’s a collider on ANOTHER back door path!
  • In those cases you just can’t identify the effect, at least not easily
  • This pops up in estimates of the gender wage gap - example from Cunningham’s Mixtape: should you control for occupation when looking at gender discrimination in the labor market?

Colliders in the Gender Wage Gap

  • We are interested in gender -> discrim -> wage; our treatment is gender -> discrim, the discrimination caused by your gender

Colliders in the Gender Wage Gap

  • Front doors/Open back doors/Closed back doors
  • gender -> discrim -> wage
  • gender -> discrim -> occup -> wage
  • discrim <- gender -> occup -> wage
  • discrim <- gender -> occup <- abil -> wage
  • gender -> discrim -> occup <- abil -> wage

Colliders in the Gender Wage Gap

  • No occup control? Ignore nondiscriminatory reasons to choose different occupations by gender
  • Control for occup? Open both back doors, create a correlation between abil and discrim where there wasn’t one
  • And also close a FRONT door, gender -> discrim -> occup -> wage: discriminatory reasons for gender diffs in occup
  • We actually can’t identify the effect we want in this diagram by controlling. It happens!
  • Suggests this question goes beyond just controlling for stuff. Real research on this topic gets clever.

Next Time

  • Get ready! Next time we’ll begin our trek down the list of common causal inference methods as they actually get used!
  • Many of them apply controlling for stuff in interesting ways
  • Others use methods other than controlling!
  • This is what economists and many data scientists actually do with their time
  • We will begin with “fixed effects”