X and Y while controlling for W:X is explained by W, and subtract it out. Call the result the residual part of X.Y is explained by W, and subtract it out. Call the result the residual part of Y.X and the residual part of Y.Y across values of Xdf <- tibble(w = rnorm(100)) %>%
mutate(x = 2*w + rnorm(100)) %>%
mutate(y = 1*x + 4*w + rnorm(100))
cor(df$x,df$y)## [1] 0.9479742
df <- df %>% group_by(cut(w,breaks=5)) %>%
mutate(x.resid = x - mean(x),
y.resid = y - mean(y))
cor(df$x.resid,df$y.resid)## [1] 0.7367752
X and Y reflects both X->Y and X<-W->YX and Y that W explains to get rid of X<-W and W->Y, blocking X<-W->Y and leaving X->Y df <- tibble(w = rnorm(100),v=rnorm(100)) %>%
mutate(x = 2*w + 3*v + rnorm(100)) %>%
mutate(y = 1*x + 4*w + 1.5*v + rnorm(100))
cor(df$x,df$y)## [1] 0.9340934
df <- df %>% group_by(cut(w,breaks=5)) %>%
mutate(x.resid = x - mean(x),
y.resid = y - mean(y)) %>%
group_by(cut(v,breaks=5)) %>%
mutate(x.resid2 = x.resid - mean(x.resid),
y.resid2 = y.resid - mean(y.resid))
cor(df$x.resid2,df$y.resid2)## [1] 0.7419072
X<-W and W->Y so as to close the back doorWWW constant” - we literally remove the variation in W, leaving it “constant”X and Y within values of W - this is made clear in the animationW is in effect controlling for Wwooldridge package)re78)jtrain2)jtrain3)jtrain3 so it gives us the “correct” result from jtrain2library(wooldridge)
#EXPERIMENT
data(jtrain2)
jtrain2 %>% group_by(train) %>% summarize(wage = mean(re78))## # A tibble: 2 x 2
## train wage
## <int> <dbl>
## 1 0 4.55
## 2 1 6.35
#BY CHOICE
data(jtrain3)
jtrain3 %>% group_by(train) %>% summarize(wage = mean(re78))## # A tibble: 2 x 2
## train wage
## <int> <dbl>
## 1 0 21.6
## 2 1 6.35
jtrain3 analysis be facing?library(stargazer)
stargazer(select(jtrain2,re75,re78),type='text')
stargazer(select(jtrain3,re75,re78),type='text')##
## ===========================================================
## Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
## -----------------------------------------------------------
## re75 445 1.377 3.151 0 0 1.2 25
## re78 445 5.301 6.631 0.000 0.000 8.125 60.308
## -----------------------------------------------------------
##
## ===============================================================
## Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
## ---------------------------------------------------------------
## re75 2,675 17.851 13.878 0 7.6 25.6 157
## re78 2,675 20.502 15.633 0.000 9.243 28.816 121.174
## ---------------------------------------------------------------
## # A tibble: 2 x 2
## train wage
## <int> <dbl>
## 1 0 4.55
## 2 1 6.35
## # A tibble: 2 x 2
## train wage
## <int> <dbl>
## 1 0 5.62
## 2 1 6.00
need.tr, and we never closed train <- U -> wage, oh and we left out plenty of other back doors: race, age, etc.) but an improvementW is a form of controlling for WX <- W -> C <- Z -> Y-> C <-. Those arrow are colliding!C, that path opens back up!a -> m, b -> m), that doesn’t make them related. Your parents both caused your genetic makeup, that doesn’t make their genetics related. Knowing dad’s eye color tells you nothing about mom’s.x <- a -> m <- b -> y is pre-blocked, no problem. a and b are unrelated, so no back door issue!m and now a and b are related, back door path open.set.seed(14233)
survey <- tibble(prog=rnorm(1000),social=rnorm(1000)) %>%
mutate(hired = (prog + social > .25))
#Truth
cor(survey$prog,survey$social)## [1] 0.03710333
#Controlling by just surveying those hired
cor(filter(survey,hired==1)$prog,filter(survey,hired==1)$social)## [1] -0.4789209
#Surveying everyone and controlling with our normal method
survey <- survey %>% group_by(hired) %>% mutate(p.resid = prog - mean(prog),
s.resid = social - mean(social)) %>% ungroup()
cor(survey$p.resid,survey$s.resid)## [1] -0.4268598
gender -> discrim -> wage; our treatment is gender -> discrim, the discrimination caused by your gendergender -> discrim -> wagegender -> discrim -> occup -> wagediscrim <- gender -> occup -> wagediscrim <- gender -> occup <- abil -> wagegender -> discrim -> occup <- abil -> wageoccup control? Ignore nondiscriminatory reasons to choose different occupations by genderoccup? Open both back doors, create a correlation between abil and discrim where there wasn’t onegender -> discrim -> occup -> wage: discriminatory reasons for gender diffs in occup