X
and Y
while controlling for W
:X
is explained by W
, and subtract it out. Call the result the residual part of X
.Y
is explained by W
, and subtract it out. Call the result the residual part of Y
.X
and the residual part of Y
.Y
across values of X
df <- tibble(w = rnorm(100)) %>%
mutate(x = 2*w + rnorm(100)) %>%
mutate(y = 1*x + 4*w + rnorm(100))
cor(df$x,df$y)
## [1] 0.9479742
df <- df %>% group_by(cut(w,breaks=5)) %>%
mutate(x.resid = x - mean(x),
y.resid = y - mean(y))
cor(df$x.resid,df$y.resid)
## [1] 0.7367752
X
and Y
reflects both X->Y
and X<-W->Y
X
and Y
that W
explains to get rid of X<-W
and W->Y
, blocking X<-W->Y
and leaving X->Y
df <- tibble(w = rnorm(100),v=rnorm(100)) %>%
mutate(x = 2*w + 3*v + rnorm(100)) %>%
mutate(y = 1*x + 4*w + 1.5*v + rnorm(100))
cor(df$x,df$y)
## [1] 0.9340934
df <- df %>% group_by(cut(w,breaks=5)) %>%
mutate(x.resid = x - mean(x),
y.resid = y - mean(y)) %>%
group_by(cut(v,breaks=5)) %>%
mutate(x.resid2 = x.resid - mean(x.resid),
y.resid2 = y.resid - mean(y.resid))
cor(df$x.resid2,df$y.resid2)
## [1] 0.7419072
X<-W
and W->Y
so as to close the back doorW
W
W
constant” - we literally remove the variation in W
, leaving it “constant”X
and Y
within values of W
- this is made clear in the animationW
is in effect controlling for W
wooldridge
package)re78
)jtrain2
)jtrain3
)jtrain3
so it gives us the “correct” result from jtrain2
library(wooldridge)
#EXPERIMENT
data(jtrain2)
jtrain2 %>% group_by(train) %>% summarize(wage = mean(re78))
## # A tibble: 2 x 2
## train wage
## <int> <dbl>
## 1 0 4.55
## 2 1 6.35
#BY CHOICE
data(jtrain3)
jtrain3 %>% group_by(train) %>% summarize(wage = mean(re78))
## # A tibble: 2 x 2
## train wage
## <int> <dbl>
## 1 0 21.6
## 2 1 6.35
jtrain3
analysis be facing?library(stargazer)
stargazer(select(jtrain2,re75,re78),type='text')
stargazer(select(jtrain3,re75,re78),type='text')
##
## ===========================================================
## Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
## -----------------------------------------------------------
## re75 445 1.377 3.151 0 0 1.2 25
## re78 445 5.301 6.631 0.000 0.000 8.125 60.308
## -----------------------------------------------------------
##
## ===============================================================
## Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
## ---------------------------------------------------------------
## re75 2,675 17.851 13.878 0 7.6 25.6 157
## re78 2,675 20.502 15.633 0.000 9.243 28.816 121.174
## ---------------------------------------------------------------
## # A tibble: 2 x 2
## train wage
## <int> <dbl>
## 1 0 4.55
## 2 1 6.35
## # A tibble: 2 x 2
## train wage
## <int> <dbl>
## 1 0 5.62
## 2 1 6.00
need.tr
, and we never closed train <- U -> wage
, oh and we left out plenty of other back doors: race, age, etc.) but an improvementW
is a form of controlling for W
X <- W -> C <- Z -> Y
-> C <-
. Those arrow are colliding!C
, that path opens back up!a -> m
, b -> m
), that doesn’t make them related. Your parents both caused your genetic makeup, that doesn’t make their genetics related. Knowing dad’s eye color tells you nothing about mom’s.x <- a -> m <- b -> y
is pre-blocked, no problem. a
and b
are unrelated, so no back door issue!m
and now a
and b
are related, back door path open.set.seed(14233)
survey <- tibble(prog=rnorm(1000),social=rnorm(1000)) %>%
mutate(hired = (prog + social > .25))
#Truth
cor(survey$prog,survey$social)
## [1] 0.03710333
#Controlling by just surveying those hired
cor(filter(survey,hired==1)$prog,filter(survey,hired==1)$social)
## [1] -0.4789209
#Surveying everyone and controlling with our normal method
survey <- survey %>% group_by(hired) %>% mutate(p.resid = prog - mean(prog),
s.resid = social - mean(social)) %>% ungroup()
cor(survey$p.resid,survey$s.resid)
## [1] -0.4268598
gender -> discrim -> wage
; our treatment is gender -> discrim
, the discrimination caused by your gendergender -> discrim -> wage
gender -> discrim -> occup -> wage
discrim <- gender -> occup -> wage
discrim <- gender -> occup <- abil -> wage
gender -> discrim -> occup <- abil -> wage
occup
control? Ignore nondiscriminatory reasons to choose different occupations by genderoccup
? Open both back doors, create a correlation between abil
and discrim
where there wasn’t onegender -> discrim -> occup -> wage
: discriminatory reasons for gender diffs in occup