X
, and as a result Y
changes too, then X
causes Y
X
was assigned, and using that information we were able to get a good estimate of the true treatment# Is your company in tech? Let's say 30% of firms are
df <- tibble(tech = sample(c(0,1),500,replace=T,prob=c(.7,.3))) %>%
#Tech firms on average spend $3mil more defending IP lawsuits
mutate(IP.spend = 3*tech+runif(500,min=0,max=4)) %>%
#Tech firms also have higher profits. But IP lawsuits lower profits
mutate(log.profit = 2*tech - .3*IP.spend + rnorm(500,mean=2))
# Now let's check for how profit and IP.spend are correlated!
cor(df$log.profit,df$IP.spend)
## [1] 0.1609575
cor(df$log.profit,df$IP.spend)
is the influence of being a tech companyIP.spend
, which then affects profit.IP.spend
on profit
, we can figure that out tooIP.spend -> profit
, and seeing what the effect is on that arrow!IP.Spend
and profit
can be explained by how tech
links the two.tech
, but we want to identify the part of the correlation that ISN’T explained by tech
(the causal part), we will want to just use what isn’t explained by tech!
tech
to explain profit
, and take the residualtech
to explain IP.spend
, and take the residualIP.spend
and profit
just comparing firms that have the same level of tech
.df <- df %>% group_by(tech) %>%
mutate(log.profit.resid = log.profit - mean(log.profit),
IP.spend.resid = IP.spend - mean(IP.spend)) %>% ungroup()
cor(df$log.profit.resid,df$IP.spend.resid)
## [1] -0.3018621
tech
(“holding it constant”) we got rid of the part of the IP.spend
/profit
relationship that was explained by tech
, and so managed to identify the IP.spend -> profit
arrow, the causal effect we’re interested in!tech
profit
(y), IP.spend
(x), and tech
(z) are all related… which is it?IP.spend
causes companies to be tech
companies (in 2, 3, 6)data(swiss)
and use help
to look at itcut
with breaks=3
)data(swiss)
help(swiss)
cor(swiss$Fertility,swiss$Education)
swiss <- swiss %>%
group_by(cut(Agriculture,breaks=3)) %>%
mutate(Fert.resid = Fertility - mean(Fertility),
Ed.resid = Education - mean(Education))
cor(swiss$Fert.resid,swiss$Ed.resid)
## [1] -0.6637889
## [1] -0.5560316