Exam Review

Describing Variables

Describing Relationships

Identification

Causal Diagrams

  1. Consider all the variables that are likely to be important in the data generating process (this includes variables you can’t observe)
  2. For simplicity, combine them together or prune the ones least likely to be important
  3. Consider which variables are likely to affect which other variables and draw arrows from one to the other
  4. (Bonus: Test some implications of the model to see if you have the right one)

Causal Diagrams

Identifying X -> Y by closing back doors:

  1. Find all the paths from X to Y on the diagram
  2. Determine which are “front doors” (start with X ->) and which are “back doors” (start with X <-)
  3. Determine which are already closed by colliders (X -> C <- Y)
  4. Then, identify the effect by finding which variables you need to control for to close all back doors (careful - don’t close the front doors, or open back up paths with colliders!)

Controlling

library(Ecdat)
data(BudgetFood)
BudgetFood <- BudgetFood %>% mutate(totexp = totexp/1000000)
m1 <- lm(wfood ~ totexp, data = BudgetFood)
m2 <- lm(wfood_r ~ totexp_r, data = BudgetFood %>% mutate(wfood_r = resid(lm(wfood~age)),
                                                          totexp_r = resid(lm(totexp~age))))
m3 <- lm(wfood ~ totexp + age, data = BudgetFood)

Controlling

Model 1 Model 2 Model 3
(Intercept) 0.495*** 0.000 0.406***
(0.002) (0.001) (0.004)
totexp -0.135*** -0.125***
(0.001) (0.001)
totexp_r -0.125***
(0.001)
age 0.002***
(0.000)
Num.Obs. 23972 23972 23972
R2 0.263 0.228 0.282
* p < 0.1, ** p < 0.05, *** p < 0.01