- We discussed how to draw a causal diagram
- How to identify the front and back door paths
- And how we can close those back door paths by controlling/adjusting in order to identify the front-door paths we want!
- And so we get our causal effect

- Today we’re going to be going a little deeper into what it means to actually control/adjust for things
- And we’re also going to talk about times when controlling/adjusting makes things
*WORSE*- collider bias! - I’m going to just start saying “controlling”, by the way - “adjusting” is a little more accurate, but “controlling” is more common

- Up to now, here’s how we’ve been getting the relationship between
`X`

and`Y`

while controlling for`W`

:

- See what part of
`X`

is explained by`W`

, and subtract it out. Call the result the residual part of`X`

. - See what part of
`Y`

is explained by`W`

, and subtract it out. Call the result the residual part of`Y`

. - Get the relationship between the residual part of
`X`

and the residual part of`Y`

.

- With the last step including things like getting the correlation, plotting the relationship, calculating the variance explained, or comparing mean
`Y`

across values of`X`

```
df <- tibble(w = rnorm(100)) %>%
mutate(x = 2*w + rnorm(100)) %>%
mutate(y = 1*x + 4*w + rnorm(100))
cor(df$x,df$y)
```

`## [1] 0.9479742`

```
df <- df %>% group_by(cut(w,breaks=5)) %>%
mutate(x.resid = x - mean(x),
y.resid = y - mean(y))
cor(df$x.resid,df$y.resid)
```

`## [1] 0.7367752`

- The relationship between
`X`

and`Y`

reflects both`X->Y`

and`X<-W->Y`

- We remove the part of
`X`

and`Y`

that`W`

explains to get rid of`X<-W`

and`W->Y`

, blocking`X<-W->Y`

and leaving`X->Y`

- It’s quite possible to control for more than one variable at a time
- Although we won’t be doing it much in this class
- A common way to do this is called multiple regression
- You can do it with our method too, but it gets tedious pretty quickly

```
df <- tibble(w = rnorm(100),v=rnorm(100)) %>%
mutate(x = 2*w + 3*v + rnorm(100)) %>%
mutate(y = 1*x + 4*w + 1.5*v + rnorm(100))
cor(df$x,df$y)
```

`## [1] 0.9340934`

```
df <- df %>% group_by(cut(w,breaks=5)) %>%
mutate(x.resid = x - mean(x),
y.resid = y - mean(y)) %>%
group_by(cut(v,breaks=5)) %>%
mutate(x.resid2 = x.resid - mean(x.resid),
y.resid2 = y.resid - mean(y.resid))
cor(df$x.resid2,df$y.resid2)
```

`## [1] 0.7419072`