X
causes Y
if…X
without changing anything else…Y
would also change as a resultExamples of causal relationships!
Some obvious:
Some less obvious:
Examples of non-zero correlations that are not causal (or may be causal in the wrong direction!)
Some obvious:
Some less obvious:
*This case of mistaken causality is the basis of the film Rock-a-Doodle which I remember being very entertaining when I was six.
X
causes Y
X
, then Y
would change as a resultX
is either 1 or 0, like “got a medical treatment” or “didn’t”X
?Y
is when we make X=0
, and then check what Angela’s Y
is again when we make X=1
.Y
s different? If so, X
causes Y
!X
on Y
isX=0
and with X=1
. She either got that medical treatment or she didn’t.X=1
and, let’s say, Y=10
.Y
would have been if we made X=0
, is missing. We don’t know what it is! Could also be Y=10
. Could be Y=9
. Could be Y=1000
!X=0
and compare their Y
?Y
could be different BESIDES X
.X=0
and they have Y=9
, is that because X
increases Y
, or is that just because Angela and Gareth would have had different Y
s anyway?Y
would have been if X
had been differentX=0
and one has X=1
X
, then you know that the people with X=0
are, on average, exactly the same as the people with X=1
X
causes Y
to increase by 1df <- data.frame(Y.without.X = rnorm(1000),X=sample(c(0,1),1000,replace=T)) %>%
mutate(Y.with.X = Y.without.X + 1) %>%
#Now assign who actually gets X
mutate(Observed.Y = ifelse(X==1,Y.with.X,Y.without.X))
#And see what effect our experiment suggests X has on Y
df %>% group_by(X) %>% summarize(Y = mean(Observed.Y))
## # A tibble: 2 x 2
## X Y
## <dbl> <dbl>
## 1 0 0.0749
## 2 1 1.06
df <- data.frame(Z = runif(10000)) %>% mutate(Y.without.X = rnorm(10000) + Z, Y.with.X = Y.without.X + 1) %>%
#Now assign who actually gets X
mutate(X = Z > .7,Observed.Y = ifelse(X==1,Y.with.X,Y.without.X))
df %>% group_by(X) %>% summarize(Y = mean(Observed.Y))
## # A tibble: 2 x 2
## X Y
## <lgl> <dbl>
## 1 FALSE 0.346
## 2 TRUE 1.85
#But if we properly model the process and compare apples to apples...
df %>% filter(abs(Z-.7)<.01) %>% group_by(X) %>% summarize(Y = mean(Observed.Y))
## # A tibble: 2 x 2
## X Y
## <lgl> <dbl>
## 1 FALSE 0.612
## 2 TRUE 1.71
X
, and how Y
would change as a result.