X causes Y if…X without changing anything else…Y would also change as a resultExamples of causal relationships!
Some obvious:
Some less obvious:
Examples of non-zero correlations that are not causal (or may be causal in the wrong direction!)
Some obvious:
Some less obvious:
*This case of mistaken causality is the basis of the film Rock-a-Doodle which I remember being very entertaining when I was six.
X causes YX, then Y would change as a resultX is either 1 or 0, like “got a medical treatment” or “didn’t”X?Y is when we make X=0, and then check what Angela’s Y is again when we make X=1.Ys different? If so, X causes Y!X on Y isX=0 and with X=1. She either got that medical treatment or she didn’t.X=1 and, let’s say, Y=10.Y would have been if we made X=0, is missing. We don’t know what it is! Could also be Y=10. Could be Y=9. Could be Y=1000!X=0 and compare their Y?Y could be different BESIDES X.X=0 and they have Y=9, is that because X increases Y, or is that just because Angela and Gareth would have had different Ys anyway?Y would have been if X had been differentX=0 and one has X=1X, then you know that the people with X=0 are, on average, exactly the same as the people with X=1X causes Y to increase by 1df <- data.frame(Y.without.X = rnorm(1000),X=sample(c(0,1),1000,replace=T)) %>%
mutate(Y.with.X = Y.without.X + 1) %>%
#Now assign who actually gets X
mutate(Observed.Y = ifelse(X==1,Y.with.X,Y.without.X))
#And see what effect our experiment suggests X has on Y
df %>% group_by(X) %>% summarize(Y = mean(Observed.Y))## # A tibble: 2 x 2
## X Y
## <dbl> <dbl>
## 1 0 0.0749
## 2 1 1.06
df <- data.frame(Z = runif(10000)) %>% mutate(Y.without.X = rnorm(10000) + Z, Y.with.X = Y.without.X + 1) %>%
#Now assign who actually gets X
mutate(X = Z > .7,Observed.Y = ifelse(X==1,Y.with.X,Y.without.X))
df %>% group_by(X) %>% summarize(Y = mean(Observed.Y))## # A tibble: 2 x 2
## X Y
## <lgl> <dbl>
## 1 FALSE 0.346
## 2 TRUE 1.85
#But if we properly model the process and compare apples to apples...
df %>% filter(abs(Z-.7)<.01) %>% group_by(X) %>% summarize(Y = mean(Observed.Y))## # A tibble: 2 x 2
## X Y
## <lgl> <dbl>
## 1 FALSE 0.612
## 2 TRUE 1.71
X, and how Y would change as a result.