D
, which is binary (0 or 1)D -> Y
in this (simplified) diagram:W
is one way of controlling for W
D=1
), look at their W
s, and pick non-treated observations with similar (or identical) values of W
Every matching estimator follows the same basic concept:
W1
, W2
, etc., to match onY
vs. the average untreated Y
, counting untreated obs more heavily the closer they areMany many many ways to do 3 and 4. Here’s one…
W
, what you do is:
a
. The smaller it is, the closer the match, but the smaller your eventual sample isi
, find all untreated observations for which their W
is within a
of W[i]
(e.g. if a=.1
and the treated observation has W = 2
, find the untreated observations with W >= 1.9 & W <= 2.1
)Y
across treatmentcut()
) them firstlibrary(Ecdat)
data(Wages)
#Coarsen
Wages <- Wages %>% mutate(ed.coarse = cut(ed,breaks=3),
exp.coarse = cut(exp,breaks=3))
#Split up the treated and untreated
union <- Wages %>% filter(union=='yes')
nonunion <- Wages %>% filter(union=='no') %>%
#For every potential complete-match, let's get the average Y
group_by(ed.coarse,exp.coarse,bluecol,
ind,south,smsa,married,sex,black) %>%
summarize(untreated.lwage = mean(lwage))
join
, aka merging, is how you can link up two data sets when they match on a list of variables, i.e. “exact matches”!join
(see help(join)
). The one we want is inner_join()
which only keeps successful matches, both treated and untreatedunion %>% inner_join(nonunion) %>%
summarize(union.mean = mean(lwage),nonunion.mean=mean(untreated.lwage))
## union.mean nonunion.mean
## 1 6.687606 6.571178
#Original union and nonunion counts, and matched union count
c(sum(Wages$union=='yes'),sum(Wages$union=='no'),nrow(union %>% inner_join(nonunion)))
## [1] 1516 2649 1274
atus
package, from the American Time Use Survey. Load the atusresp
and atusact
data sets.atusact
to tiercode==110101
(eating and drinking). Then inner_join
it with atusresp
. Call the result eating
and ungroup()
iteating <- na.omit(eating)
to nuke missing datadur
by hh_child
, matching on everything else, using cut(,breaks=5)
for everything that’s not a factor.library(atus)
data(atusresp)
data(atusact)
eating <- atusact %>% filter(tiercode==110101) %>% inner_join(atusresp) %>% ungroup() %>%
select(dur, hh_child, labor_status, student_status, work_hrs_week, partner_hh, weekly_earn, tuyear) %>%
na.omit() %>%
mutate(hrs.c = cut(work_hrs_week,breaks=5),earn.c = cut(weekly_earn,breaks=5),year.c = cut(tuyear,breaks=5))
kids <- filter(eating,hh_child=='yes')
nokids <- eating %>% filter(hh_child=='no') %>%
group_by(hrs.c,earn.c,year.c,labor_status,student_status,partner_hh) %>%
summarize(nokids.dur = mean(dur))
kids %>% inner_join(nokids) %>% summarize(kids.dur=mean(dur),nokids.dur=mean(nokids.dur))
## # A tibble: 1 x 2
## kids.dur nokids.dur
## <dbl> <dbl>
## 1 68.0 71.9