This function looks for a list of values (usually, just
NA) in a variable
.var and overwrites those values with the most recent (or next-coming) values that are not from that list ("last observation carried forward").
panel_locf(.var, .df = get(".", envir = parent.frame()), .fill = NA, .backwards = FALSE, .resolve = "error", .group_i = TRUE, .i = NULL, .t = NULL, .d = 1, .uniqcheck = FALSE)
Vector to be modified.
Data frame, pibble, or tibble (usually the one containing
Vector of values to be overwritten. Just
By default, values of newly-created observations are copied from the most recently available period. Set
If there is more than one observation per individal/period, and the value of
By default, if
Quoted or unquoted variables that identify the individual cases. Note that setting any one of
Quoted or unquoted variable indicating the time.
Number indicating the gap in
Logical parameter. Set to TRUE to always check whether
panel_locf() is unusual among last-observation-carried-forward functions (like
zoo::na.locf()) in that it is usable even if observations are not uniquely identified by
.i, if defined).
# The SPrail data has some missing price values. # Let's fill them in! # Note .d=0 tells it to ignore how big the gaps are # between one period and the next, just look for the most recent insert_date # .resolve tells it what value to pick if there are multiple # observed prices for that route/insert_date # (.resolve is not necessary if .i and .t uniquely identify obs, # or if .var is either NA or constant within them) # Also note - this will fill in using CURRENT-period # data first (if available) before looking for lagged data. data(SPrail) sum(is.na(SPrail$price))#>  249SPrail <- SPrail %>% dplyr::mutate(price = panel_locf(price, .i = c(origin, destination), .t = insert_date, .d = 0, .resolve = function(x) mean(x, na.rm = TRUE) )) # The spec is a little easier with data like Scorecard where # .i and .t uniquely identify observations # so .resolve isn't needed. data(Scorecard) sum(is.na(Scorecard$earnings_med))#>  15706Scorecard <- Scorecard %>% # Let's speed this up by just doing four-year colleges in Colorado dplyr::filter( pred_degree_awarded_ipeds == 3, state_abbr == "CO" ) %>% # Now let's fill in NAs and also in case there are any erroneous 0s dplyr::mutate(earnings_med = panel_locf(earnings_med, .fill = c(NA, 0), .i = unitid, .t = year )) # Note that there are still some missings - these are missings that come before the first # non-missing value in that unitid, so there's nothing to pull from. sum(is.na(Scorecard$earnings_med))#>  17