R/major_mutate_variations.R
mutate_cascade.Rd
This function is a wrapper for dplyr::mutate()
which performs mutate
one time period at a time, allowing each period's calculation to complete before moving on to the next. This allows changes in one period to 'cascade down' to later periods. This is (number of time periods) slower than regular mutate()
and, generally, is only used for mutations where an existing variable is being defined in terms of its own lag()
or tlag()
. This is similar in concept to (and also slower than) cumsum
but is much more flexible, and works with data that has multiple observations per individual-period using tlag()
. For example, this could be used to calculate the current value of a savings account given a variable with each period's deposits, withdrawals, and interest, or could calculate the cumulative number of credits a student has taken across all classes.
mutate_cascade( .df, ..., .skip = TRUE, .backwards = FALSE, .group_i = TRUE, .i = NULL, .t = NULL, .d = NA, .uniqcheck = FALSE, .setpanel = TRUE )
.df | Data frame or tibble. |
---|---|
... | Specification to be passed to |
.skip | Set to |
.backwards | Set to |
.group_i | By default, if |
.i | Quoted or unquoted variables that identify the individual cases. Note that setting any one of |
.t | Quoted or unquoted variables indicating the time. |
.d | Number indicating the gap in |
.uniqcheck | Logical parameter. Set to TRUE to always check whether |
.setpanel | Logical parameter. |
To apply mutate_cascade()
to non-panel data and without any grouping (perhaps to mimic standard Stata replace
functionality), add a variable to your data indicating the order you'd like mutate
performed in (perhaps using dplyr::row_number()
) and .t
to that new variable.
data(Scorecard) # I'd like to build a decaying function that remembers previous earnings but at a declining rate # Let's only use nonmissing earnings # And let's say we're only interested in four-year colleges in Colorado # (mutate_cascade + tlag can be very slow so we're working with a smaller sample) Scorecard <- Scorecard %>% dplyr::filter( !is.na(earnings_med), pred_degree_awarded_ipeds == 3, state_abbr == "CO" ) %>% # And declare the panel structure as_pibble(.i = unitid, .t = year) Scorecard <- Scorecard %>% # Almost all instances involve a variable being set to a function of a lag of itself # we don't want to overwrite so let's make another # Note that earnings_med is an integer - # but we're about to make non-integer decay function, so call it a double! dplyr::mutate(decay_earnings = as.double(earnings_med)) %>% # Now we can cascade mutate_cascade( decay_earnings = decay_earnings + .5 * tlag(decay_earnings, .quick = TRUE) )