Perform mutate one time period at a time ('Cascading mutate')

This function is a wrapper for dplyr::mutate() which performs mutate one time period at a time, allowing each period's calculation to complete before moving on to the next. This allows changes in one period to 'cascade down' to later periods. This is (number of time periods) slower than regular mutate() and, generally, is only used for mutations where an existing variable is being defined in terms of its own lag() or tlag(). This is similar in concept to (and also slower than) cumsum but is much more flexible, and works with data that has multiple observations per individual-period using tlag(). For example, this could be used to calculate the current value of a savings account given a variable with each period's deposits, withdrawals, and interest, or could calculate the cumulative number of credits a student has taken across all classes.

mutate_cascade(
  .df,
  ...,
  .skip = TRUE,
  .backwards = FALSE,
  .group_i = TRUE,
  .i = NULL,
  .t = NULL,
  .d = NA,
  .uniqcheck = FALSE,
  .setpanel = TRUE
)

Arguments

.df	Data frame or tibble.
...	Specification to be passed to `mutate()`.
.skip	Set to `TRUE` to skip the first period present in the data (or present within each group for grouped data) when applying `mutate()`. Since most uses of `mutate_cascade()` will involve a `lag()` or `tlag()`, this avoids creating an `NA` in the first period that then cascades down. By default this is TRUE. If you set this to FALSE you should probably have some method for avoiding a first-period `NA` in your `...` entry, perhaps using the `default` option in `dplyr::lag` or the `.default` option in `tlag`.
.backwards	Set to `TRUE` to run `mutate_cascade()` from the last period to the first, rather than from the first to the last.
.group_i	By default, if `.i` is specified or found in the data, `mutate_cascade` will group the data by `.i`, ignoring any grouping already implemented (although the original grouping structure will be returned at the end). Set `.group_i = FALSE` to avoid this.
.i	Quoted or unquoted variables that identify the individual cases. Note that setting any one of `.i`, `.t`, or `.d` will override all three already applied to the data, and will return data that is `as_pibble()`d with all three, unless `.setpanel=FALSE`.
.t	Quoted or unquoted variables indicating the time. `pmdplyr` accepts two kinds of time variables: numeric variables where a fixed distance `.d` will take you from one observation to the next, or, if `.d=0`, any standard variable type with an order. Consider using the `time_variable()` function to create the necessary variable if your data uses a `Date` variable for time.
.d	Number indicating the gap in `.t` between one period and the next. For example, if `.t` indicates a single day but data is collected once a week, you might set `.d=7`. To ignore gap length and assume that "one period ago" is always the most recent prior observation in the data, set `.d=0`. The default `.d = NA` here will become `.d = 1` if either `.i` or `.t` are declared.
.uniqcheck	Logical parameter. Set to TRUE to always check whether `.i` and `.t` uniquely identify observations in the data. By default this is set to FALSE and the check is only performed once per session, and only if at least one of `.i`, `.t`, or `.d` is set.
.setpanel	Logical parameter. `TRUE` by default, and so if `.i`, `.t`, and/or `.d` are declared, will return a `pibble` set in that way.

Details

To apply mutate_cascade() to non-panel data and without any grouping (perhaps to mimic standard Stata replace functionality), add a variable to your data indicating the order you'd like mutate performed in (perhaps using dplyr::row_number()) and .t to that new variable.

Examples


data(Scorecard)
# I'd like to build a decaying function that remembers previous earnings but at a declining rate
# Let's only use nonmissing earnings
# And let's say we're only interested in four-year colleges in Colorado
# (mutate_cascade + tlag can be very slow so we're working with a smaller sample)
Scorecard <- Scorecard %>%
  dplyr::filter(
    !is.na(earnings_med),
    pred_degree_awarded_ipeds == 3,
    state_abbr == "CO"
  ) %>%
  # And declare the panel structure
  as_pibble(.i = unitid, .t = year)
Scorecard <- Scorecard %>%
  # Almost all instances involve a variable being set to a function of a lag of itself
  # we don't want to overwrite so let's make another
  # Note that earnings_med is an integer -
  # but we're about to make non-integer decay function, so call it a double!
  dplyr::mutate(decay_earnings = as.double(earnings_med)) %>%
  # Now we can cascade
  mutate_cascade(
    decay_earnings = decay_earnings +
      .5 * tlag(decay_earnings, .quick = TRUE)
  )