Propagate a calculation performed on a subset of data to the rest of the data

This function performs dplyr::summarize on a .filtered subset of data. Then it applies the result to all observations (or all observations in the group, if applied to grouped data), filling in columns of the data with the summarize results, as though dplyr::mutate had been run.

mutate_subset(
  .df,
  ...,
  .filter,
  .group_i = TRUE,
  .i = NULL,
  .t = NULL,
  .d = NA,
  .uniqcheck = FALSE,
  .setpanel = TRUE
)

Arguments

.df	Data frame or tibble.
...	Specification to be passed to `dplyr::summarize()`.
.filter	Unquoted logical condition for which observations `dplyr::summarize()` operations are to be run on.
.group_i	By default, if `.i` is specified or found in the data, `mutate_cascade` will group the data by `.i`, overwriting any grouping already implemented. Set `.group_i = FALSE` to avoid this.
.i	Quoted or unquoted variables that identify the individual cases. Note that setting any one of `.i`, `.t`, or `.d` will override all three already applied to the data, and will return data that is `as_pibble()`d with all three, unless `.setpanel=FALSE`.
.t	Quoted or unquoted variable indicating the time. `pmdplyr` accepts two kinds of time variables: numeric variables where a fixed distance `.d` will take you from one observation to the next, or, if `.d=0`, any standard variable type with an order. Consider using the `time_variable()` function to create the necessary variable if your data uses a `Date` variable for time.
.d	Number indicating the gap in `.t` between one period and the next. For example, if `.t` indicates a single day but data is collected once a week, you might set `.d=7`. To ignore gap length and assume that "one period ago" is always the most recent prior observation in the data, set `.d=0`. The default `.d = NA` here will become `.d = 1` if either `.i` or `.t` are declared.
.uniqcheck	Logical parameter. Set to TRUE to always check whether `.i` and `.t` uniquely identify observations in the data. By default this is set to FALSE and the check is only performed once per session, and only if at least one of `.i`, `.t`, or `.d` is set.
.setpanel	Logical parameter. `TRUE` by default, and so if `.i`, `.t`, and/or `.d` are declared, will return a `pibble` set in that way.

Details

One application of this is to partially widen data. For example, if your analysis uses childhood height as a control variable in all years, mutate_subset() could be used to easily generate a height_age10 variable from a height variable.

Examples


data(SPrail)
# In preparation for fitting a choice model for how people choose ticket type,
# I'd like to know the price of a "Promo" ticket for a given route
# So that I can compare each other type of ticket price to that type
SPrail <- SPrail %>%
  mutate_subset(
    promo_price = mean(price, na.rm = TRUE),
    .filter = fare == "Promo",
    .i = c(origin, destination)
  )