These functions are modifications of the standard dplyr join functions, except that it allows a variable of an ordered type (like date or numeric) in x to be matched in inexact ways to variables in y.

inexact_inner_join(x, y, by = NULL, copy = FALSE, suffix = c(".x",
  ".y"), ..., var = NULL, jvar = NULL, method, exact = TRUE)

inexact_left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x",
  ".y"), ..., var = NULL, jvar = NULL, method, exact = TRUE)

inexact_right_join(x, y, by = NULL, copy = FALSE, suffix = c(".x",
  ".y"), ..., var = NULL, jvar = NULL, method, exact = TRUE)

inexact_full_join(x, y, by = NULL, copy = FALSE, suffix = c(".x",
  ".y"), ..., var = NULL, jvar = NULL, method, exact = TRUE)

inexact_semi_join(x, y, by = NULL, copy = FALSE, ..., var = NULL,
  jvar = NULL, method, exact = TRUE)

inexact_nest_join(x, y, by = NULL, copy = FALSE, keep = FALSE,
  name = NULL, ..., var = NULL, jvar = NULL, method, exact = TRUE)

inexact_anti_join(x, y, by = NULL, copy = FALSE, ..., var = NULL,
  jvar = NULL, method, exact = TRUE)

Arguments

x, y, by, copy, suffix, keep, name, ...

Arguments to be passed to the relevant join function.

var

Quoted or unquoted variable from the x data frame which is to be indirectly matched.

jvar

Quoted or unquoted variable(s) from the y data frame which are to be indirectly matched. These cannot be variable names also in x or var.

method

The approach to be taken in performing the indirect matching.

exact

A logical, where TRUE indicates that exact matches are acceptable. For example, if method = 'last', x contains var = 2, and y contains jvar = 1 and jvar = 2, then exact = TRUE will match with the jvar = 2 observation, and exact = FALSE will match with the jvar = 1 observation. If jvar contains two variables and you want them treated differently, set to c(TRUE,FALSE) or c(FALSE,TRUE).

Details

This allows matching, for example, if one data set contains data from multiple days in the week, while the other data set is weekly. Another example might be matching an observation in one data set to the *most recent* previous observation in the other.

The available methods for matching are:

  • method = "last" matches var to the closest value of jvar that is *lower*.

  • method = "next" matches var to the closest value of jvar that is *higher*.

  • method = "closest" matches var to the closest value of jvar, above or below. If equidistant between two values, picks the lower of the two.

  • method = "between" requires two variables in jvar which constitute the beginning and end of a range, and matches var to the range it is in. Make sure that the ranges are non-overlapping within the joining variables, or else you will get strange results (specifically, it should join to the earliest-starting range). If the end of one range is the exact start of another, exact = c(TRUE,FALSE) or exact = c(FALSE,TRUE) is recommended to avoid overlaps. Defaults to exact = c(TRUE,FALSE).

Note that if, given the method, var finds no proper match, it will be merged with any is.na(jvar[1]) values.

Examples

data(Scorecard) # We also have this data on the December unemployment rate for US college grads nationally # but only every other year unemp_data <- data.frame( unemp_year = c(2006, 2008, 2010, 2012, 2014, 2016, 2018), unemp = c(.017, .036, .048, .040, .028, .025, .020) ) # I want to match the most recent unemployment data I have to each college Scorecard <- Scorecard %>% inexact_left_join(unemp_data, method = "last", var = year, jvar = unemp_year )
#> Joining, by = "unemp_year"
# Or perhaps I want to find the most recent lagged value (i.e. no exact matches, only recent ones) data(Scorecard) Scorecard <- Scorecard %>% inexact_left_join(unemp_data, method = "last", var = year, jvar = unemp_year, exact = FALSE )
#> Error in inexact_join_prep(x = x, y = y, by = by, copy = copy, suffix = suffix, var = varcall, jvar = jvarcall, method = method, exact = exact): The variable names in jvar should not be in x
# Another way to do the same thing would be to specify the range of unemp_years I want exactly data(Scorecard) unemp_data$unemp_year2 <- unemp_data$unemp_year + 2 Scorecard <- Scorecard %>% inexact_left_join(unemp_data, method = "between", var = year, jvar = c(unemp_year, unemp_year2) )
#> Error in inexact_join_prep(x = x, y = y, by = by, copy = copy, suffix = suffix, var = varcall, jvar = jvarcall, method = method, exact = exact): The variable names in jvar should not be in x