This function takes in two variables of equal length, the first of which is a categorical variable, and performs a test of independence between them. It returns a character string with the results of that test for putting in a table.
A categorical variable.
A variable to test for independence with x
. This can be a factor or numeric variable. If you want a numeric variable treated as categorical, convert to a factor first.
A vector of weights to pass to the appropriate test.
Used when y
is a factor, a function that takes x
and y
as its first arguments and returns a list with three arguments: (1) The name of the test for printing, (2) the test statistic, and (3) the p-value. Defaults to a Chi-squared test if there are no weights, or a design-based F statistic (Rao & Scott Aadjustment, see survey::svychisq
) with weights, which requires that the survey
package be installed. WARNING: the Chi-squared test's assumptions fail with small sample sizes. This function will be attempted for all non-numeric y
.
Used when y
is numeric, a function that takes x
and y
as its first arguments and returns a list with three arguments: (1) The name of the test for printing, (2) the test statistic, and (3) the p-value. Defaults to a group differences F test. If you only have two groups and would prefer an absolute t-statistic to an F-statistic, pass vtable:::groupt.it
.
A numeric vector indicating the p-value cutoffs to use for reporting significance stars. Defaults to c(.01,.05,.1)
. If you don't want stars, remove them from the format
argument.
A character vector indicating the symbols to use to indicate significance cutoffs associated with star.cuoffs
. Defaults to c('***','**','*')
. If you don't want stars, remove them from the format
argument.
Number of digits after the decimal to round the test statistic and p-value to.
FALSE
will cut off trailing 0
s when rounding. TRUE
retains them. Defaults to FALSE
.
The way in which the four elements returned by (or calculated after) the test - {name}
, {stat}
, {pval}
, and {stars}
- will be arranged in the string output. Note that the default '{name}={stat}{stars}'
does not contain the p-value, and also does not contain superscript for the stars since it doesn't know what markup language you're aiming for. For LaTeX you may prefer '{name}$={stat}^{{stars}}$'
, and for HTML '{name}={stat}<sup>{stars}</sup>'
.
The options listed above, entered in named-list format.
In an attempt (and perhaps an encouragement) to use this function in weird ways, and because it's not really expected to be used directly, input is not sanitized. Have fun!
data(mtcars)
independence.test(mtcars$cyl,mtcars$mpg)
#> [1] "F=39.698***"