Contingency Tables for Two or More Categorical Variables
contingency_table.Rd
contingency_table()
compiles a contingency table for two or more categorical variables, the first of which is
typically an outcome (dependent) varable to be used for the column headings, while the remainder are typically
explanatory (independent) variables that will appear in the contingency table either as factors or optionally as row
headings.
xcontingency_table()
compiles a contingency table for a categorical outcome varable and multiple categorical
explanatory variables that are “crossed” to obtain a single explanatory factor.
Usage
contingency_table(.data, .dep_var, ..., .wt = NULL, .rownames = FALSE)
xcontingency_table(
.data,
.dep_var,
...,
.crossname = NULL,
.wt = NULL,
.rownames = FALSE
)
Arguments
- .data
a data frame, or a data frame extension (e.g. a
tibble
).- .dep_var
<
data-masking
> quoted name of the dependent variable, which may be acharacter vector
,factor
, ornumeric
.- ...
<
tidy-select
> quoted name(s) of one or morefactors
orcharacter vectors
in.data
, to be included in (or excluded from) the output.- .wt
<
data-masking
> quoted name of a numeric column in.data
containing frequency weights; defaultNULL
.- .rownames
logical
. IfTRUE
, value is a data frame with the levels of the first (or crossed) independent variable as row names, rather than a tibble; defaultFALSE
.- .crossname
a character string to be used as the name of the column of for the crossed variables. If omitted, the names of the crossed variables are used combined in “snake case”.
Value
For contingency_table()
, an object of class "contingency_table"
, "announce"
, inheriting from
tibble
, or a data.frame
, depending on whether
.rownames = FALSE
(default) or TRUE
.
Similarly for xcontingency_table()
, an object of class "xcontingency_table"
, "announce"
inheriting from
tibble
or a data.frame
, again depending on the value of rownames
.
Details
Categorical variables (i.e. factors or character vectors) in .data
required as factors in the resulting
contingency table may be selected for inclusion or exclusion using the ...
argument and the
<tidy-select
> syntax from package dplyr, including use of
“selection helpers”. If no ...
arguments are supplied, all categorical variables in .data
(other than .dep_var
) will be used.
A list of defused R expressions, as for instance created by
expl_fcts()
, may be used as the ...
arguments and should be injected using the
splice-operator, !!!
, see examples.
If .wt = NULL
, the number of rows for each unique combination of the dependent and independent variables are
counted. If .wt
is the quoted name of a numeric variable representing frequency weights, these are summated for
each unique combination of the dependent and independent variables.
If .rownames = TRUE
, the resulting contingency table will be a conventional data.frame
rather than a
tibble
and the first categorical variable (other than .dep_var
) will be used for row headings rather than
as a factor. Having row headings allows the result to be passed as an argument to chisq.test()
,
fisher.test()
or chsqfish()
, e.g., conveniently using |>
in a
piped sequence (see examples). However, using .rownames = TRUE
for a contingency table with more than
one explanatory (independent) variable will most likely result in the error message
“duplicate 'row.names' are not allowed”, in which case xcontingency_table()
should be used instead.
Multiple categorical explanatory variables in a contingency table compiled by xcontingency_table()
are
“crossed” using fct_cross()
.
See also
defused R expressions
, fct_cross()
,
splice-operator
and tibble
;
Print_Methods
for S3 method for printing objects of class "contingency_table"
.
Other contingency_table:
binom_contingency()
,
expl_fcts()
Examples
(d <- tibble(
iv = letters[1:4] |> sample(10, replace = TRUE),
dv = c(0L:3L) |> sample(10, replace = TRUE)
))
#> # A tibble: 10 × 2
#> iv dv
#> <chr> <int>
#> 1 b 2
#> 2 a 1
#> 3 b 0
#> 4 b 2
#> 5 d 2
#> 6 b 0
#> 7 d 2
#> 8 c 3
#> 9 a 3
#> 10 d 1
d |> contingency_table(dv)
#> ____________________
#> Contingency Table: -
#>
#> # A tibble: 4 × 5
#> iv `2` `1` `0` `3`
#> * <chr> <int> <int> <int> <int>
#> 1 b 2 0 2 0
#> 2 a 0 1 0 1
#> 3 d 2 1 0 0
#> 4 c 0 0 0 1
d |> contingency_table(dv, .rownames = TRUE)
#> 2 1 0 3
#> b 2 0 2 0
#> a 0 1 0 1
#> d 2 1 0 0
#> c 0 0 0 1
## Use .data pronoun for more informative error messages
d |> contingency_table(.data$dv)
#> ____________________
#> Contingency Table: -
#>
#> # A tibble: 4 × 5
#> iv `2` `1` `0` `3`
#> * <chr> <int> <int> <int> <int>
#> 1 b 2 0 2 0
#> 2 a 0 1 0 1
#> 3 d 2 1 0 0
#> 4 c 0 0 0 1
try(d |> contingency_table(dx))
#> Error : object 'dx' not found
try(d |> contingency_table(.data$dx))
#> Error in .data$dx : Column `dx` not found in `.data`.
(d <- tibble(
iv = letters[1:4] |> sample(10, replace = TRUE) |> as.factor(),
dv = c("Success", "Fail", "Borderline") |> sample(10, replace = TRUE)
))
#> # A tibble: 10 × 2
#> iv dv
#> <fct> <chr>
#> 1 d Borderline
#> 2 b Borderline
#> 3 d Borderline
#> 4 d Fail
#> 5 d Fail
#> 6 d Borderline
#> 7 b Fail
#> 8 a Success
#> 9 c Success
#> 10 c Borderline
d |> contingency_table(dv)
#> ____________________
#> Contingency Table: -
#>
#> # A tibble: 4 × 4
#> iv Borderline Fail Success
#> * <fct> <int> <int> <int>
#> 1 d 3 2 0
#> 2 b 1 1 0
#> 3 a 0 0 1
#> 4 c 1 0 1
d |> contingency_table(dv, .rownames = TRUE)
#> Borderline Fail Success
#> d 3 2 0
#> b 1 1 0
#> a 0 0 1
#> c 1 0 1
(d <- tibble(
iv = letters[1:4] |> sample(100, replace = TRUE),
dv = c("Success", "Fail", "Borderline") |> sample(100, replace = TRUE)
) |> count(iv, dv))
#> # A tibble: 12 × 3
#> iv dv n
#> <chr> <chr> <int>
#> 1 a Borderline 14
#> 2 a Fail 8
#> 3 a Success 13
#> 4 b Borderline 7
#> 5 b Fail 9
#> 6 b Success 7
#> 7 c Borderline 7
#> 8 c Fail 1
#> 9 c Success 8
#> 10 d Borderline 9
#> 11 d Fail 5
#> 12 d Success 12
d |> contingency_table(dv, .wt = n)
#> ____________________
#> Contingency Table: -
#>
#> # A tibble: 4 × 4
#> iv Borderline Fail Success
#> * <chr> <int> <int> <int>
#> 1 a 14 8 13
#> 2 b 7 9 7
#> 3 c 7 1 8
#> 4 d 9 5 12
d |> contingency_table(dv, .wt = n, .rownames = TRUE) |>
print_lf() |>
chisq.test()
#> Borderline Fail Success
#> a 14 8 13
#> b 7 9 7
#> c 7 1 8
#> d 9 5 12
#>
#> Warning: Chi-squared approximation may be incorrect
#>
#> Pearson's Chi-squared test
#>
#> data: print_lf(contingency_table(d, dv, .wt = n, .rownames = TRUE))
#> X-squared = 6.5483, df = 6, p-value = 0.3646
#>
## Use .data pronoun for more informative error messages
d |> contingency_table(dv, .wt = .data$n)
#> ____________________
#> Contingency Table: -
#>
#> # A tibble: 4 × 4
#> iv Borderline Fail Success
#> * <chr> <int> <int> <int>
#> 1 a 14 8 13
#> 2 b 7 9 7
#> 3 c 7 1 8
#> 4 d 9 5 12
try(d |> contingency_table(dv, .wt = .data$x))
#> Error in .data$x : Column `x` not found in `.data`.
rm(d)
## Using gss_cat dataset from {forcats} package
gss_cat |> contingency_table(race, relig, denom)
#> ____________________
#> Contingency Table: -
#>
#> # A tibble: 47 × 5
#> relig denom White Black Other
#> * <fct> <fct> <int> <int> <int>
#> 1 Protestant Southern baptist 1151 355 30
#> 2 Protestant Baptist-dk which 723 697 37
#> 3 Protestant No denomination 1020 149 55
#> 4 Orthodox-christian Not applicable 92 2 1
#> 5 None Not applicable 2816 384 323
#> 6 Christian Not applicable 147 49 28
#> 7 Protestant Lutheran-mo synod 208 2 2
#> 8 Protestant Other 1886 468 180
#> 9 Protestant United methodist 1007 49 11
#> 10 Jewish Not applicable 370 10 8
#> # ℹ 37 more rows
gss_cat |> contingency_table(race, !c(marital, rincome:partyid))
#> ____________________
#> Contingency Table: -
#>
#> # A tibble: 47 × 5
#> relig denom White Black Other
#> * <fct> <fct> <int> <int> <int>
#> 1 Protestant Southern baptist 1151 355 30
#> 2 Protestant Baptist-dk which 723 697 37
#> 3 Protestant No denomination 1020 149 55
#> 4 Orthodox-christian Not applicable 92 2 1
#> 5 None Not applicable 2816 384 323
#> 6 Christian Not applicable 147 49 28
#> 7 Protestant Lutheran-mo synod 208 2 2
#> 8 Protestant Other 1886 468 180
#> 9 Protestant United methodist 1007 49 11
#> 10 Jewish Not applicable 370 10 8
#> # ℹ 37 more rows
## Invokes warning and error message about duplicate 'row.names'
try(gss_cat |> contingency_table(race, relig, denom, .rownames = TRUE))
#> Warning: non-unique values when setting 'row.names': ‘Christian’, ‘Other’, ‘Protestant’
#> Error in `.rowNamesDF<-`(x, value = value) :
#> duplicate 'row.names' are not allowed
## Using xcontingency_table() avoids warning and error
gss_cat |> xcontingency_table(race, relig, denom)
#> ____________________________
#> Crossed Contingency Table: -
#>
#> # A tibble: 47 × 4
#> relig_denom White Black Other
#> * <fct> <int> <int> <int>
#> 1 Protestant:Southern baptist 1151 355 30
#> 2 Protestant:Baptist-dk which 723 697 37
#> 3 Protestant:No denomination 1020 149 55
#> 4 Orthodox-christian:Not applicable 92 2 1
#> 5 None:Not applicable 2816 384 323
#> 6 Christian:Not applicable 147 49 28
#> 7 Protestant:Lutheran-mo synod 208 2 2
#> 8 Protestant:Other 1886 468 180
#> 9 Protestant:United methodist 1007 49 11
#> 10 Jewish:Not applicable 370 10 8
#> # ℹ 37 more rows
gss_cat |> xcontingency_table(race, !c(marital, rincome:partyid))
#> ____________________________
#> Crossed Contingency Table: -
#>
#> # A tibble: 47 × 4
#> relig_denom White Black Other
#> * <fct> <int> <int> <int>
#> 1 Protestant:Southern baptist 1151 355 30
#> 2 Protestant:Baptist-dk which 723 697 37
#> 3 Protestant:No denomination 1020 149 55
#> 4 Orthodox-christian:Not applicable 92 2 1
#> 5 None:Not applicable 2816 384 323
#> 6 Christian:Not applicable 147 49 28
#> 7 Protestant:Lutheran-mo synod 208 2 2
#> 8 Protestant:Other 1886 468 180
#> 9 Protestant:United methodist 1007 49 11
#> 10 Jewish:Not applicable 370 10 8
#> # ℹ 37 more rows
gss_cat |> xcontingency_table(race, relig, denom, .crossname = "Denomination")
#> ____________________________
#> Crossed Contingency Table: -
#>
#> # A tibble: 47 × 4
#> Denomination White Black Other
#> * <fct> <int> <int> <int>
#> 1 Protestant:Southern baptist 1151 355 30
#> 2 Protestant:Baptist-dk which 723 697 37
#> 3 Protestant:No denomination 1020 149 55
#> 4 Orthodox-christian:Not applicable 92 2 1
#> 5 None:Not applicable 2816 384 323
#> 6 Christian:Not applicable 147 49 28
#> 7 Protestant:Lutheran-mo synod 208 2 2
#> 8 Protestant:Other 1886 468 180
#> 9 Protestant:United methodist 1007 49 11
#> 10 Jewish:Not applicable 370 10 8
#> # ℹ 37 more rows
gss_cat |>
xcontingency_table(race, relig, denom, .rownames = TRUE) |>
head(10)
#> White Black Other
#> Protestant:Southern baptist 1151 355 30
#> Protestant:Baptist-dk which 723 697 37
#> Protestant:No denomination 1020 149 55
#> Orthodox-christian:Not applicable 92 2 1
#> None:Not applicable 2816 384 323
#> Christian:Not applicable 147 49 28
#> Protestant:Lutheran-mo synod 208 2 2
#> Protestant:Other 1886 468 180
#> Protestant:United methodist 1007 49 11
#> Jewish:Not applicable 370 10 8
## Two more esoteric examples
ivars <- exprs(relig, denom)
gss_cat |> contingency_table(race, !!!ivars)
#> ____________________
#> Contingency Table: -
#>
#> # A tibble: 47 × 5
#> relig denom White Black Other
#> * <fct> <fct> <int> <int> <int>
#> 1 Protestant Southern baptist 1151 355 30
#> 2 Protestant Baptist-dk which 723 697 37
#> 3 Protestant No denomination 1020 149 55
#> 4 Orthodox-christian Not applicable 92 2 1
#> 5 None Not applicable 2816 384 323
#> 6 Christian Not applicable 147 49 28
#> 7 Protestant Lutheran-mo synod 208 2 2
#> 8 Protestant Other 1886 468 180
#> 9 Protestant United methodist 1007 49 11
#> 10 Jewish Not applicable 370 10 8
#> # ℹ 37 more rows
ivars <- c("relig", "denom")
gss_cat |> contingency_table(race, any_of(ivars))
#> ____________________
#> Contingency Table: -
#>
#> # A tibble: 47 × 5
#> relig denom White Black Other
#> * <fct> <fct> <int> <int> <int>
#> 1 Protestant Southern baptist 1151 355 30
#> 2 Protestant Baptist-dk which 723 697 37
#> 3 Protestant No denomination 1020 149 55
#> 4 Orthodox-christian Not applicable 92 2 1
#> 5 None Not applicable 2816 384 323
#> 6 Christian Not applicable 147 49 28
#> 7 Protestant Lutheran-mo synod 208 2 2
#> 8 Protestant Other 1886 468 180
#> 9 Protestant United methodist 1007 49 11
#> 10 Jewish Not applicable 370 10 8
#> # ℹ 37 more rows
rm(ivars)