Contingency Tables for Two or More Categorical Variables
contingency_table.Rdcontingency_table() compiles a contingency table for two or more categorical variables, the first of which is
typically an outcome (dependent) varable to be used for the column headings, while the remainder are typically
explanatory (independent) variables that will appear in the contingency table either as factors or optionally as row
headings.
xcontingency_table() compiles a contingency table for a categorical outcome varable and multiple categorical
explanatory variables that are “crossed” to obtain a single explanatory factor.
Usage
contingency_table(.data, .dep_var, ..., .wt = NULL, .rownames = FALSE)
xcontingency_table(
  .data,
  .dep_var,
  ...,
  .crossname = NULL,
  .wt = NULL,
  .rownames = FALSE
)Arguments
- .data
 a data frame, or a data frame extension (e.g. a
tibble).- .dep_var
 <
data-masking> quoted name of the dependent variable, which may be acharacter vector,factor, ornumeric.- ...
 <
tidy-select> quoted name(s) of one or morefactorsorcharacter vectorsin.data, to be included in (or excluded from) the output.- .wt
 <
data-masking> quoted name of a numeric column in.datacontaining frequency weights; defaultNULL.- .rownames
 logical. IfTRUE, value is a data frame with the levels of the first (or crossed) independent variable as row names, rather than a tibble; defaultFALSE.- .crossname
 a character string to be used as the name of the column of for the crossed variables. If omitted, the names of the crossed variables are used combined in “snake case”.
Value
For contingency_table(), an object of class "contingency_table", "announce", inheriting from
tibble, or a data.frame, depending on whether
.rownames = FALSE (default) or TRUE.
Similarly for xcontingency_table(), an object of class "xcontingency_table", "announce" inheriting from
tibble or a data.frame, again depending on the value of rownames.
Details
Categorical variables (i.e. factors or character vectors) in .data required as factors in the resulting
contingency table may be selected for inclusion or exclusion using the ... argument and the
<tidy-select> syntax from package dplyr, including use of
“selection helpers”. If no ... arguments are supplied, all categorical variables in .data
(other than .dep_var) will be used.
A list of defused R expressions, as for instance created by
expl_fcts(), may be used as the ... arguments and should be injected using the
splice-operator, !!!, see examples.
If .wt = NULL, the number of rows for each unique combination of the dependent and independent variables are
counted. If .wt is the quoted name of a numeric variable representing frequency weights, these are summated for
each unique combination of the dependent and independent variables.
If .rownames = TRUE, the resulting contingency table will be a conventional data.frame rather than a
tibble and the first categorical variable (other than .dep_var) will be used for row headings rather than
as a factor. Having row headings allows the result to be passed as an argument to chisq.test(),
fisher.test() or chsqfish(), e.g., conveniently using |> in a
piped sequence (see examples). However, using .rownames = TRUE for a contingency table with more than
one explanatory (independent) variable will most likely result in the error message
“duplicate 'row.names' are not allowed”, in which case xcontingency_table() should be used instead.
Multiple categorical explanatory variables in a contingency table compiled by xcontingency_table() are
“crossed” using fct_cross().
See also
defused R expressions, fct_cross(),
splice-operator and tibble;
Print_Methods for S3 method for printing objects of class "contingency_table".
Other contingency_table:
binom_contingency(),
expl_fcts()
Examples
(d <- tibble(
    iv = letters[1:4] |> sample(10, replace = TRUE),
    dv = c(0L:3L) |> sample(10, replace = TRUE)
))
#> # A tibble: 10 × 2
#>    iv       dv
#>    <chr> <int>
#>  1 d         2
#>  2 b         2
#>  3 d         2
#>  4 b         2
#>  5 d         2
#>  6 d         1
#>  7 d         3
#>  8 d         1
#>  9 b         2
#> 10 a         1
d |> contingency_table(dv)
#> ____________________
#> Contingency Table: -
#> 
#> # A tibble: 3 × 4
#>   iv      `2`   `1`   `3`
#> * <chr> <int> <int> <int>
#> 1 d         3     2     1
#> 2 b         3     0     0
#> 3 a         0     1     0
d |> contingency_table(dv, .rownames = TRUE)
#>   2 1 3
#> d 3 2 1
#> b 3 0 0
#> a 0 1 0
## Use .data pronoun for more informative error messages
d |> contingency_table(.data$dv)
#> ____________________
#> Contingency Table: -
#> 
#> # A tibble: 3 × 4
#>   iv      `2`   `1`   `3`
#> * <chr> <int> <int> <int>
#> 1 d         3     2     1
#> 2 b         3     0     0
#> 3 a         0     1     0
try(d |> contingency_table(dx))
#> Error : object 'dx' not found
try(d |> contingency_table(.data$dx))
#> Error in .data$dx : Column `dx` not found in `.data`.
(d <- tibble(
    iv = letters[1:4] |> sample(10, replace = TRUE) |> as.factor(),
    dv = c("Success", "Fail", "Borderline")  |> sample(10, replace = TRUE)
  ))
#> # A tibble: 10 × 2
#>    iv    dv        
#>    <fct> <chr>     
#>  1 d     Success   
#>  2 a     Borderline
#>  3 a     Fail      
#>  4 c     Borderline
#>  5 b     Fail      
#>  6 a     Success   
#>  7 c     Success   
#>  8 a     Success   
#>  9 d     Success   
#> 10 d     Success   
d |> contingency_table(dv)
#> ____________________
#> Contingency Table: -
#> 
#> # A tibble: 4 × 4
#>   iv    Success Borderline  Fail
#> * <fct>   <int>      <int> <int>
#> 1 d           3          0     0
#> 2 a           2          1     1
#> 3 c           1          1     0
#> 4 b           0          0     1
d |> contingency_table(dv, .rownames = TRUE)
#>   Success Borderline Fail
#> d       3          0    0
#> a       2          1    1
#> c       1          1    0
#> b       0          0    1
(d <- tibble(
    iv = letters[1:4] |> sample(100, replace = TRUE),
    dv = c("Success", "Fail", "Borderline")  |> sample(100, replace = TRUE)
  ) |> count(iv, dv))
#> # A tibble: 12 × 3
#>    iv    dv             n
#>    <chr> <chr>      <int>
#>  1 a     Borderline    14
#>  2 a     Fail          12
#>  3 a     Success       11
#>  4 b     Borderline    13
#>  5 b     Fail           4
#>  6 b     Success        7
#>  7 c     Borderline     6
#>  8 c     Fail           3
#>  9 c     Success        7
#> 10 d     Borderline     7
#> 11 d     Fail           7
#> 12 d     Success        9
d |> contingency_table(dv, .wt = n)
#> ____________________
#> Contingency Table: -
#> 
#> # A tibble: 4 × 4
#>   iv    Borderline  Fail Success
#> * <chr>      <int> <int>   <int>
#> 1 a             14    12      11
#> 2 b             13     4       7
#> 3 c              6     3       7
#> 4 d              7     7       9
d |> contingency_table(dv, .wt = n, .rownames = TRUE) |>
    print_lf() |>
    chisq.test()
#>   Borderline Fail Success
#> a         14   12      11
#> b         13    4       7
#> c          6    3       7
#> d          7    7       9
#> 
#> Warning: Chi-squared approximation may be incorrect
#> 
#> 	Pearson's Chi-squared test
#> 
#> data:  print_lf(contingency_table(d, dv, .wt = n, .rownames = TRUE))
#> X-squared = 4.6776, df = 6, p-value = 0.5858
#> 
## Use .data pronoun for more informative error messages
d |> contingency_table(dv, .wt = .data$n)
#> ____________________
#> Contingency Table: -
#> 
#> # A tibble: 4 × 4
#>   iv    Borderline  Fail Success
#> * <chr>      <int> <int>   <int>
#> 1 a             14    12      11
#> 2 b             13     4       7
#> 3 c              6     3       7
#> 4 d              7     7       9
try(d |> contingency_table(dv, .wt = .data$x))
#> Error in .data$x : Column `x` not found in `.data`.
rm(d)
## Using gss_cat dataset from {forcats} package
gss_cat |> contingency_table(race, relig, denom)
#> ____________________
#> Contingency Table: -
#> 
#> # A tibble: 47 × 5
#>    relig              denom             White Black Other
#>  * <fct>              <fct>             <int> <int> <int>
#>  1 Protestant         Southern baptist   1151   355    30
#>  2 Protestant         Baptist-dk which    723   697    37
#>  3 Protestant         No denomination    1020   149    55
#>  4 Orthodox-christian Not applicable       92     2     1
#>  5 None               Not applicable     2816   384   323
#>  6 Christian          Not applicable      147    49    28
#>  7 Protestant         Lutheran-mo synod   208     2     2
#>  8 Protestant         Other              1886   468   180
#>  9 Protestant         United methodist   1007    49    11
#> 10 Jewish             Not applicable      370    10     8
#> # ℹ 37 more rows
gss_cat |> contingency_table(race, !c(marital, rincome:partyid))
#> ____________________
#> Contingency Table: -
#> 
#> # A tibble: 47 × 5
#>    relig              denom             White Black Other
#>  * <fct>              <fct>             <int> <int> <int>
#>  1 Protestant         Southern baptist   1151   355    30
#>  2 Protestant         Baptist-dk which    723   697    37
#>  3 Protestant         No denomination    1020   149    55
#>  4 Orthodox-christian Not applicable       92     2     1
#>  5 None               Not applicable     2816   384   323
#>  6 Christian          Not applicable      147    49    28
#>  7 Protestant         Lutheran-mo synod   208     2     2
#>  8 Protestant         Other              1886   468   180
#>  9 Protestant         United methodist   1007    49    11
#> 10 Jewish             Not applicable      370    10     8
#> # ℹ 37 more rows
## Invokes warning and error message about duplicate 'row.names'
try(gss_cat |> contingency_table(race, relig, denom, .rownames = TRUE)) 
#> Warning: non-unique values when setting 'row.names': ‘Christian’, ‘Other’, ‘Protestant’
#> Error in `.rowNamesDF<-`(x, value = value) : 
#>   duplicate 'row.names' are not allowed
## Using xcontingency_table() avoids warning and error
gss_cat |> xcontingency_table(race, relig, denom)
#> ____________________________
#> Crossed Contingency Table: -
#> 
#> # A tibble: 47 × 4
#>    relig_denom                       White Black Other
#>  * <fct>                             <int> <int> <int>
#>  1 Protestant:Southern baptist        1151   355    30
#>  2 Protestant:Baptist-dk which         723   697    37
#>  3 Protestant:No denomination         1020   149    55
#>  4 Orthodox-christian:Not applicable    92     2     1
#>  5 None:Not applicable                2816   384   323
#>  6 Christian:Not applicable            147    49    28
#>  7 Protestant:Lutheran-mo synod        208     2     2
#>  8 Protestant:Other                   1886   468   180
#>  9 Protestant:United methodist        1007    49    11
#> 10 Jewish:Not applicable               370    10     8
#> # ℹ 37 more rows
gss_cat |> xcontingency_table(race, !c(marital, rincome:partyid))
#> ____________________________
#> Crossed Contingency Table: -
#> 
#> # A tibble: 47 × 4
#>    relig_denom                       White Black Other
#>  * <fct>                             <int> <int> <int>
#>  1 Protestant:Southern baptist        1151   355    30
#>  2 Protestant:Baptist-dk which         723   697    37
#>  3 Protestant:No denomination         1020   149    55
#>  4 Orthodox-christian:Not applicable    92     2     1
#>  5 None:Not applicable                2816   384   323
#>  6 Christian:Not applicable            147    49    28
#>  7 Protestant:Lutheran-mo synod        208     2     2
#>  8 Protestant:Other                   1886   468   180
#>  9 Protestant:United methodist        1007    49    11
#> 10 Jewish:Not applicable               370    10     8
#> # ℹ 37 more rows
gss_cat |> xcontingency_table(race, relig, denom, .crossname = "Denomination")
#> ____________________________
#> Crossed Contingency Table: -
#> 
#> # A tibble: 47 × 4
#>    Denomination                      White Black Other
#>  * <fct>                             <int> <int> <int>
#>  1 Protestant:Southern baptist        1151   355    30
#>  2 Protestant:Baptist-dk which         723   697    37
#>  3 Protestant:No denomination         1020   149    55
#>  4 Orthodox-christian:Not applicable    92     2     1
#>  5 None:Not applicable                2816   384   323
#>  6 Christian:Not applicable            147    49    28
#>  7 Protestant:Lutheran-mo synod        208     2     2
#>  8 Protestant:Other                   1886   468   180
#>  9 Protestant:United methodist        1007    49    11
#> 10 Jewish:Not applicable               370    10     8
#> # ℹ 37 more rows
gss_cat |>
    xcontingency_table(race, relig, denom, .rownames = TRUE) |>
    head(10)
#>                                   White Black Other
#> Protestant:Southern baptist        1151   355    30
#> Protestant:Baptist-dk which         723   697    37
#> Protestant:No denomination         1020   149    55
#> Orthodox-christian:Not applicable    92     2     1
#> None:Not applicable                2816   384   323
#> Christian:Not applicable            147    49    28
#> Protestant:Lutheran-mo synod        208     2     2
#> Protestant:Other                   1886   468   180
#> Protestant:United methodist        1007    49    11
#> Jewish:Not applicable               370    10     8
## Two more esoteric examples
ivars <- exprs(relig, denom)
gss_cat |> contingency_table(race, !!!ivars)
#> ____________________
#> Contingency Table: -
#> 
#> # A tibble: 47 × 5
#>    relig              denom             White Black Other
#>  * <fct>              <fct>             <int> <int> <int>
#>  1 Protestant         Southern baptist   1151   355    30
#>  2 Protestant         Baptist-dk which    723   697    37
#>  3 Protestant         No denomination    1020   149    55
#>  4 Orthodox-christian Not applicable       92     2     1
#>  5 None               Not applicable     2816   384   323
#>  6 Christian          Not applicable      147    49    28
#>  7 Protestant         Lutheran-mo synod   208     2     2
#>  8 Protestant         Other              1886   468   180
#>  9 Protestant         United methodist   1007    49    11
#> 10 Jewish             Not applicable      370    10     8
#> # ℹ 37 more rows
ivars <- c("relig", "denom")
gss_cat |> contingency_table(race, any_of(ivars))
#> ____________________
#> Contingency Table: -
#> 
#> # A tibble: 47 × 5
#>    relig              denom             White Black Other
#>  * <fct>              <fct>             <int> <int> <int>
#>  1 Protestant         Southern baptist   1151   355    30
#>  2 Protestant         Baptist-dk which    723   697    37
#>  3 Protestant         No denomination    1020   149    55
#>  4 Orthodox-christian Not applicable       92     2     1
#>  5 None               Not applicable     2816   384   323
#>  6 Christian          Not applicable      147    49    28
#>  7 Protestant         Lutheran-mo synod   208     2     2
#>  8 Protestant         Other              1886   468   180
#>  9 Protestant         United methodist   1007    49    11
#> 10 Jewish             Not applicable      370    10     8
#> # ℹ 37 more rows
rm(ivars)