Binomial Contingency Table for Data with a Binary Outcome
binom_contingency.Rd
binom_contingency()
creates a binomial contingency table for data with a binary dependent variable and
one or more categorical independent variables, optionally including totals, proportions and confidence intervals.
Usage
binom_contingency(
.data,
.dep_var,
...,
.drop_zero = FALSE,
.propci = FALSE,
.level = 0.95
)
as_binom_contingency(object, ...)
# S3 method for class 'data.frame'
as_binom_contingency(
object,
...,
.pn = NULL,
.qn = NULL,
.drop_zero = FALSE,
.propci = FALSE,
.level = 0.95
)
Arguments
- .data
a data frame, or a data frame extension (e.g. a
tibble
).- .dep_var
<
data-masking
> quoted name of a binary dependent variable, which should benumeric
with values of 0 and 1.- ...
for
binomial_contingency()
: <tidy-select
> quoted name(s) of one or morefactor
orcharacter vector
columns in.data
, to be included in (or excluded from) the output.for
as_binomial_contingency()
: further arguments passed to or from other methods.- .drop_zero
logical
. IfTRUE
,levels
of explanatory factors for which values of.dep_var
are either all zero or all one are dropped from the output; defaultFALSE
.- .propci
logical
. IfTRUE
, each row of the output"binom_contingency"
object includes totals, proportions and confidence intervals; defaultFALSE
.- .level
the confidence level required; default 0.95.
- object
a data frame, or a data frame extension (e.g. a
tibble
), to be coerced to a"binom_contingency"
object.- .pn, .qn
<
data-masking
> quoted names of columns inobject
representing numbers of successes and failures in Bernoulli trials; defaultNULL
.
Value
An object of class "binom_contingency"
, "announce"
, inheriting from tibble
,
with columns pn
and qn
representing the number of "successes" and "failures" respectively, and further columns
for independent (explanatory) variables.
If .propci = TRUE
additional columns are output representing totals, proportions and confidence intervals.
Details
Categorical variables (i.e. factors or character vectors) in .data
required as factors in the resulting
contingency table may be selected for inclusion or exclusion using the ...
argument and the
<tidy-select
> syntax from package dplyr, including use of
“selection helpers”. If no ...
arguments are supplied, all categorical variables in .data
(other than .dep_var
) will be used.
A list of defused R expressions, as for instance created by
expl_fcts()
, may be used as the ...
arguments and should be injected using the
splice-operator, !!!
, see examples.
Use .drop_zero = TRUE
to drop levels
of explanatory factors for which values of
.dep_var
are either all zero or all one, to prevent a warning messages that ‘fitted probabilities
numerically 0 or 1 occurred’ when fitting generalized linear models using glm()
or calculating
odds ratios using odds_ratio()
; see examples and Venables & Ripley (2002, pp. 197–8).
as_binom_contingency()
attempts to coerce an object to class
"binom_contingency"
. If
.pn
or .qn
arguments are not provided, these will be assumed to be columns "pn"
and "qn"
respectively.
Note
Confidence intervals are calculated using prop.test()
, and are based on Wilson's score method
without continuity correction (Newcombe, 1998).
References
Confidence interval from R's prop.test()
differs from hand calculation and result from SAS.
Stack Exchange.
Newcombe R.G. (1998). Two-Sided Confidence Intervals for the Single Proportion: Comparison of Seven Methods. Statistics in Medicine, 17, 857-872. doi:10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E .
Venables, W.N. and Ripley, B.D. (2002) Modern Applied Statistics with S. New York: Springer. doi:10.1007/978-0-387-21706-2 .
Yates' continuity correction in confidence interval returned by prop.test
.
Stack Exchange.
See also
drop_zero()
, glm()
, odds_ratio()
,
prop.test()
and tibble
;
Print_Methods
for S3 method for printing objects of class "binom_contingency"
.
Other contingency_table:
contingency_table()
,
expl_fcts()
Examples
## Bernoulli data with a single explanatory variable
(d <- bernoulli_data())
#> ___________________________
#> Simulated Bernoulli Data: -
#>
#> # A tibble: 330 × 2
#> iv dv
#> * <fct> <int>
#> 1 a 0
#> 2 a 1
#> 3 a 1
#> 4 a 1
#> 5 a 0
#> 6 a 1
#> 7 a 1
#> 8 a 0
#> 9 a 0
#> 10 a 1
#> # ℹ 320 more rows
d |> binom_contingency(dv)
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 5 × 3
#> iv pn qn
#> * <fct> <int> <int>
#> 1 a 28 38
#> 2 b 26 40
#> 3 c 19 47
#> 4 d 18 48
#> 5 e 8 58
d |> binom_contingency(dv, .propci = TRUE)
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 5 × 7
#> iv pn qn n p lower upper
#> * <fct> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 a 28 38 66 0.424 0.312 0.544
#> 2 b 26 40 66 0.394 0.285 0.515
#> 3 c 19 47 66 0.288 0.193 0.406
#> 4 d 18 48 66 0.273 0.180 0.390
#> 5 e 8 58 66 0.121 0.0627 0.221
#> Confidence level 0.95
## Use .data pronoun for more informative error messages
d |> binom_contingency(.data$dv)
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 5 × 3
#> iv pn qn
#> * <fct> <int> <int>
#> 1 a 28 38
#> 2 b 26 40
#> 3 c 19 47
#> 4 d 18 48
#> 5 e 8 58
try(d |> binom_contingency(dx))
#> Error : object 'dx' not found
try(d |> binom_contingency(.data$dx))
#> Error in .data$dx : Column `dx` not found in `.data`.
## NB this section is intended to be pasted in, rather than run by example()
if (FALSE) { # \dontrun{
oldopt <- options(warn = 0, nwarnings = 50)
## Bernoulli data with identical responses for
## the last level of the explanatory variable
d <- bernoulli_data(probs = seq(0.4, 0, length.out = 5))
d |> binom_contingency(dv)
## Elicits mutiple warnings in glm.fit()
## 'fitted probabilities numerically 0 or 1 occurred'
d |> binom_contingency(dv) |>
glm(cbind(pn, qn) ~ iv, binomial, data = _) |>
confint()
summary(warnings())
## Argument .drop_zero = TRUE in binom_contingency()
## prevents these warnings
d |> binom_contingency(dv, .drop_zero = TRUE)
d |> binom_contingency(dv, .drop_zero = TRUE) |>
glm(cbind(pn, qn) ~ iv, binomial, data = _) |>
confint()
options(oldopt)
} # }
## Bernoulli data with multiple explanatory variables
(d <- list(
iv2 = list(i = c("a", "c", "e", "g"), j = c("b", "d", "f", "h")),
iv3 = list(k = c("a", "b", "c", "d"), l = c("e", "f", "g", "h")),
iv4 = list(k = c("a", "b"), l = c("c", "d"), m = c("e", "f"))
) |> add_grps(bernoulli_data(levels = 8), iv, .key = _))
#> ___________________________
#> Simulated Bernoulli Data: -
#>
#> # A tibble: 528 × 5
#> iv iv2 iv3 iv4 dv
#> <fct> <fct> <fct> <fct> <int>
#> 1 a i k k 1
#> 2 a i k k 1
#> 3 a i k k 0
#> 4 a i k k 0
#> 5 a i k k 0
#> 6 a i k k 0
#> 7 a i k k 0
#> 8 a i k k 1
#> 9 a i k k 0
#> 10 a i k k 0
#> # ℹ 518 more rows
d |> binom_contingency(dv)
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 8 × 6
#> iv iv2 iv3 iv4 pn qn
#> * <fct> <fct> <fct> <fct> <int> <int>
#> 1 a i k k 24 42
#> 2 b j k k 28 38
#> 3 c i k l 26 40
#> 4 d j k l 25 41
#> 5 e i l m 18 48
#> 6 f j l m 12 54
#> 7 g i l g 8 58
#> 8 h j l h 7 59
d |> binom_contingency(dv, iv, iv3)
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 8 × 4
#> iv iv3 pn qn
#> * <fct> <fct> <int> <int>
#> 1 a k 24 42
#> 2 b k 28 38
#> 3 c k 26 40
#> 4 d k 25 41
#> 5 e l 18 48
#> 6 f l 12 54
#> 7 g l 8 58
#> 8 h l 7 59
d |> binom_contingency(dv, !c(iv2, iv4))
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 8 × 4
#> iv iv3 pn qn
#> * <fct> <fct> <int> <int>
#> 1 a k 24 42
#> 2 b k 28 38
#> 3 c k 26 40
#> 4 d k 25 41
#> 5 e l 18 48
#> 6 f l 12 54
#> 7 g l 8 58
#> 8 h l 7 59
d |> binom_contingency(dv, !!!expl_fcts(d))
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 8 × 6
#> iv iv2 iv3 iv4 pn qn
#> * <fct> <fct> <fct> <fct> <int> <int>
#> 1 a i k k 24 42
#> 2 b j k k 28 38
#> 3 c i k l 26 40
#> 4 d j k l 25 41
#> 5 e i l m 18 48
#> 6 f j l m 12 54
#> 7 g i l g 8 58
#> 8 h j l h 7 59
d |> binom_contingency(dv, .propci = TRUE)
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 8 × 10
#> iv iv2 iv3 iv4 pn qn n p lower upper
#> * <fct> <fct> <fct> <fct> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 a i k k 24 42 66 0.364 0.258 0.484
#> 2 b j k k 28 38 66 0.424 0.312 0.544
#> 3 c i k l 26 40 66 0.394 0.285 0.515
#> 4 d j k l 25 41 66 0.379 0.271 0.499
#> 5 e i l m 18 48 66 0.273 0.180 0.390
#> 6 f j l m 12 54 66 0.182 0.107 0.291
#> 7 g i l g 8 58 66 0.121 0.0627 0.221
#> 8 h j l h 7 59 66 0.106 0.0523 0.203
#> Confidence level 0.95
d |> binom_contingency(dv, .drop_zero = TRUE)
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 8 × 6
#> iv iv2 iv3 iv4 pn qn
#> * <fct> <fct> <fct> <fct> <int> <int>
#> 1 a i k k 24 42
#> 2 b j k k 28 38
#> 3 c i k l 26 40
#> 4 d j k l 25 41
#> 5 e i l m 18 48
#> 6 f j l m 12 54
#> 7 g i l g 8 58
#> 8 h j l h 7 59
d |>
binom_contingency(dv, iv2, iv3, .drop_zero = TRUE) |>
glm(cbind(pn, qn) ~ ., binomial, data = _) |>
summary()
#>
#> Call:
#> glm(formula = cbind(pn, qn) ~ ., family = binomial, data = binom_contingency(d,
#> dv, iv2, iv3, .drop_zero = TRUE))
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.4069 0.1605 -2.536 0.0112 *
#> iv2j -0.0799 0.1999 -0.400 0.6894
#> iv3l -1.1361 0.2067 -5.496 3.88e-08 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 33.6983 on 3 degrees of freedom
#> Residual deviance: 1.3006 on 1 degrees of freedom
#> AIC: 27.397
#>
#> Number of Fisher Scoring iterations: 3
#>
d |>
binom_contingency(dv, iv2, iv3, .drop_zero = TRUE) |>
glm(cbind(pn, qn) ~ ., binomial, data = _) |>
odds_ratio()
#>
#> Call: glm(formula = cbind(pn, qn) ~ ., family = binomial, data = binom_contingency(d, dv, iv2, iv3, .drop_zero = TRUE))
#>
#> Waiting for profiling to be done...
#> ____________________________
#> Estimates and Odds Ratios: -
#>
#> # A tibble: 3 × 7
#> parameter estimate se p_val odds_ratio ci[,"2.5%"] [,"97.5%"] sig
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 (Intercept) -0.407 0.160 0.0112 1 NA NA *
#> 2 iv2j -0.0799 0.200 0.689 0.923 0.623 1.37 NS
#> 3 iv3l -1.14 0.207 0 0.321 0.213 0.479 ***
## Use {dplyr} selection helpers e.g., last_col(), num_range() and starts_with()
d |> binom_contingency(dv, last_col(1L)) ## Offset of 1L used, since last column of d is dv
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 5 × 3
#> iv4 pn qn
#> * <fct> <int> <int>
#> 1 k 52 80
#> 2 l 51 81
#> 3 m 30 102
#> 4 g 8 58
#> 5 h 7 59
d |> binom_contingency(dv, !last_col(1L))
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 8 × 5
#> iv iv2 iv3 pn qn
#> * <fct> <fct> <fct> <int> <int>
#> 1 a i k 24 42
#> 2 b j k 28 38
#> 3 c i k 26 40
#> 4 d j k 25 41
#> 5 e i l 18 48
#> 6 f j l 12 54
#> 7 g i l 8 58
#> 8 h j l 7 59
d |> binom_contingency(dv, num_range("iv", 2:3))
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 4 × 4
#> iv2 iv3 pn qn
#> * <fct> <fct> <int> <int>
#> 1 i k 50 82
#> 2 j k 53 79
#> 3 i l 26 106
#> 4 j l 19 113
d |> binom_contingency(dv, !num_range("iv", 2:3))
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 8 × 4
#> iv iv4 pn qn
#> * <fct> <fct> <int> <int>
#> 1 a k 24 42
#> 2 b k 28 38
#> 3 c l 26 40
#> 4 d l 25 41
#> 5 e m 18 48
#> 6 f m 12 54
#> 7 g g 8 58
#> 8 h h 7 59
d |> binom_contingency(dv, starts_with("iv"))
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 8 × 6
#> iv iv2 iv3 iv4 pn qn
#> * <fct> <fct> <fct> <fct> <int> <int>
#> 1 a i k k 24 42
#> 2 b j k k 28 38
#> 3 c i k l 26 40
#> 4 d j k l 25 41
#> 5 e i l m 18 48
#> 6 f j l m 12 54
#> 7 g i l g 8 58
#> 8 h j l h 7 59
d |> binom_contingency(dv, !starts_with("iv")) ## Here, negation excludes all explanatory factors
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 1 × 2
#> pn qn
#> * <int> <int>
#> 1 148 380
## as_binom_contingency()
(d <- data.frame(
iv = letters[1:5],
success = c(34, 31, 16, 0, 10),
failure = c(32, 35, 50, 66, 56)
))
#> iv success failure
#> 1 a 34 32
#> 2 b 31 35
#> 3 c 16 50
#> 4 d 0 66
#> 5 e 10 56
d |> as_binom_contingency(.pn = success, .qn = failure)
#> Coercing `.pn` and/or `.qn` to integer
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 5 × 3
#> iv pn qn
#> * <chr> <int> <int>
#> 1 a 34 32
#> 2 b 31 35
#> 3 c 16 50
#> 4 d 0 66
#> 5 e 10 56
d |> as_binom_contingency(.pn = success, .qn = failure, .drop_zero = TRUE)
#> Coercing `.pn` and/or `.qn` to integer
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 4 × 3
#> iv pn qn
#> * <chr> <int> <int>
#> 1 a 34 32
#> 2 b 31 35
#> 3 c 16 50
#> 4 e 10 56
(d <- binom_data())
#> __________________________
#> Simulated Binomial Data: -
#>
#> # A tibble: 5 × 3
#> iv pn qn
#> * <fct> <int> <int>
#> 1 a 35 31
#> 2 b 26 40
#> 3 c 17 49
#> 4 d 17 49
#> 5 e 10 56
d |> as_binom_contingency()
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 5 × 3
#> iv pn qn
#> * <fct> <int> <int>
#> 1 a 35 31
#> 2 b 26 40
#> 3 c 17 49
#> 4 d 17 49
#> 5 e 10 56
d |> as_binom_contingency(.propci = TRUE)
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 5 × 7
#> iv pn qn n p lower upper
#> * <fct> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 a 35 31 66 0.530 0.412 0.646
#> 2 b 26 40 66 0.394 0.285 0.515
#> 3 c 17 49 66 0.258 0.167 0.374
#> 4 d 17 49 66 0.258 0.167 0.374
#> 5 e 10 56 66 0.152 0.0844 0.257
#> Confidence level 0.95
rm(d)