Levels of Independent Variable where a Bernoulli Dependent Variable is Neither All Success Nor All Failure
good_levels.Rdgood_levels() identifies levels of an independent variable for which values of a Bernoulli dependent
variable are neither all one (success) nor all zero (failure).
drop_zero() drops data with levels of an independent variable for which values of a Bernoulli dependent
variable are either all one (success) or all zero (failure).
Usage
good_levels(object, ...)
# S3 method for class 'data.frame'
good_levels(object, .dep_var, .ind_var, ...)
# S3 method for class 'binom_contingency'
good_levels(object, .ind_var, ...)
drop_zero(object, ...)
# S3 method for class 'data.frame'
drop_zero(object, .dep_var, .ind_var, ...)
# S3 method for class 'binom_contingency'
drop_zero(object, ...)
drop_null(object, ...)Arguments
- object
a data frame, or a data frame extension (e.g. a
tibble), or an object of class"binom_contingency".- ...
further arguments passed to or from other methods.
- .dep_var
<
data-masking> quoted name of a Bernoulli dependent variable that should benumericwith values of 0 and 1.- .ind_var
<
data-masking> quoted name of the independent variable, which may be afactor, or acharacter vector.
Value
good_levels()returns a
character vectorcomprisinglevelsof.ind_varfor which the corresponding values of.dep_varare neither all one (success) nor all zero (failure)drop_zero()returns an object of the same
classas that provided by argumentobject: either a data frame (or a data frame extension e.g., atibble) comprising only rows with levels of the independent variable for which values of the Bernoulli dependent variable are neither all zero nor all one; or a"binom_contingency"object excluding any rows for which values of the Bernoulli dependent variable are either all one (success) or all zero (failure).
Details
For a Bernoulli trial dataset with a numeric dependent variable coded as 0 or 1, the generic function
good_levels() identifies levels of an independent variable for which values of the dependent
variable are neither all zero nor all one i.e., \(0 < p < 1\).
For a similar Bernoulli trial dataset, the generic function drop_zero() drops all rows of data other than those
with levels of the independent variable identified by good_levels(). Unused factor levels are dropped from the
independent variable and from any other factors in the data.
The drop_zero() S3 method for objects of class "binom_contingency" returns a binomial contingency table
equivalent to the original having been created using binom_contingency() with argument
.drop_zero = TRUE.
drop_null() is deprecated, please use drop_zero().
Note
Dropping levels of explanatory factors for which values of a Bernoulli dependent variable are either all zero
or all one, prevents warning messages that ‘fitted probabilities numerically 0 or 1 occurred’ when fitting
generalized linear models using glm() or calculating odds ratios using odds_ratio();
see binom_contingency().
See also
binom_contingency and levels.
Other levels_data:
levels_data()
Examples
(d_bern <- bernoulli_data(probs = c(0.8, 0.4, 0, 0.3, 0.6 )))
#> ___________________________
#> Simulated Bernoulli Data: -
#>
#> # A tibble: 330 × 2
#> iv dv
#> * <fct> <int>
#> 1 a 1
#> 2 a 1
#> 3 a 0
#> 4 a 0
#> 5 a 1
#> 6 a 1
#> 7 a 1
#> 8 a 1
#> 9 a 1
#> 10 a 1
#> # ℹ 320 more rows
d_bern |> levels_data()
#> $iv
#> [1] "a" "b" "c" "d" "e"
#>
## S3 methods for class 'data.frame'
d_bern |> good_levels(dv, iv)
#> [1] "a" "b" "d" "e"
d_bern |> drop_zero(dv, iv)
#> ___________________________
#> Simulated Bernoulli Data: -
#>
#> # A tibble: 264 × 2
#> iv dv
#> <fct> <int>
#> 1 a 1
#> 2 a 1
#> 3 a 0
#> 4 a 0
#> 5 a 1
#> 6 a 1
#> 7 a 1
#> 8 a 1
#> 9 a 1
#> 10 a 1
#> # ℹ 254 more rows
d_bern |> drop_zero(dv, iv) |> levels_data()
#> $iv
#> [1] "a" "b" "d" "e"
#>
(d_bin <- d_bern |> binom_contingency(dv, iv))
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 5 × 3
#> iv pn qn
#> * <fct> <int> <int>
#> 1 a 51 15
#> 2 b 23 43
#> 3 c 0 66
#> 4 d 15 51
#> 5 e 36 30
## S3 methods for class 'binom_contingency'
d_bin |> good_levels(iv)
#> [1] "a" "b" "d" "e"
d_bin |> drop_zero()
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 4 × 3
#> iv pn qn
#> <fct> <int> <int>
#> 1 a 51 15
#> 2 b 23 43
#> 3 d 15 51
#> 4 e 36 30
## Results identical whether drop_zero() used
## before or after binom_contingency()
identical(
d_bern |> drop_zero(dv, iv) |> binom_contingency(dv, iv),
d_bin |> drop_zero()
)
#> [1] TRUE
## Results identical whether drop_zero() used
## during or after binom_contingency()
identical(
d_bern |> binom_contingency(dv, iv, .drop_zero = TRUE),
d_bin |> drop_zero()
)
#> [1] TRUE
rm(d_bern, d_bin)