Levels of Independent Variable where a Bernoulli Dependent Variable is Neither All Success Nor All Failure
good_levels.Rd
good_levels()
identifies levels
of an independent variable for which values of a Bernoulli dependent
variable are neither all one (success) nor all zero (failure).
drop_zero()
drops data with levels
of an independent variable for which values of a Bernoulli dependent
variable are either all one (success) or all zero (failure).
Usage
good_levels(object, ...)
# S3 method for class 'data.frame'
good_levels(object, .dep_var, .ind_var, ...)
# S3 method for class 'binom_contingency'
good_levels(object, .ind_var, ...)
drop_zero(object, ...)
# S3 method for class 'data.frame'
drop_zero(object, .dep_var, .ind_var, ...)
# S3 method for class 'binom_contingency'
drop_zero(object, ...)
drop_null(object, ...)
Arguments
- object
a data frame, or a data frame extension (e.g. a
tibble
), or an object of class"binom_contingency"
.- ...
further arguments passed to or from other methods.
- .dep_var
<
data-masking
> quoted name of a Bernoulli dependent variable that should benumeric
with values of 0 and 1.- .ind_var
<
data-masking
> quoted name of the independent variable, which may be afactor
, or acharacter vector
.
Value
good_levels()
returns a
character vector
comprisinglevels
of.ind_var
for which the corresponding values of.dep_var
are neither all one (success) nor all zero (failure)drop_zero()
returns an object of the same
class
as that provided by argumentobject
: either a data frame (or a data frame extension e.g., atibble
) comprising only rows with levels of the independent variable for which values of the Bernoulli dependent variable are neither all zero nor all one; or a"binom_contingency"
object excluding any rows for which values of the Bernoulli dependent variable are either all one (success) or all zero (failure).
Details
For a Bernoulli trial dataset with a numeric dependent variable coded as 0 or 1, the generic function
good_levels()
identifies levels
of an independent variable for which values of the dependent
variable are neither all zero nor all one i.e., \(0 < p < 1\).
For a similar Bernoulli trial dataset, the generic function drop_zero()
drops all rows of data other than those
with levels
of the independent variable identified by good_levels()
. Unused factor levels are dropped from the
independent variable and from any other factors in the data.
The drop_zero()
S3 method for objects of class "binom_contingency"
returns a binomial contingency table
equivalent to the original having been created using binom_contingency()
with argument
.drop_zero = TRUE
.
drop_null()
is deprecated, please use drop_zero()
.
Note
Dropping levels
of explanatory factors for which values of a Bernoulli dependent variable are either all zero
or all one, prevents warning messages that ‘fitted probabilities numerically 0 or 1 occurred’ when fitting
generalized linear models using glm()
or calculating odds ratios using odds_ratio()
;
see binom_contingency()
.
See also
binom_contingency
and levels
.
Other levels_data:
levels_data()
Examples
(d_bern <- bernoulli_data(probs = c(0.8, 0.4, 0, 0.3, 0.6 )))
#> ___________________________
#> Simulated Bernoulli Data: -
#>
#> # A tibble: 330 × 2
#> iv dv
#> * <fct> <int>
#> 1 a 1
#> 2 a 0
#> 3 a 1
#> 4 a 1
#> 5 a 0
#> 6 a 1
#> 7 a 1
#> 8 a 1
#> 9 a 1
#> 10 a 0
#> # ℹ 320 more rows
d_bern |> levels_data()
#> $iv
#> [1] "a" "b" "c" "d" "e"
#>
## S3 methods for class 'data.frame'
d_bern |> good_levels(dv, iv)
#> [1] "a" "b" "d" "e"
d_bern |> drop_zero(dv, iv)
#> ___________________________
#> Simulated Bernoulli Data: -
#>
#> # A tibble: 264 × 2
#> iv dv
#> <fct> <int>
#> 1 a 1
#> 2 a 0
#> 3 a 1
#> 4 a 1
#> 5 a 0
#> 6 a 1
#> 7 a 1
#> 8 a 1
#> 9 a 1
#> 10 a 0
#> # ℹ 254 more rows
d_bern |> drop_zero(dv, iv) |> levels_data()
#> $iv
#> [1] "a" "b" "d" "e"
#>
(d_bin <- d_bern |> binom_contingency(dv, iv))
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 5 × 3
#> iv pn qn
#> * <fct> <int> <int>
#> 1 a 50 16
#> 2 b 24 42
#> 3 c 0 66
#> 4 d 20 46
#> 5 e 50 16
## S3 methods for class 'binom_contingency'
d_bin |> good_levels(iv)
#> [1] "a" "b" "d" "e"
d_bin |> drop_zero()
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 4 × 3
#> iv pn qn
#> <fct> <int> <int>
#> 1 a 50 16
#> 2 b 24 42
#> 3 d 20 46
#> 4 e 50 16
## Results identical whether drop_zero() used
## before or after binom_contingency()
identical(
d_bern |> drop_zero(dv, iv) |> binom_contingency(dv, iv),
d_bin |> drop_zero()
)
#> [1] TRUE
## Results identical whether drop_zero() used
## during or after binom_contingency()
identical(
d_bern |> binom_contingency(dv, iv, .drop_zero = TRUE),
d_bin |> drop_zero()
)
#> [1] TRUE
rm(d_bern, d_bin)