Explanatory Factors in Data as List of Expressions

Create a list of defused expressions representing the names of all or a selection of explanatory factors or character vectors in a dataset.

Usage

expl_fcts(
  .data,
  ...,
  .named = FALSE,
  .val = c("syms", "data_syms", "character")
)

Arguments

.data: a data frame, or a data frame extension (e.g. a tibble).
...: <tidy-select> quoted name(s) of one or more factors or character vectors in .data, to be included in (or excluded from) the output.
.named: logical, whether to name the elements of the list. If TRUE, unnamed inputs are automatically named with set_names(); default FALSE.
.val: the type of output required. The default "syms" returns a list of symbols; the alternative "data_syms" returns a list of symbols prefixed with the .data pronoun. The "character" option returns a character vector.

Value

A list of symbols representing the names of selected explanatory factors or character vectors in .data; unless .val = "data_syms", in which case the symbols are prefixed with the .data pronoun or .val = "character" whereupon the selected names are returned as a character vector instead.

Details

By default, expl_fcts() creates a list of symbols i.e., defused R expressions, representing the names of all or a selection of explanatory factors (or character vectors) in .data, using syms() from package rlang. Alternatively, if .val = "data_syms", a list of symbols prefixed with the .data pronoun is returned instead. Finally, if .val = "character", expl_fcts() returns a character vector of the names of the explanatory factors (or character vectors) in .data

Variables in .data may be selected for inclusion or exclusion using the ... argument and the <tidy-select> syntax from package dplyr, including use of “selection helpers”. If no ... arguments are supplied, all categorical variables in .data will be included in the list.

A list of symbols returned by expl_fcts() may be “injected” into the ... arguments of contingency_table(), xcontingency_table(), binom_contingency() and other similar functions, using the splice-operator !!!. If .val = "character", the functions all_of() or any_of() should be used to wrap the resulting character vector of names instead of using !!!. A list of symbols returned by expl_fcts() may also be used to provide a list argument with injection support to lapply() (or purrr package map() functions), using the injection-operator !! (see examples).

Examples

(d <- list(
    iv2 = list(g = c("a", "c", "e"), h = c("b", "d", "f")),
    iv3 = list(i = c("a", "b", "c"), j = c("d", "e", "f")),
    iv4 = list(k = c("a", "b"), l = c("c", "d"), m = c("e", "f"))
) |> add_grps(bernoulli_data(levels = 6), iv, .key = _))
#> ___________________________
#> Simulated Bernoulli Data: -
#> 
#> # A tibble: 396 × 5
#>    iv    iv2   iv3   iv4      dv
#>    <fct> <fct> <fct> <fct> <int>
#>  1 a     g     i     k         0
#>  2 a     g     i     k         0
#>  3 a     g     i     k         0
#>  4 a     g     i     k         0
#>  5 a     g     i     k         0
#>  6 a     g     i     k         0
#>  7 a     g     i     k         1
#>  8 a     g     i     k         1
#>  9 a     g     i     k         0
#> 10 a     g     i     k         1
#> # ℹ 386 more rows

d |> expl_fcts()
#> [[1]]
#> iv
#> 
#> [[2]]
#> iv2
#> 
#> [[3]]
#> iv3
#> 
#> [[4]]
#> iv4
#> 

d |> expl_fcts(.named = TRUE)
#> $iv
#> iv
#> 
#> $iv2
#> iv2
#> 
#> $iv3
#> iv3
#> 
#> $iv4
#> iv4
#> 

d |> expl_fcts(.val = "data_syms")
#> [[1]]
#> .data$iv
#> 
#> [[2]]
#> .data$iv2
#> 
#> [[3]]
#> .data$iv3
#> 
#> [[4]]
#> .data$iv4
#> 

d |> expl_fcts(.named = TRUE, .val = "data_syms")
#> $iv
#> .data$iv
#> 
#> $iv2
#> .data$iv2
#> 
#> $iv3
#> .data$iv3
#> 
#> $iv4
#> .data$iv4
#> 

d |> expl_fcts(.val = "character")
#> [1] "iv"  "iv2" "iv3" "iv4"

d |> expl_fcts(.named = TRUE, .val = "character")
#>    iv   iv2   iv3   iv4 
#>  "iv" "iv2" "iv3" "iv4" 

## Select or exclude factors
d |> expl_fcts(iv, iv3)
#> [[1]]
#> iv
#> 
#> [[2]]
#> iv3
#> 

d |> expl_fcts(!c(iv, iv3))
#> [[1]]
#> iv2
#> 
#> [[2]]
#> iv4
#> 

## Use {dplyr} selection helpers e.g., last_col(), num_range() and starts_with()
d |> expl_fcts(last_col(1L))  ## Offset of 1L used, since last column of d is dv
#> [[1]]
#> iv4
#> 

d |> expl_fcts(!last_col())
#> [[1]]
#> iv
#> 
#> [[2]]
#> iv2
#> 
#> [[3]]
#> iv3
#> 
#> [[4]]
#> iv4
#> 

d |> expl_fcts(num_range("iv", 2:3))
#> [[1]]
#> iv2
#> 
#> [[2]]
#> iv3
#> 

d |> expl_fcts(!num_range("iv", 2:3))
#> [[1]]
#> iv
#> 
#> [[2]]
#> iv4
#> 

d |> expl_fcts(starts_with("iv"))
#> [[1]]
#> iv
#> 
#> [[2]]
#> iv2
#> 
#> [[3]]
#> iv3
#> 
#> [[4]]
#> iv4
#> 

## Negation of selection helper excludes all explanatory factors
d |> expl_fcts(!starts_with("iv"))
#> list()

## In following three examples, each triplet should give identical results
## Include all explanatory factors
d |> binom_contingency(dv)
#> _____________________________
#> Binomial Contingency Table: -
#> 
#> # A tibble: 6 × 6
#>   iv    iv2   iv3   iv4      pn    qn
#> * <fct> <fct> <fct> <fct> <int> <int>
#> 1 a     g     i     k        30    36
#> 2 b     h     i     k        29    37
#> 3 c     g     i     l        24    42
#> 4 d     h     j     l        18    48
#> 5 e     g     j     m        10    56
#> 6 f     h     j     m         5    61

d |> binom_contingency(dv, !!!expl_fcts(d))
#> _____________________________
#> Binomial Contingency Table: -
#> 
#> # A tibble: 6 × 6
#>   iv    iv2   iv3   iv4      pn    qn
#> * <fct> <fct> <fct> <fct> <int> <int>
#> 1 a     g     i     k        30    36
#> 2 b     h     i     k        29    37
#> 3 c     g     i     l        24    42
#> 4 d     h     j     l        18    48
#> 5 e     g     j     m        10    56
#> 6 f     h     j     m         5    61

d |> binom_contingency(dv, all_of(expl_fcts(d, .val = "character")))
#> _____________________________
#> Binomial Contingency Table: -
#> 
#> # A tibble: 6 × 6
#>   iv    iv2   iv3   iv4      pn    qn
#> * <fct> <fct> <fct> <fct> <int> <int>
#> 1 a     g     i     k        30    36
#> 2 b     h     i     k        29    37
#> 3 c     g     i     l        24    42
#> 4 d     h     j     l        18    48
#> 5 e     g     j     m        10    56
#> 6 f     h     j     m         5    61

## Include only iv and iv3
d |> binom_contingency(dv, iv, iv3)
#> _____________________________
#> Binomial Contingency Table: -
#> 
#> # A tibble: 6 × 4
#>   iv    iv3      pn    qn
#> * <fct> <fct> <int> <int>
#> 1 a     i        30    36
#> 2 b     i        29    37
#> 3 c     i        24    42
#> 4 d     j        18    48
#> 5 e     j        10    56
#> 6 f     j         5    61

d |> binom_contingency(dv, !!!expl_fcts(d, iv, iv3))
#> _____________________________
#> Binomial Contingency Table: -
#> 
#> # A tibble: 6 × 4
#>   iv    iv3      pn    qn
#> * <fct> <fct> <int> <int>
#> 1 a     i        30    36
#> 2 b     i        29    37
#> 3 c     i        24    42
#> 4 d     j        18    48
#> 5 e     j        10    56
#> 6 f     j         5    61

d |> binom_contingency(dv, all_of(expl_fcts(d, iv, iv3, .val = "character")))
#> _____________________________
#> Binomial Contingency Table: -
#> 
#> # A tibble: 6 × 4
#>   iv    iv3      pn    qn
#> * <fct> <fct> <int> <int>
#> 1 a     i        30    36
#> 2 b     i        29    37
#> 3 c     i        24    42
#> 4 d     j        18    48
#> 5 e     j        10    56
#> 6 f     j         5    61

## Exclude iv and iv3
d |> binom_contingency(dv, !c(iv, iv3))
#> _____________________________
#> Binomial Contingency Table: -
#> 
#> # A tibble: 6 × 4
#>   iv2   iv4      pn    qn
#> * <fct> <fct> <int> <int>
#> 1 g     k        30    36
#> 2 h     k        29    37
#> 3 g     l        24    42
#> 4 h     l        18    48
#> 5 g     m        10    56
#> 6 h     m         5    61

d |> binom_contingency(dv, !!!expl_fcts(d, !c(iv, iv3)))
#> _____________________________
#> Binomial Contingency Table: -
#> 
#> # A tibble: 6 × 4
#>   iv2   iv4      pn    qn
#> * <fct> <fct> <int> <int>
#> 1 g     k        30    36
#> 2 h     k        29    37
#> 3 g     l        24    42
#> 4 h     l        18    48
#> 5 g     m        10    56
#> 6 h     m         5    61

d |> binom_contingency(dv, all_of(expl_fcts(d, !c(iv, iv3), .val = "character")))
#> _____________________________
#> Binomial Contingency Table: -
#> 
#> # A tibble: 6 × 4
#>   iv2   iv4      pn    qn
#> * <fct> <fct> <int> <int>
#> 1 g     k        30    36
#> 2 h     k        29    37
#> 3 g     l        24    42
#> 4 h     l        18    48
#> 5 g     m        10    56
#> 6 h     m         5    61

## Use with lapply, binom_contingency(), glm() and odds_ratio()
expl_fcts(d, .named = TRUE) |>
    lapply(\(x) binom_contingency(d, dv, !!x))
#> $iv
#> _____________________________
#> Binomial Contingency Table: -
#> 
#> # A tibble: 6 × 3
#>   iv       pn    qn
#> * <fct> <int> <int>
#> 1 a        30    36
#> 2 b        29    37
#> 3 c        24    42
#> 4 d        18    48
#> 5 e        10    56
#> 6 f         5    61
#> 
#> $iv2
#> _____________________________
#> Binomial Contingency Table: -
#> 
#> # A tibble: 2 × 3
#>   iv2      pn    qn
#> * <fct> <int> <int>
#> 1 g        64   134
#> 2 h        52   146
#> 
#> $iv3
#> _____________________________
#> Binomial Contingency Table: -
#> 
#> # A tibble: 2 × 3
#>   iv3      pn    qn
#> * <fct> <int> <int>
#> 1 i        83   115
#> 2 j        33   165
#> 
#> $iv4
#> _____________________________
#> Binomial Contingency Table: -
#> 
#> # A tibble: 3 × 3
#>   iv4      pn    qn
#> * <fct> <int> <int>
#> 1 k        59    73
#> 2 l        42    90
#> 3 m        15   117
#> 

expl_fcts(d, .named = TRUE) |>
    lapply(\(x)
        binom_contingency(d, dv, !!x) |>
        glm(cbind(pn, qn) ~ ., binomial, data = _)
    )
#> $iv
#> 
#> Call:  glm(formula = cbind(pn, qn) ~ ., family = binomial, data = binom_contingency(d, 
#>     dv, !!x))
#> 
#> Coefficients:
#> (Intercept)          ivb          ivc          ivd          ive          ivf  
#>     -0.1823      -0.0613      -0.3773      -0.7985      -1.5404      -2.3191  
#> 
#> Degrees of Freedom: 5 Total (i.e. Null);  0 Residual
#> Null Deviance:	    42.07 
#> Residual Deviance: 2.531e-14 	AIC: 37.66
#> 
#> $iv2
#> 
#> Call:  glm(formula = cbind(pn, qn) ~ ., family = binomial, data = binom_contingency(d, 
#>     dv, !!x))
#> 
#> Coefficients:
#> (Intercept)         iv2h  
#>     -0.7390      -0.2934  
#> 
#> Degrees of Freedom: 1 Total (i.e. Null);  0 Residual
#> Null Deviance:	    1.758 
#> Residual Deviance: -4.174e-14 	AIC: 15.1
#> 
#> $iv3
#> 
#> Call:  glm(formula = cbind(pn, qn) ~ ., family = binomial, data = binom_contingency(d, 
#>     dv, !!x))
#> 
#> Coefficients:
#> (Intercept)         iv3j  
#>     -0.3261      -1.2833  
#> 
#> Degrees of Freedom: 1 Total (i.e. Null);  0 Residual
#> Null Deviance:	    31.25 
#> Residual Deviance: 9.859e-14 	AIC: 14.87
#> 
#> $iv4
#> 
#> Call:  glm(formula = cbind(pn, qn) ~ ., family = binomial, data = binom_contingency(d, 
#>     dv, !!x))
#> 
#> Coefficients:
#> (Intercept)         iv4l         iv4m  
#>     -0.2129      -0.5492      -1.8412  
#> 
#> Degrees of Freedom: 2 Total (i.e. Null);  0 Residual
#> Null Deviance:	    38.86 
#> Residual Deviance: -4.441e-15 	AIC: 20.96
#> 

expl_fcts(d, .named = TRUE) |>
    lapply(\(x)
        binom_contingency(d, dv, !!x, .drop_zero = TRUE) |>
        odds_ratio(.ind_var = !!x)
    )
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> $iv
#> ____________________________
#> Estimates and Odds Ratios: -
#> 
#> # A tibble: 6 × 7
#>   parameter   estimate    se     p_val odds_ratio ci[,"2.5%"] [,"97.5%"] sig  
#>   <chr>          <dbl> <dbl>     <dbl>      <dbl>       <dbl>      <dbl> <fct>
#> 1 (Intercept)  -0.182  0.247 0.461         1          NA          NA     NS   
#> 2 ivb          -0.0613 0.350 0.861         0.941       0.472       1.87  NS   
#> 3 ivc          -0.377  0.356 0.289         0.686       0.339       1.37  NS   
#> 4 ivd          -0.799  0.371 0.0313        0.45        0.215       0.923 *    
#> 5 ive          -1.54   0.423 0.000271      0.214       0.0898      0.478 ***  
#> 6 ivf          -2.32   0.527 0.0000107     0.0984      0.0313      0.256 ***  
#> 
#> $iv2
#> ____________________________
#> Estimates and Odds Ratios: -
#> 
#> # A tibble: 2 × 7
#>   parameter   estimate    se     p_val odds_ratio ci[,"2.5%"] [,"97.5%"] sig  
#>   <chr>          <dbl> <dbl>     <dbl>      <dbl>       <dbl>      <dbl> <fct>
#> 1 (Intercept)   -0.739 0.152 0.0000012      1          NA          NA    ***  
#> 2 iv2h          -0.293 0.222 0.186          0.746       0.482       1.15 NS   
#> 
#> $iv3
#> ____________________________
#> Estimates and Odds Ratios: -
#> 
#> # A tibble: 2 × 7
#>   parameter   estimate    se     p_val odds_ratio ci[,"2.5%"] [,"97.5%"] sig  
#>   <chr>          <dbl> <dbl>     <dbl>      <dbl>       <dbl>      <dbl> <fct>
#> 1 (Intercept)   -0.326 0.144 0.0236         1          NA         NA     *    
#> 2 iv3j          -1.28  0.239 0.0000001      0.277       0.172      0.439 ***  
#> 
#> $iv4
#> ____________________________
#> Estimates and Odds Ratios: -
#> 
#> # A tibble: 3 × 7
#>   parameter   estimate    se  p_val odds_ratio ci[,"2.5%"] [,"97.5%"] sig  
#>   <chr>          <dbl> <dbl>  <dbl>      <dbl>       <dbl>      <dbl> <fct>
#> 1 (Intercept)   -0.213 0.175 0.224       1         NA          NA     NS   
#> 2 iv4l          -0.549 0.256 0.0320      0.577      0.348       0.951 *    
#> 3 iv4m          -1.84  0.325 0           0.159      0.0814      0.293 ***  
#> 

rm(d)