Skip to contents

Compare a series of nested Bernoulli or binomial GLMs by supplying data, a dependent variable and a list of terms representing the right-hand side of each of a series of model formulae.

Usage

comp_glm(.data, .dep_var, ..., .family = binomial, .arrange_by = desc(AIC))

Arguments

.data

a data frame, or a data frame extension (e.g. a tibble).

.dep_var

<data-masking> quoted name of the binary dependent variable to be used as the LHS of the model formula; should be numeric with values of 0 and 1, or a two-column matrix with the columns giving the numbers of successes and failures e.g., cbind(pn, qn).

...

<dynamic-dots> the RHS of any number of model formulae to be compared, based on independent variables in .data.

.family

a description of the error distribution and link function to be used in the model. Can be a character string naming a family function, a family function or the result of a call to a family function; default binomial.

.arrange_by

<data-masking> quoted name of a column for ordering results. Use desc to sort by a variable in descending order; default desc(AIC).

Value

An object of classes "comp_glm", "announce", inheriting from tibble and hence "data.frame", with columns including the RHS of the model formula, the glm object and eight further goodness-of-fit measures output by glance.glm(), as follows: -

f_rhs

The right-hand side of the formula as supplied in the ... argument.

.glm

A list-column containing the glm model objects.

AIC

Akaike's Information Criterion.

BIC

Bayesian Information Criterion.

deviance

Deviance of the model.

df.null

Degrees of freedom for the null model.

df.residual

Residual degrees of freedom.

logLik

The log-likelihood of the model.

nobs

Number of observations.

null.deviance

Deviance of the null model.

Details

comp_glm() builds the formulas from the dependent variable and the formula right-hand side, calls glm, and saves the resulting objects of class "glm" as a list column in a tibble, together with model fit information obtained using glance.glm from the broom package. The output may be ordered by a selected column, otherwise by default in descending order of Akaike's Information Criterion (AIC).

comp_glm() may be used conveniently to compare a series of related binomial GLMs based on different groupings of factors added to .data using add_grps.

Note

It is the user's responsibility to check models are suitably nested to ensure meaningful comparisons.

See also

add_grps, formula, glance.glm and glm.

Other comp_glm: anova_tbl(), summanov(), univ_anova()

Examples

## Simulate Bernoulli data

(d <- list(
        iv2 = list(g = c("a", "c", "e"), h = c("b", "d", "f")),
        iv3 = list(i = c("a", "b", "c"), j = c("d", "e", "f")),
        iv4 = list(k = c("a", "b"), l = c("c", "d"), m = c("e", "f"))
    ) |> add_grps(bernoulli_data(levels = 6), iv, .key = _))
#> ___________________________
#> Simulated Bernoulli Data: -
#> 
#> # A tibble: 396 × 5
#>    iv    iv2   iv3   iv4      dv
#>    <fct> <fct> <fct> <fct> <int>
#>  1 a     g     i     k         0
#>  2 a     g     i     k         0
#>  3 a     g     i     k         1
#>  4 a     g     i     k         0
#>  5 a     g     i     k         1
#>  6 a     g     i     k         1
#>  7 a     g     i     k         0
#>  8 a     g     i     k         0
#>  9 a     g     i     k         1
#> 10 a     g     i     k         1
#> # ℹ 386 more rows

## Models arranged in descending order of AIC by default.
d |> comp_glm(dv, iv, iv2, iv3, iv4)
#> ______________________
#> Compare Nested GLMs: -
#> 
#> # A tibble: 4 × 10
#>   f_rhs .glm       null.deviance df.null logLik   AIC   BIC deviance df.residual
#> * <chr> <named li>         <dbl>   <int>  <dbl> <dbl> <dbl>    <dbl>       <int>
#> 1 iv2   <glm>               479.     395  -236.  476.  484.     472.         394
#> 2 iv3   <glm>               479.     395  -229.  461.  469.     457.         394
#> 3 iv4   <glm>               479.     395  -227.  461.  473.     455.         393
#> 4 iv    <glm>               479.     395  -223.  459.  483.     447.         390
#> # ℹ 1 more variable: nobs <int>

## Arrange models by formula right-hand side 
(comps <- d |> comp_glm(dv, iv, iv2, iv3, iv4, .arrange_by = f_rhs))
#> ______________________
#> Compare Nested GLMs: -
#> 
#> # A tibble: 4 × 10
#>   f_rhs .glm       null.deviance df.null logLik   AIC   BIC deviance df.residual
#> * <chr> <named li>         <dbl>   <int>  <dbl> <dbl> <dbl>    <dbl>       <int>
#> 1 iv    <glm>               479.     395  -223.  459.  483.     447.         390
#> 2 iv2   <glm>               479.     395  -236.  476.  484.     472.         394
#> 3 iv3   <glm>               479.     395  -229.  461.  469.     457.         394
#> 4 iv4   <glm>               479.     395  -227.  461.  473.     455.         393
#> # ℹ 1 more variable: nobs <int>

## Inspect components of .glm list-column
lapply(comps$.glm, formula)
#> $iv
#> dv ~ iv
#> <environment: 0x5634ad6ca480>
#> 
#> $iv2
#> dv ~ iv2
#> <environment: 0x5634ad737818>
#> 
#> $iv3
#> dv ~ iv3
#> <environment: 0x5634ad7a1c18>
#> 
#> $iv4
#> dv ~ iv4
#> <environment: 0x5634ad80c488>
#> 

lapply(comps$.glm, summary)
#> $iv
#> 
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.06062    0.24630   0.246 0.805570    
#> ivb         -0.68633    0.35693  -1.923 0.054493 .  
#> ivc         -0.75377    0.35895  -2.100 0.035733 *  
#> ivd         -1.20006    0.37837  -3.172 0.001516 ** 
#> ive         -1.37281    0.38900  -3.529 0.000417 ***
#> ivf         -2.19225    0.46953  -4.669 3.03e-06 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 478.96  on 395  degrees of freedom
#> Residual deviance: 446.76  on 390  degrees of freedom
#> AIC: 458.76
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> 
#> $iv2
#> 
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -0.6035     0.1487  -4.060 4.91e-05 ***
#> iv2h         -0.5917     0.2245  -2.635  0.00841 ** 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 478.96  on 395  degrees of freedom
#> Residual deviance: 471.90  on 394  degrees of freedom
#> AIC: 475.9
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> 
#> $iv3
#> 
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -0.4097     0.1451  -2.823  0.00476 ** 
#> iv3j         -1.0608     0.2330  -4.552  5.3e-06 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 478.96  on 395  degrees of freedom
#> Residual deviance: 457.08  on 394  degrees of freedom
#> AIC: 461.08
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> 
#> $iv4
#> 
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -0.2744     0.1757  -1.562   0.1183    
#> iv4l         -0.6313     0.2604  -2.424   0.0154 *  
#> iv4m         -1.3906     0.2958  -4.701 2.59e-06 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 478.96  on 395  degrees of freedom
#> Residual deviance: 454.67  on 393  degrees of freedom
#> AIC: 460.67
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> 

## Convert to binomial data using binom_contingency()
(d <- d |> binom_contingency(dv))
#> _____________________________
#> Binomial Contingency Table: -
#> 
#> # A tibble: 6 × 6
#>   iv    iv2   iv3   iv4      pn    qn
#> * <fct> <fct> <fct> <fct> <int> <int>
#> 1 a     g     i     k        34    32
#> 2 b     h     i     k        23    43
#> 3 c     g     i     l        22    44
#> 4 d     h     j     l        16    50
#> 5 e     g     j     m        14    52
#> 6 f     h     j     m         7    59

(comps <- d |> comp_glm(cbind(pn, qn), iv, iv2, iv3, iv4))
#> ______________________
#> Compare Nested GLMs: -
#> 
#> # A tibble: 4 × 10
#>   f_rhs .glm       null.deviance df.null logLik   AIC   BIC deviance df.residual
#> * <chr> <named li>         <dbl>   <int>  <dbl> <dbl> <dbl>    <dbl>       <int>
#> 1 iv2   <glm>               32.2       5  -25.6  55.2  54.8 2.51e+ 1           4
#> 2 iv3   <glm>               32.2       5  -18.2  40.4  39.9 1.03e+ 1           4
#> 3 iv4   <glm>               32.2       5  -17.0  39.9  39.3 7.91e+ 0           3
#> 4 iv    <glm>               32.2       5  -13.0  38.0  36.8 2.26e-14           0
#> # ℹ 1 more variable: nobs <int>

## Inspect components of .glm list-column
lapply(comps$.glm, formula)
#> $iv2
#> cbind(pn, qn) ~ iv2
#> <environment: 0x5634afea96c0>
#> 
#> $iv3
#> cbind(pn, qn) ~ iv3
#> <environment: 0x5634aff1b718>
#> 
#> $iv4
#> cbind(pn, qn) ~ iv4
#> <environment: 0x5634aff8d808>
#> 
#> $iv
#> cbind(pn, qn) ~ iv
#> <environment: 0x5634afe2ec80>
#> 

lapply(comps$.glm, summary)
#> $iv2
#> 
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -0.6035     0.1487  -4.060 4.91e-05 ***
#> iv2h         -0.5917     0.2245  -2.635  0.00841 ** 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 32.209  on 5  degrees of freedom
#> Residual deviance: 25.148  on 4  degrees of freedom
#> AIC: 55.172
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> 
#> $iv3
#> 
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -0.4097     0.1451  -2.823  0.00476 ** 
#> iv3j         -1.0608     0.2330  -4.552  5.3e-06 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 32.209  on 5  degrees of freedom
#> Residual deviance: 10.327  on 4  degrees of freedom
#> AIC: 40.352
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> 
#> $iv4
#> 
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -0.2744     0.1757  -1.562   0.1183    
#> iv4l         -0.6313     0.2604  -2.424   0.0154 *  
#> iv4m         -1.3906     0.2958  -4.701 2.59e-06 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 32.2088  on 5  degrees of freedom
#> Residual deviance:  7.9103  on 3  degrees of freedom
#> AIC: 39.935
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> 
#> $iv
#> 
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.06062    0.24630   0.246 0.805570    
#> ivb         -0.68633    0.35693  -1.923 0.054493 .  
#> ivc         -0.75377    0.35895  -2.100 0.035733 *  
#> ivd         -1.20006    0.37837  -3.172 0.001516 ** 
#> ive         -1.37281    0.38900  -3.529 0.000417 ***
#> ivf         -2.19225    0.46954  -4.669 3.03e-06 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 3.2209e+01  on 5  degrees of freedom
#> Residual deviance: 2.2649e-14  on 0  degrees of freedom
#> AIC: 38.025
#> 
#> Number of Fisher Scoring iterations: 3
#> 
#> 

rm(comps, d)