Compare Series of Nested GLMs
comp_glm.Rd
Compare a series of nested Bernoulli or binomial GLMs by supplying data, a dependent variable and a list of terms representing the right-hand side of each of a series of model formulae.
Arguments
- .data
a data frame, or a data frame extension (e.g. a
tibble
).- .dep_var
<
data-masking
> quoted name of the binary dependent variable to be used as the LHS of the model formula; should be numeric with values of 0 and 1, or a two-column matrix with the columns giving the numbers of successes and failures e.g.,cbind(pn, qn)
.- ...
<
dynamic-dots
> the RHS of any number of model formulae to be compared, based on independent variables in.data
.- .family
a description of the error distribution and link function to be used in the model. Can be a character string naming a family function, a family function or the result of a call to a family function; default
binomial
.- .arrange_by
<
data-masking
> quoted name of a column for ordering results. Usedesc
to sort by a variable in descending order; defaultdesc(AIC)
.
Value
An object of classes "comp_glm"
, "announce"
, inheriting from tibble
and hence
"data.frame"
, with columns including the RHS of the model formula
, the glm
object and
eight further goodness-of-fit measures output by glance.glm()
, as follows: -
- f_rhs
The right-hand side of the formula as supplied in the
...
argument.- .glm
A list-column containing the glm model objects.
- AIC
Akaike's Information Criterion.
- BIC
Bayesian Information Criterion.
- deviance
Deviance of the model.
- df.null
Degrees of freedom for the null model.
- df.residual
Residual degrees of freedom.
- logLik
The log-likelihood of the model.
- nobs
Number of observations.
- null.deviance
Deviance of the null model.
Details
comp_glm()
builds the formulas from the dependent variable and the formula right-hand side, calls
glm
, and saves the resulting objects of class "glm"
as a list column in a
tibble
, together with model fit information obtained using
glance.glm
from the broom package. The output may be ordered by a
selected column, otherwise by default in descending order of Akaike's Information Criterion (AIC).
comp_glm()
may be used conveniently to compare a series of related binomial GLMs based on different
groupings of factors added to .data
using add_grps
.
Note
It is the user's responsibility to check models are suitably nested to ensure meaningful comparisons.
See also
add_grps
, formula
, glance.glm
and
glm
.
Other comp_glm:
anova_tbl()
,
summanov()
,
univ_anova()
Examples
## Simulate Bernoulli data
(d <- list(
iv2 = list(g = c("a", "c", "e"), h = c("b", "d", "f")),
iv3 = list(i = c("a", "b", "c"), j = c("d", "e", "f")),
iv4 = list(k = c("a", "b"), l = c("c", "d"), m = c("e", "f"))
) |> add_grps(bernoulli_data(levels = 6), iv, .key = _))
#> ___________________________
#> Simulated Bernoulli Data: -
#>
#> # A tibble: 396 × 5
#> iv iv2 iv3 iv4 dv
#> <fct> <fct> <fct> <fct> <int>
#> 1 a g i k 0
#> 2 a g i k 1
#> 3 a g i k 0
#> 4 a g i k 0
#> 5 a g i k 0
#> 6 a g i k 0
#> 7 a g i k 0
#> 8 a g i k 0
#> 9 a g i k 0
#> 10 a g i k 1
#> # ℹ 386 more rows
## Models arranged in descending order of AIC by default.
d |> comp_glm(dv, iv, iv2, iv3, iv4)
#> ______________________
#> Compare Nested GLMs: -
#>
#> # A tibble: 4 × 10
#> f_rhs .glm null.deviance df.null logLik AIC BIC deviance df.residual
#> * <chr> <named li> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 iv2 <glm> 486. 395 -241. 486. 494. 482. 394
#> 2 iv3 <glm> 486. 395 -231. 466. 474. 462. 394
#> 3 iv4 <glm> 486. 395 -227. 461. 473. 455. 393
#> 4 iv <glm> 486. 395 -224. 461. 485. 449. 390
#> # ℹ 1 more variable: nobs <int>
## Arrange models by formula right-hand side
(comps <- d |> comp_glm(dv, iv, iv2, iv3, iv4, .arrange_by = f_rhs))
#> ______________________
#> Compare Nested GLMs: -
#>
#> # A tibble: 4 × 10
#> f_rhs .glm null.deviance df.null logLik AIC BIC deviance df.residual
#> * <chr> <named li> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 iv <glm> 486. 395 -224. 461. 485. 449. 390
#> 2 iv2 <glm> 486. 395 -241. 486. 494. 482. 394
#> 3 iv3 <glm> 486. 395 -231. 466. 474. 462. 394
#> 4 iv4 <glm> 486. 395 -227. 461. 473. 455. 393
#> # ℹ 1 more variable: nobs <int>
## Inspect components of .glm list-column
lapply(comps$.glm, formula)
#> $iv
#> dv ~ iv
#> <environment: 0x561cdda2f5e0>
#>
#> $iv2
#> dv ~ iv2
#> <environment: 0x561cdda9f428>
#>
#> $iv3
#> dv ~ iv3
#> <environment: 0x561cddb01b00>
#>
#> $iv4
#> dv ~ iv4
#> <environment: 0x561cddb6a9e0>
#>
lapply(comps$.glm, summary)
#> $iv
#>
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.06062 0.24630 0.246 0.805570
#> ivb -0.36601 0.35027 -1.045 0.296063
#> ivc -0.89353 0.36387 -2.456 0.014063 *
#> ivd -1.11923 0.37402 -2.992 0.002768 **
#> ive -1.28440 0.38332 -3.351 0.000806 ***
#> ivf -2.36321 0.49388 -4.785 1.71e-06 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 485.82 on 395 degrees of freedom
#> Residual deviance: 448.64 on 390 degrees of freedom
#> AIC: 460.64
#>
#> Number of Fisher Scoring iterations: 4
#>
#>
#> $iv2
#>
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.6257 0.1491 -4.195 2.73e-05 ***
#> iv2h -0.4329 0.2206 -1.963 0.0497 *
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 485.82 on 395 degrees of freedom
#> Residual deviance: 481.94 on 394 degrees of freedom
#> AIC: 485.94
#>
#> Number of Fisher Scoring iterations: 4
#>
#>
#> $iv3
#>
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.3469 0.1443 -2.404 0.0162 *
#> iv3j -1.0907 0.2310 -4.721 2.35e-06 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 485.82 on 395 degrees of freedom
#> Residual deviance: 462.26 on 394 degrees of freedom
#> AIC: 466.26
#>
#> Number of Fisher Scoring iterations: 4
#>
#>
#> $iv4
#>
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.1214 0.1744 -0.696 0.48650
#> iv4l -0.8216 0.2607 -3.151 0.00162 **
#> iv4m -1.5436 0.2950 -5.232 1.68e-07 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 485.82 on 395 degrees of freedom
#> Residual deviance: 454.79 on 393 degrees of freedom
#> AIC: 460.79
#>
#> Number of Fisher Scoring iterations: 4
#>
#>
## Convert to binomial data using binom_contingency()
(d <- d |> binom_contingency(dv))
#> _____________________________
#> Binomial Contingency Table: -
#>
#> # A tibble: 6 × 6
#> iv iv2 iv3 iv4 pn qn
#> * <fct> <fct> <fct> <fct> <int> <int>
#> 1 a g i k 34 32
#> 2 b h i k 28 38
#> 3 c g i l 20 46
#> 4 d h j l 17 49
#> 5 e g j m 15 51
#> 6 f h j m 6 60
(comps <- d |> comp_glm(cbind(pn, qn), iv, iv2, iv3, iv4))
#> ______________________
#> Compare Nested GLMs: -
#>
#> # A tibble: 4 × 10
#> f_rhs .glm null.deviance df.null logLik AIC BIC deviance df.residual
#> * <chr> <named l> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 iv2 <glm> 37.2 5 -29.6 63.3 62.9 3.33e+ 1 4
#> 2 iv3 <glm> 37.2 5 -19.8 43.6 43.2 1.36e+ 1 4
#> 3 iv4 <glm> 37.2 5 -16.1 38.2 37.5 6.15e+ 0 3
#> 4 iv <glm> 37.2 5 -13.0 38.0 36.8 -7.11e-15 0
#> # ℹ 1 more variable: nobs <int>
## Inspect components of .glm list-column
lapply(comps$.glm, formula)
#> $iv2
#> cbind(pn, qn) ~ iv2
#> <environment: 0x561cdb3e31f8>
#>
#> $iv3
#> cbind(pn, qn) ~ iv3
#> <environment: 0x561cdb301468>
#>
#> $iv4
#> cbind(pn, qn) ~ iv4
#> <environment: 0x561cdb26c210>
#>
#> $iv
#> cbind(pn, qn) ~ iv
#> <environment: 0x561cdb45efd8>
#>
lapply(comps$.glm, summary)
#> $iv2
#>
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.6257 0.1491 -4.195 2.73e-05 ***
#> iv2h -0.4329 0.2206 -1.963 0.0497 *
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 37.176 on 5 degrees of freedom
#> Residual deviance: 33.291 on 4 degrees of freedom
#> AIC: 63.293
#>
#> Number of Fisher Scoring iterations: 4
#>
#>
#> $iv3
#>
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.3469 0.1443 -2.404 0.0162 *
#> iv3j -1.0907 0.2310 -4.721 2.35e-06 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 37.176 on 5 degrees of freedom
#> Residual deviance: 13.616 on 4 degrees of freedom
#> AIC: 43.618
#>
#> Number of Fisher Scoring iterations: 4
#>
#>
#> $iv4
#>
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.1214 0.1744 -0.696 0.48650
#> iv4l -0.8216 0.2607 -3.151 0.00162 **
#> iv4m -1.5436 0.2950 -5.232 1.68e-07 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 37.1759 on 5 degrees of freedom
#> Residual deviance: 6.1501 on 3 degrees of freedom
#> AIC: 38.152
#>
#> Number of Fisher Scoring iterations: 4
#>
#>
#> $iv
#>
#> Call:
#> glm(formula = inject(!!.dep_var ~ !!x), family = .family, data = .data)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.06062 0.24630 0.246 0.805570
#> ivb -0.36601 0.35027 -1.045 0.296063
#> ivc -0.89353 0.36387 -2.456 0.014063 *
#> ivd -1.11923 0.37402 -2.992 0.002768 **
#> ive -1.28440 0.38332 -3.351 0.000806 ***
#> ivf -2.36321 0.49396 -4.784 1.72e-06 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 3.7176e+01 on 5 degrees of freedom
#> Residual deviance: -7.1054e-15 on 0 degrees of freedom
#> AIC: 38.002
#>
#> Number of Fisher Scoring iterations: 3
#>
#>
rm(comps, d)