Collate Data for Plotting Univariable GLM Predictions with Error Bars
glm_plotdata.Rd
glm_plotdata()
outputs data based on predictions from univariable general linear models (GLMs) suitably
collated for easy creation of standardised plots with error bars representing confidence intervals or standard
errors.
Usage
glm_plotdata(object, ...)
# S3 method for class 'binom_contingency'
glm_plotdata(
object,
...,
.ind_var,
.ungroup = NULL,
conf_level = 0.95,
type = c("link", "response")
)
# S3 method for class 'data.frame'
glm_plotdata(
object,
...,
.dep_var,
.ind_var,
.ungroup = NULL,
conf_level = 0.95,
type = c("link", "response")
)
# S3 method for class 'formula'
glm_plotdata(
object,
...,
.family = binomial,
.data,
.ungroup = NULL,
conf_level = 0.95,
type = c("link", "response")
)
# S3 method for class 'glm'
glm_plotdata(object, ..., conf_level = 0.95, type = c("link", "response"))
Arguments
- object
an object from which the odds ratios are to be calculated, which may be a
binom_contingency
table, adata frame
(or a data frame extension e.g., atibble
), aformula
or aglm
.- ...
further arguments passed to or from other methods. Not currently used.
- .ind_var
<
data-masking
> quoted name of the independent variable.- .ungroup
<
data-masking
> quoted name of the column containing the ungrouped levels of.ind_var
, see details; defaultNULL
.- conf_level
the confidence level required for the error bars; default 0.95. If
NA
, error bars are standard error.- type
the type of prediction required. The default is on the scale of the linear predictors; the alternative
"response"
is on the scale of the response variable; default"link"
.- .dep_var
quoted name of the response variable in the data representing the number of successes and failures respectively, see Details; default
cbind(pn, qn)
.- .family
a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See
family
for details of family functions.)- .data
a data frame, or a data frame extension (e.g. a
tibble
).
Value
An object of class "glm_plotdata"
, "announce"
, inheriting from tibble
,
with values on the linear predictor or response scale (depending on type
) in columns as follows: -
- level
Level of the independent variable.
- grouped
Grouped levels of the independent variable (or
NA
if ungrouped).- n
Number of observations.
- obs
Observed values.
- pred
Values predicted by the model.
- lower
Lower extent of error bar.
- upper
Upper extent of error bar.
It also has attributes "conf_level"
, signifying the confidence level, "ind_var"
, the name of the
independent variable, and "type"
(see argument type
).
Details
This function works with univariable binomial GLMs having a numeric
dependent variable of ones and zeros
representing numbers of successes and failures, or a two-column matrix
with the columns giving the numbers of
successes and failures, see glm()
, and an independent variable with multiple levels. Its output may
be plotted conveniently using ggplot()
in package ggplot2;
ParaAnita provides a suitable S3 method ggplot.glm_plotdata()
for
this purpose.
glm_plotdata()
allows exploration of proposed groupings of the levels of the independent variable, such as
obtained using add_grps()
or fct_collapse()
, and will include both the
grouped and ungrouped levels in its output. In such cases, .ind_var
should contain the groupings and the .ungroup
argument should name a column in object
's data containing the ungrouped levels, see examples. The grouped levels
are used as the independent variable in the GLM and are shown within the output object in the column
grouped
while the ungrouped levels are shown in the column level
. If the .ungroup
is NULL
(the default),
levels of .ind_var
will appear in the column level
and the grouped
column in the output will contain
NA
.
If conf_level
is 0.95
(the default) or a similar value, the lower
and upper
columns in the output delimit
the prediction intervals at that confidence level. If conf_level
is NA
, then lower
and upper
are the model predictions ±standard error.
If type = "link"
, then the linear predictors and their confidence intervals or ±standard errors are obtained.
If type = "response"
, then the linear predictors and their confidence intervals or ±standard errors are
transformed back to the response scale using the link inverse function.
Note
Confidence intervals are calculated from the standard errors of the parameter estimates using the quantiles of
the t distribution with n - 1 degrees of freedom, at the probability given by conf_level
. These
confidence intervals are generally more conservative i.e., a little wider, than those obtained by "profiling"
(e.g., using confint.glm
). If the conf_level
argument is NA
, standard
error is shown rather than a confidence interval.
See also
add_grps
, binom_contingency
, formula
,
glm
, and tibble
.
Other plot_model:
Plot_Model
,
glm_plotlist()
,
var_labs()
Examples
(d <- binom_data())
#> __________________________
#> Simulated Binomial Data: -
#>
#> # A tibble: 5 × 3
#> iv pn qn
#> * <fct> <int> <int>
#> 1 a 32 34
#> 2 b 33 33
#> 3 c 25 41
#> 4 d 13 53
#> 5 e 6 60
## ___________________________________________________
## Ungrouped data, 95% Confidence interval (default)
## On linear predictor scale (default)
d |> glm_plotdata(.dep_var = cbind(pn, qn), .ind_var = iv)
#> ________________
#> GLM Plot Data: -
#>
#> # A tibble: 5 × 7
#> level grouped n obs pred lower upper
#> * <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 a NA 66 -0.0606 -0.0606 -0.545 0.424
#> 2 b NA 66 0 0 -0.484 0.484
#> 3 c NA 66 -0.495 -0.495 -0.994 0.00449
#> 4 d NA 66 -1.41 -1.41 -2.01 -0.796
#> 5 e NA 66 -2.30 -2.30 -3.14 -1.46
## On response scale
d |> glm_plotdata(.dep_var = cbind(pn, qn), .ind_var = iv, type = "response")
#> ________________
#> GLM Plot Data: -
#>
#> # A tibble: 5 × 7
#> level grouped n obs pred lower upper
#> * <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 a NA 66 0.485 0.485 0.367 0.604
#> 2 b NA 66 0.5 0.5 0.381 0.619
#> 3 c NA 66 0.379 0.379 0.270 0.501
#> 4 d NA 66 0.197 0.197 0.118 0.311
#> 5 e NA 66 0.0909 0.0909 0.0413 0.188
## ________________________________
## Ungrouped data, standard error
## On linear predictor scale (default)
d |> glm_plotdata(.dep_var = cbind(pn, qn), .ind_var = iv, conf_level = NA)
#> ________________
#> GLM Plot Data: -
#>
#> # A tibble: 5 × 7
#> level grouped n obs pred lower upper
#> * <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 a NA 66 -0.0606 -0.0606 -0.307 0.186
#> 2 b NA 66 0 0 -0.246 0.246
#> 3 c NA 66 -0.495 -0.495 -0.748 -0.241
#> 4 d NA 66 -1.41 -1.41 -1.71 -1.10
#> 5 e NA 66 -2.30 -2.30 -2.73 -1.87
## On response scale
d |> glm_plotdata(.dep_var = cbind(pn, qn), .ind_var = iv, conf_level = NA, type = "response")
#> ________________
#> GLM Plot Data: -
#>
#> # A tibble: 5 × 7
#> level grouped n obs pred lower upper
#> * <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 a NA 66 0.485 0.485 0.424 0.546
#> 2 b NA 66 0.5 0.5 0.439 0.561
#> 3 c NA 66 0.379 0.379 0.321 0.440
#> 4 d NA 66 0.197 0.197 0.153 0.251
#> 5 e NA 66 0.0909 0.0909 0.0612 0.133
(d <- list(iv2 = list(ab = c("a", "b"), cd = c("c", "d"))) |>
add_grps(d, iv, .key = _))
#> __________________________
#> Simulated Binomial Data: -
#>
#> # A tibble: 5 × 4
#> iv iv2 pn qn
#> <fct> <fct> <int> <int>
#> 1 a ab 32 34
#> 2 b ab 33 33
#> 3 c cd 25 41
#> 4 d cd 13 53
#> 5 e e 6 60
## _________________________________________________
## Grouped data, 95% Confidence interval (default)
## On linear predictor scale (default)
d |> glm_plotdata(.dep_var = cbind(pn, qn), .ind_var = iv2, .ungroup = iv)
#> ________________
#> GLM Plot Data: -
#>
#> # A tibble: 5 × 7
#> level grouped n obs pred lower upper
#> * <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 a ab 66 -0.0606 -0.0303 -0.373 0.312
#> 2 b ab 66 0 -0.0303 -0.373 0.312
#> 3 c cd 66 -0.495 -0.906 -1.28 -0.528
#> 4 d cd 66 -1.41 -0.906 -1.28 -0.528
#> 5 e e 66 -2.30 -2.30 -3.14 -1.46
## On response scale
d |> glm_plotdata(.dep_var = cbind(pn, qn), .ind_var = iv2, .ungroup = iv, type = "response")
#> ________________
#> GLM Plot Data: -
#>
#> # A tibble: 5 × 7
#> level grouped n obs pred lower upper
#> * <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 a ab 66 0.485 0.492 0.408 0.577
#> 2 b ab 66 0.5 0.492 0.408 0.577
#> 3 c cd 66 0.379 0.288 0.217 0.371
#> 4 d cd 66 0.197 0.288 0.217 0.371
#> 5 e e 66 0.0909 0.0909 0.0413 0.188
## ______________________________
## Grouped data, standard error
## On linear predictor scale (default)
d |> glm_plotdata(.dep_var = cbind(pn, qn), .ind_var = iv2, .ungroup = iv, conf_level = NA)
#> ________________
#> GLM Plot Data: -
#>
#> # A tibble: 5 × 7
#> level grouped n obs pred lower upper
#> * <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 a ab 66 -0.0606 -0.0303 -0.204 0.144
#> 2 b ab 66 0 -0.0303 -0.204 0.144
#> 3 c cd 66 -0.495 -0.906 -1.10 -0.713
#> 4 d cd 66 -1.41 -0.906 -1.10 -0.713
#> 5 e e 66 -2.30 -2.30 -2.73 -1.87
## On response scale
d |> glm_plotdata(
.dep_var = cbind(pn, qn), .ind_var = iv2,
.ungroup = iv, conf_level = NA, type = "response"
)
#> ________________
#> GLM Plot Data: -
#>
#> # A tibble: 5 × 7
#> level grouped n obs pred lower upper
#> * <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 a ab 66 0.485 0.492 0.449 0.536
#> 2 b ab 66 0.5 0.492 0.449 0.536
#> 3 c cd 66 0.379 0.288 0.250 0.329
#> 4 d cd 66 0.197 0.288 0.250 0.329
#> 5 e e 66 0.0909 0.0909 0.0612 0.133
rm(d)