'Function Factory' for Box-Cox Transformation of Data
boxcox3.Rd
Function factory to create functions that take \(\lambda\) as an argument for performing the Box-Cox transformation on a given dataset. Adapted from Wickham (2019) Section 10.4.4 Exercises.
Value
Returns a function
taking a single argument \(\lambda\) that performs the Box-Cox
transformation on data x.
Details
A numeric vector containing the data to be transformed is provided as an argument to this 'function factory', which returns a function performing the Box-Cox transformation on those data for any given value of \(\lambda\). The Box-Cox transformation takes the following form: -
if \(\lambda \ne 0\) $$y(\lambda) = \displaystyle \frac{y^\lambda - 1}{\lambda}$$
if \(\lambda = 0\) $$y(\lambda) = \log(y)$$
If labile_data
is TRUE
, data
are represented in the boxcox3()
function environment as a
quosure
, and functions returned by will automatically refer to the current version of
data
in its original environment
, usually the calling environment i.e., typically but not
necessarily the global environment. If labile_data
is FALSE
, returned functions refer to a copy of data
saved in the function environment at the time of execution of boxcox3()
, and will not reflect any subsequent
changes to the original data
.
References
Wickham, Hadley (2019) Advanced R 2nd edition. CRC Press. adv-r.hadley.nz
See also
Other boxcox:
opt_bc()
Examples
## Create skewed data
(d <- rlnorm(20))
#> [1] 0.24658623 1.29087083 0.08739968 0.99444420 1.86181668 3.15318044
#> [7] 0.16173151 0.78088663 0.78333126 0.75374177 0.57481939 1.87570022
#> [13] 7.88549421 0.19573582 1.66933768 0.15520453 0.59332527 0.94875763
#> [19] 1.72115632 0.40088734
## Calculate skewness using BitsnBobs::skew()
d |> skew()
#> [1] 3.16996
## Box-Cox function for these data
bc_func <- boxcox3(d)
## Box-Cox transform data with various values of lambda
bc_func(-1)
#> [1] -3.055376440 0.225329154 -10.441688958 -0.005586835 0.462890192
#> [6] 0.682859887 -5.183086998 -0.280595625 -0.276599123 -0.326714319
#> [11] -0.739676861 0.466865767 0.873184867 -4.108927002 0.400960027
#> [16] -5.443110962 -0.685416164 -0.054009971 0.418995248 -1.494466372
bc_func(0)
#> [1] -1.400043517 0.255317055 -2.437263611 -0.005571287 0.621552721
#> [6] 1.148411606 -1.821817661 -0.247325302 -0.244199607 -0.282705449
#> [11] -0.553699384 0.628982042 2.065024895 -1.630989402 0.512426950
#> [16] -1.863011492 -0.522012515 -0.052601910 0.542996343 -0.914074827
bc_func(1)
#> [1] -0.753413767 0.290870833 -0.912600316 -0.005555796 0.861816681
#> [6] 2.153180437 -0.838268489 -0.219113372 -0.216668740 -0.246258229
#> [11] -0.425180605 0.875700223 6.885494207 -0.804264183 0.669337681
#> [16] -0.844795471 -0.406674730 -0.051242372 0.721156318 -0.599112655
bc_func(2)
#> [1] -0.469597615 0.333173754 -0.496180648 -0.005540362 1.233180677
#> [6] 4.471273434 -0.486921459 -0.195108037 -0.193196068 -0.215936672
#> [11] -0.334791332 1.259125664 30.590509448 -0.480843745 0.893344147
#> [16] -0.487955777 -0.323982562 -0.049929481 0.981189535 -0.419644668
## bc_func(0) same as log(d)
identical(bc_func(0), log(d))
#> [1] TRUE
seq(-3, 3, 1) |> ## Create a sequence from -3 to 3
set_names(\(x) paste("lambda", x)) |> ## Name sequence vector using rlang::set_names()
print_lf() |> ## Print with line feed
lapply(bc_func) |> ## Box-Cox transform data using each lambda value
print_lf() |> ## in sequence and print the named list
map_dbl(skewness) |> ## Calculate skewness for each element of the list
print_lf() |> ## and print the numeric vector
abs() |> ## Absolute skewness...
which.min() ## ...which lambda gives minimum?
#> lambda -3 lambda -2 lambda -1 lambda 0 lambda 1 lambda 2 lambda 3
#> -3 -2 -1 0 1 2 3
#>
#> $`lambda -3`
#> [1] -2.189835e+01 1.783695e-01 -4.989524e+02 -5.618106e-03 2.816836e-01
#> [6] 3.227009e-01 -7.846097e+01 -3.666937e-01 -3.601601e-01 -4.450813e-01
#> [11] -1.421697e+00 2.828220e-01 3.326535e-01 -4.411626e+01 2.616784e-01
#> [16] -8.882575e+01 -1.262547e+00 -5.697956e-02 2.679574e-01 -4.840491e+00
#>
#> $`lambda -2`
#> [1] -7.723039034 0.199942540 -64.956123103 -0.005602442 0.355756527
#> [6] 0.449711074 -18.615282412 -0.319962578 -0.314852660 -0.380085442
#> [11] -1.013237790 0.357883945 0.491958961 -12.550567555 0.320575555
#> [16] -20.256839433 -0.920313822 -0.055468509 0.331216739 -2.611181240
#>
#> $`lambda -1`
#> [1] -3.055376440 0.225329154 -10.441688958 -0.005586835 0.462890192
#> [6] 0.682859887 -5.183086998 -0.280595625 -0.276599123 -0.326714319
#> [11] -0.739676861 0.466865767 0.873184867 -4.108927002 0.400960027
#> [16] -5.443110962 -0.685416164 -0.054009971 0.418995248 -1.494466372
#>
#> $`lambda 0`
#> [1] -1.400043517 0.255317055 -2.437263611 -0.005571287 0.621552721
#> [6] 1.148411606 -1.821817661 -0.247325302 -0.244199607 -0.282705449
#> [11] -0.553699384 0.628982042 2.065024895 -1.630989402 0.512426950
#> [16] -1.863011492 -0.522012515 -0.052601910 0.542996343 -0.914074827
#>
#> $`lambda 1`
#> [1] -0.753413767 0.290870833 -0.912600316 -0.005555796 0.861816681
#> [6] 2.153180437 -0.838268489 -0.219113372 -0.216668740 -0.246258229
#> [11] -0.425180605 0.875700223 6.885494207 -0.804264183 0.669337681
#> [16] -0.844795471 -0.406674730 -0.051242372 0.721156318 -0.599112655
#>
#> $`lambda 2`
#> [1] -0.469597615 0.333173754 -0.496180648 -0.005540362 1.233180677
#> [6] 4.471273434 -0.486921459 -0.195108037 -0.193196068 -0.215936672
#> [11] -0.334791332 1.259125664 30.590509448 -0.480843745 0.893344147
#> [16] -0.487955777 -0.323982562 -0.049929481 0.981189535 -0.419644668
#>
#> $`lambda 3`
#> [1] -0.328335460 0.383679798 -0.333110793 -0.005524986 1.817909798
#> [6] 10.116881427 -0.331923192 -0.174609295 -0.173113926 -0.190593069
#> [11] -0.270023236 1.866394933 163.109354770 -0.330833623 1.217307926
#> [16] -0.332087121 -0.263709604 -0.048661441 1.366239150 -0.311857710
#>
#>
#> lambda -3 lambda -2 lambda -1 lambda 0 lambda 1 lambda 2
#> -4.08779986 -3.36358910 -2.00565188 -0.04650773 3.16995968 4.27696463
#> lambda 3
#> 4.44244689
#>
#> lambda 0
#> 4
## Usually, lambda 0 has least absolute skewness as data were sampled from lognormal distribution
rm(d, bc_func)