Skip to contents

The phi correlation coefficient (or mean square contingency coefficient and denoted by \(\phi\) or \(r\phi\)) is a measure of association between two naturally dichotomous variables.

Usage

phi_coef(x)

Arguments

x

a square matrix containing the observations of two binary variables as a two-by-two table of counts.

Value

A numeric between -1 and 1, the \(\phi\) correlation coefficient.

Details

For a two-by-two contingency table \(n_{11}\) \(n_{12}\) \(n_{21}\) \(n_{22}\) the \(\phi\) correlation coefficient is given by: -

$$\displaystyle \phi = \frac{n_{11}n_{22} - n_{12}n_{21}} {\sqrt{(n_{11} + n_{21})(n_{12} + n_{22})(n_{11} + n_{12})(n_{21} + n_{22})}}$$

or equivalently, the determinant of the matrix divided by the (principal) square root of the product of its four marginal sums.

References

Yule, G.U. (1912). On the Methods of Measuring Association Between Two Attributes. J Royal Stat Soc. 75 (6): 579–652. doi:10.2307/2340126 .

See also

Examples

## Example from Wikipedia
twobytwo <- matrix(c(6, 1, 2, 3), nrow = 2, dimnames = rep(list(c("Cat", "Dog")), 2) |>
              setNames(c("Actual", "Predicted")))
addmargins(twobytwo)
#>       Predicted
#> Actual Cat Dog Sum
#>    Cat   6   2   8
#>    Dog   1   3   4
#>    Sum   7   5  12

phi_coef(twobytwo)
#> [1] 0.4780914

## Example from Statology
twobytwo <- matrix(c(4, 8, 9, 4), nrow = 2, dimnames =
              list(Gender = c("Male", "Female"), Party = c("Dem", "Rep")))
addmargins(twobytwo)
#>         Party
#> Gender   Dem Rep Sum
#>   Male     4   9  13
#>   Female   8   4  12
#>   Sum     12  13  25

phi_coef(twobytwo)
#> [1] -0.3589744

rm(twobytwo)