Skip to contents

Cohen's kappa measures the agreement between two raters (or diagnostic tests) who each classify N items into C mutually exclusive categories, typically two categories in the case of diagnostic tests.

print_all() for class "cohens_kappa" first prints the standard htest output, then the matrices of observed and expected numbers with their marginal sums, and lastly the numbers and proportions of observations agreeing.

Usage

cohens_kappa(x, se_method = c("Fleiss", "Cohen"), conf.level = 0.95)

# S3 method for class 'cohens_kappa'
print_all(x, ...)

Arguments

x

a square matrix containing the observations of two raters or results of two diagnostic tests.

se_method

a character string specifying the method of calculating the standard error; must be one of "Fleiss" (default), or "Cohen". You can specify just the initial letter.

conf.level

the confidence level required; default 0.95.

...

further arguments passed to or from other methods.

Value

A list with classes "cohens_kappa" and "htest", containing the following components: -

statistic

\(\kappa\), the kappa coefficient.

parameter

the total number of observations.

conf.int

confidence interval of \(\kappa\) (95% or other specified level).

estimate

the number of agreements observed and the number expected by chance.

stderr

the standard error of \(\kappa\).

observed

the observed counts.

expected

the expected counts under the null hypothesis of zero agreement.

data.name

a character string giving the name of the data.

method

the character string "Cohen's kappa coefficient of agreement" and the standard error method used.

Details

Cohen's kappa coefficient is given by: -

$$\displaystyle \kappa = \frac{p_{o}-p_{e}}{1-p_{e}}$$

where \(p_{o}\) is the proportion of observations in agreement and \(p_{e}\) is the proportion of observations expected to agree by chance.

Cohen's (1960) original approximation to the standard error of \(\kappa\) is given by: -

$$\displaystyle se = \sqrt{\frac{p_{o}(1-p_{o})}{n(1-p_{e})^{2}}}$$

Output returned by cohens_kappa() maybe printed using print_all() to provide additional information, see examples.

Note

Professor Joseph Fleiss et al. (1979) observed: Many human endeavors have been cursed with repeated failures before final success is achieved. The scaling of Mount Everest is one example. The discovery of the Northwest Passage is a second. The derivation of a correct standard error for kappa is a third. Coding the Fleiss et al. (1979) standard error method in R was an endeavour similarly cursed!

References

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educ Psychol Meas, 20, 37–46. doi:10.1177/001316446002000104 .

Fleiss, J.L., Nee, J.C., & Landis, J.R. (1979). Large sample variance of kappa in the case of different sets of raters. Psychol Bull, 86(5), 974–977. https://doi.org/10.1037/0033-2909.86.5.974 doi:10.1037/0033-2909.86.5.974 .

See also

Examples

 ## Two-by-two table for diagnostic test comparison
 (twobytwo <- matrix(c(31, 12, 4, 58), nrow = 2, dimnames = rep(list(c("+ve", "-ve")), 2) |>
               setNames(c("Test1", "Test2"))
             ))
#>      Test2
#> Test1 +ve -ve
#>   +ve  31   4
#>   -ve  12  58

 (ck <- cohens_kappa(twobytwo))
#> 
#> 	Cohen's kappa coefficient of agreement (Fleiss stderr)
#> 
#> data:  twobytwo
#> stderr = 0.073448, number of observations = 105
#> 95 percent confidence interval:
#>  0.5317210 0.8196303
#> sample estimates:
#>     kappa 
#> 0.6756757 
#> 

 ck |> print_all()   
#> 
#> 	Cohen's kappa coefficient of agreement (Fleiss stderr)
#> 
#> data:  twobytwo
#> stderr = 0.073448, number of observations = 105
#> 95 percent confidence interval:
#>  0.5317210 0.8196303
#> sample estimates:
#>     kappa 
#> 0.6756757 
#> 
#> Observed: -
#>      Test2
#> Test1 +ve -ve Sum
#>   +ve  31   4  35
#>   -ve  12  58  70
#>   Sum  43  62 105
#> 
#> Expected: -
#>      Test2
#> Test1      +ve      -ve Sum
#>   +ve 14.33333 20.66667  35
#>   -ve 28.66667 41.33333  70
#>   Sum 43.00000 62.00000 105
#> 
#> Number of agreements: -
#> observed expected 
#> 89.00000 55.66667 
#> 
#> Proportion in agreement: -
#>  observed  expected 
#> 0.8476190 0.5301587 
#> 

 ## Example from Altman et al. (Statistics with Confidence 2nd Edn. 2008. ISBN:978-0-727-91375-3,
 ## p.117), using, as they did, Cohen's approximation to the standard error. 
 (twobytwo <- matrix(c(32, 3, 6, 42), nrow = 2, dimnames = rep(list(c("Yes", "No")), 2) |>
               setNames(c("Parent", "Paediatrician"))
             ))
#>       Paediatrician
#> Parent Yes No
#>    Yes  32  6
#>    No    3 42

 cohens_kappa(twobytwo, se_method = "Cohen") |> print_all()   
#> 
#> 	Cohen's kappa coefficient of agreement (Cohen stderr)
#> 
#> data:  twobytwo
#> stderr = 0.069171, number of observations = 83
#> 95 percent confidence interval:
#>  0.6446565 0.9158024
#> sample estimates:
#>     kappa 
#> 0.7802295 
#> 
#> Observed: -
#>       Paediatrician
#> Parent Yes No Sum
#>    Yes  32  6  38
#>    No    3 42  45
#>    Sum  35 48  83
#> 
#> Expected: -
#>       Paediatrician
#> Parent     Yes      No Sum
#>    Yes 16.0241 21.9759  38
#>    No  18.9759 26.0241  45
#>    Sum 35.0000 48.0000  83
#> 
#> Number of agreements: -
#> observed expected 
#> 74.00000 42.04819 
#> 
#> Proportion in agreement: -
#>  observed  expected 
#> 0.8915663 0.5066047 
#> 

 ## Confidence interval using Fleiss et al.'s standard error for comparison
 cohens_kappa(twobytwo, se_method = "Fleiss") |> _$conf.int
#> [1] 0.6450640 0.9153949
#> attr(,"conf.level")
#> [1] 0.95

 ## Example with three categories from Cohen (1960).
 (threebythree <- matrix(c(88, 10,  2, 14, 40,  6, 18, 10, 12), nrow = 3,
                   dimnames = rep(list(c("Cat1", "Cat2", "Cat3")), 2) |>
                       setNames(c("Judge_B", "Judge_A"))
                 ))
#>        Judge_A
#> Judge_B Cat1 Cat2 Cat3
#>    Cat1   88   14   18
#>    Cat2   10   40   10
#>    Cat3    2    6   12

 cohens_kappa(threebythree, se_method = "Cohen") |> print_all()
#> 
#> 	Cohen's kappa coefficient of agreement (Cohen stderr)
#> 
#> data:  threebythree
#> stderr = 0.054922, number of observations = 200
#> 95 percent confidence interval:
#>  0.3838812 0.5991696
#> sample estimates:
#>     kappa 
#> 0.4915254 
#> 
#> Observed: -
#>        Judge_A
#> Judge_B Cat1 Cat2 Cat3 Sum
#>    Cat1   88   14   18 120
#>    Cat2   10   40   10  60
#>    Cat3    2    6   12  20
#>    Sum   100   60   40 200
#> 
#> Expected: -
#>        Judge_A
#> Judge_B Cat1 Cat2 Cat3 Sum
#>    Cat1   60   36   24 120
#>    Cat2   30   18   12  60
#>    Cat3   10    6    4  20
#>    Sum   100   60   40 200
#> 
#> Number of agreements: -
#> observed expected 
#>      140       82 
#> 
#> Proportion in agreement: -
#> observed expected 
#>     0.70     0.41 
#> 

 ## Using Fleiss et al.'s standard error for comparison
 cohens_kappa(threebythree, se_method = "Fleiss") |> _$conf.int
#> [1] 0.3899498 0.5931011
#> attr(,"conf.level")
#> [1] 0.95

 rm(ck, threebythree, twobytwo)