Association tests can be conducted for contingency tables in cases a) where both variables are ordinal and b) where one variable is ordinal and one variable is nominal.
Ordinal × ordinal tables
The linear-by-linear test can be used to test the association among variables in a contingency table with ordered categories (Agresti, 2007). This test or a test with a similar function is sometimes called “ordinal chi-square” test. In Agresti, the method used is called the linear-by-linear association model.
In R, the test can be performed by permutation test with the coin package.
Association in ordinal tables can also be tested with
Kendall’s tau correlation. tau-b can be used for square tables
(with the same number of rows and columns). And tau-c can be used for
non-square tables.
Nominal × ordinal tables
An association test can also be performed on a contingency table with one ordered categorical variable and one non-ordered categorical variable.
The Cochran–Armitage test is a special case of this when the nominal variable has only two categories. The Cochran–Armitage test, however, can be “extended” in some software implementations to cases where the nominal variable has more than two categories.
Higher-dimensional tables
Most of the examples in this chapter use two-dimensional tables, although the coin package can handle three-dimensional tables. For three-dimensional analyses, it may be easier to use data in the long format, as is shown in the final example in this chapter.
Packages used in this chapter
The packages used in this chapter include:
• coin
• DescTools
• vcdExtra
• PMCMRplus
• rcompanion
The following commands will install these packages if they are not already installed:
if(!require(coin)){install.packages("coin")}
if(!require(DescTools)){install.packages("DescTools")}
if(!require(vcdExtra)){install.packages("vcdExtra")}
if(!require(PMCMRplus)){install.packages("PMCMRplus")}
if(!require(rcompanion)){install.packages("rcompanion")}
Tests for ordered contingency tables
Linear-by-linear test
The lbl_test function in the coin package will automatically treat the variables as ordered. By default, the levels are equally spaced, but the scores option can be used to specify the distance between the levels of each variable.
The null hypothesis for the linear-by-linear test is that there is no association among the variables in the table. A significant p-value suggests that there is an association. This is similar to a chi-square test, except that the categories are ordered in nature.
Example of linear-by-linear test
For this hypothetical example, farmers were surveyed about how often they use some best management practice. Responses are organized according to the size of the operation. Both variables in the contingency table are ordered categories.
Note the placement of the initial quote mark in the Input function. And that the read.ftable function is used in a second step.
Input =(
"Adopt Always Sometimes Never
Size
Hobbiest 0 1 5
Mom-and-pop 2 3 4
Small 4 4 4
Medium 3 2 0
Large 2 0 0
")
Tabla = as.table(read.ftable(textConnection(Input)))
Tabla
sum (Tabla)
prop.table(Tabla,
margin = NULL) ### proportion in the
table
Size Always Sometimes Never
Hobbiest 0.00000000 0.02941176 0.14705882
Mom-and-pop 0.05882353 0.08823529 0.11764706
Small 0.11764706 0.11764706 0.11764706
Medium 0.08823529 0.05882353 0.00000000
Large 0.05882353 0.00000000 0.00000000
spineplot(Tabla)
Spine plot for each level of Size showing the proportion of Always
(darkest gray), Sometimes (medium gray), and Never (lightest gray). The width
of the Size bars indicates the total number of counts in that bar.
library(coin)
LxL = lbl_test(Tabla)
LxL
Asymptotic Linear-by-Linear Association Test
data: Adopt (ordered) by Size (Hobbiest < Mom-and-pop < Small <
Medium < Large)
Z = -3.3276, p-value = 0.0008761
### Chi-square value, df = 1
statistic(LxL)^2
11.07262
The linear-by-linear test can also be conducted with the CMHtest function in the vcdExtra package. It is reported as cor in the output.
library(vcdExtra)
CMHtest(Tabla)
Cochran-Mantel-Haenszel Statistics for Size by Adopt
AltHypothesis Chisq Df Prob
cor Nonzero correlation 11.073 1 0.00087612
Compare to chi-square test without ordered categories
ChiSq = chisq_test(Tabla)
ChiSq
Asymptotic Pearson Chi-Squared Test
data: Adopt by Size (Hobbiest, Mom-and-pop, Small, Medium, Large)
chi-squared = 13.495, df = 8, p-value = 0.09593
Kendall’s tau test for ordinal tables
Kendall’s tau can be used as test of association for ordinal tables. If the table has an unequal number of rows and columns, Kendall’s tau-c should be used rather than Kendall’s tau-b.
Kendall’s tau-b
library(DescTools)
KendallTauB(Tabla)
[1] -0.4960301
KendallTauB(Tabla, conf.level=0.95)
tau_b lwr.ci upr.ci
-0.4960301 -0.6967707 -0.2952895
A p-value for the association statistic can derived from the cor.test function. However, the table needs to be converted to an appropriate numeric format first. Note that the results are similar to those from the linear-by-linear test.
library(vcdExtra)
Long = expand.table(Tabla)
Long$Size = factor(Long$Size, ordered=TRUE,
levels = c("Hobbiest", "Mom-and-pop",
"Small", "Medium",
"Large"))
Long$Adopt = factor(Long$Adopt, ordered=TRUE,
levels = c("Always", "Sometimes",
"Never"))
cor.test(~ as.numeric(Long$Size) + as.numeric(Long$Adopt),
method="kendall")
Kendall's rank correlation tau
z = -3.3196, p-value = 0.0009015
tau
-0.4960301
A p-value for the association statistic can also be derived from the confidence interval reported by KendallTauB. The code is adopted from: stackoverflow.com/questions/51918648/p-value-for-kendalls-tau-c-in-r . Note that for this example, the results are somewhat different from that from cor.test, but in my experience in most cases they tend to be similar.
ConfLevel = 0.95
TAUB = KendallTauB(Tabla, conf.level=ConfLevel)
Q = abs(qnorm((1-ConfLevel)/2))
SE = abs((TAUB[3] - TAUB[1])) / Q
pValue = 2*pnorm(abs(TAUB[1]), mean=0, sd=SE, lower.tail=FALSE)
Z = sign(TAUB[1]) * qnorm(pValue/2, lower.tail=FALSE)
data.frame(tau = TAUB[1], SE=SE, lower=TAUB[2], upper=TAUB[3], z=Z,
pValue=pValue)
tau SE lower upper z pValue
tau_b -0.4960301 0.1024206 -0.6967707 -0.2952895 -4.843072 1.27847e-06
Kendall’s tau-c
library(DescTools)
StuartTauC(Tabla)
[1] -0.5242215
StuartTauC(Tabla, conf.level=0.95)
tauc lwr.ci upr.ci
-0.5242215 -0.7477650 -0.3006779
A p-value for the association statistic can be derived from the confidence interval reported by StuartTauC. The code is adopted from: stackoverflow.com/questions/51918648/p-value-for-kendalls-tau-c-in-r .
ConfLevel = 0.95
TAUC = StuartTauC(Tabla, conf.level=ConfLevel)
Q = abs(qnorm((1-ConfLevel)/2))
SE = abs((TAUC[3] - TAUC[1])) / Q
pValue = 2*pnorm(abs(TAUC[1]), mean=0, sd=SE, lower.tail=FALSE)
Z = sign(TAUC[1]) * qnorm(pValue/2, lower.tail=FALSE)
data.frame(tau = TAUC[1], SE=SE, lower=TAUC[2], upper=TAUC[3], z=Z,
pValue=pValue)
tau SE lower upper z pValue
tauc -0.5242215 0.1140549 -0.747765 -0.3006779 -4.596219 4.302267e-06
Jonckheere–Terpstra and Cuzick tests
The Jonckheere–Terpstra test is usually used in cases where there is an ordinal independent variable and a continuous dependent variable. However, because it can handle ties in the dependent variable, it can be used when both variables are ordinal.
The implementation of the Cuzick test used here does not account for ties when determining the p-value.
Note that both of these tests explicitly treat one variable as an independent variable and one variable as a dependent variable.
Jonckheere–Terpstra test
Note that for this example, the results are similar to those from the Kendall tau-b test.
library(vcdExtra)
Long = expand.table(Tabla)
Long$Size = factor(Long$Size, ordered=TRUE,
levels = c("Hobbiest", "Mom-and-pop",
"Small", "Medium",
"Large"))
Long$Adopt = factor(Long$Adopt, ordered=TRUE,
levels = c("Always", "Sometimes",
"Never"))library(PMCMRplus)
library(PMCMRplus)
jonckheereTest(Long$Size, Long$Adopt)
Jonckheere-Terpstra test
z = -3.3196, p-value = 0.0009015
Cuzick test
library(PMCMRplus)
cuzickTest(Long$Size, Long$Adopt)
Cuzick trend test
z = -3.1336, p-value = 0.001727
Warning message:
Ties are present. 'cuzickTest()' does not correct for ties
Extended Cochran–Armitage test
The chisq_test function in the coin package can be used to conduct a test of association for a contingency table with one ordered categorical variable and one nominal categorical variable. The Cochran–Armitage test is a special case of this where the non-ordered variable has only two categories.
The scores option is used to indicate which variable should be treated as ordered, and the spacing of the levels of this variable.
Example of extended Cochran–Armitage test 1
For this hypothetical example, students were surveyed about how often they eat breakfast and how they travel to school. Breakfast is treated as ordered, and Travel is treated as nominal.
Note the placement of the initial quote mark in the Input function.
Input =(
"Breakfast Never Rarely Sometimes Often Always
Travel
Walk 6 9 6 5 2
Bus 2 5 8 5 3
Drive 2 4 6 8 8
")
Tabla = as.table(read.ftable(textConnection(Input)))
Tabla
sum (Tabla)
prop.table(Tabla,
margin = 1) ### proportion in each row
Breakfast
Travel Never Rarely Sometimes
Often Always
Walk 0.21428571 0.32142857 0.21428571 0.17857143 0.07142857
Bus 0.08695652 0.21739130 0.34782609 0.21739130 0.13043478
Drive 0.07142857 0.14285714 0.21428571 0.28571429 0.28571429
library(coin)
spineplot(Tabla)
Spine plot for each category of Travel showing the proportion of Breakfast
(where the darkest gray is Never and the lightest gray is Always).
library(coin)
Test = chisq_test(Tabla,
scores = list("Breakfast" = c(-2, -1, 0, 1, 2)))
Test
Asymptotic Generalized Pearson Chi-Squared Test
data: Breakfast (ordered) by Travel (Walk, Bus, Drive)
chi-squared = 8.6739, df = 2, p-value = 0.01308
Post-hoc analysis
For a contingency table with one ordered variable and one nominal variable, it makes sense to analyze the component tables with pairwise comparisons of the levels of the nominal variable.
Results of the compact letter display will be easier to interpret if the table is ordered so that the first row, in this case, ranks highest or lowest in the ordered variable, and so on.
library(rcompanion)
PT = pairwiseOrdinalIndependence(Tabla,
compare="row")
PT
Comparison p.value p.adjust
1 Walk : Bus 0.12800 0.1570
2 Walk : Drive 0.00462 0.0139
3 Bus : Drive 0.15700 0.1570
library(rcompanion)
cldList(p.value ~ Comparison,
data = PT,
threshold = 0.05)
Group Letter MonoLetter
1 Walk a a
2 Bus ab ab
3 Drive b b
Example of extended Cochran–Armitage test 2
This example revisits the example from Kruskal–Wallis chapter. Likert is treated as ordered, and Speaker is treated as non-ordered. The results are relatively similar to those from the Kruskal–Wallis and Dunn tests.
Note the placement of the initial quote mark in the Input function.
Input =(
"Likert 1 2 3 4 5
Speaker
Pooh 0 0 1 6 3
Piglet 1 6 2 1 0
Tigger 0 0 2 6 2
")
Tabla = as.table(read.ftable(textConnection(Input)))
Tabla
sum(Tabla)
prop.table(Tabla,
margin = 1) ### proportion in each row
Likert
Speaker 1 2 3 4 5
Pooh 0.0 0.0 0.1 0.6 0.3
Piglet 0.1 0.6 0.2 0.1 0.0
Tigger 0.0 0.0 0.2 0.6 0.2
library(coin)
Test = chisq_test(Tabla,
scores = list("Likert" = c(1, 2, 3, 4, 5)))
Test
Asymptotic Generalized Pearson Chi-Squared Test
data: Likert (ordered) by Speaker (Pooh, Piglet, Tigger)
chi-squared = 18.423, df = 2, p-value = 9.991e-05
library(rcompanion)
PT = pairwiseOrdinalIndependence(Tabla,
compare="row")
PT
Comparison p.value p.adjust
1 Pooh : Piglet 0.000310 0.000902
2 Pooh : Tigger 0.474000 0.474000
3 Piglet : Tigger 0.000601 0.000902
library(rcompanion)
cldList(p.value ~ Comparison,
data =
PT,
threshold = 0.05)
Group Letter MonoLetter
1 Pooh a a
2 Piglet b b
3 Tigger a a
Long-format data and three-dimensional contingency tables
The tests in this chapter can also be performed on data in long format rather than in table format. Each row of data can represent a single observation, or one variable can hold counts of each category, as the Count variable does below.
For the independence_test function in the coin package to handle ordered and non-ordered variables properly, the user need only specify which variables are to be considered ordered.
The independence_test function can also handle a stratification variable. That is, a two-dimensional table within each level of the stratification variable.
Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="
Crop Size Adopt Count
Nursery Hobbiest Always 0
Nursery Hobbiest Sometimes 1
Nursery Hobbiest Never 3
Nursery Mom-and-pop Always 1
Nursery Mom-and-pop Sometimes 1
Nursery Mom-and-pop Never 2
Nursery Small Always 2
Nursery Small Sometimes 3
Nursery Small Never 2
Nursery Medium Always 2
Nursery Medium Sometimes 1
Nursery Medium Never 0
Nursery Large Always 1
Nursery Large Sometimes 0
Nursery Large Never 0
Vegetable Hobbiest Always 0
Vegetable Hobbiest Sometimes 0
Vegetable Hobbiest Never 2
Vegetable Mom-and-pop Always 1
Vegetable Mom-and-pop Sometimes 2
Vegetable Mom-and-pop Never 2
Vegetable Small Always 2
Vegetable Small Sometimes 1
Vegetable Small Never 2
Vegetable Medium Always 1
Vegetable Medium Sometimes 1
Vegetable Medium Never 0
Vegetable Large Always 1
Vegetable Large Sometimes 0
Vegetable Large Never 0
")
### Tell R which variables are ordered
Data$Size = factor(Data$Size,
ordered = TRUE,
levels=unique(Data$Size))
Data$Adopt = factor(Data$Adopt,
ordered = TRUE,
levels=unique(Data$Adopt))
### Make sure the ordered levels are in the proper
order
str(Data$Adopt)
Ord.factor w/ 3 levels "Always"<"Sometimes"<..: 1 2 3
1 2 3 1 2 3 1 ...
levels(Data$Adopt)
[1] "Always" "Sometimes" "Never"
str(Data$Size)
Ord.factor w/ 5 levels "Hobbiest"<"Mom-and-pop"<..: 1
1 1 2 2 2 3 3 3 4 ...
levels(Data$Size)
[1] "Hobbiest" "Mom-and-pop" "Small"
"Medium" "Large"
Two-dimensional table example
XT = xtabs(Count ~ Size + Adopt,
data = Data)
XT
Adopt
Size Always Sometimes Never
Hobbiest 0 1 5
Mom-and-pop 2 3 4
Small 4 4 4
Medium 3 2 0
Large 2 0 0
spineplot(XT)
library(coin)
independence_test(Adopt ~ Size,
data = Data,
weights = ~ Count)
Asymptotic General Independence Test
data: Adopt (ordered) by Size (Hobbiest < Mom-and-pop < Small <
Medium < Large)
Z = -3.3276, p-value = 0.0008761
Three-dimensional table example
The following example adds Crop as a stratification variable.
ftable(xtabs(Count ~ Crop + Size + Adopt,
data = Data))
Adopt Always Sometimes Never
Crop Size
Nursery Hobbiest 0 1 3
Mom-and-pop 1 1 2
Small 2 3 2
Medium 2 1 0
Large 1 0 0
Vegetable Hobbiest 0 0 2
Mom-and-pop 1 2 2
Small 2 1 2
Medium 1 1 0
Large 1 0 0
library(coin)
independence_test(Adopt ~ Size | Crop,
data = Data,
weights = ~ Count)
Asymptotic General Independence Test
data: Adopt (ordered) by Size (Hobbiest <
Mom-and-pop < Small < Medium < Large)
stratified by Crop
Z = -3.281, p-value = 0.001034
Spine plot for two-dimensional tables
XT1 = xtabs(Count ~ Size + Adopt,
data = Data,
subset = Data$Crop=="Nursery")
XT1
spineplot(XT1,
main = "Nursery")
XT2 = xtabs(Count ~ Size + Adopt,
data = Data,
subset = Data$Crop=="Vegetable")
XT2
spineplot(XT2,
main = "Vegetable")
References
Agresti, A. 2007. An Introduction to Categorical Data Analysis 2nd Edition. Wiley-Interscience.