[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Association Tests for Ordinal Tables

Association tests can be conducted for contingency tables in cases a) where both variables are ordinal and b) where one variable is ordinal and one variable is nominal.

 

Ordinal × ordinal tables

 

The linear-by-linear test can be used to test the association among variables in a contingency table with ordered categories (Agresti, 2007).   This test or a test with a similar function is sometimes called “ordinal chi-square” test.  In Agresti, the method used is called the linear-by-linear association model.

 

In R, the test can be performed by permutation test with the coin package.

 

Association in ordinal tables can also be tested with Kendall’s tau correlation.  tau-b can be used for square tables (with the same number of rows and columns).  And tau-c can be used for non-square tables.

Nominal × ordinal tables

 

An association test can also be performed on a contingency table with one ordered categorical variable and one non-ordered categorical variable. 

 

The Cochran–Armitage test is a special case of this when the nominal variable has only two categories.  The Cochran–Armitage test, however, can be “extended” in some software implementations to cases where the nominal variable has more than two categories.

 

Higher-dimensional tables

 

Most of the examples in this chapter use two-dimensional tables, although the coin package can handle three-dimensional tables.  For three-dimensional analyses, it may be easier to use data in the long format, as is shown in the final example in this chapter.

 

Packages used in this chapter

 

The packages used in this chapter include:

•  coin

•  DescTools

•  vcdExtra

•  PMCMRplus

•  rcompanion

 

The following commands will install these packages if they are not already installed:


if(!require(coin)){install.packages("coin")}
if(!require(DescTools)){install.packages("DescTools")}
if(!require(vcdExtra)){install.packages("vcdExtra")}
if(!require(PMCMRplus)){install.packages("PMCMRplus")}
if(!require(rcompanion)){install.packages("rcompanion")}


Tests for ordered contingency tables

 

Linear-by-linear test

The lbl_test function in the coin package will automatically treat the variables as ordered.  By default, the levels are equally spaced, but the scores option can be used to specify the distance between the levels of each variable.

 

The null hypothesis for the linear-by-linear test is that there is no association among the variables in the table.  A significant p-value suggests that there is an association.  This is similar to a chi-square test, except that the categories are ordered in nature.

 

Example of linear-by-linear test

For this hypothetical example, farmers were surveyed about how often they use some best management practice.  Responses are organized according to the size of the operation.  Both variables in the contingency table are ordered categories.

 

Note the placement of the initial quote mark in the Input function.  And that the read.ftable function is used in a second step.


Input =(
"Adopt      Always  Sometimes  Never
Size
Hobbiest         0          1      5
Mom-and-pop      2          3      4
Small            4          4      4
Medium           3          2      0
Large            2          0      0
")

Tabla = as.table(read.ftable(textConnection(Input)))

Tabla

sum (Tabla)

prop.table(Tabla,
           margin = NULL)   ### proportion in the table


Size              Always  Sometimes      Never
  Hobbiest    0.00000000 0.02941176 0.14705882
  Mom-and-pop 0.05882353 0.08823529 0.11764706
  Small       0.11764706 0.11764706 0.11764706
  Medium      0.08823529 0.05882353 0.00000000
  Large       0.05882353 0.00000000 0.00000000


spineplot(Tabla)


Chart

Description automatically generated

Spine plot for each level of Size showing the proportion of Always (darkest gray), Sometimes (medium gray), and Never (lightest gray).  The width of the Size bars indicates the total number of counts in that bar.


library(coin)

LxL = lbl_test(Tabla)

LxL


Asymptotic Linear-by-Linear Association Test

data:  Adopt (ordered) by Size (Hobbiest < Mom-and-pop < Small < Medium < Large)

Z = -3.3276, p-value = 0.0008761


### Chi-square value, df = 1

statistic(LxL)^2 


11.07262


The linear-by-linear test can also be conducted with the CMHtest function in the vcdExtra package.  It is reported as cor in the output.


library(vcdExtra)

CMHtest(Tabla)


Cochran-Mantel-Haenszel Statistics for Size by Adopt

                 AltHypothesis  Chisq Df       Prob
cor        Nonzero correlation 11.073  1 0.00087612


Compare to chi-square test without ordered categories


ChiSq = chisq_test(Tabla)

ChiSq


Asymptotic Pearson Chi-Squared Test

data:  Adopt by Size (Hobbiest, Mom-and-pop, Small, Medium, Large)

chi-squared = 13.495, df = 8, p-value = 0.09593


Kendall’s tau test for ordinal tables

Kendall’s tau can be used as test of association for ordinal tables.  If the table has an unequal number of rows and columns, Kendall’s tau-c should be used rather than Kendall’s tau-b.

 

Kendall’s tau-b


library(DescTools)

KendallTauB(Tabla)


[1] -0.4960301


KendallTauB(Tabla, conf.level=0.95)


     tau_b     lwr.ci     upr.ci
-0.4960301 -0.6967707 -0.2952895


A p-value for the association statistic can derived from the cor.test function.  However, the table needs to be converted to an appropriate numeric format first.  Note that the results are similar to those from the linear-by-linear test.


library(vcdExtra)

Long = expand.table(Tabla)

Long$Size = factor(Long$Size, ordered=TRUE,
                   levels = c("Hobbiest", "Mom-and-pop",
                              "Small", "Medium", "Large"))

Long$Adopt = factor(Long$Adopt, ordered=TRUE,
                   levels = c("Always", "Sometimes", "Never"))

cor.test(~ as.numeric(Long$Size) + as.numeric(Long$Adopt), method="kendall")


Kendall's rank correlation tau

z = -3.3196, p-value = 0.0009015

       tau
-0.4960301


A p-value for the association statistic can also be derived from the confidence interval reported by KendallTauB.  The code is adopted from: stackoverflow.com/questions/51918648/p-value-for-kendalls-tau-c-in-r .  Note that for this example, the results are somewhat different from that from cor.test, but in my experience in most cases they tend to be similar.


ConfLevel = 0.95

TAUB = KendallTauB(Tabla, conf.level=ConfLevel)

Q = abs(qnorm((1-ConfLevel)/2))

SE        =  abs((TAUB[3] - TAUB[1])) / Q

pValue = 2*pnorm(abs(TAUB[1]), mean=0, sd=SE, lower.tail=FALSE)

Z = sign(TAUB[1]) * qnorm(pValue/2, lower.tail=FALSE)

data.frame(tau = TAUB[1], SE=SE, lower=TAUB[2], upper=TAUB[3], z=Z, pValue=pValue)


             tau        SE      lower      upper         z      pValue
tau_b -0.4960301 0.1024206 -0.6967707 -0.2952895 -4.843072 1.27847e-06


Kendall’s tau-c


library(DescTools)

StuartTauC(Tabla)


[1] -0.5242215


StuartTauC(Tabla, conf.level=0.95)


       tauc     lwr.ci     upr.ci
-0.5242215 -0.7477650 -0.3006779


A p-value for the association statistic can be derived from the confidence interval reported by StuartTauC.  The code is adopted from: stackoverflow.com/questions/51918648/p-value-for-kendalls-tau-c-in-r .


ConfLevel = 0.95

TAUC = StuartTauC(Tabla, conf.level=ConfLevel)

Q = abs(qnorm((1-ConfLevel)/2))

SE        =  abs((TAUC[3] - TAUC[1])) / Q

pValue = 2*pnorm(abs(TAUC[1]), mean=0, sd=SE, lower.tail=FALSE)

Z = sign(TAUC[1]) * qnorm(pValue/2, lower.tail=FALSE)

data.frame(tau = TAUC[1], SE=SE, lower=TAUC[2], upper=TAUC[3], z=Z, pValue=pValue)


            tau        SE     lower      upper         z       pValue
tauc -0.5242215 0.1140549 -0.747765 -0.3006779 -4.596219 4.302267e-06


Jonckheere–Terpstra and Cuzick tests

 

The Jonckheere–Terpstra test is usually used in cases where there is an ordinal independent variable and a continuous dependent variable.  However, because it can handle ties in the dependent variable, it can be used when both variables are ordinal.

 

The implementation of the Cuzick test used here does not account for ties when determining the p-value.

 

Note that both of these tests explicitly treat one variable as an independent variable and one variable as a dependent variable.

 

Jonckheere–Terpstra test

Note that for this example, the results are similar to those from the Kendall tau-b test.


library(vcdExtra)

Long = expand.table(Tabla)

Long$Size = factor(Long$Size, ordered=TRUE,
                   levels = c("Hobbiest", "Mom-and-pop",
                              "Small", "Medium", "Large"))

Long$Adopt = factor(Long$Adopt, ordered=TRUE,
                   levels = c("Always", "Sometimes", "Never"))library(PMCMRplus)


library(PMCMRplus)

jonckheereTest(Long$Size, Long$Adopt)


Jonckheere-Terpstra test

z = -3.3196, p-value = 0.0009015


Cuzick test


library(PMCMRplus)

cuzickTest(Long$Size, Long$Adopt)


Cuzick trend test

z = -3.1336, p-value = 0.001727


Warning message:
  Ties are present. 'cuzickTest()' does not correct for ties


Extended Cochran–Armitage test

 

The chisq_test function in the coin package can be used to conduct a test of association for a contingency table with one ordered categorical variable and one nominal categorical variable.  The Cochran–Armitage test is a special case of this where the non-ordered variable has only two categories.

 

The scores option is used to indicate which variable should be treated as ordered, and the spacing of the levels of this variable.

 

Example of extended Cochran–Armitage test 1

For this hypothetical example, students were surveyed about how often they eat breakfast and how they travel to school.  Breakfast is treated as ordered, and Travel is treated as nominal.

 

Note the placement of the initial quote mark in the Input function.


Input =(
"Breakfast  Never  Rarely  Sometimes Often  Always
Travel
Walk         6      9       6         5      2
Bus          2      5       8         5      3
Drive        2      4       6         8      8
")

Tabla = as.table(read.ftable(textConnection(Input)))


Tabla

sum (Tabla)

prop.table(Tabla,
           margin = 1)   ### proportion in each row


       Breakfast
Travel       Never     Rarely  Sometimes      Often     Always
  Walk  0.21428571 0.32142857 0.21428571 0.17857143 0.07142857
  Bus   0.08695652 0.21739130 0.34782609 0.21739130 0.13043478
  Drive 0.07142857 0.14285714 0.21428571 0.28571429 0.28571429



library(coin)

spineplot(Tabla)


Chart

Description automatically generated with medium confidence

Spine plot for each category of Travel showing the proportion of Breakfast (where the darkest gray is Never and the lightest gray is Always).



library(coin)

Test = chisq_test(Tabla,
                  scores = list("Breakfast" = c(-2, -1, 0, 1, 2)))

Test


Asymptotic Generalized Pearson Chi-Squared Test

data:  Breakfast (ordered) by Travel (Walk, Bus, Drive)

chi-squared = 8.6739, df = 2, p-value = 0.01308


Post-hoc analysis

For a contingency table with one ordered variable and one nominal variable, it makes sense to analyze the component tables with pairwise comparisons of the levels of the nominal variable.

 

Results of the compact letter display will be easier to interpret if the table is ordered so that the first row, in this case, ranks highest or lowest in the ordered variable, and so on.


library(rcompanion)

PT = pairwiseOrdinalIndependence(Tabla,
                                 compare="row")

PT


    Comparison p.value p.adjust
1   Walk : Bus 0.12800   0.1570
2 Walk : Drive 0.00462   0.0139
3  Bus : Drive 0.15700   0.1570


library(rcompanion)

cldList(p.value ~ Comparison,
        data       = PT,
        threshold  = 0.05)


  Group Letter MonoLetter
1  Walk      a         a
2   Bus     ab         ab
3 Drive      b          b


Example of extended Cochran–Armitage test 2

This example revisits the example from Kruskal–Wallis chapter.  Likert is treated as ordered, and Speaker is treated as non-ordered.  The results are relatively similar to those from the Kruskal–Wallis and Dunn tests.

 

Note the placement of the initial quote mark in the Input function.


Input =(
"Likert 1 2 3 4 5
Speaker
Pooh   0 0 1 6 3
Piglet 1 6 2 1 0
Tigger 0 0 2 6 2
")

Tabla = as.table(read.ftable(textConnection(Input)))

Tabla

sum(Tabla)

prop.table(Tabla,
           margin = 1)   ### proportion in each row


        Likert
Speaker    1   2   3   4   5
  Pooh   0.0 0.0 0.1 0.6 0.3
  Piglet 0.1 0.6 0.2 0.1 0.0
  Tigger 0.0 0.0 0.2 0.6 0.2


library(coin)

Test = chisq_test(Tabla,
                  scores = list("Likert" = c(1, 2, 3, 4, 5)))

Test


Asymptotic Generalized Pearson Chi-Squared Test

data:  Likert (ordered) by Speaker (Pooh, Piglet, Tigger)

chi-squared = 18.423, df = 2, p-value = 9.991e-05


library(rcompanion)

PT = pairwiseOrdinalIndependence(Tabla,
                                 compare="row")

PT


       Comparison  p.value p.adjust
1   Pooh : Piglet 0.000310 0.000902
2   Pooh : Tigger 0.474000 0.474000
3 Piglet : Tigger 0.000601 0.000902


library(rcompanion)

cldList(p.value ~ Comparison,
        data      = PT,
        threshold = 0.05)


   Group Letter MonoLetter
1   Pooh      a         a
2 Piglet      b          b
3 Tigger      a         a


Long-format data and three-dimensional contingency tables

 

The tests in this chapter can also be performed on data in long format rather than in table format.  Each row of data can represent a single observation, or one variable can hold counts of each category, as the Count variable does below.

 

For the independence_test function in the coin package to handle ordered and non-ordered variables properly, the user need only specify which variables are to be considered ordered.

 

The independence_test function can also handle a stratification variable.  That is, a two-dimensional table within each level of the stratification variable.


Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="

Crop       Size         Adopt      Count
Nursery    Hobbiest     Always         0
Nursery    Hobbiest     Sometimes      1
Nursery    Hobbiest     Never          3
Nursery    Mom-and-pop  Always         1
Nursery    Mom-and-pop  Sometimes      1
Nursery    Mom-and-pop  Never          2
Nursery    Small        Always         2
Nursery    Small        Sometimes      3
Nursery    Small        Never          2
Nursery    Medium       Always         2
Nursery    Medium       Sometimes      1
Nursery    Medium       Never          0
Nursery    Large        Always         1
Nursery    Large        Sometimes      0
Nursery    Large        Never          0
Vegetable  Hobbiest     Always         0
Vegetable  Hobbiest     Sometimes      0
Vegetable  Hobbiest     Never          2
Vegetable  Mom-and-pop  Always         1
Vegetable  Mom-and-pop  Sometimes      2
Vegetable  Mom-and-pop  Never          2
Vegetable  Small        Always         2
Vegetable  Small        Sometimes      1
Vegetable  Small        Never          2
Vegetable  Medium       Always         1
Vegetable  Medium       Sometimes      1
Vegetable  Medium       Never          0
Vegetable  Large        Always         1
Vegetable  Large        Sometimes      0
Vegetable  Large        Never          0
")


### Tell R which variables are ordered

Data$Size = factor(Data$Size,
                   ordered = TRUE,
                   levels=unique(Data$Size))

Data$Adopt = factor(Data$Adopt,
                   ordered = TRUE,
                   levels=unique(Data$Adopt))

### Make sure the ordered levels are in the proper order

str(Data$Adopt)


Ord.factor w/ 3 levels "Always"<"Sometimes"<..: 1 2 3 1 2 3 1 2 3 1 ...


levels(Data$Adopt)


 [1] "Always"    "Sometimes" "Never"


str(Data$Size)


Ord.factor w/ 5 levels "Hobbiest"<"Mom-and-pop"<..: 1 1 1 2 2 2 3 3 3 4 ...


levels(Data$Size)


[1] "Hobbiest"    "Mom-and-pop" "Small"       "Medium"      "Large"


Two-dimensional table example


XT = xtabs(Count ~ Size + Adopt,
           data = Data)

XT


             Adopt
Size          Always Sometimes Never
  Hobbiest         0         1     5
  Mom-and-pop      2         3     4
  Small            4         4     4
  Medium           3         2     0
  Large            2         0     0


spineplot(XT)

library(coin)

independence_test(Adopt ~ Size,
           data = Data,
           weights = ~ Count)


Asymptotic General Independence Test

data:  Adopt (ordered) by Size (Hobbiest < Mom-and-pop < Small < Medium < Large)

Z = -3.3276, p-value = 0.0008761


Three-dimensional table example

The following example adds Crop as a stratification variable.


ftable(xtabs(Count ~ Crop + Size + Adopt,
             data = Data))


                      Adopt Always Sometimes Never
Crop      Size                                   
Nursery   Hobbiest               0         1     3
          Mom-and-pop            1         1     2
          Small                  2         3     2
          Medium                 2         1     0
          Large                  1         0     0
Vegetable Hobbiest               0         0     2
          Mom-and-pop            1         2     2
          Small                  2         1     2
          Medium                 1         1     0
          Large                  1         0     0


library(coin)

independence_test(Adopt ~ Size | Crop,
                  data = Data,
                  weights = ~ Count)


Asymptotic General Independence Test

data:  Adopt (ordered) by Size (Hobbiest < Mom-and-pop < Small < Medium < Large)

       stratified by Crop

Z = -3.281, p-value = 0.001034


Spine plot for two-dimensional tables


XT1 = xtabs(Count ~ Size + Adopt,
            data = Data,
            subset = Data$Crop=="Nursery")

XT1

spineplot(XT1,
          main = "Nursery")

XT2 = xtabs(Count ~ Size + Adopt,
            data = Data,
            subset = Data$Crop=="Vegetable")

XT2

spineplot(XT2,
          main = "Vegetable")


References

 

Agresti, A. 2007. An Introduction to Categorical Data Analysis 2nd Edition.  Wiley-Interscience.