[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Association Tests for Ordinal Tables

The linear-by-linear test can be used to test the association among variables in a contingency table with ordered categories (Agresti, 2007).   This test or a test with a similar function is sometimes called “ordinal chi-square” test.

 

In Agresti, the method used is called the linear-by-linear association model.  In R, the test can be performed by permutation test with the coin package.

An association test can also be performed on a contingency table with one ordered nominal variable and one non-ordered nominal variable.  The Cochran–Armitage test is a special case of this when the non-ordered variable has only two variables.

 

Most of the examples in this chapter use two-dimensional tables, although the coin package can handle three-dimensional tables.  For three-dimensional analyses, it may be easier to use data in the long format, as is shown in the final example in this chapter.

 

Packages used in this chapter

 

The packages used in this chapter include:

•  coin

•  rcompanion

 

The following commands will install these packages if they are not already installed:


if(!require(coin)){install.packages("coin")}
if(!require(rcompanion)){install.packages("rcompanion")}

Linear-by-linear test for ordered contingency tables

 

The lbl_test function in the coin package with automatically treat the variables as ordered, with the levels in the table ordered from smallest to largest.  By default, the levels are equally spaced, but the scores option can be used to specify the distance between the levels of each variable.

 

 

The null hypothesis for the linear-by-linear test is that there is no association among the variables in the table.  A significant p-value suggests that there is an association.  This is similar to a chi-square test, except that the categories are ordered in nature.

 

Example of linear-by-linear test 1

For this hypothetical example, farmers were surveyed about how often they use some best management practice.  Responses are organized according to the size of the operation.  Both variables in the contingency table are ordered categories.

 

Note the placement of the initial quote mark in the Input function.


Input =(
"Adopt      Always  Sometimes  Never
Size
Hobbiest         0          1      5
Mom-and-pop      2          3      4
Small            4          4      4
Medium           3          2      0
Large            2          0      0
")

Tabla = as.table(read.ftable(textConnection(Input)))

Tabla

sum (Tabla)

prop.table(Tabla,
           margin = NULL)   ### proportion in the table


Size              Always  Sometimes      Never
  Hobbiest    0.00000000 0.02941176 0.14705882
  Mom-and-pop 0.05882353 0.08823529 0.11764706
  Small       0.11764706 0.11764706 0.11764706
  Medium      0.08823529 0.05882353 0.00000000
  Large       0.05882353 0.00000000 0.00000000


library(coin)

spineplot(Tabla)




Spine plot for each Size showing the proportion of Always (dark gray), Sometimes (medium gray), and Never (light gray).


library(coin)

LxL = lbl_test(Tabla)

LxL


Asymptotic Linear-by-Linear Association Test

data:  Adopt (ordered) by Size (Hobbiest < Mom-and-pop < Small < Medium < Large)

Z = -3.3276, p-value = 0.0008761


statistic(LxL)^2


11.07262


Compare to chi-square test without ordered categories


ChiSq = chisq_test(Tabla)

ChiSq


Asymptotic Pearson Chi-Squared Test

data:  Adopt by Size (Hobbiest, Mom-and-pop, Small, Medium, Large)

chi-squared = 13.495, df = 8, p-value = 0.09593


Example of linear-by-linear test 2

The following example revisits the data from Converting Numeric Data to Categories chapter.  It tests if there is an association between Tired and Happy.  Both variables in the contingency table are ordered categories, with the levels of each being 1, 2, 3, 4, and 5.


Input =(
"Tired 1 2 3 5
Happy
    1  0 0 0 3
    2  0 0 0 2
    3  0 0 3 0
    4  2 0 0 0
    5  3 2 0 5
")

Tabla = as.table(read.ftable(textConnection(Input)))


Tabla

sum (Tabla)

prop.table(Tabla,
           margin = NULL)   ### proportion in the table


     Tired
Happy    1    2    3    5
    1 0.00 0.00 0.00 0.15
    2 0.00 0.00 0.00 0.10
    3 0.00 0.00 0.15 0.00
    4 0.10 0.00 0.00 0.00
    5 0.15 0.10 0.00 0.25


library(coin)

LxL = lbl_test(Tabla)

LxL


Asymptotic Linear-by-Linear Association Test

data:  Tired (ordered) by Happy (1 < 2 < 3 < 4 < 5)

Z = -1.8878, p-value = 0.05906


Extended Cochran–Armitage test

 

The chisq_test function in the coin package can be used to conduct a test of association for a contingency table with one ordered nominal variable and one non-ordered nominal variable.  The Cochran–Armitage test is a special case of this where the non-ordered variable has only two categories.

 

The scores option is used to indicate which variable should be treated as ordered, and the spacing of the levels of this variable.

 

Example of extended Cochran–Armitage test 1

For this hypothetical example, students were surveyed about how often they eat breakfast and how they travel to school.  Breakfast is treated as ordered, and Travel is treated as nominal.

 

Note the placement of the initial quote mark in the Input function.


Input =(
"Breakfast  Never  Rarely  Sometimes Often  Always
Travel
Walk         6      9       6         5      2
Bus          2      5       8         5      3
Drive        2      4       6         8      8
")

Tabla = as.table(read.ftable(textConnection(Input)))


Tabla

sum (Tabla)

prop.table(Tabla,
           margin = 1)   ### proportion in each row


       Breakfast
Travel       Never     Rarely  Sometimes      Often     Always
  Walk  0.21428571 0.32142857 0.21428571 0.17857143 0.07142857
  Bus   0.08695652 0.21739130 0.34782609 0.21739130 0.13043478
  Drive 0.07142857 0.14285714 0.21428571 0.28571429 0.28571429



library(coin)

spineplot(Tabla)





library(coin)

Test = chisq_test(Tabla,
                  scores = list("Breakfast" = c(-2, -1, 0, 1, 2)))

Test


Asymptotic Generalized Pearson Chi-Squared Test

data:  Breakfast (ordered) by Travel (Walk, Bus, Drive)

chi-squared = 8.6739, df = 2, p-value = 0.01308


statistic(Test)^2


[1] 75.23709


Post-hoc analysis

For a contingency table with one ordered variable and one non-ordered variable, it makes sense to analyze the component tables with pairwise comparisons of the levels of the non-ordered variable.

 

Results of the compact letter display will be easier to interpret if the table is ordered so that the first row, in this case, ranks highest or lowest in the ordered variable, and so on.


library(rcompanion)

PT = pairwiseOrdinalIndependence(Tabla,
                                 compare="row")

PT


    Comparison p.value p.adjust
1   Walk : Bus 0.12800   0.1570
2 Walk : Drive 0.00462   0.0139
3  Bus : Drive 0.15700   0.1570


library(rcompanion)

cldList(comparison = PT$Comparison,
        p.value    = PT$p.value,
        threshold  = 0.05)


  Group Letter MonoLetter
1  Walk      a         a
2   Bus     ab         ab
3 Drive      b          b


Example of extended Cochran–Armitage test 2

This example revisits the example from Kruskal–Wallis chapter.  Likert is treated as ordered, and Speaker is treated as non-ordered.

 

Note the placement of the initial quote mark in the Input function.


Input =(
"Likert 1 2 3 4 5
Speaker
Pooh   0 0 1 6 3
Piglet 1 6 2 1 0
Tigger 0 0 2 6 2
")

Tabla = as.table(read.ftable(textConnection(Input)))


Tabla

sum(Tabla)

prop.table(Tabla,
           margin = 1)   ### proportion in each row


        Likert
Speaker    1   2   3   4   5
  Pooh   0.0 0.0 0.1 0.6 0.3
  Piglet 0.1 0.6 0.2 0.1 0.0
  Tigger 0.0 0.0 0.2 0.6 0.2


library(coin)

Test = chisq_test(Tabla,
                  scores = list("Likert" = c(1, 2, 3, 4, 5)))

Test


Asymptotic Generalized Pearson Chi-Squared Test

data:  Likert (ordered) by Speaker (Pooh, Piglet, Tigger)

chi-squared = 18.423, df = 2, p-value = 9.991e-05


library(rcompanion)

PT = pairwiseOrdinalIndependence(Tabla,
                                 compare="row")

PT


       Comparison  p.value p.adjust
1   Pooh : Piglet 0.000310 0.000902
2   Pooh : Tigger 0.474000 0.474000
3 Piglet : Tigger 0.000601 0.000902


library(rcompanion)

cldList(comparison = PT$Comparison,
        p.value    = PT$p.value,
        threshold  = 0.05)


   Group Letter MonoLetter
1   Pooh      a         a
2 Piglet      b          b
3 Tigger      a         a


Long-form data and three-dimensional contingency tables

 

The tests in this chapter can also be performed on data in long format rather than in table format.  Each row of data can represent a single observation, or one variable can hold counts of each category, as the Count variable does below.

 

For the independence_test function in the coin package to handle ordered and non-ordered variables properly, the user need only specify which variables are to be considered ordered.

 

The independence_test function can also handle a stratification variable.  That is, a two-dimensional table within each level of the stratification variable.


Input =("
Crop       Size         Adopt      Count
Nursery    Hobbiest     Always         0
Nursery    Hobbiest     Sometimes      1
Nursery    Hobbiest     Never          3
Nursery    Mom-and-pop  Always         1
Nursery    Mom-and-pop  Sometimes      1
Nursery    Mom-and-pop  Never          2
Nursery    Small        Always         2
Nursery    Small        Sometimes      3
Nursery    Small        Never          2
Nursery    Medium       Always         2
Nursery    Medium       Sometimes      1
Nursery    Medium       Never          0
Nursery    Large        Always         1
Nursery    Large        Sometimes      0
Nursery    Large        Never          0
Vegetable  Hobbiest     Always         0
Vegetable  Hobbiest     Sometimes      0
Vegetable  Hobbiest     Never          2
Vegetable  Mom-and-pop  Always         1
Vegetable  Mom-and-pop  Sometimes      2
Vegetable  Mom-and-pop  Never          2
Vegetable  Small        Always         2
Vegetable  Small        Sometimes      1
Vegetable  Small        Never          2
Vegetable  Medium       Always         1
Vegetable  Medium       Sometimes      1
Vegetable  Medium       Never          0
Vegetable  Large        Always         1
Vegetable  Large        Sometimes      0
Vegetable  Large        Never          0
")

Data = read.table(textConnection(Input),header=TRUE)


### Tell R which variables are ordered

Data$Size = factor(Data$Size,
                   ordered = TRUE,
                   levels=unique(Data$Size))

Data$Adopt = factor(Data$Adopt,
                   ordered = TRUE,
                   levels=unique(Data$Adopt))

### Make sure the ordered levels are in the proper order

str(Data$Adopt)


Ord.factor w/ 3 levels "Always"<"Sometimes"<..: 1 2 3 1 2 3 1 2 3 1 ...


levels(Data$Adopt)


 [1] "Always"    "Sometimes" "Never"


str(Data$Size)


Ord.factor w/ 5 levels "Hobbiest"<"Mom-and-pop"<..: 1 1 1 2 2 2 3 3 3 4 ...


levels(Data$Size)


[1] "Hobbiest"    "Mom-and-pop" "Small"       "Medium"      "Large"


Two-dimensional table example


XT = xtabs(Count ~ Size + Adopt,
           data = Data)

XT


             Adopt
Size          Always Sometimes Never
  Hobbiest         0         1     5
  Mom-and-pop      2         3     4
  Small            4         4     4
  Medium           3         2     0
  Large            2         0     0


spineplot(XT)

library(coin)

independence_test(Adopt ~ Size,
           data = Data,
           weights = ~ Count)


Asymptotic General Independence Test

data:  Adopt (ordered) by Size (Hobbiest < Mom-and-pop < Small < Medium < Large)

Z = -3.3276, p-value = 0.0008761


Three-dimensional table example

The following example adds Crop as a stratification variable.


ftable(xtabs(Count ~ Crop + Size + Adopt,
             data = Data))


                      Adopt Always Sometimes Never
Crop      Size                                   
Nursery   Hobbiest               0         1     3
          Mom-and-pop            1         1     2
          Small                  2         3     2
          Medium                 2         1     0
          Large                  1         0     0
Vegetable Hobbiest               0         0     2
          Mom-and-pop            1         2     2
          Small                  2         1     2
          Medium                 1         1     0
          Large                  1         0     0


library(coin)

independence_test(Adopt ~ Size | Crop,
                  data = Data,
                  weights = ~ Count)


Asymptotic General Independence Test

data:  Adopt (ordered) by Size (Hobbiest < Mom-and-pop < Small < Medium < Large)

       stratified by Crop

Z = -3.281, p-value = 0.001034


Spine plot for two-dimensional tables


XT1 = xtabs(Count ~ Size + Adopt,
            data = Data,
            subset = Data$Crop=="Nursery")

XT1

spineplot(XT1,
          main = "Nursery")

XT2 = xtabs(Count ~ Size + Adopt,
            data = Data,
            subset = Data$Crop=="Vegetable")

XT2

spineplot(XT2,
          main = "Vegetable")


References

 

Agresti, A. 2007. An Introduction to Categorical Data Analysis 2nd Edition.  Wiley-Interscience.