## Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

# Association Tests for Ordinal Tables

The linear-by-linear test can be used to test the association among variables in a contingency table with ordered categories (Agresti, 2007).   This test or a test with a similar function is sometimes called “ordinal chi-square” test.

In Agresti, the method used is called the linear-by-linear association model.  In R, the test can be performed by permutation test with the coin package.

An association test can also be performed on a contingency table with one ordered nominal variable and one non-ordered nominal variable.  The Cochran–Armitage test is a special case of this when the non-ordered variable has only two variables.

Most of the examples in this chapter use two-dimensional tables, although the coin package can handle three-dimensional tables.  For three-dimensional analyses, it may be easier to use data in the long format, as is shown in the final example in this chapter.

### Packages used in this chapter

The packages used in this chapter include:

•  coin

•  rcompanion

The following commands will install these packages if they are not already installed:

if(!require(coin)){install.packages("coin")}
if(!require(rcompanion)){install.packages("rcompanion")}

### Linear-by-linear test for ordered contingency tables

The lbl_test function in the coin package with automatically treat the variables as ordered, with the levels in the table ordered from smallest to largest.  By default, the levels are equally spaced, but the scores option can be used to specify the distance between the levels of each variable.

The null hypothesis for the linear-by-linear test is that there is no association among the variables in the table.  A significant p-value suggests that there is an association.  This is similar to a chi-square test, except that the categories are ordered in nature.

#### Example of linear-by-linear test 1

For this hypothetical example, farmers were surveyed about how often they use some best management practice.  Responses are organized according to the size of the operation.  Both variables in the contingency table are ordered categories.

Note the placement of the initial quote mark in the Input function.

Input =(
Size
Hobbiest         0          1      5
Mom-and-pop      2          3      4
Small            4          4      4
Medium           3          2      0
Large            2          0      0
")

Tabla

sum (Tabla)

prop.table(Tabla,
margin = NULL)   ### proportion in the table

Size              Always  Sometimes      Never
Hobbiest    0.00000000 0.02941176 0.14705882
Mom-and-pop 0.05882353 0.08823529 0.11764706
Small       0.11764706 0.11764706 0.11764706
Medium      0.08823529 0.05882353 0.00000000
Large       0.05882353 0.00000000 0.00000000

library(coin)

spineplot(Tabla)

Spine plot for each Size showing the proportion of Always (dark gray), Sometimes (medium gray), and Never (light gray).

library(coin)

LxL = lbl_test(Tabla)

LxL

Asymptotic Linear-by-Linear Association Test

data:  Adopt (ordered) by Size (Hobbiest < Mom-and-pop < Small < Medium < Large)

Z = -3.3276, p-value = 0.0008761

statistic(LxL)^2

11.07262

##### Compare to chi-square test without ordered categories

ChiSq = chisq_test(Tabla)

ChiSq

Asymptotic Pearson Chi-Squared Test

data:  Adopt by Size (Hobbiest, Mom-and-pop, Small, Medium, Large)

chi-squared = 13.495, df = 8, p-value = 0.09593

#### Example of linear-by-linear test 2

The following example revisits the data from Converting Numeric Data to Categories chapter.  It tests if there is an association between Tired and Happy.  Both variables in the contingency table are ordered categories, with the levels of each being 1, 2, 3, 4, and 5.

Input =(
"Tired 1 2 3 5
Happy
1  0 0 0 3
2  0 0 0 2
3  0 0 3 0
4  2 0 0 0
5  3 2 0 5
")

Tabla

sum (Tabla)

prop.table(Tabla,
margin = NULL)   ### proportion in the table

Tired
Happy    1    2    3    5
1 0.00 0.00 0.00 0.15
2 0.00 0.00 0.00 0.10
3 0.00 0.00 0.15 0.00
4 0.10 0.00 0.00 0.00
5 0.15 0.10 0.00 0.25

library(coin)

LxL = lbl_test(Tabla)

LxL

Asymptotic Linear-by-Linear Association Test

data:  Tired (ordered) by Happy (1 < 2 < 3 < 4 < 5)

Z = -1.8878, p-value = 0.05906

### Extended Cochran–Armitage test

The chisq_test function in the coin package can be used to conduct a test of association for a contingency table with one ordered nominal variable and one non-ordered nominal variable.  The Cochran–Armitage test is a special case of this where the non-ordered variable has only two categories.

The scores option is used to indicate which variable should be treated as ordered, and the spacing of the levels of this variable.

#### Example of extended Cochran–Armitage test 1

For this hypothetical example, students were surveyed about how often they eat breakfast and how they travel to school.  Breakfast is treated as ordered, and Travel is treated as nominal.

Note the placement of the initial quote mark in the Input function.

Input =(
"Breakfast  Never  Rarely  Sometimes Often  Always
Travel
Walk         6      9       6         5      2
Bus          2      5       8         5      3
Drive        2      4       6         8      8
")

Tabla

sum (Tabla)

prop.table(Tabla,
margin = 1)   ### proportion in each row

Breakfast
Travel       Never     Rarely  Sometimes      Often     Always
Walk  0.21428571 0.32142857 0.21428571 0.17857143 0.07142857
Bus   0.08695652 0.21739130 0.34782609 0.21739130 0.13043478
Drive 0.07142857 0.14285714 0.21428571 0.28571429 0.28571429

library(coin)

spineplot(Tabla)

library(coin)

Test = chisq_test(Tabla,
scores = list("Breakfast" = c(-2, -1, 0, 1, 2)))

Test

Asymptotic Generalized Pearson Chi-Squared Test

data:  Breakfast (ordered) by Travel (Walk, Bus, Drive)

chi-squared = 8.6739, df = 2, p-value = 0.01308

statistic(Test)^2

[1] 75.23709

##### Post-hoc analysis

For a contingency table with one ordered variable and one non-ordered variable, it makes sense to analyze the component tables with pairwise comparisons of the levels of the non-ordered variable.

Results of the compact letter display will be easier to interpret if the table is ordered so that the first row, in this case, ranks highest or lowest in the ordered variable, and so on.

library(rcompanion)

PT = pairwiseOrdinalIndependence(Tabla,
compare="row")

PT

1   Walk : Bus 0.12800   0.1570
2 Walk : Drive 0.00462   0.0139
3  Bus : Drive 0.15700   0.1570

library(rcompanion)

cldList(p.value ~ Comparison,
data      = PT,
threshold = 0.05)

Group Letter MonoLetter
1  Walk      a         a
2   Bus     ab         ab
3 Drive      b          b

#### Example of extended Cochran–Armitage test 2

This example revisits the example from Kruskal–Wallis chapter.  Likert is treated as ordered, and Speaker is treated as non-ordered.

Note the placement of the initial quote mark in the Input function.

Input =(
"Likert 1 2 3 4 5
Speaker
Pooh   0 0 1 6 3
Piglet 1 6 2 1 0
Tigger 0 0 2 6 2
")

Tabla

sum(Tabla)

prop.table(Tabla,
margin = 1)   ### proportion in each row

Likert
Speaker    1   2   3   4   5
Pooh   0.0 0.0 0.1 0.6 0.3
Piglet 0.1 0.6 0.2 0.1 0.0
Tigger 0.0 0.0 0.2 0.6 0.2

library(coin)

Test = chisq_test(Tabla,
scores = list("Likert" = c(1, 2, 3, 4, 5)))

Test

Asymptotic Generalized Pearson Chi-Squared Test

data:  Likert (ordered) by Speaker (Pooh, Piglet, Tigger)

chi-squared = 18.423, df = 2, p-value = 9.991e-05

library(rcompanion)

PT = pairwiseOrdinalIndependence(Tabla,
compare="row")

PT

1   Pooh : Piglet 0.000310 0.000902
2   Pooh : Tigger 0.474000 0.474000
3 Piglet : Tigger 0.000601 0.000902

library(rcompanion)

cldList(p.value ~ Comparison,
data      = PT,
threshold = 0.05)

Group Letter MonoLetter
1   Pooh      a         a
2 Piglet      b          b
3 Tigger      a         a

### Long-form data and three-dimensional contingency tables

The tests in this chapter can also be performed on data in long format rather than in table format.  Each row of data can represent a single observation, or one variable can hold counts of each category, as the Count variable does below.

For the independence_test function in the coin package to handle ordered and non-ordered variables properly, the user need only specify which variables are to be considered ordered.

The independence_test function can also handle a stratification variable.  That is, a two-dimensional table within each level of the stratification variable.

Input =("
Nursery    Hobbiest     Always         0
Nursery    Hobbiest     Sometimes      1
Nursery    Hobbiest     Never          3
Nursery    Mom-and-pop  Always         1
Nursery    Mom-and-pop  Sometimes      1
Nursery    Mom-and-pop  Never          2
Nursery    Small        Always         2
Nursery    Small        Sometimes      3
Nursery    Small        Never          2
Nursery    Medium       Always         2
Nursery    Medium       Sometimes      1
Nursery    Medium       Never          0
Nursery    Large        Always         1
Nursery    Large        Sometimes      0
Nursery    Large        Never          0
Vegetable  Hobbiest     Always         0
Vegetable  Hobbiest     Sometimes      0
Vegetable  Hobbiest     Never          2
Vegetable  Mom-and-pop  Always         1
Vegetable  Mom-and-pop  Sometimes      2
Vegetable  Mom-and-pop  Never          2
Vegetable  Small        Always         2
Vegetable  Small        Sometimes      1
Vegetable  Small        Never          2
Vegetable  Medium       Always         1
Vegetable  Medium       Sometimes      1
Vegetable  Medium       Never          0
Vegetable  Large        Always         1
Vegetable  Large        Sometimes      0
Vegetable  Large        Never          0
")

### Tell R which variables are ordered

Data\$Size = factor(Data\$Size,
ordered = TRUE,
levels=unique(Data\$Size))

ordered = TRUE,

### Make sure the ordered levels are in the proper order

Ord.factor w/ 3 levels "Always"<"Sometimes"<..: 1 2 3 1 2 3 1 2 3 1 ...

[1] "Always"    "Sometimes" "Never"

str(Data\$Size)

Ord.factor w/ 5 levels "Hobbiest"<"Mom-and-pop"<..: 1 1 1 2 2 2 3 3 3 4 ...

levels(Data\$Size)

[1] "Hobbiest"    "Mom-and-pop" "Small"       "Medium"      "Large"

#### Two-dimensional table example

XT = xtabs(Count ~ Size + Adopt,
data = Data)

XT

Size          Always Sometimes Never
Hobbiest         0         1     5
Mom-and-pop      2         3     4
Small            4         4     4
Medium           3         2     0
Large            2         0     0

spineplot(XT)

library(coin)

data = Data,
weights = ~ Count)

Asymptotic General Independence Test

data:  Adopt (ordered) by Size (Hobbiest < Mom-and-pop < Small < Medium < Large)

Z = -3.3276, p-value = 0.0008761

#### Three-dimensional table example

The following example adds Crop as a stratification variable.

ftable(xtabs(Count ~ Crop + Size + Adopt,
data = Data))

Crop      Size
Nursery   Hobbiest               0         1     3
Mom-and-pop            1         1     2
Small                  2         3     2
Medium                 2         1     0
Large                  1         0     0
Vegetable Hobbiest               0         0     2
Mom-and-pop            1         2     2
Small                  2         1     2
Medium                 1         1     0
Large                  1         0     0

library(coin)

data = Data,
weights = ~ Count)

Asymptotic General Independence Test

data:  Adopt (ordered) by Size (Hobbiest < Mom-and-pop < Small < Medium < Large)

stratified by Crop

Z = -3.281, p-value = 0.001034

##### Spine plot for two-dimensional tables

XT1 = xtabs(Count ~ Size + Adopt,
data = Data,
subset = Data\$Crop=="Nursery")

XT1

spineplot(XT1,
main = "Nursery")

XT2 = xtabs(Count ~ Size + Adopt,
data = Data,
subset = Data\$Crop=="Vegetable")

XT2

spineplot(XT2,
main = "Vegetable")

### References

Agresti, A. 2007. An Introduction to Categorical Data Analysis 2nd Edition.  Wiley-Interscience.