 ## Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

# Measures of Association for Ordinal Variables

### Measures of association for one ordinal variable and one nominal variable

The statistics Freeman’s theta and epsilon-squared are used to gauge the strength of the association between one ordinal variable and one nominal variable.  Both of these statistics range from 0 to 1, with 0 indicating no association and 1 indicating perfect association.

As effect sizes, these statistics are not affected by the sample size per se.  epsilon-squared is usually used as the effect size for a Kruskal–Wallis test, whereas Freeman’s theta is most often used as an effect size for data arranged in a table, such as for a Cochran–Armitage test.  However, it is my understanding that neither statistic assumes or prohibits one variable being designated as the dependent variable

### Measures of association for two ordinal variables

Measures of association for ordinal variables include Somers’ D (or delta), Kendall’s tau-b, and Goodman and Kruskal's gamma

Yule's Q is equivalent in magnitude to Goodman and Kruskal's gamma for a 2 x 2 table, but can be either positive or negative depending on the order of the cells in the table.

### Polychoric and tetrachoric correlation

Polychoric correlation is used to measure the degree of correlation between two ordinal variables with the assumption that each ordinal variable is a discrete summary of an underlying (latent) normally distributed continuous variable.  For example, if an ordinal variable Height were measured as very short, short, average, tall, very tall, one could assume that these categories represent actual height measurements that are continuous and normally distributed.  A similar assumption might be made for Likert items, for example on an agree–disagree spectrum.

Tetrachoric correlation is a special case of polychoric correlation when both variables are dichotomous.

##### Appropriate data

•  One ordinal variable and one nominal variable, or two ordinal variables.  Usually expressed as a contingency table.

•  Experimental units aren’t paired.

##### Hypotheses

•  There are no hypotheses tested directly with these statistics.

##### Other notes and alternative tests

•  Cramér’s V and phi are used for tables with two nominal variables. Cohen’s w is a variant. Goodman and Kruskal’s lambda is also used.

• Tetrachoric and polychoric correlation are used for two ordinal variables when there is an assumption that the ordinal variables represent latent continuous variables underlying the ordinal variables.

•  Biserial and polyserial correlation are used for one continuous variable and one ordinal (or dichotomous) variable, when there is an assumption that the ordinal variable represents a latent continuous variable.

### Packages used in this chapter

The packages used in this chapter include:

•  rcompanion

•  psych

•  DescTools

The following commands will install these packages if they are not already installed:

if(!require(rcompanion)){install.packages("rcompanion")}
if(!require(psych)){install.packages("psych")}
if(!require(DescTools)){install.packages("DescTools")}

The hypothetical Breakfast example from the previous chapter includes Breakfast as an ordinal variable and Travel as a nominal variable.

Input =(
"Breakfast  Never  Rarely  Sometimes Often  Always
Travel
Walk         6      9       6         5      2
Bus          2      5       8         5      3
Drive        2      4       6         8      8
")

Tabla

#### Freeman’s theta

library(rcompanion)

freemanTheta(Tabla,
group = "row")

Freeman.theta
0.312

#### Epsilon-squared

library(rcompanion)

epsilonSquared(Tabla,
group = "row")

epsilon.squared
0.11

The hypothetical example with Pooh, Piglet, and Tigger includes Likert as an ordinal variable and Speaker as a nominal variable.

Input =(
"Likert  1 2 3 4 5
Speaker
Pooh     0 0 1 6 3
Piglet   1 6 2 1 0
Tigger   0 0 2 6 2
")

Tabla

#### Freeman’s theta

library(rcompanion)

freemanTheta(Tabla,
group = "row")

Freeman.theta
0.64

#### Epsilon-squared

library(rcompanion)

epsilonSquared(Tabla,
group = "row")

epsilon.squared
0.581

#### Perfect association

Input =(
"Ordinal  1  2  3
Category
A        10  0  0
B         0 10  0
C         0  0 10
")

Tabla

library(rcompanion)

freemanTheta(Tabla)

Freeman.theta
1

library(rcompanion)

epsilonSquared(Tabla)

epsilon.squared
1

#### Zero association

Input =(
"Ordinal  1  2  3
Category
A         5  5  5
B        10 10 10
C        15 15 15
")

Tabla

library(rcompanion)

freemanTheta(Tabla)

Freeman.theta
0

library(rcompanion)

epsilonSquared(Tabla)

epsilon.squared
0

### Examples for Somers’ D, Kendall’s tau-b, and Goodman and Kruskal's gamma

#### First example

This example includes two ordinal variables, Adopt and Size.

Input =(
Size
Hobbiest         0          1      5
Mom-and-pop      2          3      4
Small            4          4      4
Medium           3          2      0
Large            2          0      0
")

Tabla

library(DescTools)

SomersDelta(Tabla,
direction  = "column",
conf.level = 0.95)

somers     lwr.ci     ups.ci
-0.4665127 -0.6452336 -0.2877918

### Somers' D for (Column | Row), with confidence interval

library(DescTools)

KendallTauB(Tabla,
conf.level = 0.95)

tau_b     lwr.ci     ups.ci
-0.4960301 -0.6967707 -0.2952895

### Kendall’s tau-b with confidence interval

library(DescTools)

GoodmanKruskalGamma(Tabla,
conf.level = 0.95)

gamma     lwr.ci     ups.ci
-0.6778523 -0.9199026 -0.4358021

### Goodman and Kruskal’s gamma with confidence interval

#### Second example

This example includes two ordinal variables, Tired and Happy.

Input =(
"Tired  1 2 3 5
Happy
1   0 0 0 3
2   0 0 0 2
3   0 0 3 0
4   2 0 0 0
5   3 2 0 5
")

Tabla

library(DescTools)

SomersDelta(Tabla,
direction  = "row",
conf.level = 0.95)

somers      lwr.ci      ups.ci
-0.32061069 -0.68212474  0.04090337

### Somers' D for (Row | Column), with confidence interval

library(DescTools)

KendallTauB(Tabla,
conf.level = 0.95)

tau_b      lwr.ci      ups.ci
-0.31351142 -0.67251606  0.04549322

### Kendall’s tau-b with confidence interval

library(DescTools)

GoodmanKruskalGamma(Tabla,
conf.level = 0.95)

gamma      lwr.ci      ups.ci
-0.42000000 -0.87863057  0.038630

### Goodman and Kruskal’s gamma with confidence interval

### Examples for polychoric correlation

#### First example

This example includes two ordinal variables, Adopt and Size.  A single correlation coefficient is produced.

Input =(
Size
Hobbiest         0          1      5
Mom-and-pop      2          3      4
Small            4          4      4
Medium           3          2      0
Large            2          0      0
")

Tabla

library(psych)

polychoric(Tabla,
correct=FALSE)

\$rho
 -0.678344

#### Second example

This example includes two ordinal variables, Tired and Happy.

Input =(
"Tired  1 2 3 5
Happy
1   0 0 0 3
2   0 0 0 2
3   0 0 3 0
4   2 0 0 0
5   3 2 0 5
")

Tabla

library(psych)

polychoric(Tabla,
correct=FALSE)

\$rho
 -0.495164

#### Third example

The following example revisits the Belcher family data.  Here, we want to know if the scores among instructors are correlated.  This makes sense since each rater rated each instructor.

Input =("
Rater  Bob   Linda   Tina   Gene   Louisa
a       4     8       7      6      8
b       5     6       5      4      7
c       4     8       7      5      8
d       6     8       8      5      8
e       6     8       8      6      9
f       6     7       9      6      9
g      10    10      10      5      8
h       6     9       9      5     10
")

Data.num = Data[c("Bob", "Linda", "Tina", "Gene", "Louisa")]

library(psych)

polychoric(Data.num,
correct=FALSE)

Polychoric correlations
Bob   Linda Tina  Gene  Louis
Bob     1.00
Linda   0.59  1.00
Tina    0.88  0.76  1.00
Gene   -0.03  0.16  0.37  1.00
Louisa  0.35  0.47  0.70  0.64  1.00

pairs(data = Data.num,
~ Bob + Linda + Tina + Gene + Louisa) 