[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Measures of Association for Ordinal Variables

Measures of association for one ordinal variable and one nominal variable

 

The statistics Freeman’s theta and epsilon-squared are used to gauge the strength of the association between one ordinal variable and one nominal variable.  Both of these statistics range from 0 to 1, with 0 indicating no association and 1 indicating perfect association.  In my experience, Freeman’s theta tends to be somewhat larger that epsilon-squared for the same data. 

 

As effect sizes, these statistics are not affected by the sample size per se.  epsilon-squared is usually used as the effect size for a Kruskal–Wallis test, whereas Freeman’s theta is most often used as an effect size for data arranged in a table, such as for a Cochran–Armitage test.  However, it is my understanding that neither statistic assumes or prohibits one variable being designated as the dependent variable

 

Measures of association for two ordinal variables

 

Measures of association for ordinal variables include Somers’ D (or delta), Kendall’s tau-b, and Goodman and Kruskal's gamma

 

Yule's Q is equivalent in magnitude to Goodman and Kruskal's gamma for a 2 x 2 table, but can be either positive or negative depending on the order of the cells in the table.

 

Polychoric and tetrachoric correlation

 

Polychoric correlation is used to measure the degree of correlation between two ordinal variables with the assumption that each ordinal variable is a discrete summary of an underlying (latent) normally distributed continuous variable.  For example, if an ordinal variable Height were measured as very short, short, average, tall, very tall, one could assume that these categories represent actual height measurements that are continuous and normally distributed.  A similar assumption might be made for Likert items, for example on an agree–disagree spectrum.

 

Tetrachoric correlation is a special case of polychoric correlation when both variables are dichotomous.

 

Appropriate data

•  One ordinal variable and one nominal variable, or two ordinal variables.  Usually expressed as a contingency table.

•  Experimental units aren’t paired.

 

Hypotheses

•  There are no hypotheses tested directly with these statistics.

 

Other notes and alternative tests

•  Cramér’s V and phi are used for tables with two nominal variables. Cohen’s w is a variant. Goodman and Kruskal’s lambda is also used.

•  Biserial and polyserial correlation are used for one continuous variable and one ordinal (or dichotomous) variable, when there is an assumption that the ordinal variable represents a latent continuous variable.

 

Packages used in this chapter

 

The packages used in this chapter include:

•  rcompanion

•  psych

•  DescTools

 

The following commands will install these packages if they are not already installed:


if(!require(rcompanion)){install.packages("rcompanion")}
if(!require(psych)){install.packages("psych")}
if(!require(DescTools)){install.packages("DescTools")}

 

Examples for Freeman’s theta and epsilon-squared

 

The hypothetical Breakfast example from the previous chapter includes Breakfast as an ordinal variable and Travel as a nominal variable.


Input =(
"Breakfast  Never  Rarely  Sometimes Often  Always
Travel
Walk         6      9       6         5      2
Bus          2      5       8         5      3
Drive        2      4       6         8      8
")

Tabla = as.table(read.ftable(textConnection(Input)))


Tabla

Freeman’s theta


library(rcompanion)

freemanTheta(Tabla,
             group = "row")


Freeman.theta
        0.312


Epsilon-squared


library(rcompanion)

epsilonSquared(Tabla,
               group = "row")


epsilon.squared
           0.11


The hypothetical example with Pooh, Piglet, and Tigger includes Likert as an ordinal variable and Speaker as a nominal variable.


Input =(
"Likert  1 2 3 4 5
Speaker
Pooh     0 0 1 6 3
Piglet   1 6 2 1 0
Tigger   0 0 2 6 2
")

Tabla = as.table(read.ftable(textConnection(Input)))

Tabla

Freeman’s theta


library(rcompanion)

freemanTheta(Tabla,
             group = "row")


Freeman.theta
        0.64


Epsilon-squared


library(rcompanion)

epsilonSquared(Tabla,
               group = "row")


epsilon.squared
           0.581


Additional examples for Freeman’s theta and epsilon-squared

 

Perfect association


Input =(
"Ordinal  1  2  3
Category
A        10  0  0
B         0 10  0
C         0  0 10
")

Tabla = as.table(read.ftable(textConnection(Input)))

Tabla


library(rcompanion)

freemanTheta(Tabla)


Freeman.theta
        1


library(rcompanion)

epsilonSquared(Tabla)


epsilon.squared
           1


Zero association


Input =(
"Ordinal  1  2  3
Category
A         5  5  5
B        10 10 10
C        15 15 15
")

Tabla = as.table(read.ftable(textConnection(Input)))


Tabla


library(rcompanion)

freemanTheta(Tabla)


Freeman.theta
            0


library(rcompanion)

epsilonSquared(Tabla)


epsilon.squared
              0


Examples for Somers’ D, Kendall’s tau-b, and Goodman and Kruskal's gamma

 

First example

 

This example includes two ordinal variables, Adopt and Size.


Input =(
"Adopt      Always  Sometimes  Never
Size
Hobbiest         0          1      5
Mom-and-pop      2          3      4
Small            4          4      4
Medium           3          2      0
Large            2          0      0
")

Tabla = as.table(read.ftable(textConnection(Input)))

Tabla


library(DescTools)

SomersDelta(Tabla,
            direction  = "column",
            conf.level = 0.95)


    somers     lwr.ci     ups.ci
-0.4665127 -0.6452336 -0.2877918

### Somers' D for (Column | Row), with confidence interval


library(DescTools)

KendallTauB(Tabla,
            conf.level = 0.95)


     tau_b     lwr.ci     ups.ci
-0.4960301 -0.6967707 -0.2952895

### Kendall’s tau-b with confidence interval


library(DescTools)

GoodmanKruskalGamma(Tabla,
                    conf.level = 0.95)


     gamma     lwr.ci     ups.ci
-0.6778523 -0.9199026 -0.4358021

### Goodman and Kruskal’s gamma with confidence interval


Second example

 

This example includes two ordinal variables, Tired and Happy.

 

Input =(
"Tired  1 2 3 5
Happy
    1   0 0 0 3
    2   0 0 0 2
    3   0 0 3 0
    4   2 0 0 0
    5   3 2 0 5
")

Tabla = as.table(read.ftable(textConnection(Input)))

Tabla


library(DescTools)

SomersDelta(Tabla,
            direction  = "row",
            conf.level = 0.95)


     somers      lwr.ci      ups.ci
-0.32061069 -0.68212474  0.04090337

### Somers' D for (Row | Column), with confidence interval


library(DescTools)

KendallTauB(Tabla,
            conf.level = 0.95)


      tau_b      lwr.ci      ups.ci
-0.31351142 -0.67251606  0.04549322

### Kendall’s tau-b with confidence interval


library(DescTools)

GoodmanKruskalGamma(Tabla,
                    conf.level = 0.95)


      gamma      lwr.ci      ups.ci
-0.42000000 -0.87863057  0.038630

### Goodman and Kruskal’s gamma with confidence interval


Examples for polychoric correlation

 

First example

 

This example includes two ordinal variables, Adopt and Size.  A single correlation coefficient is produced.


Input =(
"Adopt      Always  Sometimes  Never
Size
Hobbiest         0          1      5
Mom-and-pop      2          3      4
Small            4          4      4
Medium           3          2      0
Large            2          0      0
")

Tabla = as.table(read.ftable(textConnection(Input)))

Tabla


library(psych)

polychoric(Tabla,
           correct=FALSE)


$rho
[1] -0.678344


Second example

 

This example includes two ordinal variables, Tired and Happy.

 

Input =(
"Tired  1 2 3 5
Happy
    1   0 0 0 3
    2   0 0 0 2
    3   0 0 3 0
    4   2 0 0 0
    5   3 2 0 5
")

Tabla = as.table(read.ftable(textConnection(Input)))

Tabla


library(psych)

polychoric(Tabla,
           correct=FALSE)


$rho
[1] -0.495164


Third example

 

The following example revisits the Belcher family data.  Here, we want to know if the scores among instructors are correlated.  This makes sense since each rater rated each instructor.

 

Input =("
Rater  Bob   Linda   Tina   Gene   Louisa
a       4     8       7      6      8
b       5     6       5      4      7
c       4     8       7      5      8
d       6     8       8      5      8
e       6     8       8      6      9
f       6     7       9      6      9
g      10    10      10      5      8
h       6     9       9      5     10
")

Data = read.table(textConnection(Input),header=TRUE)

Data.num = Data[c("Bob", "Linda", "Tina", "Gene", "Louisa")]

library(psych)

polychoric(Data.num,
           correct=FALSE)


Polychoric correlations
       Bob   Linda Tina  Gene  Louis
Bob     1.00                       
Linda   0.59  1.00                 
Tina    0.88  0.76  1.00           
Gene   -0.03  0.16  0.37  1.00     
Louisa  0.35  0.47  0.70  0.64  1.00


pairs(data = Data.num,
      ~ Bob + Linda + Tina + Gene + Louisa)