[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Measures of Association for Nominal Variables

The statistics phi and Cramér’s V are used to gauge the strength of the association between two nominal variables.  They are sometimes considered to be measures of effect size for tests of association for nominal variables.

 

Cramér’s V varies from 0 to 1, with a 1 indicting a perfect association.  phi varies from –1 to 1, with  –1 and 1  being perfect associations.

 

The phi statistic is available only for 2 x 2 tables.


Goodman and Kruskal's  lambda statistic is also used to gauge the strength of the association between two nominal variables.  It is formulated so that one dimension on the table is considered the independent variable, and one is considered the dependent variable, so that the independent variable is used to predict the dependent variable.  It varies from 0 to 1.

 

Appropriate data

•  Two nominal variables with two or more levels each.  Usually expressed as a contingency table.

•  Experimental units aren’t paired.

•  For phi, the table is 2 x 2 only.

 

Hypotheses

•  There are no hypotheses tested directly with these statistics.

 

Other notes and alternative tests

•  Cramér’s V and phi are used for tables with two nominal variables. Goodman and Kruskal's lambda is also used.

•  Freeman’s theta and epsilon squared are used for tables with one ordinal variable and one nominal variable.

•  For tables with two ordinal variables, Kendall’s Tau-b, Goodman and Kruskal's gamma, Somers’ D, and Yule's Q are used.

•  Tetrachoric and polychoric correlation are used for two ordinal variables when there is an assumption that the ordinal variables represent latent continuous variables underlying the ordinal variables.

•  Biserial and polyserial correlation are used for one continuous variable and one ordinal (or dichotomous) variable, when there is an assumption that the ordinal variable represents a latent continuous variable.

 

Packages used in this chapter

 

The packages used in this chapter include:

•  rcompanion

•  vcd

•  psych

•  DescTools

 

The following commands will install these packages if they are not already installed:


if(!require(rcompanion)){install.packages("rcompanion")}
if(!require(vcd)){install.packages("vcd")}
if(!require(psych)){install.packages("psych")}
if(!require(DescTools)){install.packages("DescTools")}


Examples for correlation for nominal variables

 

Cramér’s V


Input =("
County               Pass   Fail
Bloom                21      5
Cobblestone           6     11
Dougal                7      8
Heimlich             27      5
")

Matrix = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

Matrix


### Cramer’s V

library(rcompanion)

cramerV(Matrix,
        digits=4)


Cramer V
 0.4387


library(vcd)

assocstats(Matrix)


Phi-Coefficient   : NA
Cramer's V        : 0.439


### Fisher’s exact test

fisher.test(Matrix)


Fisher's Exact Test for Count Data

p-value = 0.000668


phi


Input =("
Sex      Pass Fail
  Male     49   64
  Female   44   24
")

Matrix.2 = as.matrix(read.table(textConnection(Input),
                     header=TRUE,
                     row.names=1))

Matrix.2

library(psych)

phi(Matrix.2,
    digits = 4)


[1] -0.2068


library(rcompanion)

cramerV(Matrix.2)


Cramer V

  0.2068

   ###  Note that Cramer’s V is the same as the absolute value
   ###    of phi for 2 x 2 tables.


### Fisher’s exact test

fisher.test(Matrix.2)


Fisher's Exact Test for Count Data

p-value = 0.005959


Goodman Kruskal lambda

 

Input =("
County               Pass   Fail
Bloom                21      5
Cobblestone           6     11
Dougal                7      8
Heimlich             27      5
")

Matrix = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

Matrix


### Goodman Kruskal lambda

library(DescTools)

Lambda(Matrix,
       direction="column")


[1] 0.2068966

   ### County predicts Pass/Fail


Optional analysis:  changing the order of the table

 

Note that if we change the order of the rows in the table, the results for Cramér’s V, lambda (with direction=column), and Fisher’s exact test do not change.


Input =("
County               Pass   Fail
Heimlich             27      5
Bloom                21      5
Dougal                7      8
Cobblestone           6     11
")

Matrix.3 = as.matrix(read.table(textConnection(Input),
                     header=TRUE,
                     row.names=1))

Matrix.3


### Cramer’s V

library(rcompanion)

cramerV(Matrix.3,
        digits=4)


Cramer V
 0.4387


### Goodman Kruskal lambda

library(DescTools)

Lambda(Matrix.3,
       direction="column")

   ###  Treat County as independent variable


[1] 0.2068966


### Fisher’s exact test

fisher.test(Matrix.3)


Fisher's Exact Test for Count Data

p-value = 0.000668


Optional analysis:  comparing statistics

 

The following examples may give some sense of the difference between Cramér’s V, and lambda.


Input =("
X    Y1   Y2
X1   10    0
X2    0   10
")

Matrix.x = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

cramerV(Matrix.x)


 Cramer V
  1


Lambda(Matrix.x,
       direction="column")


 [1] 1


Input =("
X    Y1   Y2
X1   10    0
X2   10   10
")

Matrix.y = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

cramerV(Matrix.y)


 Cramer V
  0.5


Lambda(Matrix.y,

       direction="column")


 [1] 0

   ### X predicts Y.


Input =("
X    Y1   Y2
X1   10    0
X2   10   10
X3   10   20
X4   10   30
")

Matrix.z = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

cramerV(Matrix.z)


 Cramer V
  0.4488


Lambda(Matrix.z,
       direction="column")


 [1] 0.25

   ### X predicts Y.