[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Measures of Association for Nominal Variables

There are several statistics that can be used to gauge the strength of the association between two nominal variables.  They are used as measures of effect size for tests of association for nominal variables.

 

The statistics phi and Cramér’s V are commonly used.  Cramér’s V varies from 0 to 1, with a 1 indicting a perfect association.  phi varies from –1 to 1, with  –1 and 1 indicating perfect associations.  phi is available only for 2 x 2 tables.

 

Cohen’s w is similar to Cramér’s V in use, but it’s upper value is not limited to 1.


Goodman and Kruskal’s lambda statistic is also used to gauge the strength of the association between two nominal variables.  It is formulated so that one dimension on the table is considered the independent variable, and one is considered the dependent variable, so that the independent variable is used to predict the dependent variable.  It varies from 0 to 1.

 

Another measure of association is Tschuprow's T.  It is similar to Cramér’s V, and they are equivalent for square tables (one with an equal number of rows and columns).

 

Appropriate data

•  Two nominal variables with two or more levels each.  Usually expressed as a contingency table.

•  Experimental units aren’t paired.

•  For phi, the table is 2 x 2 only.

 

Hypotheses

•  There are no hypotheses tested directly with these statistics.

 

Other notes and alternative tests

•  Freeman’s theta and epsilon squared are used for tables with one ordinal variable and one nominal variable.

•  For tables with two ordinal variables, Kendall’s Tau-b, Goodman and Kruskal's gamma, and Somers’ D are used.

•  Tetrachoric and polychoric correlation are used for two ordinal variables when there is an assumption that the ordinal variables represent latent continuous variables underlying the ordinal variables.

•  Biserial and polyserial correlation are used for one continuous variable and one ordinal (or dichotomous) variable, when there is an assumption that the ordinal variable represents a latent continuous variable.

 

Interpretation of statistics

 

The interpretation of measures of association is always relative to the discipline, the specific data, and the aims of the analyst.  Sometimes guidelines are given for “small”, “medium”, and “large” effects—often originating from Cohen (1988)—but it is important to remember that these are still relative to the discipline and type of data.  A smaller effect size may be considered “large” in psychology or behavioral science, but may be considered quite small in a physical science such as chemistry.  The specific conditions of the study are important as well.  For example, one would expect the difference in knowledge between a group completely ignorant of a subject and one educated in the subject to be large, but the difference between two groups educated in the same subject with different manners might be small.

 

Interpretation of measures of association for behavioral sciences

 

Interpretation

Cohen’s w

phi

Cramér’s V,
k = 2*

Cramér’s V,
k = 3*

Cramér’s V,
k = 4*

Small

0.10

0.10

0.10

0.07

0.06

Medium

0.30

0.30

0.30

0.20

0.17

Large

0.50

0.50

0.50

0.35

0.29

___________________________________
Adapted from Cohen (1988).

* k is the minimum number of categories in either rows or columns.

 

Packages used in this chapter

 

The packages used in this chapter include:

•  rcompanion

•  vcd

•  psych

•  DescTools

 

The following commands will install these packages if they are not already installed:


if(!require(rcompanion)){install.packages("rcompanion")}
if(!require(vcd)){install.packages("vcd")}
if(!require(psych)){install.packages("psych")}
if(!require(DescTools)){install.packages("DescTools")}


Examples for measures of association for nominal variables

 

Cramér’s V


Input =("
County               Pass   Fail
Bloom                21      5
Cobblestone           6     11
Dougal                7      8
Heimlich             27      5
")

Matrix = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

Matrix


library(rcompanion)

cramerV(Matrix,
        digits=4)


Cramer V
 0.4387


library(vcd)

assocstats(Matrix)


Phi-Coefficient   : NA
Cramer's V        : 0.439


library(DescTools)

CramerV(Matrix,
        conf.level=0.95)


Cramer V    lwr.ci    upr.ci
0.4386881 0.1944364 0.6239856



phi


Input =("
Sex      Pass Fail
  Male     49   64
  Female   44   24
")

Matrix.2 = as.matrix(read.table(textConnection(Input),
                     header=TRUE,
                     row.names=1))

Matrix.2


library(psych)

phi(Matrix.2,
    digits = 4)


[1] -0.2068


library(DescTools)

Phi(Matrix.2)


[1] 0.206808

   ###  It appears that DescTools always produces a positive value.


library(rcompanion)

cramerV(Matrix.2)


Cramer V

  0.2068

   ###  Note that Cramer’s V is the same as the absolute value
   ###    of phi for 2 x 2 tables.



Cohen’s w

 

Input =("
County               Pass   Fail
Bloom                21      5
Cobblestone           6     11
Dougal                7      8
Heimlich             27      5
")

Matrix = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

Matrix


library(rcompanion)

cohenW(Matrix)


Cohen w
 0.4387

   ### Note that because the smallest dimension in the table is 2,
       the value of Cohen’s w is the same as that for
       Cramer’s V.



Goodman Kruskal lambda

 

Input =("
County               Pass   Fail
Bloom                21      5
Cobblestone           6     11
Dougal                7      8
Heimlich             27      5
")

Matrix = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

Matrix


library(DescTools)

Lambda(Matrix,
       direction="column")


[1] 0.2068966

   ### County predicts Pass/Fail


Tschuprow's T


Input =("
County               Pass   Fail
Bloom                21      5
Cobblestone           6     11
Dougal                7      8
Heimlich             27      5
")

Matrix = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

Matrix


library(DescTools)

TschuprowT(Matrix)


[1] 0.3333309


Optional analysis:  changing the order of the table

 

Note that if we change the order of the rows in the table, the results for Cramér’s V, Cohen’s w, and lambda (with direction=column) do not change.  This is because these statistics treat the variables as nominal and not ordinal.


Input =("
County               Pass   Fail
Heimlich             27      5
Bloom                21      5
Dougal                7      8
Cobblestone           6     11
")

Matrix.3 = as.matrix(read.table(textConnection(Input),
                     header=TRUE,
                     row.names=1))

Matrix.3


### Cramer’s V

library(rcompanion)

cramerV(Matrix.3,
        digits=4)


Cramer V
 0.4387


### Cohen’s w

cohenW(Matrix.3)


Cohen w
 0.4387


### Goodman Kruskal lambda

library(DescTools)

Lambda(Matrix.3,
       direction="column")

   ###  Treat County as independent variable


[1] 0.2068966


Optional analysis:  comparing statistics

 

The following examples may give some sense of the difference between Cramér’s V, and lambda.


Input =("
X    Y1   Y2
X1   10    0
X2    0   10
")

Matrix.x = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

library(rcompanion)

cramerV(Matrix.x)


 Cramer V
  1


library(rcompanion)

cohenW(Matrix.x)


Cohen w

      1


library(DescTools)

Lambda(Matrix.x,
       direction="column")


 [1] 1


Input =("
X    Y1   Y2
X1   10    0
X2   10   10
")

Matrix.y = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

library(rcompanion)

cramerV(Matrix.y)


 Cramer V
  0.5


library(rcompanion)

cohenW(Matrix.y)


Cohen w
    0.5

library(DescTools)

Lambda(Matrix.y,

       direction="column")


 [1] 0

   ### X predicts Y.


Input =("
X    Y1   Y2
X1   10    0
X2   10   10
X3   10   20
X4   10   30
")

Matrix.z = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

library(rcompanion)

cramerV(Matrix.z)


 Cramer V
  0.4488


library(rcompanion)

cohenW(Matrix.z)


Cohen w
 0.4488


library(DescTools)

Lambda(Matrix.z,
       direction="column")


 [1] 0.25

   ### X predicts Y.


References

 

Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences, 2nd Edition. Routledge.