[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Tests for Paired Nominal Data

Tests of symmetry for nominal data are used when the counts on a contingency table represent values that are paired or repeated in time.

 

As an example, consider a question on repeated on a pre-test and a post-test.  We may want to know if the number of correct responses changed from the pre-test to the post-test.


Did students have the correct answer to the question?

            After
Before      Correct   Incorrect
Correct      2        0
Incorrect   21        7

Note that the row names and column names have the same levels, and that counts represent paired responses.  That is, for each observation you must know the individual’s response before and after.

 

Also note that the number of students included in the table are 30, or the sum of the cell counts.

 

In essence, those students with the same response before and after don’t affect the assessment of the change in responses.  We would focus on the “discordant” counts.  That is, How many students had incorrect answers before and correct answers after, in contrast to those who had the reverse trend.  Here, because 21 changed from incorrect to correct, and 0 changed from correct to incorrect, we might suspect that there was a significant change in responses from incorrect to correct.

 

To grasp the difference between nominal tests of association and nominal tests of symmetry, be sure to visit the coffee and tea example below in the section “An example without repeated measures, comparing test of symmetry with test of association”.

 

Appropriate data

•  Two nominal variables with two or more levels each, and each with the same levels.

•  Observations are paired or matched between the two variables.

•  McNemar and McNemar–Bowker tests may not be appropriate if discordant cells have low counts.

 

Hypotheses

•  Null hypothesis:  The contingency table is symmetric.  That is, the probability of cell [i, j] is equal to the probability of cell [j, i].

•  Alternative hypothesis (two-sided): The contingency table is not symmetric.

 

Interpretation

Depending on the context, significant results can be reported as e.g. “There was a significant change from answer A to answer B.”  Or, “X was more popular than Y.”

 

Post-hoc analysis

Post hoc analysis for tests on a contingency table larger than 2 x 2 can be conducted by conducting tests for the component 2 x 2 tables.  A correction for multiple tests should be applied.

 

Other notes and alternative tests

•  For unpaired data, see the tests in the chapter Association Tests for Nominal Data.

•  For multiple times or groups, Cochran’s Q test can be used.  

 

Packages used in this chapter

 

The packages used in this chapter include:

•  EMT

•  rcompanion

 

The following commands will install these packages if they are not already installed:


if(!require(EMT)){install.packages("EMT")}
if(!require(rcompanion)){install.packages("rcompanion")}

McNemar and McNemar–Bowker tests

For a 2 x 2 table, the most common test for symmetry is McNemar’s test.  For larger tables, McNemar’s test is generalized as the McNemar–Bowker symmetry test.  One drawback to the latter test is that it may fail if there are 0’s in certain locations in the matrix.

 

McNemar’s test may not be reliable if there are low counts in the “discordant” cells.  Authors recommend that these cells to sum to at least 5 or 10 or 25.

 

Exact tests

Exact tests of symmetry reduce to exact tests for goodness-of-fit.  A 2 x 2 table is analyzed with a binomial exact test, and a larger table is analyzed with a multinomial exact test.  Examples of these are shown in this chapter in the “Optional analyses: conducting exact tests for symmetry” section.

 

I have written a function, nominalSymmetryTest to conduct these exact tests easily.

 

Example of tests for paired data nominal data

 

Alucard teaches a Master Gardener training on rain gardens for stormwater management and one on rain barrels.  He wishes to assess if people are more willing to install these green infrastructure practices after attending the training.  His data follow.  Note that there are 46 attendees answering each question.


Are you planning to install a rain barrel?

         After
Before   Yes   No
Yes        9    5
No        17   15


Are you planning to install a rain garden?

         After
Before   Yes   No   Maybe
Yes        6    0   1
No         5    3   7
Maybe     11    1   12


Rain barrel


Input =("
Before       After.yes   After.no
Before.yes     9          5
Before.no     17         15
")

Matrix.1 = as.matrix(read.table(textConnection(Input),
                     header=TRUE,
                     row.names=1))

Matrix.1

sum(Matrix.1)


[1] 46


Rain garden


Input =("
Before         Yes.after   No.after   Maybe.after
Yes.before      6          0           1
No.before       5          3           7
Maybe.before   11          1          12
")

Matrix.2 = as.matrix(read.table(textConnection(Input),
                     header=TRUE,
                     row.names=1))

Matrix.2

sum(Matrix.2)


[1] 46


Exact tests for symmetry

 

Rain barrel


library(rcompanion)

nominalSymmetryTest(Matrix.1,
                    digits = 3)


$Global.test.for.symmetry

  Dimensions p.value
1      2 x 2  0.0169


Rain garden


library(rcompanion)

nominalSymmetryTest(Matrix.2,
                    method="fdr",
                    digits = 3)

   ### Note: This may take a long time
   ###       Use MonteCarlo option for large matrices or counts


$Global.test.for.symmetry

  Dimensions p.value
1      3 x 3   2e-04

$Pairwise.symmetry.tests
                                       Comparison p.value p.adjust
1       Yes.before/Yes.after : No.before/No.after  0.0625   0.0703
2 Yes.before/Yes.after : Maybe.before/Maybe.after 0.00635   0.0190
3   No.before/No.after : Maybe.before/Maybe.after  0.0703   0.0703

$p.adjustment

  Method
1    fdr


Maybe to Yes, p.value = 0.0190

  Before         Yes.after   No.after   Maybe.after
  Yes.before     
6          0           1
  No.before       5          3           7
  Maybe.before  
11          1          12


McNemar and McNemar–Bowker chi-square tests for symmetry

 

Rain barrel


mcnemar.test(Matrix.1)


McNemar's Chi-squared test with continuity correction

McNemar's chi-squared = 5.5, df = 1, p-value = 0.01902


Rain garden


mcnemar.test(Matrix.2)


McNemar's Chi-squared test

McNemar's chi-squared = 17.833, df = 3, p-value = 0.0004761


An example without repeated measures, comparing test of symmetry with test of association

As another example, consider a survey of tea and coffee drinking, in which each respondent is asked both if they drink coffee, and if they drink tea.


           Tea
  Coffee   Yes   No
  Yes      37    17
  No        9    25


We would use a test of symmetry in this case if the question we wanted to answer was, Is coffee more popular than tea?  That is, is it more common for someone to drink coffee and not tea than to drink tea and not coffee?  (Those who drink both or drink neither are not relevant to this question.)

 

Note also that this is an example of using a test of symmetry to test the relative frequency of two dichotomous variables when the same subjects are surveyed.


Input =("
Coffee   Yes   No
Yes      37    17
No        9    25
")

Matrix.3 = as.matrix(read.table(textConnection(Input),
                     header=TRUE,
                     row.names=1))


mcnemar.test(Matrix.3)


McNemar's Chi-squared test with continuity correction

McNemar's chi-squared = 1.8846, df = 1, p-value = 0.1698

###  Neither coffee nor tea is more popular, specifically because
###    neither the 9 nor the 17 in the table are large relative to
###    the other.


A test of association answers a very different question.  Namely, Is coffee drinking associated with tea drinking?  That is, is someone more likely to drink tea if they drink coffee?


chisq.test(Matrix.3)


Pearson's Chi-squared test with Yates' continuity correction

X-squared = 13.148, df = 1, p-value = 0.0002878

###  Coffee drinking and tea drinking are associated, in this case people who
###    drink coffee are likely to drink tea. This is a positive association.
###    A negative association could also be significant.


Optional analysis: a 4 x 4 example with several 0’s

As an additional example, imagine a religious caucusing event in which advocates try to sway attendees to switch their religions.

 

Matrix row names are the attendees’ original religions, and the column names, with a “2” added, are attendees’ new religions after the caucus.  Note there are several 0 counts in the matrix.

 

Note first that the mcnemar.test function fails, namely because of the position of some 0 counts in the matrix.

 

Second, note that the multinomial.test function used by my nominalSymmetryTest function would take a long time to calculate an exact p-value for this matrix, so the MonteCarlo=TRUE option is used.  The number of samples used in the Monte Carlo approach can be adjusted with the ntrial option.  Some of the p-values in the post-hoc analysis cannot be produced because of the placement of 0 counts, but these should also be considered non-significant results.


Input =("
Before        Pastafarian2   Discordiant2   Dudist2   Jedi2
Pastafarian   7              0              23         0
Discordiant   0              7               0        33
Dudist        3              0               7         1
Jedi          0              1               0         7
")

Matrix.4 = as.matrix(read.table(textConnection(Input),
                     header=TRUE,
                     row.names=1))

Matrix.4


McNemar –Bowker test


mcnemar.test(Matrix.4)


McNemar's Chi-squared test

McNemar's chi-squared = NaN, df = 6, p-value = NA


Exact test


library(rcompanion)

nominalSymmetryTest(Matrix.4,
                    method="fdr",
                    digits = 3,
                    MonteCarlo = TRUE,
                    ntrial = 100000)


$Global.test.for.symmetry

  Dimensions p.value
1      4 x 4       0

$Pairwise.symmetry.tests
                                           Comparison  p.value p.adjust
1 Pastafarian/Pastafarian2 : Discordiant/Discordiant2     <NA>       NA
2           Pastafarian/Pastafarian2 : Dudist/Dudist2  8.8e-05 1.32e-04
3               Pastafarian/Pastafarian2 : Jedi/Jedi2     <NA>       NA
4           Discordiant/Discordiant2 : Dudist/Dudist2     <NA>       NA
5               Discordiant/Discordiant2 : Jedi/Jedi2 4.07e-09 1.22e-08
6                         Dudist/Dudist2 : Jedi/Jedi2        1 1.00e+00

$p.adjustment

  Method
1    fdr

A look at the significant results

  Pastafarian to Dudist, p-value = 0.000132

    Before        Pastafarian2   Discordiant2   Dudist2   Jedi2
    Pastafarian  
7              0              23         0
    Discordiant   0              7               0        33
    Dudist       
3              0               7         1
    Jedi          0              1               0         7


  Discordiant to Jedi, p-value < 0.0001

    Before        Pastafarian2   Discordiant2   Dudist2   Jedi2
    Pastafarian   7              0              23         0
    Discordiant   0             
7               0        33
    Dudist        3              0               7         1
    Jedi          0             
1               0         7


Optional analyses: conducting exact tests for symmetry

The exact symmetry tests can be conducted directly with the binom.test and multinomial.test functions.  These results match the results of my nominalSymmetryTest function.


Rain barrel

           After
  Before   Yes   No
  Yes        9    5
  No        17   15


Rain garden

           After
  Before   Yes   No   Maybe
  Yes        6   0     1
  No         5   3     7
  Maybe     11   1    12


Rain barrel

For a 2 x 2 matrix, x = the count in one of the discordant cells, and n = the sum of the counts in discordant cells.  The expected proportion is 0.50.

 

You could also follow the method for an n x n matrix below.


x =  5
n =  5 + 17
expected = 0.50

binom.test(x, n, expected)


Exact binomial test

number of successes = 5, number of trials = 22, p-value = 0.0169


Rain garden

For an n x n matrix, observed = a vector of the counts of discordant cells.  Expected is a vector of length (n * nn), with each value equal to (1 / (n * nn)).


observed = c(0, 1, 5, 7, 11, 1)
expected = c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6)

library(EMT)

multinomial.test(observed, expected)


Exact Multinomial Test, distance measure: p

    Events    pObs    p.value
    142506       0      2e-04


Yes–No

For post-hoc testing, reduce the matrix to a 2 x 2 matrix.


x =  0
n =  0 + 5
expected = 0.50

binom.test(x, n, expected)


Exact binomial test

number of successes = 0, number of trials = 5, p-value = 0.0625


Yes–Maybe


x =  1
n =  1 + 11
expected = 0.50

binom.test(x, n, expected)


Exact binomial test

number of successes = 1, number of trials = 12, p-value = 0.006348


No–Maybe


x =  7
n =  7 + 1
expected = 0.50

binom.test(x, n, expected)


Exact binomial test

number of successes = 7, number of trials = 8, p-value = 0.07031


Exercises N


1. Considering Alucard’s data,

How many students responded to the rain barrel question?

How many students changed their answer on the rain barrel question from no to yes?

How would you interpret the results on the rain barrel question?

How would you interpret the result of the global test of the rain garden question?

 

How would you interpret the result of the post-hoc analysis of the rain garden question?

 

2. Considering the coffee and tea example, Do you understand the difference between the hypotheses for tests of association and tests of symmetry?  Would you be comfortable choosing the correct approach?

 

3. Ryuk and Rem held a workshop on planting habitat for pollinators like bees and butterflies.  They wish to know if attendees were more likely to do a planting after the workshop than before.


Will plant?

         After
Before   Yes.after   No.after   Maybe.after
Yes       17         0           0
No         5         9          13
Maybe     15         0           7


For each of the following, answer the question, and show the output from the analyses you used to answer the question.

 

How many students responded to this question?

How many students changed their answer from no to yes?

How would you interpret the result of the global test?

 

How would you interpret the result of the post-hoc analysis?