[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Cochran’s Q Test for Paired Nominal Data

Cochran’s Q test is an extension of the McNemar test, when the response variable is dichotomous and there are either multiple times for a repeated measure or multiple categories with paired responses.  A dichotomous variable is a nominal variable with only two levels.

 

One case would be extending the rain barrel question from the previous chapter to multiple times.  A more common case would be extending the coffee and tea example to multiple beverages.  In this case, we would test among several beverages which is more popular that the others.

 

Data are not typically arranged in a contingency table, but are either organized in long-format, with each row representing a single observation, or organized in a short-format matrix with each row representing an experimental unit with the repeated measures across columns. 


Long-format

   Student   Beverage   Response
   a         Coffee     Yes
   a         Tea        Yes
   a         Cola       No
   b         Coffee     No
   b         Tea        No
   b         Cola       Yes
   c         Coffee     Yes
   c         Tea        No
   c         Cola       No


Short-format

   Student   Coffee   Tea   Cola
   a         Yes      Yes   No
   b         No       No    Yes
   c         Yes      No    No



Data need to be arranged in an unreplicated complete block design.  That is, for each experimental unit (Student in this case) there needs to be one and only one response per question or time.

 

The package RVAideMemoire has a function cochran.qtest which will conduct Cochran’s Q test on long-format data.  The function symmetry_test in the coin package will also conduct the test using permutation, which a type of resampling technique.

 

A word of caution about the symmetry_test function:  be sure not to use it in cases where the response variable is multinomial; that is, where there are more than two levels in the response.  The function will return a result, but it is not the same kind of symmetry result as the McNemar–Bowker test.  I suspect that for a multinomial response variable, coin treats it like an ordered factor, not like a nominal variable.

 

When there are two questions or times, Cochran’s Q test is equivalent to the McNemar test.

 

Post-hoc analysis can be conducted with pairwise McNemar or exact tests on 2 x 2 tables.  My custom function pairwiseMcnemar will conduct pairwise McNemar, exact, or permutation tests.

 

Appropriate data

•  The response variable is a dichotomous nominal variable.

•  The responses are paired by experimental unit.

•  The responses are measured across two or more times or factors.

•  The data follow an unreplicated complete block design, with each experimental unit treated as the block.

 

Hypotheses

•  Null hypothesis:  The marginal probability of a [positive] response is unchanged across the times or factors.  That is, a positive response is equally likely across times or factors.

•  Alternative hypothesis: Positive responses are not equally likely across times or factors.

 

Interpretation

Significant results can be reported as “There was a significant difference in the proportion of positive responses across times or factors.”  Or, “A positive response for X was more likely than a positive response for Y.”

 

Post-hoc analysis

Post hoc analysis can be conducted by conducting tests for the component 2 x 2 tables.  A correction for multiple tests should be applied.

 

Packages used in this chapter

 

The packages used in this chapter include:

•  psych

•  RVAideMemoire

•  coin

•  reshape2

•  rcompanion

 

The following commands will install these packages if they are not already installed:


if(!require(psych)){install.packages("psych")}
if(!require(RVAideMemoire)){install.packages("RVAideMemoire")}
if(!require(coin)){install.packages("coin")}
if(!require(reshape2)){install.packages("reshape2")}
if(!require(rcompanion)){install.packages("rcompanion")}


Cochran’s Q test example: long-format

 

Hayley Smith teaches an extension course on environmentally-friendly lawn care.  After the course, she surveys her students about their willingness to adopt good practices: Soil testing, measuring the amount of irrigation, raising the mowing height, and returning the grass clippings to the lawn.  She wants to know which practice students were more willing to adopt.


Input = ("
Practice     Student   Response
SoilTest     a         Yes
SoilTest     b         No
SoilTest     c         Yes
SoilTest     d         Yes
SoilTest     e         No
SoilTest     f         Yes
SoilTest     g         Yes
SoilTest     h         Yes
SoilTest     i         No
SoilTest     j         Yes
SoilTest     k         Yes
SoilTest     l         Yes
SoilTest     m         Yes
SoilTest     n         Yes
Irrigation   a         Yes
Irrigation   b         No
Irrigation   c         No
Irrigation   d         No
Irrigation   e         No
Irrigation   f         No
Irrigation   g         No
Irrigation   h         No
Irrigation   i         Yes
Irrigation   j         No
Irrigation   k         No
Irrigation   l         No
Irrigation   m         No
Irrigation   n         No
MowHeight    a         Yes
MowHeight    b         No
MowHeight    c         Yes
MowHeight    d         Yes
MowHeight    e         Yes
MowHeight    f         Yes
MowHeight    g         Yes
MowHeight    h         Yes
MowHeight    i         No
MowHeight    j         Yes
MowHeight    k         Yes
MowHeight    l         Yes
MowHeight    m         Yes
MowHeight    n         Yes
Clippings    a         Yes
Clippings    b         No
Clippings    c         No
Clippings    d         Yes
Clippings    e         Yes
Clippings    f         No
Clippings    g         No
Clippings    h         Yes
Clippings    i         Yes
Clippings    j         Yes
Clippings    k         Yes
Clippings    l         No
Clippings    m         Yes
Clippings    n         No
")

Data = read.table(textConnection(Input),header=TRUE)


### Order factors otherwise R will alphabetize them

Data$Practice = factor(Data$Practice,
                       levels=unique(Data$Practice))

Data$Response = factor(Data$Response,
                       levels=c("Yes", "No"))


###  Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)


### Remove unnecessary objects

rm(Input)


### View data as a table

Data$Response.n = as.numeric(Data$Response) - 1   ### Creates a new numeric variable
                                                  ###  that is Response as a 0 or 1

Table = xtabs(Response.n ~ Student + Practice,
              data=Data)

Table


      Practice
Student SoilTest Irrigation MowHeight Clippings
      a        0          0         0         0
      b        1          1         1         1
      c        0          1         0         1
      d        0          1         0         0
      e        1          1         0         0
      f        0          1         0         1
      g        0          1         0         1
      h        0          1         0         0
      i        1          0         1         0
      j        0          1         0         0
      k        0          1         0         0
      l        0          1         0         1
      m        0          1         0         0
      n        0          1         0         1


### View counts of responses

xtabs( ~ Practice + Response,
       data=Data)


            Response
Practice     Yes No
  SoilTest    11  3
  Irrigation   2 12
  MowHeight   12  2
  Clippings    8  6


### Create bar plot

Table = xtabs( ~ Response + Practice,
               data=Data)

Table              


barplot(Table,
        beside = TRUE,
        legend = TRUE,
        ylim = c(0, 12),   ### y-axis: used to prevent legend overlapping bars
        cex.names = 0.8,   ### Text size for bars
        cex.axis = 0.8,    ### Text size for axis
        args.legend = list(x   = "topright",   ### Legend location
                           cex = 0.8,          ### Legend text size
                           bty = "n"))         ### Remove legend box


image


Cochran’s Q test


library(RVAideMemoire)

cochran.qtest(Response ~ Practice | Student,
              data = Data)


Cochran's Q test

Q = 16.9535, df = 3, p-value = 0.0007225

   ### Note that the function also gives you proportion of responses for the
   ###   positive response, which is always the second of the nominal response,
   ###   which in this case is "no", or the response coded as "1".

   ### I wouldn’t use the post-hoc analysis included with the output of the
   ###   function in RVAideMemoire


library(coin)

symmetry_test(Response ~ Practice | Student,
              data = Data,
              teststat = "quad")


Asymptotic General Symmetry Test

chi-squared = 16.953, df = 3, p-value = 0.0007225


Post-hoc analysis for Cochran’s Q test

My pairwiseMcnemar function will conduct pairwise McNemar, binomial exact, or permutation tests analogous to uncorrected McNemar tests.  The permutation tests require the coin package.  As usual, method is the p-value adjustment method (see ?p.adjust for options), and digits indicates the number of digits in the output.  The correct option is used by the chi-square test function.


### Order groups

Data = Data[order(factor(Data$Practice,
                  levels=c("MowHeight", "SoilTest",
                           "Clippings", "Irrigation"))),]
 
Data


### Pairwise McNemar tests

library(rcompanion)

PT = pairwiseMcnemar(x      = Data$Response,
                     g      = Data$Practice,
                     block  = Data$Student,
                     test   = "permutation",
                     method = "fdr",
                     digits = 3)

PT


$Pairwise
                  Comparison p.value p.adjust
1   MowHeight - SoilTest = 0   0.317   0.3170
2  MowHeight - Clippings = 0   0.102   0.1530
3 MowHeight - Irrigation = 0 0.00389   0.0200
4   SoilTest - Clippings = 0   0.257   0.3080
5  SoilTest - Irrigation = 0 0.00666   0.0200
6 Clippings - Irrigation = 0  0.0143   0.0286


### Compact letter display


PT = PT$Pairwise

library(rcompanion)

cldList(comparison = PT$Comparison,
        p.value    = PT$p.adjust,
        threshold  = 0.05)


       Group Letter MonoLetter
1  MowHeight      a         a
2   SoilTest      a         a
3  Clippings      a         a
4 Irrigation      b          b


Table of results

               Proportion "Yes"   Grouping letter
   MowHeight   0.86                a
   SoilTest    0.79                a
   Clippings   0.57                a
   Irrigation  0.14                  b

Optional analysis:  Converting a matrix of p-values to a compact letter display

This compact letter display could also be generated with the multcompView package.  To perform this well, it is helpful to first order the factors by the numeric order of their results, in this case by proportion of “yes” responses.  Then the data need to be arranged into a matrix of pairwise p-values.


Input =("
Column      MowHeight  SoilTest  Clippings  Irrigation
MowHeight   NA         0.317     0.153      0.0200
SoilTest    NA         NA        0.308      0.0200
Clippings   NA         NA        NA         0.0286
Irrigation  NA         NA        NA         NA
")

PT = as.matrix(read.table(textConnection(Input),
               header=TRUE,
               row.names=1))

PT

### Transform the upper matrix to a full matrix

diag(PT) = 1

PT1 = t(PT)

PT[lower.tri(PT)] = PT1[lower.tri(PT1)]

PT

### Produce compact letter display

library(multcompView)

multcompLetters(PT,  
                compare="<",  
                threshold=0.05,  ### p-value to use as significance threshold  
                Letters=letters,  
                reversed = FALSE)


MowHeight   SoilTest  Clippings Irrigation
      "a"        "a"        "a"        "b"


Cochran’s Q test example: short-format in matrix


Input =("
Student  SoilTest  Irrigation  MowHeight  Clippings
a        0         0           0          0
b        1         1           1          1
c        0         1           0          1
d        0         1           0          0
e        1         1           0          0
f        0         1           0          1
g        0         1           0          1
h        0         1           0          0
i        1         0           1          0
j        0         1           0          0
k        0         1           0          0
l        0         1           0          1
m        0         1           0          0
n        0         1           0          1
")

Matrix = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))


### Add names to matrix dimensions that will be names of variables in long-format

names(dimnames(Matrix))[1] = "Student"
names(dimnames(Matrix))[2] = "Practice"


Matrix


### Convert matrix to long-format

library(reshape2)

Data = melt(Matrix)


### View data frame

library(psych)

headTail(Data)


    Student  Practice value
1         a  SoilTest     0
2         b  SoilTest     1
3         c  SoilTest     0
4         d  SoilTest     0
...    <NA>      <NA>   ...
53        k Clippings     0
54        l Clippings     1
55        m Clippings     0
56        n Clippings     1

### Note response variable is called "value"


Cochran’s Q test


library(RVAideMemoire)

cochran.qtest(value ~ Practice | Student,
              data = Data)


Cochran's Q test

Q = 16.9535, df = 3, p-value = 0.0007225


library(coin)

symmetry_test(value ~ Practice | Student,
              data = Data,
              teststat = "quad")


Asymptotic General Symmetry Test

chi-squared = 16.953, df = 3, p-value = 0.0007225


Post-hoc analysis for Cochran’s Q test


### Order groups

Data = Data[order(factor(Data$Practice,
                  levels=c("MowHeight", "SoilTest",
                           "Clippings", "Irrigation"))),]
 
Data


### Pairwise McNemar tests

library(rcompanion)

pairwiseMcnemar(x       = Data$value,
                g       = Data$Practice,
                block   = Data$Student,
                test    = "permutation",
                method  = "fdr",
                digits  = 3)
                
   ###  test = "exact", "mcnemar", "permutation"
   ###  method: p-value adjustment, see ?p.adjust
   ###  correct: TRUE, to apply continuity correction for McNemar test
  


$Pairwise
                  Comparison p.value p.adjust
1   MowHeight - SoilTest = 0   0.317   0.3170
2  MowHeight - Clippings = 0   0.102   0.1530
3 MowHeight - Irrigation = 0 0.00389   0.0200
4   SoilTest - Clippings = 0   0.257   0.3080
5  SoilTest - Irrigation = 0 0.00666   0.0200
6 Clippings - Irrigation = 0  0.0143   0.0286