[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Cochran–Mantel–Haenszel Test for 3-Dimensional Tables

The Cochran–Mantel–Haenszel test is an extension of the chi-square test of association.  It is used for multiple chi-square tests across multiple groups or times.  The data are stratified so that each chi-square table is within one group or time.

 

The following data investigate whether there is a link between listening to podcasts and using public transportation to get to work, collected across three cities.  This can be thought of as a 2 x 2 contingency table in each of the three cities.

 

Data can be arranged in a table of counts or can be arranged in long-format with or without counts.  If a table is used for input, it should follow R’s ftable format, as shown.

 

Table format


                       City  Bikini.Bottom  Frostbite.Falls  New.New.York
Listen      Transport
Podcast     Drive                       13               17            5
            Public                      27               25           27
No.podcast  Drive                       23               22           17
            Public                      44               31           22


Long-format with counts


City             Listen      Transport  Count
Bikini.Bottom    Podcast     Drive      13
Bikini.Bottom    Podcast     Public     27
Bikini.Bottom    No.podcast  Drive      23
Bikini.Bottom    No.podcast  Public     44
Frostbite.Falls  Podcast     Drive      17
Frostbite.Falls  Podcast     Public     25
Frostbite.Falls  No.podcast  Drive      22
Frostbite.Falls  No.podcast  Public     31
New.New.York     Podcast     Drive       5
New.New.York     Podcast     Public     27
New.New.York     No.podcast  Drive      17
New.New.York     No.podcast  Public     22


The test can be conducted with the mantelhaen.test function in the native stats package.

 

One assumption of the test is that there are no three-way interactions in the data.  This is confirmed with a non-significant result from a test such as the Woolf test or Breslow–Day test.

 

Post-hoc analysis can include looking at the individual chi-square, Fisher exact, or G-test for association for each time or group.

 

The component n x n tables can be 2 x 2 or larger.

 

Appropriate data

•  Three nominal variables with two or more levels each.

•  Data can be stratified as n x n tables with the third time or grouping variable

 

Hypotheses

•  Null hypothesis:  There is no association between the two inner variables.

•  Alternative hypothesis (two-sided): There is an association between the two inner variables.

 

Interpretation

Significant results can be reported as “There was a significant association between variable A and variable B [across groups].”

 

Post-hoc analysis

Post-hoc analysis can include looking at the individual chi-square, Fisher exact, or g-test for association for each time or group.

 

Packages used in this chapter

 

The packages used in this chapter include:

•  psych

•  vcd

•  DescTools

•  rcompanion

 

The following commands will install these packages if they are not already installed:


if(!require(psych)){install.packages("psych")}
if(!require(vcd)){install.packages("vcd")}
if(!require(DescTools)){install.packages("DescTools")}
if(!require(rcompanion)){install.packages("rcompanion")}


C–M–H test example: long-format with counts

Alexander Anderson is concerned that there is a bias in his teaching methods for his pesticide applicator’s course.  He wants to know if there is an association between students’ sex and passing the course across the four counties in which he teaches.  The following are his data.


Input = ("
County       Sex     Result  Count
Bloom        Female  Pass     9
Bloom        Female  Fail     5
Bloom        Male    Pass     7
Bloom        Male    Fail    17
Cobblestone  Female  Pass    11
Cobblestone  Female  Fail    4
Cobblestone  Male    Pass    9
Cobblestone  Male    Fail    21
Dougal       Female  Pass     9
Dougal       Female  Fail     7
Dougal       Male    Pass    19
Dougal       Male    Fail     9
Heimlich     Female  Pass    15
Heimlich     Female  Fail     8
Heimlich     Male    Pass    14
Heimlich     Male    Fail    17
")

Data = read.table(textConnection(Input),header=TRUE)


### Order factors otherwise R will alphabetize them

Data$County = factor(Data$County,
                     levels=unique(Data$County))

Data$Sex    = factor(Data$Sex,
                     levels=unique(Data$Sex))

Data$Result = factor(Data$Result,
                     levels=unique(Data$Result))


###  Check the data frame


library(psych)

headTail(Data)

str(Data)

summary(Data)


### Remove unnecessary objects

rm(Input)


Convert data to a table


Table = xtabs(Count ~ Sex + Result + County,
              data=Data)

   ###  Note that the grouping variable is last in the xtabs function

ftable(Table)                     # Display a flattened table


              County Bloom Cobblestone Dougal Heimlich
Sex    Result                                        
Female Pass              9          11      9       15
       Fail              5           4      7        8
Male   Pass              7           9     19       14
       Fail             17          21      9       17


Cochran–Mantel–Haenszel test


mantelhaen.test(Table)


Mantel-Haenszel chi-squared test with continuity correction

Mantel-Haenszel X-squared = 6.7314, df = 1, p-value = 0.009473

alternative hypothesis: true common odds ratio is not equal to 1


Woolf test


library(vcd)

woolf_test(Table)

### Woolf test for homogeneity of odds ratios across strata.
###   If significant, C-M-H test is not appropriate


Woolf-test on Homogeneity of Odds Ratios (no 3-Way assoc.)

X-squared = 7.1376, df = 3, p-value = 0.06764


Post-hoc analysis

The groupwiseCMH function will conduct analysis of the component n x n tables with Fisher exact, g-test, or chi-square tests of association.  It accepts only a 3-dimensional table.  The group option indicates which dimension should be considered the grouping variable (1, 2, or 3).  It will conduct only one type of test at a time.  That is, if multiple of the options fisher, gtest, or chisq are set to TRUE, it will conduct only one of them.  As usual, method is the p-value adjustment method (see ?p.adjust for options), and digits indicates the number of digits in the output.  The correct option is used by the chi-square test function.


library(rcompanion)

groupwiseCMH(Table,
                    group   = 3,
                       fisher  = TRUE,
                       gtest   = FALSE,
                       chisq   = FALSE,
                       method  = "fdr",
                       correct = "none",
                       digits  = 3)


        Group   Test p.value  adj.p
1       Bloom Fisher  0.0468 0.0936
2 Cobblestone Fisher  0.0102 0.0408
3      Dougal Fisher  0.5230 0.5230
4    Heimlich Fisher  0.1750 0.2330


C–M–H test example: table format

 

The read.ftable function can be very fussy about the formatting of the input.  1) It seems to not like a blank first line, so the double quote symbol in the input should be on the same line as the column names.  2) It doesn’t like leading spaces on the input lines.  These may appear when you paste the code in to the RStudio Console or R Script area.  One solution is to manually delete these spaces.  R Script files that are saved without these leading spaces should be able to be opened and run without further modification.

 

The Cochran–Mantel–Haenszel test, Woolf test, and post-hoc analysis would be the same as those conducted on Table above.


Input =(
"              County Bloom Cobblestone Dougal Heimlich
Sex    Result                                        
Female Pass              9          11      9       15
       Fail              5           4      7        8
Male   Pass              7           9     19       14
       Fail             17          21      9       17
")

Table = as.table(read.ftable(textConnection(Input)))

ftable(Table)

   ### Display a flattened table