R Companion: Cochran–Mantel–Haenszel Test for Repeated Tests of Independence

The Cochran–Mantel–Haenszel test can be performed in R with the mantelhaen.test function in the native stats package. A few other useful functions come from the package vcd. One is woolf_test, which performs the Woolf test for homogeneity of the odds ratio across strata levels. This has a similar function to the Breslow-Day test mentioned in the Handbook. If this test is significant, the C-M-H test may not be appropriate. The Breslow-Day test itself can be performed with a function in the package DescTools. For cautions about using this test, see the documentation for this function, or other appropriate sources.

library(DescTools); ?BreslowDayTest

There are a couple of different ways to generate the three-way contingency table. The table can be read in with the read.ftable function. Note that the columns are the stratum variable.

Caution should be used with the formatting, since read.ftable can be fussy. I’ve noticed that it doesn’t like leading spaces in the rows. Certain editors, such as the one in RStudio, may add leading spaces when this code is pasted in. To alleviate this, delete those spaces manually, or paste the code into a plain text editor, save the file as a .R file, and then open that file with RStudio.

Another way to generate the contingency table is beginning with a data frame and tabulating the data using the xtabs function. The second example uses this method.

Examples in Summary and Analysis of Extension Program Evaluation

SAEPER: Cochran–Mantel–Haenszel Test for 3-Dimensional Tables

Packages used in this chapter

The following commands will install these packages if they are not already installed:

if(!require(dplyr)){install.packages("dplyr")}
if(!require(DescTools)){install.packages("DescTools")}
if(!require(ggplot2)){install.packages("ggplot2")}
if(!require(grid)){install.packages("grid")}
if(!require(vcd)){install.packages("vcd")}

See the Handbook for information on these topics.

Examples

Cochran–Mantel–Haenszel Test with data read by read.ftable

### --------------------------------------------------------------
### Handedness example, Cochran–Mantel–Haenszel test, p. 97–98
###    Example using read.ftable
### --------------------------------------------------------------

      # Note no spaces on lines before row names.
      #   read.ftable can be fussy about leading spaces.

Input =(
"                 Group W.Child B.adult PA.white W.men G.soldier
Whorl      Handed
Clockwise Right            708     136      106    109      801
           Left              50      24       32     22      102
CounterCl Right            169     73       17     16      180
           Left              13     14        4     26       25
")

Tabla = as.table(read.ftable(textConnection(Input)))

ftable(Tabla)                        # Display a flattened table

Cochran–Mantel–Haenszel test

mantelhaen.test(Tabla)

Mantel-Haenszel X-squared = 5.9421, df = 1, p-value = 0.01478

Woolf test

library(vcd)

oddsratio(Tabla, log=TRUE) # Show log odds for each 2x2

W.Child B.adult PA.white W.men G.soldier

0.08547173 0.08319894 -0.24921579 2.08581324 0.08680711

library(vcd)

woolf_test(Tabla)                # Woolf test for homogeneity of
                                 #   odds ratios across strata.
                                 #   If significant, C-M-H test
                                 #   is not appropriate

Woolf-test on Homogeneity of Odds Ratios (no 3-Way assoc.)

X-squared = 22.8165, df = 4, p-value = 0.0001378

Breslow-Day test

library(DescTools)

BreslowDayTest(Tabla)

Breslow-Day Test for Homogeneity of the Odds Ratios

X-squared = 24.7309, df = 4, p-value = 5.698e-05

Individual Fisher exact tests

n = dim(Tabla)[3]

for(i in 1:n){
   Name = dimnames(Tabla)[3]$Group[i]
   P.value = fisher.test(Tabla[,,i])$p.value
   cat(Name, "\n")
   cat("Fisher test p-value: ", P.value, "\n")
   cat("\n")
}

### Note: "Group" must be the name of the stratum variable

W.Child

Fisher test p-value: 0.7435918

B.adult

Fisher test p-value: 0.8545009

PA.white

Fisher test p-value: 0.7859788

W.men

Fisher test p-value: 6.225227e-08

G.soldier

Fisher test p-value: 0.7160507

# # #

Cochran–Mantel–Haenszel Test with data entered as a data frame

### --------------------------------------------------------------
### Mussel example, Cochran–Mantel–Haenszel test, pp. 98–99
###    Example using cross-tabulation of a data frame
### --------------------------------------------------------------

Input =("
Location    Habitat    Allele     Count
Tillamook marine          94     56
Tillamook estuarine       94     69
Tillamook marine      non-94     40
Tillamook estuarine   non-94     77
Yaquina    marine          94     61
Yaquina    estuarine       94    257
Yaquina    marine      non-94     57
Yaquina    estuarine   non-94    301
Alsea      marine          94     73
Alsea      estuarine       94     65
Alsea      marine      non-94     71
Alsea      estuarine   non-94     79
Umpqua     marine          94     71
Umpqua     estuarine       94     48
Umpqua     marine      non-94     55
Umpqua     estuarine   non-94     48
")

Data = read.table(textConnection(Input),header=TRUE)

### Specify the order of factor levels
### Otherwise, R will alphabetize them

library(dplyr)

Data =
mutate(Data,
       Location = factor(Location, levels=unique(Location)),
       Habitat = factor(Habitat, levels=unique(Habitat)),
       Allele = factor(Allele, levels=unique(Allele))
       )

### Cross-tabulate the data
###   Note here, Location is stratum variable (is last)
###              Habitat x Allele are 2 x 2 tables

Data.xtabs = xtabs(Count ~ Allele + Habitat + Location,
                   data=Data)

ftable(Data.xtabs)                      # Display a flattened table

Location Tillamook Yaquina Alsea Umpqua

Allele Habitat

94 marine 56 61 73 71

estuarine 69 257 65 48

non-94 marine 40 57 71 55

estuarine 77 301 79 48

Cochran–Mantel–Haenszel test

mantelhaen.test(Data.xtabs)

Mantel-Haenszel X-squared = 5.0497, df = 1, p-value = 0.02463

Woolf test

library(vcd)

oddsratio(Data.xtabs, log=TRUE) # Show log odds for each 2x2

Tillamook Yaquina Alsea Umpqua

0.4461712 0.2258568 0.2228401 0.2553467

library(vcd)

woolf_test(Data.xtabs)               # Woolf test for homogeneity of
                                     #   odds ratios across strata.
                                     #   If significant, C-M-H test
                                     #   is not appropriate

Woolf-test on Homogeneity of Odds Ratios (no 3-Way assoc.)

X-squared = 0.5292, df = 3, p-value = 0.9124

Breslow-Day test

library(DescTools)

BreslowDayTest(Data.xtabs)

Breslow-Day Test for Homogeneity of the Odds Ratios

X-squared = 0.5295, df = 3, p-value = 0.9124

Individual Fisher exact tests

n = dim(Data.xtabs)[3]

for(i in 1:n){
   Name = dimnames(Data.xtabs)[3]$Location[i]
   P.value = fisher.test(Data.xtabs[,,i])$p.value
   cat(Name, "\n")
   cat("Fisher test p-value: ", P.value, "\n")
   cat("\n")
}

### Note: "Location" must be the name of the stratum variable

Tillamook

Fisher test p-value: 0.1145223

Yaquina

Fisher test p-value: 0.2665712

Alsea

Fisher test p-value: 0.4090355

Umpqua

Fisher test p-value: 0.4151874

# # #

Cochran–Mantel–Haenszel Test with data read by read.ftable

### --------------------------------------------------------------
### Niacin example, Cochran–Mantel–Haenszel test, p. 99
###    Example using read.ftable
### --------------------------------------------------------------

      # Note no spaces on lines before row names.
      #   read.ftable can be fussy about leading spaces.

Input =(
"                 Study FATS AFREGS ARBITER.2 HATS CLAS.1
Supplement Revasc
Niacin    Yes          2    4      1         1    2
          No           46   67    86        37   92
Placebo   Yes          11   12      4         6    1
          No           41   60    76        32   93
")

Tabla = as.table(read.ftable(textConnection(Input)))

ftable(Tabla)                        # Display a flattened table

Cochran–Mantel–Haenszel test

mantelhaen.test(Tabla)

Mantel-Haenszel X-squared = 12.7457, df = 1, p-value = 0.0003568

Woolf test

library(vcd)

oddsratio(Tabla, log=TRUE) # Show log odds for each 2x2

FATS AFREGS ARBITER.2 HATS CLAS.1

-1.8198174 -1.2089603 -1.5099083 -1.9369415 0.7039581

woolf_test(Tabla) # Woolf test for homogeneity of

# odds ratios across strata.

# If significant, C-M-H test

# is not appropriate

Woolf-test on Homogeneity of Odds Ratios (no 3-Way assoc.)

X-squared = 3.4512, df = 4, p-value = 0.4853

Breslow-Day test

library(DescTools)

BreslowDayTest(Tabla)

Breslow-Day Test for Homogeneity of the Odds Ratios

X-squared = 4.4517, df = 4, p-value = 0.3483

Individual Fisher exact tests

n = dim(Tabla)[3]

for(i in 1:n){
   Name = dimnames(Tabla)[3]$Study[i]
   P.value = fisher.test(Tabla[,,i])$p.value
   cat(Name, "\n")
   cat("Fisher test p-value: ", P.value, "\n")
   cat("\n")
}

### Note: "Study" must be the name of the stratum variable

FATS

Fisher test p-value: 0.01581505

AFREGS

Fisher test p-value: 0.0607213

ARBITER.2

Fisher test p-value: 0.1948915

HATS

Fisher test p-value: 0.1075169

CLAS.1

Fisher test p-value: 1

# # #

Graphing the results

Simple bar plot with categories and no error bars

### --------------------------------------------------------------
### Simple bar plot of proportions, p. 99
###      Uses data in a matrix format
### --------------------------------------------------------------

Input =("
Habitat Tillamook Yaquina Alsea   Umpqua
Marine    0.5833     0.5169   0.5069 0.5635
Estuarine 0.4726     0.4606   0.4514 0.5000
")

Matriz = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

Matriz

barplot(Matriz,
        beside=TRUE,
        legend=TRUE,
        ylim=c(0, 0.9),
        xlab="Location",
        ylab="Lap94 proportion")

# # #

Rplot

Bar plot with categories and error bars

This example includes code to calculate the confidence intervals for the error bars and add them to the data frame. This code could be excluded if these values were calculated manually and added to the data frame.

### --------------------------------------------------------------
### Graph example, bar plot of proportions, p. 99
###    Using ggplot2
###    Plot adapted from:
###       shinyapps.stat.ubc.ca/r-graph-catalog/
### --------------------------------------------------------------

Input =("
Location   Habitat   Allele   Count Total Lap.94.Proportion
Tillamook Marine    94       56     96     0.5833
Tillamook Estuarine 94       69     146    0.4726
Yaquina   Marine    94       61     118    0.5169
Yaquina   Estuarine 94       257    558    0.4606
Alsea     Marine    94       73     144    0.5069
Alsea     Estuarine 94       65     144    0.4514
Umpqua    Marine    94       71     126    0.5635
Umpqua    Estuarine 94       48     96     0.5000
")

Data = read.table(textConnection(Input),header=TRUE)

### Specify the order of factor levels
### Otherwise, R will alphabetize them

library(dplyr)

Data =
mutate(Data,
       Location = factor(Location, levels=unique(Location)),
       Habitat = factor(Habitat, levels=unique(Habitat)),
       Allele = factor(Allele, levels=unique(Data$ Allele))
       )

### Add confidence intervals

Fun.low = function (x){
          binom.test(x["Count"], x["Total"],
          0.5)$ conf.int[1]
         }

Fun.up = function (x){
           binom.test(x["Count"], x["Total"],
           0.5)$ conf.int[2]
          }

Data =
mutate(Data,
          low.ci = apply(Data[c("Count", "Total")], 1, Fun.low),
          upper.ci = apply(Data[c("Count", "Total")], 1, Fun.up)
          )

Data

Location Habitat Allele Count Total Lap.94.Proportion low.ci upper.ci

1 Tillamook Marine 94 56 96 0.5833 0.4782322 0.6831506

2 Tillamook Estuarine 94 69 146 0.4726 0.3894970 0.5568427

3 Yaquina Marine 94 61 118 0.5169 0.4231343 0.6098931

4 Yaquina Estuarine 94 257 558 0.4606 0.4186243 0.5029422

5 Alsea Marine 94 73 144 0.5069 0.4224208 0.5911766

6 Alsea Estuarine 94 65 144 0.4514 0.3684040 0.5364149

7 Umpqua Marine 94 71 126 0.5635 0.4723096 0.6516209

8 Umpqua Estuarine 94 48 96 0.5000 0.3961779 0.6038221

### Plot adapted from:
###   shinyapps.stat.ubc.ca/r-graph-catalog/

library(ggplot2)
library(grid)

ggplot(Data,
   aes(x = Location, y = Lap.94.Proportion, fill = Habitat,
       ymax=upper.ci, ymin=low.ci)) +
       geom_bar(stat="identity", position = "dodge", width = 0.7) +
       geom_bar(stat="identity", position = "dodge",
                colour = "black", width = 0.7,
                show_guide = FALSE) +
       scale_y_continuous(breaks = seq(0, 0.80, 0.1),
                limits = c(0, 0.80),
                expand = c(0, 0)) +
       scale_fill_manual(name = "Count type" ,
                 values = c('grey80', 'grey30'),
                 labels = c("Marine",
                            "Estuarine")) +
       geom_errorbar(position=position_dodge(width=0.7),
                     width=0.0, size=0.5, color="black") +
       labs(x = "Location",
            y = "Lap94 proportion") +
       ## ggtitle("Main title") +
       theme_bw() +
       theme(panel.grid.major.x = element_blank(),
             panel.grid.major.y = element_line(colour = "grey50"),
             plot.title = element_text(size = rel(1.5),
             face = "bold", vjust = 1.5),
             axis.title = element_text(face = "bold"),
             legend.position = "top",
             legend.title = element_blank(),
             legend.key.size = unit(0.4, "cm"),
             legend.key = element_rect(fill = "black"),
             axis.title.y = element_text(vjust= 1.8),
             axis.title.x = element_text(vjust= -0.5)
     )

# # #

Rplot

Bar plot of proportions vs. categories. Error bars indicate 95% confidence intervals for proportion.

Similar tests

See the Handbook for information on this topic.

How to do the test

R code for the SAS example is shown in the “Examples” section above.

An R Companion for the Handbook of Biological Statistics

Cochran–Mantel–Haenszel Test for Repeated Tests of Independence

Examples in Summary and Analysis of Extension Program Evaluation

Packages used in this chapter

Examples

Cochran–Mantel–Haenszel Test with data read by read.ftable

Cochran–Mantel–Haenszel test

Woolf test

Breslow-Day test

Individual Fisher exact tests

Cochran–Mantel–Haenszel Test with data entered as a data frame

Cochran–Mantel–Haenszel test

Woolf test

Breslow-Day test

Individual Fisher exact tests

Cochran–Mantel–Haenszel Test with data read by read.ftable

Cochran–Mantel–Haenszel test

Woolf test

Breslow-Day test

Individual Fisher exact tests

Graphing the results

Simple bar plot with categories and no error bars

Bar plot with categories and error bars