[banner]

An R Companion for the Handbook of Biological Statistics

Salvatore S. Mangiafico

Cochran–Mantel–Haenszel Test for Repeated Tests of Independence

The Cochran–Mantel–Haenszel test can be performed in R with the mantelhaen.test function in the native stats package.  A few other useful functions come from the package vcd.  One is   woolf_test, which performs the Woolf test for homogeneity of the odds ratio across strata levels.  This has a similar function to the Breslow-Day test mentioned in the Handbook.  If this test is significant, the C-M-H test may not be appropriate.  The Breslow-Day test itself can be performed with a function in the package DescTools. For cautions about using this test, see the documentation for this function, or other appropriate sources.

 

library(DescTools); ?BreslowDayTest

 

There are a couple of different ways to generate the three-way contingency table.  The table can be read in with the read.ftable function.  Note that the columns are the stratum variable.

 

Caution should be used with the formatting, since read.ftable can be fussy.  I’ve noticed that it doesn’t like leading spaces in the rows.  Certain editors, such as the one in RStudio, may add leading spaces when this code is pasted in.  To alleviate this, delete those spaces manually, or paste the code into a plain text editor, save the file as a .R file, and then open that file with RStudio.

 

Another way to generate the contingency table is beginning with a data frame and tabulating the data using the xtabs function.  The second example uses this method.

 

Examples in Summary and Analysis of Extension Program Evaluation

 

SAEPER: Cochran–Mantel–Haenszel Test for 3-Dimensional Tables

 

Packages used in this chapter

The following commands will install these packages if they are not already installed:


if(!require(dplyr)){install.packages("dplyr")}
if(!require(DescTools)){install.packages("DescTools")}
if(!require(ggplot2)){install.packages("ggplot2")}
if(!require(grid)){install.packages("grid")}
if(!require(vcd)){install.packages("vcd")}

 

 

 

When to use it

Null hypothesis

How the test works

Assumptions

See the Handbook for information on these topics.

 

Examples

Cochran–Mantel–Haenszel Test with data read by read.ftable

 

### --------------------------------------------------------------
### Handedness example, Cochran–Mantel–Haenszel test, p. 97
98
###    Example using read.ftable
### --------------------------------------------------------------

      # Note no spaces on lines before row names.
      #   read.ftable can be fussy about leading spaces.

Input =(
"                  Group W.Child B.adult PA.white W.men G.soldier
Whorl      Handed
Clockwise  Right            708     136      106    109      801
           Left              50      24       32     22      102
CounterCl  Right            169      73       17     16      180
           Left              13      14        4     26       25
")

Tabla = as.table(read.ftable(textConnection(Input)))

ftable(Tabla)                        # Display a flattened table

 

 

Cochran–Mantel–Haenszel test

 

mantelhaen.test(Tabla)

 

Mantel-Haenszel X-squared = 5.9421, df = 1, p-value = 0.01478

 

 

Woolf test

 

 

library(vcd)

oddsratio(Tabla, log=TRUE)            # Show log odds for each 2x2

 

   W.Child     B.adult    PA.white       W.men   G.soldier

0.08547173  0.08319894 -0.24921579  2.08581324  0.08680711

 

 

library(vcd)

woolf_test(Tabla)                # Woolf test for homogeneity of
                                 #   odds ratios across strata.
                                 #   If significant, C-M-H test
                                 #   is not appropriate

 

Woolf-test on Homogeneity of Odds Ratios (no 3-Way assoc.)

X-squared = 22.8165, df = 4, p-value = 0.0001378

 

 

Breslow-Day test

 

library(DescTools)

BreslowDayTest(Tabla)

 

 

 Breslow-Day Test for Homogeneity of the Odds Ratios
 
X-squared = 24.7309, df = 4, p-value = 5.698e-05

 

 

Individual Fisher exact tests

 

n = dim(Tabla)[3]

for(i in 1:n){
   Name = dimnames(Tabla)[3]$Group[i]
   P.value = fisher.test(Tabla[,,i])$p.value
   cat(Name, "\n")
   cat("Fisher test p-value: ", P.value, "\n")
   cat("\n")
  }

 ### Note: "Group" must be the name of the stratum variable

 

W.Child

Fisher test p-value:  0.7435918

 

B.adult

Fisher test p-value:  0.8545009

 

PA.white

Fisher test p-value:  0.7859788

 

W.men

Fisher test p-value:  6.225227e-08

 

G.soldier

Fisher test p-value:  0.7160507

 

#     #     #

 

 

Cochran–Mantel–Haenszel Test with data entered as a data frame

 

### --------------------------------------------------------------
### Mussel example, Cochran–Mantel–Haenszel test, pp. 98
99
###    Example using cross-tabulation of a data frame
### --------------------------------------------------------------

Input =("
 Location    Habitat     Allele     Count
  Tillamook  marine          94     56
  Tillamook  estuarine       94     69
  Tillamook  marine      non-94     40  
  Tillamook  estuarine   non-94     77
  Yaquina    marine          94     61 
  Yaquina    estuarine       94    257
  Yaquina    marine      non-94     57  
  Yaquina    estuarine   non-94    301
  Alsea      marine          94     73 
  Alsea      estuarine       94     65
  Alsea      marine      non-94     71  
  Alsea      estuarine   non-94     79
  Umpqua     marine          94     71  
  Umpqua     estuarine       94     48
  Umpqua     marine      non-94     55   
  Umpqua     estuarine   non-94     48
 ")

Data = read.table(textConnection(Input),header=TRUE)


### Specify the order of factor levels
### Otherwise, R will alphabetize them

library(dplyr)

Data =
mutate(Data,
       Location = factor(Location, levels=unique(Location)),
       Habitat = factor(Habitat, levels=unique(Habitat)),
       Allele = factor(Allele, levels=unique(Allele))
       )


### Cross-tabulate the data
###   Note here, Location is stratum variable (is last)
###              Habitat x Allele are 2 x 2 tables


Data.xtabs = xtabs(Count ~ Allele + Habitat + Location,
                   data=Data)


ftable(Data.xtabs)                      # Display a flattened table

 

                 Location Tillamook Yaquina Alsea Umpqua

Allele Habitat                                         

94     marine                    56      61    73     71

       estuarine                 69     257    65     48

non-94 marine                    40      57    71     55

       estuarine                 77     301    79     48

 

 

Cochran–Mantel–Haenszel test

 

mantelhaen.test(Data.xtabs)

 

Mantel-Haenszel X-squared = 5.0497, df = 1, p-value = 0.02463

 

 

Woolf test

 

library(vcd)

oddsratio(Data.xtabs, log=TRUE)       # Show log odds for each 2x2

 

Tillamook   Yaquina     Alsea    Umpqua

0.4461712 0.2258568 0.2228401 0.2553467

 

 

library(vcd)

woolf_test(Data.xtabs)               # Woolf test for homogeneity of
                                     #   odds ratios across strata.
                                     #   If significant, C-M-H test
                                     #   is not appropriate

 

Woolf-test on Homogeneity of Odds Ratios (no 3-Way assoc.)

X-squared = 0.5292, df = 3, p-value = 0.9124

 

 

Breslow-Day test

 

library(DescTools)

BreslowDayTest(Data.xtabs)

 

 Breslow-Day Test for Homogeneity of the Odds Ratios

X-squared = 0.5295, df = 3, p-value = 0.9124

 

 

Individual Fisher exact tests

 

n = dim(Data.xtabs)[3]

for(i in 1:n){
   Name = dimnames(Data.xtabs)[3]$Location[i]
   P.value = fisher.test(Data.xtabs[,,i])$p.value
   cat(Name, "\n")
   cat("Fisher test p-value: ", P.value, "\n")
   cat("\n")
  }

 ### Note: "Location" must be the name of the stratum variable

 

Tillamook

Fisher test p-value:  0.1145223

 

Yaquina

Fisher test p-value:  0.2665712

 

Alsea

Fisher test p-value:  0.4090355

 

Umpqua

Fisher test p-value:  0.4151874

 

#     #     #

 

 

Cochran–Mantel–Haenszel Test with data read by read.ftable

 

### --------------------------------------------------------------
### Niacin example, Cochran–Mantel–Haenszel test, p. 99
###    Example using read.ftable
### --------------------------------------------------------------

      # Note no spaces on lines before row names.
      #   read.ftable can be fussy about leading spaces.

Input =(
"                 Study FATS AFREGS ARBITER.2 HATS CLAS.1
Supplement Revasc
Niacin     Yes           2    4      1         1    2
           No           46   67     86        37   92
Placebo    Yes          11   12      4         6    1
           No           41   60     76        32   93
")

Tabla = as.table(read.ftable(textConnection(Input)))

ftable(Tabla)                        # Display a flattened table

 

 

Cochran–Mantel–Haenszel test

 

mantelhaen.test(Tabla)

 

Mantel-Haenszel X-squared = 12.7457, df = 1, p-value = 0.0003568

 

 

Woolf test

 

library(vcd)

oddsratio(Tabla, log=TRUE)           # Show log odds for each 2x2

 

      FATS     AFREGS  ARBITER.2       HATS     CLAS.1

-1.8198174 -1.2089603 -1.5099083 -1.9369415  0.7039581

 

 

woolf_test(Tabla)                    # Woolf test for homogeneity of

                                     #   odds ratios across strata.

                                     #   If significant, C-M-H test

                                     #   is not appropriate

 

Woolf-test on Homogeneity of Odds Ratios (no 3-Way assoc.)

X-squared = 3.4512, df = 4, p-value = 0.4853

 

 

Breslow-Day test

 

library(DescTools)

BreslowDayTest(Tabla)



Breslow-Day Test for Homogeneity of the Odds Ratios

X-squared = 4.4517, df = 4, p-value = 0.3483

 

 

Individual Fisher exact tests

 

n = dim(Tabla)[3]

for(i in 1:n){
   Name = dimnames(Tabla)[3]$Study[i]
   P.value = fisher.test(Tabla[,,i])$p.value
   cat(Name, "\n")
   cat("Fisher test p-value: ", P.value, "\n")
   cat("\n")
  }

 ### Note: "Study" must be the name of the stratum variable

 

FATS

Fisher test p-value:  0.01581505

 

AFREGS

Fisher test p-value:  0.0607213

 

ARBITER.2

Fisher test p-value:  0.1948915

 

HATS

Fisher test p-value:  0.1075169

 

CLAS.1

Fisher test p-value:  1

 

#     #     #

 

 

Graphing the results

Simple bar plot with categories and no error bars

 

### --------------------------------------------------------------
### Simple bar plot of proportions, p. 99
###      Uses data in a matrix format
### --------------------------------------------------------------

Input =("
Habitat    Tillamook  Yaquina  Alsea   Umpqua
 Marine    0.5833     0.5169   0.5069  0.5635 
 Estuarine 0.4726     0.4606   0.4514  0.5000
")

Matriz = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

Matriz

barplot(Matriz,
        beside=TRUE,
        legend=TRUE,
        ylim=c(0, 0.9),
        xlab="Location",
        ylab="Lap94 proportion")

 

#     #     #

 

Rplot

 

 

Bar plot with categories and error bars

This example includes code to calculate the confidence intervals for the error bars and add them to the data frame.  This code could be excluded if these values were calculated manually and added to the data frame.

 

### --------------------------------------------------------------
### Graph example, bar plot of proportions, p. 99
###    Using ggplot2
###    Plot adapted from:
###       shinyapps.stat.ubc.ca/r-graph-catalog/  
### --------------------------------------------------------------

Input =("
Location   Habitat   Allele   Count  Total  Lap.94.Proportion
 Tillamook Marine    94       56     96     0.5833
 Tillamook Estuarine 94       69     146    0.4726
 Yaquina   Marine    94       61     118    0.5169
 Yaquina   Estuarine 94       257    558    0.4606
 Alsea     Marine    94       73     144    0.5069
 Alsea     Estuarine 94       65     144    0.4514
 Umpqua    Marine    94       71     126    0.5635
 Umpqua    Estuarine 94       48     96     0.5000
")

Data = read.table(textConnection(Input),header=TRUE)


### Specify the order of factor levels
### Otherwise, R will alphabetize them

library(dplyr)

Data =
mutate(Data,
       Location = factor(Location, levels=unique(Location)),
       Habitat = factor(Habitat, levels=unique(Habitat)),
       Allele = factor(Allele, levels=unique(Data$ Allele))
       )


### Add confidence intervals

Fun.low = function (x){
          binom.test(x["Count"], x["Total"],
          0.5)$ conf.int[1]
         }

Fun.up = function (x){
           binom.test(x["Count"], x["Total"],
           0.5)$ conf.int[2]
          }

Data =
mutate(Data,
          low.ci = apply(Data[c("Count", "Total")], 1, Fun.low),
          upper.ci = apply(Data[c("Count", "Total")], 1, Fun.up)
          )

Data

 

   Location   Habitat Allele Count Total Lap.94.Proportion    low.ci  upper.ci

1 Tillamook    Marine     94    56    96            0.5833 0.4782322 0.6831506

2 Tillamook Estuarine     94    69   146            0.4726 0.3894970 0.5568427

3   Yaquina    Marine     94    61   118            0.5169 0.4231343 0.6098931

4   Yaquina Estuarine     94   257   558            0.4606 0.4186243 0.5029422

5     Alsea    Marine     94    73   144            0.5069 0.4224208 0.5911766

6     Alsea Estuarine     94    65   144            0.4514 0.3684040 0.5364149

7    Umpqua    Marine     94    71   126            0.5635 0.4723096 0.6516209

8    Umpqua Estuarine     94    48    96            0.5000 0.3961779 0.6038221

 

 

### Plot adapted from:
###   shinyapps.stat.ubc.ca/r-graph-catalog/
 
library(ggplot2)
library(grid)

ggplot(Data,
   aes(x = Location, y = Lap.94.Proportion, fill = Habitat,
       ymax=upper.ci, ymin=low.ci))  +
       geom_bar(stat="identity", position = "dodge", width = 0.7) +
       geom_bar(stat="identity", position = "dodge",
                colour = "black", width = 0.7,
                show_guide = FALSE)  +
       scale_y_continuous(breaks = seq(0, 0.80, 0.1),
                limits = c(0, 0.80),
                expand = c(0, 0))  +
       scale_fill_manual(name = "Count type" ,
                 values = c('grey80', 'grey30'),
                 labels = c("Marine",
                            "Estuarine"))  +
       geom_errorbar(position=position_dodge(width=0.7),
                     width=0.0, size=0.5, color="black")  +
       labs(x = "Location",
            y = "Lap94 proportion")  +
       ## ggtitle("Main title") +
       theme_bw()  +
       theme(panel.grid.major.x = element_blank(),
             panel.grid.major.y = element_line(colour = "grey50"),
             plot.title = element_text(size = rel(1.5),
             face = "bold", vjust = 1.5),
             axis.title = element_text(face = "bold"),
             legend.position = "top",
             legend.title = element_blank(),
             legend.key.size = unit(0.4, "cm"),
             legend.key = element_rect(fill = "black"),
             axis.title.y = element_text(vjust= 1.8),
             axis.title.x = element_text(vjust= -0.5)
     )

 

#     #     #

 

Rplot

Bar plot of proportions vs. categories.  Error bars indicate 95% confidence intervals for  proportion.

 

 

Similar tests

See the Handbook for information on this topic.

 

How to do the test

R code for the SAS example is shown in the “Examples” section above.