The Cochran–Mantel–Haenszel test can be performed in R with the mantelhaen.test function in the native stats package. A few other useful functions come from the package vcd. One is woolf_test, which performs the Woolf test for homogeneity of the odds ratio across strata levels. This has a similar function to the Breslow-Day test mentioned in the Handbook. If this test is significant, the C-M-H test may not be appropriate. The Breslow-Day test itself can be performed with a function in the package DescTools. For cautions about using this test, see the documentation for this function, or other appropriate sources.
library(DescTools); ?BreslowDayTest
There are a couple of different ways to generate the three-way contingency table. The table can be read in with the read.ftable function. Note that the columns are the stratum variable.
Caution should be used with the formatting, since read.ftable can be fussy. I’ve noticed that it doesn’t like leading spaces in the rows. Certain editors, such as the one in RStudio, may add leading spaces when this code is pasted in. To alleviate this, delete those spaces manually, or paste the code into a plain text editor, save the file as a .R file, and then open that file with RStudio.
Another way to generate the contingency table is beginning with a data frame and tabulating the data using the xtabs function. The second example uses this method.
Examples in Summary and Analysis of Extension Program Evaluation
SAEPER: Cochran–Mantel–Haenszel Test for 3-Dimensional Tables
Packages used in this chapter
The following commands will install these packages if they are not already installed:
if(!require(dplyr)){install.packages("dplyr")}
if(!require(DescTools)){install.packages("DescTools")}
if(!require(ggplot2)){install.packages("ggplot2")}
if(!require(grid)){install.packages("grid")}
if(!require(vcd)){install.packages("vcd")}
See the Handbook for information on these topics.
Examples
Cochran–Mantel–Haenszel Test with data read by read.ftable
###
--------------------------------------------------------------
### Handedness example, Cochran–Mantel–Haenszel test, p. 97–98
### Example using read.ftable
### --------------------------------------------------------------
# Note no spaces on lines before row names.
# read.ftable can be fussy about leading spaces.
Input =(
" Group W.Child B.adult PA.white W.men G.soldier
Whorl Handed
Clockwise Right 708 136 106 109 801
Left 50 24 32 22 102
CounterCl Right 169 73 17 16 180
Left 13 14 4 26 25
")
Tabla = as.table(read.ftable(textConnection(Input)))
ftable(Tabla) # Display a
flattened table
Cochran–Mantel–Haenszel test
mantelhaen.test(Tabla)
Mantel-Haenszel X-squared = 5.9421, df = 1, p-value = 0.01478
Woolf test
library(vcd)
oddsratio(Tabla, log=TRUE) # Show log
odds for each 2x2
W.Child B.adult PA.white W.men G.soldier
0.08547173 0.08319894 -0.24921579 2.08581324 0.08680711
library(vcd)
woolf_test(Tabla) # Woolf test for homogeneity
of
# odds ratios
across strata.
# If
significant, C-M-H test
# is not
appropriate
Woolf-test on Homogeneity of Odds Ratios (no 3-Way assoc.)
X-squared = 22.8165, df = 4, p-value = 0.0001378
Breslow-Day test
library(DescTools)
BreslowDayTest(Tabla)
Breslow-Day Test for Homogeneity of the Odds Ratios
X-squared = 24.7309, df = 4, p-value = 5.698e-05
Individual Fisher exact tests
n = dim(Tabla)[3]
for(i in 1:n){
Name = dimnames(Tabla)[3]$Group[i]
P.value = fisher.test(Tabla[,,i])$p.value
cat(Name, "\n")
cat("Fisher test p-value: ", P.value, "\n")
cat("\n")
}
### Note: "Group" must be the name of
the stratum variable
W.Child
Fisher test p-value: 0.7435918
B.adult
Fisher test p-value: 0.8545009
PA.white
Fisher test p-value: 0.7859788
W.men
Fisher test p-value: 6.225227e-08
G.soldier
Fisher test p-value: 0.7160507
# # #
Cochran–Mantel–Haenszel Test with data entered as a data frame
###
--------------------------------------------------------------
### Mussel example, Cochran–Mantel–Haenszel test, pp. 98–99
### Example using cross-tabulation of a data frame
### --------------------------------------------------------------
Input =("
Location Habitat Allele Count
Tillamook marine 94 56
Tillamook estuarine 94 69
Tillamook marine non-94 40
Tillamook estuarine non-94 77
Yaquina marine 94 61
Yaquina estuarine 94 257
Yaquina marine non-94 57
Yaquina estuarine non-94 301
Alsea marine 94 73
Alsea estuarine 94 65
Alsea marine non-94 71
Alsea estuarine non-94 79
Umpqua marine 94 71
Umpqua estuarine 94 48
Umpqua marine non-94 55
Umpqua estuarine non-94 48
")
Data = read.table(textConnection(Input),header=TRUE)
### Specify the order of factor levels
### Otherwise, R will alphabetize them
library(dplyr)
Data =
mutate(Data,
Location = factor(Location, levels=unique(Location)),
Habitat = factor(Habitat, levels=unique(Habitat)),
Allele = factor(Allele, levels=unique(Allele))
)
### Cross-tabulate the data
### Note here, Location is stratum variable (is last)
### Habitat x Allele are 2 x 2 tables
Data.xtabs = xtabs(Count ~ Allele + Habitat + Location,
data=Data)
ftable(Data.xtabs) # Display a
flattened table
Location Tillamook Yaquina Alsea Umpqua
Allele Habitat
94 marine 56 61 73 71
estuarine 69 257 65 48
non-94 marine 40 57 71 55
estuarine 77 301 79 48
Cochran–Mantel–Haenszel test
mantelhaen.test(Data.xtabs)
Mantel-Haenszel X-squared = 5.0497, df = 1, p-value = 0.02463
Woolf test
library(vcd)
oddsratio(Data.xtabs, log=TRUE) # Show log
odds for each 2x2
Tillamook Yaquina Alsea Umpqua
0.4461712 0.2258568 0.2228401 0.2553467
library(vcd)
woolf_test(Data.xtabs) #
Woolf test for homogeneity of
# odds ratios across strata.
# If significant, C-M-H test
# is not appropriate
Woolf-test on Homogeneity of Odds Ratios (no 3-Way assoc.)
X-squared = 0.5292, df = 3, p-value = 0.9124
Breslow-Day test
library(DescTools)
BreslowDayTest(Data.xtabs)
Breslow-Day Test for Homogeneity of the Odds Ratios
X-squared = 0.5295, df = 3, p-value = 0.9124
Individual Fisher exact tests
n = dim(Data.xtabs)[3]
for(i in 1:n){
Name = dimnames(Data.xtabs)[3]$Location[i]
P.value = fisher.test(Data.xtabs[,,i])$p.value
cat(Name, "\n")
cat("Fisher test p-value: ", P.value, "\n")
cat("\n")
}
### Note: "Location" must be the name of
the stratum variable
Tillamook
Fisher test p-value: 0.1145223
Yaquina
Fisher test p-value: 0.2665712
Alsea
Fisher test p-value: 0.4090355
Umpqua
Fisher test p-value: 0.4151874
# # #
Cochran–Mantel–Haenszel Test with data read by read.ftable
###
--------------------------------------------------------------
### Niacin example, Cochran–Mantel–Haenszel test, p. 99
### Example using read.ftable
### --------------------------------------------------------------
# Note no spaces on lines before row names.
# read.ftable can be fussy about leading spaces.
Input =(
" Study FATS AFREGS ARBITER.2 HATS CLAS.1
Supplement Revasc
Niacin Yes 2 4 1 1 2
No 46 67 86 37 92
Placebo Yes 11 12 4 6 1
No 41 60 76 32 93
")
Tabla = as.table(read.ftable(textConnection(Input)))
ftable(Tabla) # Display a
flattened table
Cochran–Mantel–Haenszel test
mantelhaen.test(Tabla)
Mantel-Haenszel X-squared = 12.7457, df = 1, p-value = 0.0003568
Woolf test
library(vcd)
oddsratio(Tabla, log=TRUE) # Show log
odds for each 2x2
FATS AFREGS ARBITER.2 HATS CLAS.1
-1.8198174 -1.2089603 -1.5099083 -1.9369415 0.7039581
woolf_test(Tabla) # Woolf test for homogeneity of
# odds ratios across strata.
# If significant, C-M-H test
# is not appropriate
Woolf-test on Homogeneity of Odds Ratios (no 3-Way assoc.)
X-squared = 3.4512, df = 4, p-value = 0.4853
Breslow-Day test
library(DescTools)
BreslowDayTest(Tabla)
Breslow-Day Test
for Homogeneity of the Odds Ratios
X-squared = 4.4517, df = 4, p-value =
0.3483
Individual Fisher exact tests
n = dim(Tabla)[3]
for(i in 1:n){
Name = dimnames(Tabla)[3]$Study[i]
P.value = fisher.test(Tabla[,,i])$p.value
cat(Name, "\n")
cat("Fisher test p-value: ", P.value, "\n")
cat("\n")
}
### Note: "Study" must be the name of
the stratum variable
FATS
Fisher test p-value: 0.01581505
AFREGS
Fisher test p-value: 0.0607213
ARBITER.2
Fisher test p-value: 0.1948915
HATS
Fisher test p-value: 0.1075169
CLAS.1
Fisher test p-value: 1
# # #
Graphing the results
Simple bar plot with categories and no error bars
###
--------------------------------------------------------------
### Simple bar plot of proportions, p. 99
### Uses data in a matrix format
### --------------------------------------------------------------
Input =("
Habitat Tillamook Yaquina Alsea Umpqua
Marine 0.5833 0.5169 0.5069 0.5635
Estuarine 0.4726 0.4606 0.4514 0.5000
")
Matriz = as.matrix(read.table(textConnection(Input),
header=TRUE,
row.names=1))
Matriz
barplot(Matriz,
beside=TRUE,
legend=TRUE,
ylim=c(0, 0.9),
xlab="Location",
ylab="Lap94 proportion")
# # #
Bar plot with categories and error bars
This example includes code to calculate the confidence intervals for the error bars and add them to the data frame. This code could be excluded if these values were calculated manually and added to the data frame.
### --------------------------------------------------------------
### Graph example, bar plot of proportions, p. 99
### Using ggplot2
### Plot adapted from:
### shinyapps.stat.ubc.ca/r-graph-catalog/
### --------------------------------------------------------------
Input =("
Location Habitat Allele Count Total Lap.94.Proportion
Tillamook Marine 94 56 96 0.5833
Tillamook Estuarine 94 69 146 0.4726
Yaquina Marine 94 61 118 0.5169
Yaquina Estuarine 94 257 558 0.4606
Alsea Marine 94 73 144 0.5069
Alsea Estuarine 94 65 144 0.4514
Umpqua Marine 94 71 126 0.5635
Umpqua Estuarine 94 48 96 0.5000
")
Data = read.table(textConnection(Input),header=TRUE)
### Specify the order of factor levels
### Otherwise, R will alphabetize them
library(dplyr)
Data =
mutate(Data,
Location = factor(Location, levels=unique(Location)),
Habitat = factor(Habitat, levels=unique(Habitat)),
Allele = factor(Allele, levels=unique(Data$ Allele))
)
### Add confidence intervals
Fun.low = function (x){
binom.test(x["Count"], x["Total"],
0.5)$ conf.int[1]
}
Fun.up = function (x){
binom.test(x["Count"], x["Total"],
0.5)$ conf.int[2]
}
Data =
mutate(Data,
low.ci = apply(Data[c("Count", "Total")], 1, Fun.low),
upper.ci = apply(Data[c("Count", "Total")], 1, Fun.up)
)
Data
Location Habitat Allele Count Total Lap.94.Proportion low.ci upper.ci
1 Tillamook Marine 94 56 96 0.5833 0.4782322 0.6831506
2 Tillamook Estuarine 94 69 146 0.4726 0.3894970 0.5568427
3 Yaquina Marine 94 61 118 0.5169 0.4231343 0.6098931
4 Yaquina Estuarine 94 257 558 0.4606 0.4186243 0.5029422
5 Alsea Marine 94 73 144 0.5069 0.4224208 0.5911766
6 Alsea Estuarine 94 65 144 0.4514 0.3684040 0.5364149
7 Umpqua Marine 94 71 126 0.5635 0.4723096 0.6516209
8 Umpqua Estuarine 94 48 96 0.5000 0.3961779 0.6038221
### Plot adapted from:
### shinyapps.stat.ubc.ca/r-graph-catalog/
library(ggplot2)
library(grid)
ggplot(Data,
aes(x = Location, y = Lap.94.Proportion, fill = Habitat,
ymax=upper.ci, ymin=low.ci)) +
geom_bar(stat="identity", position = "dodge", width
= 0.7) +
geom_bar(stat="identity", position = "dodge",
colour = "black", width = 0.7,
show_guide = FALSE) +
scale_y_continuous(breaks = seq(0, 0.80, 0.1),
limits = c(0, 0.80),
expand = c(0, 0)) +
scale_fill_manual(name = "Count type" ,
values = c('grey80', 'grey30'),
labels = c("Marine",
"Estuarine")) +
geom_errorbar(position=position_dodge(width=0.7),
width=0.0, size=0.5, color="black") +
labs(x = "Location",
y = "Lap94 proportion") +
## ggtitle("Main title") +
theme_bw() +
theme(panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(colour = "grey50"),
plot.title = element_text(size = rel(1.5),
face = "bold", vjust = 1.5),
axis.title = element_text(face = "bold"),
legend.position = "top",
legend.title = element_blank(),
legend.key.size = unit(0.4, "cm"),
legend.key = element_rect(fill = "black"),
axis.title.y = element_text(vjust= 1.8),
axis.title.x = element_text(vjust= -0.5)
)
# # #
Bar plot of proportions vs. categories. Error bars indicate 95% confidence intervals for proportion.
See the Handbook for information on this topic.
R code for the SAS example is shown in the “Examples” section above.