Comparisons of values across groups in linear models, cumulative link models, and other models can be conducted easily with the emmeans package. Importantly, it can make comparisons among interactions of factors.
E.M. means stands for estimated marginal means. Estimated marginal means are means for treatment levels that are adjusted for means of other factors in the model. For an example, see the What are Estimated Marginal Means? chapter.
p-value adjustments for multiple comparisons
Adjustment of p-values for multiple comparisons is indicated with the adjust= option in the pairs, cld, and summary functions. Options are "tukey", "scheffe", "sidak", "bonferroni", "dunnettx", "mvt", and "none". For further information on these options, see the summary.emmGrid function section in:
Lenth, R.V., J. Love, and M. Hervé. 2017. Package ‘emmeans: Estimated Marginal Means, aka Least-Squares Means’. cran.r-project.org/web/packages/emmeans/emmeans.pdf.
or use
library(emmeans)
?summary.emmGrid
Confidence intervals for marginal means
To adjust the confidence intervals for the EM means, use e.g. summary(marginal, level=0.99).
Ignore values of emmeans for clm and clmm models
Typically, you should ignore the values of the estimated marginal means themselves (emmeans) when using them with clm and clmm models. See “clm” in help(models, package="emmeans") for details and for other options.
Packages used in this chapter
The packages used in this chapter include:
• psych
• ordinal
• car
• emmeans
• multcomp
The following commands will install these packages if they are not already installed:
if(!require(psych)){install.packages("psych")}
if(!require(ordinal)){install.packages("ordinal")}
if(!require(car)){install.packages("car")}
if(!require(emmeans)){install.packages("emmeans")}
if(!require(multcomp)){install.packages("multcomp")}
Multiple comparisons with emmeans
This example uses data set and model from the One-way Ordinal regression with CLM chapter.
Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="
Speaker Likert
Pooh 3
Pooh 5
Pooh 4
Pooh 4
Pooh 4
Pooh 4
Pooh 4
Pooh 4
Pooh 5
Pooh 5
Piglet 2
Piglet 4
Piglet 2
Piglet 2
Piglet 1
Piglet 2
Piglet 3
Piglet 2
Piglet 2
Piglet 3
Tigger 4
Tigger 4
Tigger 4
Tigger 4
Tigger 5
Tigger 3
Tigger 5
Tigger 4
Tigger 4
Tigger 3
")
### Order levels of the factor; otherwise R will alphabetize them
Data$Speaker = factor(Data$Speaker,
levels=unique(Data$Speaker))
### Create a new variable which is the likert scores as an ordered factor
Data$Likert.f = factor(Data$Likert,
ordered = TRUE)
### Check the data frame
library(psych)
headTail(Data)
str(Data)
summary(Data)
Define model and conduct analysis of deviance
library(ordinal)
model = clm(Likert.f ~ Speaker,
data = Data)
anova(model, type="II")
Type II Analysis of Deviance Table with Wald chi-square tests
Df Chisq Pr(>Chisq)
Speaker 2 13.498 0.001172 **
Group separations by emmeans in table format
In the emmeans function, model specifies the model object that was previously fitted. Note the specialized formula where pairs indicates that all pairwise comparisons should be conducted, and Speaker indicates the variable whose levels will be compared.
The output here compares the levels of the grouping variable. For example, a significant p-value in the Pooh – Piglet line suggests that the value of the dependent variable (Likert.f) is different for Pooh compared with Piglet.
Remember to ignore the estimate, SE, and emmean values when using emmeans with a clm or clmm model object, unless specific options in emmeans are selected.
library(emmeans)
marginal = emmeans(model,
~ Speaker)
pairs(marginal,
adjust="tukey")
contrast estimate SE df z.ratio p.value
Pooh - Piglet 4.944 1.376 Inf 3.592 0.0010
Pooh - Tigger 0.634 0.906 Inf 0.700 0.7636
Piglet - Tigger -4.310 1.317 Inf -3.272 0.0031
P value adjustment: tukey method for comparing a family of 3 estimates
### Remember to ignore “estimate” and “SE” of
differences with CLM,
### as well as “emmeans” in the output not shown here
Compact letter display
Here we’ll use the emmeans output called marginal created above, and pass this object to the cld function to create a compact letter display. Note that this function needs to be called from the multcomp package. Options for the cld function can be viewed here:
rdrr.io/cran/emmeans/man/CLD.emmGrid.html
or use
library(emmeans)
?cld.emmGrid
In the output, groups sharing a letter in the .group column are not significantly different, at the alpha level specified.
library(multcomp)
cld(marginal,
alpha=0.05,
Letters=letters, ### Use lower-case
letters for .group
adjust="tukey") ###
Tukey-adjusted comparisons
Speaker emmean SE df asymp.LCL asymp.UCL .group
Piglet -1.78 0.776 Inf -3.636 0.0683 a
Tigger 2.53 0.841 Inf 0.518 4.5353 b
Pooh 3.16 0.891 Inf 1.032 5.2884 b
Confidence level used: 0.95
Conf-level adjustment: sidak method for 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates
significance level used: alpha = 0.05
### Groups sharing a letter in .group are not
significantly different
### Remember to ignore “emmean”, “SE”, “LCL”, and
“UCL” for CLM
### unless certain options are used.
Optional: Interaction plot of estimated marginal means with mean separation letters
It is often desirable to plot estimated marginal means from an analysis with either their confidence intervals or standard errors. This can be conducted as a one-way plot or an interaction plot. The emmeans and ggplot2 packages make it relatively easy to extract the EM means and the group separation letters and use them for plotting.
Input data and specify linear model
if(!require(car)){install.packages("car")}
if(!require(multcomp)){install.packages("multcomp")}
if(!require(emmeans)){install.packages("emmeans")}
if(!require(ggplot2)){install.packages("ggplot2")}
Location = c(rep("Olympia" , 6), rep("Ventura", 6),
rep("Northampton", 6), rep("Burlington", 6))
Tribe = c(rep(c("Jedi", "Sith"), 12))
Midichlorians = c(10, 4, 12, 5, 15, 4, 15, 9, 15, 11, 18, 12,
8, 13, 8, 15, 10, 17, 22, 22, 20, 22, 20, 25)
Data = data.frame(Tribe, Location, Midichlorians)
str(Data)
model = lm(Midichlorians ~ Tribe + Location + Tribe:Location,
data = Data)
library(car)
Anova(model)
Anova Table (Type II tests)
Response: Midichlorians
Sum Sq Df F value Pr(>F)
Tribe 8.17 1 3.0154 0.1017
Location 591.00 3 72.7385 1.535e-09 ***
Tribe:Location 198.83 3 24.4718 3.218e-06 ***
Residuals 43.33 16
One-way estimated marginal means and plot
library(multcomp)
library(emmeans)
marginal = emmeans(model,
~ Location)
CLD = cld(marginal,
alpha=0.05,
Letters=letters,
adjust="tukey")
CLD
Location emmean SE df lower.CL upper.CL .group
Olympia 8.333333 0.6718548 16 6.449596 10.21707 a
Northampton 11.833333 0.6718548 16 9.949596 13.71707 b
Ventura 13.333333 0.6718548 16 11.449596 15.21707 b
Burlington 21.833333 0.6718548 16 19.949596 23.71707 c
### Order the levels for printing
CLD$Location = factor(CLD$Location,
levels=c("Olympia", "Ventura",
"Northampton", "Burlington"))
### Remove spaces in .group
CLD$.group=gsub(" ", "", CLD$.group)
### Plot
library(ggplot2)
ggplot(CLD,
aes(x = Location,
y = emmean,
label = .group)) +
geom_point(shape = 15,
size = 4) +
geom_errorbar(aes(ymin = lower.CL,
ymax = upper.CL),
width = 0.2,
size = 0.7) +
theme_bw() +
theme(axis.title = element_text(face = "bold"),
axis.text = element_text(face = "bold"),
plot.caption = element_text(hjust = 0)) +
ylab("Estimated marginal mean\nmidichlorian count") +
ggtitle ("Midichlorian counts",
subtitle = "In four U.S. cities") +
labs(caption = paste0("\nMidichlorian counts for four
locations. ",
"Boxes indicate the EM mean. \n",
"Error bars indicate the 95% ",
"confidence interval of the EM mean.
\n",
"Means sharing a letter are not ",
"significantly different
(Tukey-adjusted \n",
"comparisons)."),
hjust=0.5) +
geom_text(nudge_x = c(0, 0, 0, 0),
nudge_y = c(4, 4, 4, 4),
color = "black")

Interaction plot of estimated marginal means
library(multcomp)
library(emmeans)
marginal = emmeans(model,
~ Tribe:Location)
CLD = cld(marginal,
alpha=0.05,
Letters=letters,
adjust="tukey")
CLD
Tribe Location emmean SE df lower.CL upper.CL .group
Sith Olympia 4.333333 0.9501462 16 1.354477 7.31219 a
Jedi Northampton 8.666667 0.9501462 16 5.687810 11.64552 ab
Sith Ventura 10.666667 0.9501462 16 7.687810 13.64552 bc
Jedi Olympia 12.333333 0.9501462 16 9.354477 15.31219 bcd
Sith Northampton 15.000000 0.9501462 16 12.021143 17.97886 cd
Jedi Ventura 16.000000 0.9501462 16 13.021143 18.97886 d
Jedi Burlington 20.666667 0.9501462 16 17.687810 23.64552 e
Sith Burlington 23.000000 0.9501462 16 20.021143 25.97886 e
### Order the levels for printing
CLD$Location = factor(CLD$Location,
levels=c("Olympia", "Ventura",
"Northampton", "Burlington"))
CLD$Tribe = factor(CLD$Tribe,
levels=c("Jedi", "Sith"))
### Remove spaces in .group
CLD$.group=gsub(" ", "", CLD$.group)
CLD
### Plot
library(ggplot2)
pd = position_dodge(0.4) ### How much to jitter
the points on the plot
ggplot(CLD,
aes(x = Location,
y = emmean,
color = Tribe,
label = .group)) +
geom_point(shape = 15,
size = 4,
position = pd) +
geom_errorbar(aes(ymin = lower.CL,
ymax = upper.CL),
width = 0.2,
size = 0.7,
position = pd) +
theme_bw() +
theme(axis.title = element_text(face = "bold"),
axis.text = element_text(face = "bold"),
plot.caption = element_text(hjust = 0)) +
ylab("Estimated marginal mean\nmidichlorian count") +
ggtitle ("Midichlorian counts for Jedi and Sith",
subtitle = "In four U.S. cities") +
labs(caption = paste0("\nMidichlorian counts for two tribes
across ",
"four locations. Boxes indicate
\n",
"the EM mean. ",
"Error bars indicate the 95% confidence
",
"interval ",
"of the EM \n",
"mean. Means sharing a letter are
",
"not significantly different \n",
"(Tukey-adjusted comparisons)."),
hjust=0.5) +
geom_text(nudge_x = c(0.1, -0.1, 0.1, -0.1, 0.1, -0.1, -0.1, 0.1),
nudge_y = c(4.5, 4.5, 4.5, 4.5, 4.5 , 4.5, 4.5, 4.5),
color = "black") +
scale_color_manual(values = c("blue", "red"))
