R Handbook: Estimated Marginal Means for Multiple Comparisons

Comparisons of values across groups in linear models, cumulative link models, and other models can be conducted easily with the emmeans package. Importantly, it can make comparisons among interactions of factors.

E.M. means stands for estimated marginal means. Estimated marginal means are means for treatment levels that are adjusted for means of other factors in the model. For an example, see the What are Estimated Marginal Means? chapter.

p-value adjustments for multiple comparisons

Adjustment of p-values for multiple comparisons is indicated with the adjust= option in the pairs, cld, and summary functions. Options are "tukey", "scheffe", "sidak", "bonferroni", "dunnettx", "mvt", and "none". For further information on these options, see the summary.emmGrid function section in:

Lenth, R.V., J. Love, and M. Hervé. 2017. Package ‘emmeans: Estimated Marginal Means, aka Least-Squares Means’. cran.r-project.org/web/packages/emmeans/emmeans.pdf.

or use

library(emmeans)

?summary.emmGrid

Confidence intervals for marginal means

To adjust the confidence intervals for the EM means, use e.g. summary(marginal, level=0.99).

Ignore values of emmeans for clm and clmm models

Typically, you should ignore the values of the estimated marginal means themselves (emmeans) when using them with clm and clmm models. See “clm” in help(models, package="emmeans") for details and for other options.

Packages used in this chapter

The packages used in this chapter include:

• psych

• ordinal

• car

• emmeans

• multcomp

The following commands will install these packages if they are not already installed:

if(!require(psych)){install.packages("psych")}
if(!require(ordinal)){install.packages("ordinal")}
if(!require(car)){install.packages("car")}
if(!require(emmeans)){install.packages("emmeans")}
if(!require(multcomp)){install.packages("multcomp")}

Multiple comparisons with emmeans

This example uses data set and model from the One-way Ordinal regression with CLM chapter.

Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="

Speaker Likert
Pooh      3
Pooh      5
Pooh      4
Pooh      4
Pooh      4
Pooh      4
Pooh      4
Pooh      4
Pooh      5
Pooh      5
Piglet    2
Piglet    4
Piglet    2
Piglet    2
Piglet    1
Piglet    2
Piglet    3
Piglet    2
Piglet    2
Piglet    3
Tigger    4
Tigger    4
Tigger    4
Tigger    4
Tigger    5
Tigger    3
Tigger    5
Tigger    4
Tigger    4
Tigger    3
")

### Order levels of the factor; otherwise R will alphabetize them

Data$Speaker = factor(Data$Speaker,
                      levels=unique(Data$Speaker))

### Create a new variable which is the likert scores as an ordered factor

Data$Likert.f = factor(Data$Likert,
                       ordered = TRUE)

### Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)

Define model and conduct analysis of deviance

library(ordinal)

model = clm(Likert.f ~ Speaker,
data = Data)

anova(model, type="II")

Type II Analysis of Deviance Table with Wald chi-square tests

Df Chisq Pr(>Chisq)
Speaker 2 13.498 0.001172 **

Group separations by emmeans in table format

In the emmeans function, model specifies the model object that was previously fitted. Note the specialized formula where pairs indicates that all pairwise comparisons should be conducted, and Speaker indicates the variable whose levels will be compared.

The output here compares the levels of the grouping variable. For example, a significant p-value in the Pooh – Piglet line suggests that the value of the dependent variable (Likert.f) is different for Pooh compared with Piglet.

Remember to ignore the estimate, SE, and emmean values when using emmeans with a clm or clmm model object, unless specific options in emmeans are selected.

library(emmeans)

marginal = emmeans(model,
~ Speaker)

pairs(marginal,
adjust="tukey")

contrast        estimate    SE df z.ratio p.value
Pooh - Piglet      4.944 1.376 Inf   3.592 0.0010
Pooh - Tigger      0.634 0.906 Inf   0.700 0.7636
Piglet - Tigger   -4.310 1.317 Inf -3.272 0.0031

P value adjustment: tukey method for comparing a family of 3 estimates

   ### Remember to ignore “estimate” and “SE” of differences with CLM,
   ###   as well as “emmeans” in the output not shown here

Compact letter display

Here we’ll use the emmeans output called marginal created above, and pass this object to the cld function to create a compact letter display. Note that this function needs to be called from the multcomp package. Options for the cld function can be viewed here:

rdrr.io/cran/emmeans/man/CLD.emmGrid.html

or use

library(emmeans)

?cld.emmGrid

In the output, groups sharing a letter in the .group column are not significantly different, at the alpha level specified.

library(multcomp)

cld(marginal,
    alpha=0.05,
    Letters=letters,      ### Use lower-case letters for .group
    adjust="tukey")       ### Tukey-adjusted comparisons

Speaker emmean    SE df asymp.LCL asymp.UCL .group
Piglet   -1.78 0.776 Inf    -3.636    0.0683 a
Tigger    2.53 0.841 Inf     0.518    4.5353   b
Pooh      3.16 0.891 Inf     1.032    5.2884   b

Confidence level used: 0.95

Conf-level adjustment: sidak method for 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates
significance level used: alpha = 0.05

   ### Groups sharing a letter in .group are not significantly different

   ### Remember to ignore “emmean”, “SE”, “LCL”, and “UCL” for CLM
   ###   unless certain options are used.

Optional: Interaction plot of estimated marginal means with mean separation letters

It is often desirable to plot estimated marginal means from an analysis with either their confidence intervals or standard errors. This can be conducted as a one-way plot or an interaction plot. The emmeans and ggplot2 packages make it relatively easy to extract the EM means and the group separation letters and use them for plotting.

Input data and specify linear model

if(!require(car)){install.packages("car")}
if(!require(multcomp)){install.packages("multcomp")}
if(!require(emmeans)){install.packages("emmeans")}
if(!require(ggplot2)){install.packages("ggplot2")}

Location = c(rep("Olympia" , 6), rep("Ventura", 6),
             rep("Northampton", 6), rep("Burlington", 6))

Tribe = c(rep(c("Jedi", "Sith"), 12))

Midichlorians = c(10, 4, 12, 5, 15, 4, 15, 9, 15, 11, 18, 12,
                   8, 13, 8, 15, 10, 17, 22, 22, 20, 22, 20, 25)

Data = data.frame(Tribe, Location, Midichlorians)

str(Data)

model = lm(Midichlorians ~ Tribe + Location + Tribe:Location,
                    data = Data)

library(car)

Anova(model)

Anova Table (Type II tests)

Response: Midichlorians
               Sum Sq Df F value    Pr(>F)
Tribe            8.17 1 3.0154    0.1017
Location       591.00 3 72.7385 1.535e-09 ***
Tribe:Location 198.83 3 24.4718 3.218e-06 ***
Residuals       43.33 16

One-way estimated marginal means and plot

library(multcomp)
library(emmeans)

marginal = emmeans(model,
                   ~ Location)

CLD = cld(marginal,
          alpha=0.05,
          Letters=letters,
          adjust="tukey")

CLD

Location       emmean        SE df lower.CL upper.CL .group
Olympia      8.333333 0.6718548 16 6.449596 10.21707 a
Northampton 11.833333 0.6718548 16 9.949596 13.71707   b
Ventura     13.333333 0.6718548 16 11.449596 15.21707   b
Burlington 21.833333 0.6718548 16 19.949596 23.71707    c

### Order the levels for printing

CLD$Location = factor(CLD$Location,
                       levels=c("Olympia", "Ventura", "Northampton", "Burlington"))

### Remove spaces in .group

CLD$.group=gsub(" ", "", CLD$.group)

### Plot

library(ggplot2)

ggplot(CLD,
       aes(x     = Location,
           y     = emmean,
           label = .group)) +

    geom_point(shape = 15,
               size   = 4) +

    geom_errorbar(aes(ymin = lower.CL,
                      ymax = upper.CL),
                      width = 0.2,
                      size = 0.7) +

    theme_bw() +
    theme(axis.title   = element_text(face = "bold"),
          axis.text    = element_text(face = "bold"),
          plot.caption = element_text(hjust = 0)) +

    ylab("Estimated marginal mean\nmidichlorian count") +
    ggtitle ("Midichlorian counts",

             subtitle = "In four U.S. cities") +

            labs(caption = paste0("\nMidichlorian counts for four locations. ",
                                   "Boxes indicate the EM mean. \n",
                                   "Error bars indicate the 95% ",
                                   "confidence interval of the EM mean. \n",
                                   "Means sharing a letter are not ",
                                   "significantly different (Tukey-adjusted \n",
                                   "comparisons)."),
                            hjust=0.5) +

geom_text(nudge_x = c(0, 0, 0, 0),
            nudge_y = c(4, 4, 4, 4),
            color   = "black")

Interaction plot of estimated marginal means

library(multcomp)
library(emmeans)

marginal = emmeans(model,
                   ~ Tribe:Location)

CLD = cld(marginal,
          alpha=0.05,
          Letters=letters,
          adjust="tukey")

CLD

Tribe Location       emmean        SE df lower.CL upper.CL .group
Sith Olympia      4.333333 0.9501462 16 1.354477 7.31219 a
Jedi Northampton 8.666667 0.9501462 16 5.687810 11.64552 ab
Sith Ventura     10.666667 0.9501462 16 7.687810 13.64552   bc
Jedi Olympia     12.333333 0.9501462 16 9.354477 15.31219   bcd
Sith Northampton 15.000000 0.9501462 16 12.021143 17.97886    cd
Jedi Ventura     16.000000 0.9501462 16 13.021143 18.97886     d
Jedi Burlington 20.666667 0.9501462 16 17.687810 23.64552      e
Sith Burlington 23.000000 0.9501462 16 20.021143 25.97886      e

### Order the levels for printing

CLD$Location = factor(CLD$Location,
                       levels=c("Olympia", "Ventura", "Northampton", "Burlington"))

CLD$Tribe = factor(CLD$Tribe,
                       levels=c("Jedi", "Sith"))

### Remove spaces in .group

CLD$.group=gsub(" ", "", CLD$.group)

CLD

### Plot

library(ggplot2)

pd = position_dodge(0.4)    ### How much to jitter the points on the plot

ggplot(CLD,
       aes(x     = Location,
           y     = emmean,
           color = Tribe,
           label = .group)) +

    geom_point(shape = 15,
               size   = 4,
             position = pd) +

    geom_errorbar(aes(ymin = lower.CL,
                      ymax = upper.CL),
                      width = 0.2,
                      size = 0.7,
                      position = pd) +

    theme_bw() +
    theme(axis.title   = element_text(face = "bold"),
          axis.text    = element_text(face = "bold"),
          plot.caption = element_text(hjust = 0)) +

    ylab("Estimated marginal mean\nmidichlorian count") +
     ggtitle ("Midichlorian counts for Jedi and Sith",
            subtitle = "In four U.S. cities") +

            labs(caption = paste0("\nMidichlorian counts for two tribes across ",
                                   "four locations. Boxes indicate \n",
                                   "the EM mean. ",
                                   "Error bars indicate the 95% confidence ",
                                   "interval ",
                                    "of the EM \n",
                                   "mean. Means sharing a letter are ",
                                   "not significantly different \n",
                                   "(Tukey-adjusted comparisons)."),
                            hjust=0.5) +

geom_text(nudge_x = c(0.1, -0.1, 0.1, -0.1, 0.1, -0.1, -0.1, 0.1),
            nudge_y = c(4.5, 4.5, 4.5, 4.5, 4.5 , 4.5, 4.5, 4.5),
            color   = "black") +

scale_color_manual(values = c("blue", "red"))

Summary and Analysis of Extension Program Evaluation in R

Estimated Marginal Means for Multiple Comparisons

Define model and conduct analysis of deviance

Group separations by emmeans in table format

Compact letter display