[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Estimated Marginal Means for Multiple Comparisons

 

Comparisons of values across groups in linear models, cumulative link models, and other models can be conducted easily with the emmeans package.  Importantly, it can make comparisons among interactions of factors.

 

E.M. means stands for estimated marginal means.  Estimated marginal means are means for treatment levels that are adjusted for means of other factors in the model.  For an example, see the What are Estimated Marginal Means? chapter.

 

p-value adjustments for multiple comparisons

Adjustment of p-values for multiple comparisons is indicated with the adjust= option in the pairs, cld, and summary functions.  Options are "tukey", "scheffe", "sidak", "bonferroni", "dunnettx", "mvt", and "none".  For further information on these options, see the summary.emmGrid function section in:

 

Lenth, R.V., J. Love, and M. Hervé. 2017. Package ‘emmeans: Estimated Marginal Means, aka Least-Squares Means’.  cran.r-project.org/web/packages/emmeans/emmeans.pdf.

 

or use

 

library(emmeans)

?summary.emmGrid

Confidence intervals for marginal means

To adjust the confidence intervals for the EM means, use e.g. summary(marginal, level=0.99).

 

Ignore values of emmeans for clm and clmm models

Typically, you should ignore the values of the estimated marginal means themselves (emmeans) when using them with clm and clmm models.  See “clm” in help(models, package="emmeans") for details and for other options.

 

Packages used in this chapter

 

The packages used in this chapter include:

•  psych

•  ordinal

•  car

•  emmeans

•  multcomp

 

The following commands will install these packages if they are not already installed:


if(!require(psych)){install.packages("psych")}
if(!require(ordinal)){install.packages("ordinal")}
if(!require(car)){install.packages("car")}
if(!require(emmeans)){install.packages("emmeans")}
if(!require(multcomp)){install.packages("multcomp")}


Multiple comparisons with emmeans

 

This example uses data set and model from the One-way Ordinal regression with CLM chapter.


Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="

 Speaker  Likert
 Pooh      3
 Pooh      5
 Pooh      4
 Pooh      4
 Pooh      4
 Pooh      4
 Pooh      4
 Pooh      4
 Pooh      5
 Pooh      5
 Piglet    2
 Piglet    4
 Piglet    2
 Piglet    2
 Piglet    1
 Piglet    2
 Piglet    3
 Piglet    2
 Piglet    2
 Piglet    3
 Tigger    4
 Tigger    4
 Tigger    4
 Tigger    4
 Tigger    5
 Tigger    3
 Tigger    5
 Tigger    4
 Tigger    4
 Tigger    3
")


### Order levels of the factor; otherwise R will alphabetize them


Data$Speaker = factor(Data$Speaker,
                      levels=unique(Data$Speaker))


### Create a new variable which is the likert scores as an ordered factor


Data$Likert.f = factor(Data$Likert,
                       ordered = TRUE)


###  Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)


Define model and conduct analysis of deviance


library(ordinal)

model = clm(Likert.f ~ Speaker,
            data = Data)

anova(model, type="II")



Type II Analysis of Deviance Table with Wald chi-square tests

        Df  Chisq Pr(>Chisq)  
Speaker  2 13.498   0.001172 **


Group separations by emmeans in table format

In the emmeans function, model specifies the model object that was previously fitted.  Note the specialized formula where pairs indicates that all pairwise comparisons should be conducted, and Speaker indicates the variable whose levels will be compared.

 

The output here compares the levels of the grouping variable.  For example, a significant p-value in the Pooh – Piglet line suggests that the value of the dependent variable (Likert.f) is different for Pooh compared with Piglet.

 

Remember to ignore the estimate, SE, and emmean values when using emmeans with a clm or clmm model object, unless specific options in emmeans are selected.


library(emmeans)

marginal = emmeans(model,
                   ~ Speaker)

pairs(marginal,
      adjust="tukey")


 contrast        estimate    SE  df z.ratio p.value
 Pooh - Piglet      4.944 1.376 Inf   3.592  0.0010
 Pooh - Tigger      0.634 0.906 Inf   0.700  0.7636
 Piglet - Tigger   -4.310 1.317 Inf  -3.272  0.0031

P value adjustment: tukey method for comparing a family of 3 estimates

   ### Remember to ignore “estimate” and “SE” of differences with CLM,
   ###   as well as “emmeans” in the output not shown here



Compact letter display

Here we’ll use the emmeans output called marginal created above, and pass this object to the cld function to create a compact letter display.  Note that this function needs to be called from the multcomp package.  Options for the cld function can be viewed here:

 

rdrr.io/cran/emmeans/man/CLD.emmGrid.html

 

or use


library(emmeans)

?cld.emmGrid


In the output, groups sharing a letter in the .group column are not significantly different, at the alpha level specified.


library(multcomp)

cld(marginal,
    alpha=0.05,
    Letters=letters,      ### Use lower-case letters for .group
    adjust="tukey")       ### Tukey-adjusted comparisons 


 Speaker emmean    SE  df asymp.LCL asymp.UCL .group
 Piglet   -1.78 0.776 Inf    -3.636    0.0683  a   
 Tigger    2.53 0.841 Inf     0.518    4.5353   b  
 Pooh      3.16 0.891 Inf     1.032    5.2884   b  

Confidence level used: 0.95

Conf-level adjustment: sidak method for 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates
significance level used: alpha = 0.05

   ### Groups sharing a letter in .group are not significantly different

   ### Remember to ignore “emmean”, “SE”, “LCL”, and “UCL” for CLM
   ###   unless certain options are used.


Optional:  Interaction plot of estimated marginal means with mean separation letters

 

It is often desirable to plot estimated marginal means from an analysis with either their confidence intervals or standard errors.  This can be conducted as a one-way plot or an interaction plot.  The emmeans and ggplot2 packages make it relatively easy to extract the EM means and the group separation letters and use them for plotting.

 

Input data and specify linear model


if(!require(car)){install.packages("car")}
if(!require(multcomp)){install.packages("multcomp")}
if(!require(emmeans)){install.packages("emmeans")}
if(!require(ggplot2)){install.packages("ggplot2")}


Location = c(rep("Olympia" , 6), rep("Ventura", 6),
             rep("Northampton", 6), rep("Burlington", 6))

Tribe  = c(rep(c("Jedi", "Sith"), 12))

Midichlorians = c(10,  4, 12,  5, 15,  4, 15,  9, 15, 11, 18, 12,
                   8, 13,  8, 15, 10, 17, 22, 22, 20, 22, 20, 25)


Data = data.frame(Tribe, Location, Midichlorians)

str(Data)


model = lm(Midichlorians ~ Tribe + Location + Tribe:Location,
                    data = Data)

library(car)

Anova(model)


Anova Table (Type II tests)

Response: Midichlorians
               Sum Sq Df F value    Pr(>F)   
Tribe            8.17  1  3.0154    0.1017   
Location       591.00  3 72.7385 1.535e-09 ***
Tribe:Location 198.83  3 24.4718 3.218e-06 ***
Residuals       43.33 16                     


One-way estimated marginal means and plot


library(multcomp)
library(emmeans)

marginal = emmeans(model,
                   ~ Location)

CLD = cld(marginal,
          alpha=0.05,
          Letters=letters,
          adjust="tukey")

CLD


Location        emmean        SE df  lower.CL upper.CL .group
 Olympia      8.333333 0.6718548 16  6.449596 10.21707  a   
 Northampton 11.833333 0.6718548 16  9.949596 13.71707   b  
 Ventura     13.333333 0.6718548 16 11.449596 15.21707   b  
 Burlington  21.833333 0.6718548 16 19.949596 23.71707    c 


### Order the levels for printing

CLD$Location = factor(CLD$Location,
                       levels=c("Olympia", "Ventura", "Northampton", "Burlington"))

###  Remove spaces in .group 

CLD$.group=gsub(" ", "", CLD$.group)


### Plot

library(ggplot2)

ggplot(CLD,
       aes(x     = Location,
           y     = emmean,
           label = .group)) +

    geom_point(shape  = 15,
               size   = 4) +

    geom_errorbar(aes(ymin  =  lower.CL,
                      ymax  =  upper.CL),
                      width =  0.2,
                      size  =  0.7) +

    theme_bw() +
    theme(axis.title   = element_text(face = "bold"),
          axis.text    = element_text(face = "bold"),
          plot.caption = element_text(hjust = 0)) +

    ylab("Estimated marginal mean\nmidichlorian count") +
    ggtitle ("Midichlorian counts",

             subtitle = "In four U.S. cities") +

            labs(caption  = paste0("\nMidichlorian counts for four locations. ",
                                   "Boxes indicate the EM mean. \n",
                                   "Error bars indicate the 95% ",
                                   "confidence interval of the EM mean. \n",
                                   "Means sharing a letter are not ",
                                   "significantly different (Tukey-adjusted \n",
                                   "comparisons)."),
                            hjust=0.5) +

  geom_text(nudge_x = c(0, 0, 0, 0),
            nudge_y = c(4, 4, 4, 4),
            color   = "black")





Interaction plot of estimated marginal means


library(multcomp)
library(emmeans)

marginal = emmeans(model,
                   ~ Tribe:Location)

CLD = cld(marginal,
          alpha=0.05,
          Letters=letters,
          adjust="tukey")

CLD


Tribe  Location       emmean        SE df  lower.CL upper.CL .group
 Sith  Olympia      4.333333 0.9501462 16  1.354477  7.31219  a   
 Jedi  Northampton  8.666667 0.9501462 16  5.687810 11.64552  ab  
 Sith  Ventura     10.666667 0.9501462 16  7.687810 13.64552   bc 
 Jedi  Olympia     12.333333 0.9501462 16  9.354477 15.31219   bcd
 Sith  Northampton 15.000000 0.9501462 16 12.021143 17.97886    cd
 Jedi  Ventura     16.000000 0.9501462 16 13.021143 18.97886     d
 Jedi  Burlington  20.666667 0.9501462 16 17.687810 23.64552      e
 Sith  Burlington  23.000000 0.9501462 16 20.021143 25.97886      e


### Order the levels for printing

CLD$Location = factor(CLD$Location,
                       levels=c("Olympia", "Ventura", "Northampton", "Burlington"))

CLD$Tribe = factor(CLD$Tribe,
                       levels=c("Jedi", "Sith"))

###  Remove spaces in .group 

CLD$.group=gsub(" ", "", CLD$.group)


CLD


### Plot

library(ggplot2)

pd = position_dodge(0.4)    ### How much to jitter the points on the plot

ggplot(CLD,
       aes(x     = Location,
           y     = emmean,
           color = Tribe,
           label = .group)) +

    geom_point(shape  = 15,
               size   = 4,
             position = pd) +

    geom_errorbar(aes(ymin  =  lower.CL,
                      ymax  =  upper.CL),
                      width =  0.2,
                      size  =  0.7,
                      position = pd) +

    theme_bw() +
    theme(axis.title   = element_text(face = "bold"),
          axis.text    = element_text(face = "bold"),
          plot.caption = element_text(hjust = 0)) +

    ylab("Estimated marginal mean\nmidichlorian count") +
     ggtitle ("Midichlorian counts for Jedi and Sith",
            subtitle = "In four U.S. cities") +
 
            labs(caption  = paste0("\nMidichlorian counts for two tribes across ",
                                   "four locations. Boxes indicate \n",
                                   "the EM mean. ",
                                   "Error bars indicate the 95% confidence ",
                                   "interval ",
                                    "of the EM \n",
                                   "mean. Means sharing a letter are ",
                                   "not significantly different \n",
                                   "(Tukey-adjusted comparisons)."),
                            hjust=0.5) +
 
  geom_text(nudge_x = c(0.1, -0.1, 0.1, -0.1, 0.1, -0.1, -0.1, 0.1),
            nudge_y = c(4.5,  4.5, 4.5,  4.5, 4.5 , 4.5,  4.5, 4.5),
            color   = "black") +
 
  scale_color_manual(values = c("blue", "red"))