[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Least Square Means for Multiple Comparisons

Comparisons of values across groups in linear models, cumulative link models, and other models can be conducted easily with the lsmeans package. Importantly, it can make comparisons among interactions of factors. 

 

L.S. means stands for least square means.  Least square means are means for treatment levels that are adjusted for means of other factors in the model.  For a more complete explanation, see the What are least square means? chapter.

 

p-value adjustments for multiple comparisons

Adjustment of p-values for multiple comparisons is indicated with the adjust= option in the lsmeans function.  Options are "tukey", "scheffe", "sidak", "bonferroni", "dunnettx", "mvt", and "none".  For further information on these options, see the summary function section in:

 

Lenth, R.V. and M. Hervé. 2015. Package ‘lsmeans’. cran.r-project.org/web/packages/lsmeans/lsmeans.pdf.

 

Note that the adjust= option should also be applied to the cld function if a compact letter display is requested.

 

Ignore values of lsmeans for clm and clmm models

Typically you should ignore the values of the LS means themselves (lsmeans) when using them with clm and clmm models.  With default settings, the values of the LS means and the values of differences among the LS means are not easy to interpret.  See “clm” in help(models, package="lsmeans") for details and for other options.

 

Packages used in this chapter

 

The packages used in this chapter include:

•  psych

•  ordinal

•  car

•  RVAideMemoire

•  lsmeans

•  multcompView

 

The following commands will install these packages if they are not already installed:


if(!require(psych)){install.packages("psych")}
if(!require(ordinal)){install.packages("ordinal")}
if(!require(car)){install.packages("car")}
if(!require(RVAideMemoire)){install.packages("RVAideMemoire")}
if(!require(lsmeans)){install.packages("lsmeans")}
if(!require(multcompView)){install.packages("multcompView")}


Multiple comparisons with lsmeans

 

This example uses data set and model from the One-way Ordinal ANOVA with CLM chapter.


Input =("
 Speaker  Likert
 Pooh      3
 Pooh      5
 Pooh      4
 Pooh      4
 Pooh      4
 Pooh      4
 Pooh      4
 Pooh      4
 Pooh      5
 Pooh      5
 Piglet    2
 Piglet    4
 Piglet    2
 Piglet    2
 Piglet    1
 Piglet    2
 Piglet    3
 Piglet    2
 Piglet    2
 Piglet    3
 Tigger    4
 Tigger    4
 Tigger    4
 Tigger    4
 Tigger    5
 Tigger    3
 Tigger    5
 Tigger    4
 Tigger    4
 Tigger    3
")

Data = read.table(textConnection(Input),header=TRUE)


### Order levels of the factor; otherwise R will alphabetize them


Data$Speaker = factor(Data$Speaker,
                      levels=unique(Data$Speaker))


### Create a new variable which is the likert scores as an ordered factor


Data$Likert.f = factor(Data$Likert,
                       ordered = TRUE)


###  Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)


### Remove unnecessary objects

rm(Input)


Define model and conduct analysis of deviance


library(ordinal)

model = clm(Likert.f ~ Speaker,
            data = Data)

library(car)

library(RVAideMemoire)

Anova(model,
      type = "II")


Analysis of Deviance Table (Type II tests)

        LR Chisq Df Pr(>Chisq)   
Speaker   23.395  2  8.315e-06 ***


Group separations by lsmeans in table format

In the lsmeans function, model specifies the model object that was previously fitted.  Note the specialized formula where pairwise indicates that all pairwise comparisons should be conducted, and Speaker indicates the variable whose levels will be compared.

 

The output here compares the levels of the grouping variable.  For example, a significant p-value in the Pooh – Piglet line suggests that the value of the dependent variable (Likert.f) is different for Pooh compared with Piglet.

 

Remember to ignore the estimate, SE, and lsmean values when using lsmeans with a clm or clmm model object, unless specific options in lsmeans are selected.


library(lsmeans)

lsmeans(model,
        pairwise ~ Speaker,
        adjust="tukey")       ### Tukey-adjusted comparisons


$contrasts
 contrast         estimate        SE df    z.ratio p.value
 Pooh - Piglet    4.943822 1.3764706 NA  3.5916658  0.0010
 Pooh - Tigger    0.633731 0.9055691 NA  0.6998152  0.7636
 Piglet - Tigger -4.310091 1.3173294 NA -3.2718403  0.0031

P value adjustment: tukey method for comparing a family of 3 estimates

   ### Remember to ignore “estimate” and “SE” of differences with CLM,
   ###   as well as “lsmeans” in the output not shown here



Compact letter display

Here we’ll create an object of the lsmeans output called leastsquare.  Then we’ll pass this object to the cld function to create a compact letter display.

 

In the output, groups sharing a letter in the .group column are not significantly different, at the alpha level specified.

 


library(multcompView)

library(lsmeans)

leastsquare = lsmeans(model,
                      pairwise ~ Speaker,
                      adjust="tukey")      ###  Tukey-adjusted comparisons

cld(leastsquare,
    alpha=0.05,
    Letters=letters,      ### Use lower-case letters for .group
    adjust="tukey")       ### Tukey-adjusted comparisons 


Speaker    lsmean        SE df  asymp.LCL  asymp.UCL .group
 Piglet  -1.783599 0.7755958 NA -3.6355182 0.06832068  a   
 Tigger   2.526492 0.8413200 NA  0.5176407 4.53534421   b  
 Pooh     3.160223 0.8912968 NA  1.0320404 5.28840661   b  

Confidence level used: 0.95
Conf-level adjustment: sidak method for 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates
significance level used: alpha = 0.05

   ### Groups sharing a letter in .group are not significantly different

   ### Remember to ignore “lsmean”, “SE”, “LCL”, and “UCL” for CLM
   ###   unless certain options are used.



Optional:  Interaction plot of least square means with mean separation letters

 

It is often desirable to plot least square means from an analysis with either their confidence intervals or standard errors.  This can be conducted as a one-way plot or an interaction plot.  The lsmeans and ggplot2 packages make it relatively easy to extract the LS means and the group separation letters and use them for plotting.

 

Input data and specify linear model


if(!require(car)){install.packages("car")}
if(!require(multcompView)){install.packages("multcompView")}
if(!require(lsmeans)){install.packages("lsmeans")}
if(!require(ggplot2)){install.packages("ggplot2")}


Location = c(rep("Olympia" , 6), rep("Ventura", 6),
             rep("Northampton", 6), rep("Burlington", 6))

Tribe  = c(rep(c("Jedi", "Sith"), 12))

Midichlorians = c(10,  4, 12,  5, 15,  4, 15,  9, 15, 11, 18, 12,
                   8, 13,  8, 15, 10, 17, 22, 22, 20, 22, 20, 25)


Data = data.frame(Tribe, Location, Midichlorians)

str(Data)


model = lm(Midichlorians ~ Tribe + Location + Tribe:Location,
                    data = Data)

library(car)
Anova(model)


Anova Table (Type II tests)

Response: Midichlorians
               Sum Sq Df F value    Pr(>F)   
Tribe            8.17  1  3.0154    0.1017   
Location       591.00  3 72.7385 1.535e-09 ***
Tribe:Location 198.83  3 24.4718 3.218e-06 ***
Residuals       43.33 16                      


One-way least square means and plot


library(multcompView)
library(lsmeans)

leastsquare = lsmeans(model,
                      pairwise ~ Location,
                      adjust="tukey")

CLD = cld(leastsquare,
          alpha=0.05,
          Letters=letters,
          adjust="tukey")

CLD


Location       lsmean        SE df  lower.CL upper.CL .group
 Olympia      8.333333 0.6718548 16  6.449596 10.21707  a   
 Northampton 11.833333 0.6718548 16  9.949596 13.71707   b  
 Ventura     13.333333 0.6718548 16 11.449596 15.21707   b  
 Burlington  21.833333 0.6718548 16 19.949596 23.71707    c 


### Order the levels for printing

CLD$Location = factor(CLD$Location,
                       levels=c("Olympia", "Ventura", "Northampton", "Burlington"))

###  Remove spaces in .group 

CLD$.group=gsub(" ", "", CLD$.group)


### Plot

library(ggplot2)

ggplot(CLD,
       aes(x     = Location,
           y     = lsmean,
           label = .group)) +

    geom_point(shape  = 15,
               size   = 4,
             position = pd) +

    geom_errorbar(aes(ymin  =  lower.CL,
                      ymax  =  upper.CL),
                      width =  0.2,
                      size  =  0.7,
                   position = pd) +

    theme_bw() +
    theme(axis.title   = element_text(face = "bold"),
          axis.text    = element_text(face = "bold"),
          plot.caption = element_text(hjust = 0)) +

    ylab("Least square mean\nmidichlorian count") +
    ggtitle ("Midichlorian counts",

             subtitle = "In four U.S. cities") +

            labs(caption  = paste0("\nMidichlorian counts for four locations. ",
                                   "Boxes indicate the LS mean. \n",
                                   "Error bars indicate the 95% ",
                                   "confidence interval of the LS mean. \n",
                                   "Means sharing a letter are not ",
                                   "significantly different (Tukey-adjusted \n",
                                   "comparisons)."),
                            hjust=0.5) +

  geom_text(nudge_x = c(0, 0, 0, 0),
            nudge_y = c(4, 4, 4, 4),
            color   = "black")





Interaction plot of least square means


library(multcompView)
library(lsmeans)

leastsquare = lsmeans(model,
                      pairwise ~ Tribe:Location,
                      adjust="tukey")

CLD = cld(leastsquare,
          alpha=0.05,
          Letters=letters,
          adjust="tukey")

CLD


Tribe Location       lsmean        SE df  lower.CL upper.CL .group
 Sith  Olympia      4.333333 0.9501462 16  1.354477  7.31219  a   
 Jedi  Northampton  8.666667 0.9501462 16  5.687810 11.64552  ab  
 Sith  Ventura     10.666667 0.9501462 16  7.687810 13.64552   bc 
 Jedi  Olympia     12.333333 0.9501462 16  9.354477 15.31219   bcd
 Sith  Northampton 15.000000 0.9501462 16 12.021143 17.97886    cd
 Jedi  Ventura     16.000000 0.9501462 16 13.021143 18.97886     d
 Jedi  Burlington  20.666667 0.9501462 16 17.687810 23.64552      e
 Sith  Burlington  23.000000 0.9501462 16 20.021143 25.97886      e


### Order the levels for printing

CLD$Location = factor(CLD$Location,
                       levels=c("Olympia", "Ventura", "Northampton", "Burlington"))

CLD$Tribe = factor(CLD$Tribe,
                       levels=c("Jedi", "Sith"))

###  Remove spaces in .group 

CLD$.group=gsub(" ", "", CLD$.group)


CLD


### Plot

library(ggplot2)

pd = position_dodge(0.4)    ### How much to jitter the points on the plot

ggplot(CLD,
       aes(x     = Location,
           y     = lsmean,
           color = Tribe,
           label = .group)) +

    geom_point(shape  = 15,
               size   = 4,
             position = pd) +

    geom_errorbar(aes(ymin  =  lower.CL,
                      ymax  =  upper.CL),
                      width =  0.2,
                      size  =  0.7,
                      position = pd) +

    theme_bw() +
    theme(axis.title   = element_text(face = "bold"),
          axis.text    = element_text(face = "bold"),
          plot.caption = element_text(hjust = 0)) +

    ylab("Least square mean\nmidichlorian count") +
     ggtitle ("Midichlorian counts for Jedi and Sith",
            subtitle = "In four U.S. cities") +
 
            labs(caption  = paste0("\nMidichlorian counts for two tribes across ",
                                   "four locations. Boxes indicate \n",
                                   "the LS mean. ",
                                   "Error bars indicate the 95% confidence ",
                                   "interval ",
                                    "of the LS \n",
                                   "mean. Means sharing a letter are ",
                                   "not significantly different \n",
                                   "(Tukey-adjusted comparisons)."),
                            hjust=0.5) +
 
  geom_text(nudge_x = c(0.1, -0.1, 0.1, -0.1, 0.1, -0.1, -0.1, 0.1),
            nudge_y = c(4.5,  4.5, 4.5,  4.5, 4.5 , 4.5,  4.5, 4.5),
            color   = "black") +
 
  scale_color_manual(values = c("blue", "red"))