[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Post-hoc Contrasts and Polynomial Contrasts

Comparisons of estimated marginal means can be used to assess custom contrasts.  These might include comparisons among groups of treatments or a linear trend across treatments.

 

For a couple of simple examples, we’ll revisit the midichlorians data from the Estimated Marginal Means for Multiple Comparisons chapter. And use a simple model of Midichlorians vs. Location. There is a relevant plot at the end of the chapter.


Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="
Tribe  Location      Midichlorians
Jedi   Olympia       10
Sith   Olympia        4
Jedi   Olympia       12
Sith   Olympia        5
Jedi   Olympia       15
Sith   Olympia        4
Jedi   Ventura       15
Sith   Ventura        9
Jedi   Ventura       15
Sith   Ventura       11
Jedi   Ventura       18
Sith   Ventura       12
Jedi   Northampton    8
Sith   Northampton   13
Jedi   Northampton    8
Sith   Northampton   15
Jedi   Northampton   10
Sith   Northampton   17
Jedi   Burlington    22
Sith   Burlington    22
Jedi   Burlington    20
Sith   Burlington    22
Jedi   Burlington    20
Sith   Burlington    25
")


Custom contrasts to compare groups of treatments

 

Imagine we want to compare the midichlorian counts for locations on the east coast of the U.S. and the west coast of the U.S. With the data we have, that would entail comparing (Burlington and Northampton) vs. (Olympia and Ventura).

 

Let’s also compare locations in states beginning with letters A–M, and locations in states beginning with letters N–Z.  So, (Burlington and Olympia) vs. (Northampton and Ventura).

 

For more complete and varied examples for this kind of analysis, see rcompanion.org/rcompanion/h_01.html or the Post-hoc Contrasts in Models chapter in the An R Companion for the Handbook of Biological Statistics book.

 

First, we’ll need to look at the order of factors in our data.  This will tell us the order of coefficients for the contrasts.


levels(Data$Location)


"Burlington"  "Northampton" "Olympia"     "Ventura"


If we wanted to use a different order, we could re-order the factors.

 

Data$Location = factor(Data$Location,
                       levels=c("Burlington", "Northampton", "Olympia", "Ventura"))

levels(Data$Location)


"Burlington"  "Northampton" "Olympia"     "Ventura"


Create model and test the contrasts


model = lm(Midichlorians ~ Location, data = Data)

library(emmeans)

marginal = emmeans(model, ~ Location)

Contrasts = list(EastCoastWestCoast =  c( 1,  1, -1, -1),
                 StatesNZStatesAM   =  c( 1, -1,  1, -1))

Test = contrast(marginal, Contrasts)

test(Test)


contrast           estimate   SE df t.ratio p.value
EastCoastWestCoast       12 2.89 20   4.154  0.0005
StatesNZStatesAM          5 2.89 20   1.731  0.0989


Here, the results suggest that there is a significant difference between east coast locations (Burlington and Northampton) and west coast locations (Olympia and Ventura).  This makes sense given the plot below.

 

However, the other contrast we tested — (Burlington and Olympia) vs. (Northampton and Ventura) — was not significant.

 

Polynomial contrasts as a test of trend

 

As a different flavor of this analysis, imagine we want to see if there an increasing or decreasing trend across the locations in alphabetical order.  This is a useful technique when the independent variable is a categorical variable that can be ordered.  This variable is treated as categorical, but we also examine trends treating it as an ordinal variable.

 

For more complete examples for this kind of analysis, see rcompanion.org/rcompanion/h_03.html or the Polynomial Contrasts in Models chapter in the An R Companion for the Handbook of Biological Statistics book.

 

First, we’ll need to look at the order of factors in our data.  This will tell us the order of coefficients for the contrasts.


levels(Data$Location)


"Burlington"  "Northampton" "Olympia"     "Ventura"


If we wanted to use a different order, we could re-order the factors.

 

Data$Location = factor(Data$Location,
                       levels=c("Burlington", "Northampton", "Olympia", "Ventura"))

levels(Data$Location)


"Burlington"  "Northampton" "Olympia"     "Ventura"


Create model and test the contrasts


model = lm(Midichlorians ~ Location, data = Data)

library(emmeans)

marginal = emmeans(model, ~ Location)

Contrasts = list(Linear    =  c( -3, -1,  1,  3),
                 Quadratic =  c(  1, -1, -1,  1),
                 Cubic     =  c( -1,  3, -3,  1))

Test = contrast(marginal, Contrasts)

test(Test)


contrast  estimate   SE df t.ratio p.value
Linear         -29 6.46 20  -4.490  0.0002
Quadratic       15 2.89 20   5.193  <.0001
Cubic            2 6.46 20   0.310  0.76


Here, the results suggest that there is a significant linear trend across Locations in alphabetical order.  Looking at the plot below, it’s clear that there is a decrease in Midichlorians as alphabetical Location increases.

 

Also, there is a significant quadratic trend. In this case, there is the suggestion that the mean for  Ventura is greater than that for Olympia, suggesting it begins an upward trend for Locations later in alphabetical order.

 

Plot


Out = as.data.frame(emmeans(model, ~ Location))

library(ggplot2)

png(filename = "MidichlorianPlot.png",
    width  = 5,
    height = 3.75,
    units  = "in",
    res    = 300)

ggplot(Out,
       aes(x     = Location,
           y     = emmean)) +

 geom_point(shape  = 15,
            size   = 4) +

 geom_errorbar(aes(ymin  =  lower.CL,
                   ymax  =  upper.CL),
               width =  0.1,
               size  =  0.7) +

 theme_bw() +
 theme(axis.title   = element_text(face = "bold"),
       axis.text    = element_text(face = "bold"),
       plot.caption = element_text(hjust = 0)) +

 xlab("\nLocation") +
 ylab("Estimated marginal mean\nmidichlorian count\n")

dev.off()