## An R Companion for the Handbook of Biological Statistics

Salvatore S. Mangiafico

# Contrasts in Linear Models

Contrasts can be used to make specific comparisons of treatments within a linear model.

One common use is when a factorial design is used, but control or check treatments are used in addition to the factorial design.  In the first example below, there are two treatments (D and C) each at two levels (1 and 2), and then there is a Control treatment. The approach used here is to analyze the experiment as a one-way analysis of variance, and then use contrasts to test various hypotheses.

Another common use is when there are several treatments that could be thought of as members of a group.  In the second example below,  there are measurements for six wines, some of which are red (Merlot, Cabernet, Syrah) and some of which are white (Chardonnay, Riesling, Gewürztraminer).  We can compare the treatments within the red wine group by setting up contrasts and conducting an F-test.  This is analogous to testing the main effect of Red Wine.

The packages lsmeans and multcomp allow for unlimited tests of single-degree contrasts, with a p-value correction for multiple tests.  They also allow for an F-test for multi-line contrasts, for example when testing within groups.  The aov function in the native stats package has more limited functionality.

See the chapters on One-way Anova and Two-way Anova for general considerations on conducting analysis of variance

### Packages used in this chapter

The following commands will install these packages if they are not already installed:

if(!require(car)){install.packages("car")}
if(!require(lsmeans){install.packages("lsmeans")}
if(!require(multcomp)){install.packages("multcomp")}

### Example for single degree-of-freedom contrasts

This hypothetical example could represent an experiment with a factorial design two treatments (D and C) each at two levels (1 and 2), and a control treatment.  The 2-by-2 factorial plus control is treated as a one-way anova with five treatments.

Input = ("
Treatment   Response
'D1:C1'    1.0
'D1:C1'    1.2
'D1:C1'    1.3
'D1:C2'    2.1
'D1:C2'    2.2
'D1:C2'    2.3
'D2:C1'    1.4
'D2:C1'    1.6
'D2:C1'    1.7
'D2:C2'    2.5
'D2:C2'    2.6
'D2:C2'    2.8
'Control'  1.0
'Control'  0.9
'Control'  0.8
")

### Specify the order of factor levels. Otherwise R will alphabetize them.

Data\$Treatment = factor(Data\$Treatment,
levels=unique(Data\$Treatment))

Data

boxplot(Response ~ Treatment,
data = Data,
ylab="Response",
xlab="Treatment")

###  Define linear model

model = lm(Response ~ Treatment,
data = Data)

library(car)

Anova(model, type="II")

summary(model)

#### Example with lsmeans

###  You need to look at order of factor levels to determine the contrasts

levels(Data\$Treatment)

[1] "D1:C1"   "D1:C2"   "D2:C1"   "D2:C2"   "Control"

library(lsmeans)

leastsquare = lsmeans(model, "Treatment")

Contrasts = list(D1vsD2          = c(1,  1, -1, -1,  0),
C1vsC2          = c(1, -1,  1, -1,  0),
InteractionDC   = c(1, -1, -1,  1,  0),
C1vsC2forD1only = c(1, -1,  0,  0,  0),
C1vsC2forD2only = c(0,  0,  1, -1,  0),
TreatsvsControl = c(1,  1,  1,  1, -4),
T1vsC           = c(1,  0,  0,  0, -1),
T2vsC           = c(0,  1,  0,  0, -1),
T3vsC           = c(0,  0,  1,  0, -1),
T4vsC           = c(0,  0,  0,  1, -1))

### The column names match the order of levels of the treatment variable
### The coefficients of each row sum to 0

contrast           estimate        SE df t.ratio p.value
D1vsD2          -0.83333333 0.1549193 10  -5.379  0.0031
C1vsC2          -2.10000000 0.1549193 10 -13.555  <.0001
InteractionDC    0.03333333 0.1549193 10   0.215  1.0000
C1vsC2forD1only -1.03333333 0.1095445 10  -9.433  <.0001
C1vsC2forD2only -1.06666667 0.1095445 10  -9.737  <.0001
TreatsvsControl  3.96666667 0.3464102 10  11.451  <.0001
T1vsC            0.26666667 0.1095445 10   2.434  0.3011
T2vsC            1.30000000 0.1095445 10  11.867  <.0001
T3vsC            0.66666667 0.1095445 10   6.086  0.0012
T4vsC            1.73333333 0.1095445 10  15.823  <.0001

### Note that p-values are slightly different than those from multcomp
###  due to different adjustment methods.  If “none” is chosen as
###  the adjustment method for both procedures,
###  p-values and other statistics will be the same.

### With adjust="none", results will be the same as
###   the aov method.

#### Example with multcomp

###  You need to look at order of factor levels to determine the contrasts

levels(Data\$Treatment)

[1] "D1:C1"   "D1:C2"   "D2:C1"   "D2:C2"   "Control"

Input = ("
Contrast.Name     D1C2  D1C2 D2C1 D2C2  Control
D1vsD2            1     1   -1   -1     0
C1vsC2            1    -1    1   -1     0
InteractionDC     1    -1   -1    1     0
C1vsC2forD1only   1    -1    0    0     0
C1vsC2forD2only   0     0    1   -1     0
TreatsvsControl   1     1    1    1    -4
T1vsC             1     0    0    0    -1
T2vsC             0     1    0    0    -1
T3vsC             0     0    1    0    -1
T4vsC             0     0    0    1    -1
")

### The column names match the order of levels of the treatment variable
### The coefficients of each row sum to 0

row.names=1))

Matriz

library(multcomp)

G = glht(model,
linfct = mcp(Treatment = Matriz))

G\$linfct

summary(G,

### Adjustment options: "none", "single-step", "Shaffer",
###                     "Westfall", "free", "holm", "hochberg",
###                     "hommel", "bonferroni", "BH", "BY", "fdr"

Estimate Std. Error t value Pr(>|t|)

D1vsD2 == 0          -0.83333    0.15492  -5.379  0.00218 **

C1vsC2 == 0          -2.10000    0.15492 -13.555  < 0.001 ***

InteractionDC == 0    0.03333    0.15492   0.215  0.99938

C1vsC2forD1only == 0 -1.03333    0.10954  -9.433  < 0.001 ***

C1vsC2forD2only == 0 -1.06667    0.10954  -9.737  < 0.001 ***

TreatsvsControl == 0  3.96667    0.34641  11.451  < 0.001 ***

T1vsC == 0            0.26667    0.10954   2.434  0.17428

T2vsC == 0            1.30000    0.10954  11.867  < 0.001 ***

T3vsC == 0            0.66667    0.10954   6.086  < 0.001 ***

T4vsC == 0            1.73333    0.10954  15.823  < 0.001 ***

### With test=adjusted("none"), results will be the same as aov method below.

#     #     #

### Example for global F-test within a group of treatments

This example has treatments consisting of three red wines and three white wines.  We will want to know if there is an effect of the treatments in the red wine group on the response variable, while keeping the individual identities of the wines in the Treatment variable.  This approach is advantageous because post-hoc comparisons could still be made within the red wines, for example comparing Merlot to Cabernet.

Input = ("
Treatment          Response
Merlot             5
Merlot             6
Merlot             7
Cabernet           8
Cabernet           9
Cabernet          10
Syrah             11
Syrah             12
Syrah             13

Chardonnay         1
Chardonnay         2
Chardonnay         3
Riesling           1
Riesling           2
Riesling           2
Gewürtztraminer    1
Gewürtztraminer    2
Gewürtztraminer    4
")

### Specify the order of factor levels. Otherwise R will alphabetize them.

Data\$Treatment = factor(Data\$Treatment,
levels=unique(Data\$Treatment))

Data

boxplot(Response ~ Treatment,
data = Data,
ylab="Response",
xlab="Treatment")

###  You need to look at order of factor levels to determine the contrasts

levels(Data\$Treatment)

[1] "Merlot"  "Cabernet"  "Syrah"  "Chardonnay"  "Riesling"  "Gewürtztraminer"

###  Define linear model

model = lm(Response ~ Treatment,
data = Data)

library(car)

Anova(model, type="II")

summary(model)

#### Tests of contrasts with lsmeans

##### Question: Is there an effect within red wine ?

library(lsmeans)

leastsquare = lsmeans(model, "Treatment")

Contrasts = list(Red_line1   = c(1, -1,  0,  0,  0,  0),
Red_line2   = c(0,  1, -1,  0,  0,  0))

### The column names match the order of levels of the treatment variable
### The coefficients of each row sum to 0

Test = contrast(leastsquare, Contrasts)

test(Test, joint=TRUE)

df1 df2    F p.value
2  12 24.3  0.0001

### Note that two lines of contrasts resulted in one hypothesis test
###   using 2 degrees of freedom.  This investigated the effect within
###   a group of 3 treatments.

### Results are essentially the same as those from multcomp

Question: Is there an effect within white wine ?

library(lsmeans)

leastsquare = lsmeans(model, "Treatment")

Contrasts = list(White_line1   = c(0,  0,  0,  1, -1,  0),
White_line2   = c(0,  0,  0,  0,  1, -1))

### The column names match the order of levels of the treatment variable
### The coefficients of each row sum to 0

Test = contrast(leastsquare, Contrasts)

test(Test, joint=TRUE)

df1 df2   F p.value
2  12 0.3  0.7462

### Note that two lines of contrasts resulted in one hypothesis test
###   using 2 degrees of freedom.  This investigated the effect within
###   a group of 3 treatments

### Results are the same as those from multcomp

##### Question: Is there a difference between red and white wines?  And, mean separation for red wine

library(lsmeans)

leastsquare = lsmeans(model, "Treatment")

Contrasts = list(Red_vs_white    = c( 1,  1,  1, -1, -1, -1),
Merlot_vs_Cab   = c( 1, -1,  0,  0,  0,  0),
Cab_vs_Syrah    = c( 0,  1, -1,  0,  0,  0),
Syrah_vs_Merlot = c(-1,  0,  1,  0,  0,  0))

### The column names match the order of levels of the treatment variable
### The coefficients of each row sum to 0

contrast        estimate       SE df t.ratio p.value

Red_vs_white          21 1.490712 12  14.087  <.0001

Merlot_vs_Cab         -3 0.860663 12  -3.486  0.0179
Cab_vs_Syrah          -3 0.860663 12  -3.486  0.0179
Syrah_vs_Merlot        6 0.860663 12   6.971  0.0001

### Note that p-values are slightly different than those from multcomp
###  due to different adjustment methods.  If “none” is chosen as
###  the adjustment method for both procedures,
###  p-values and other statistics will be the same.

#### Tests of contrasts with multcomp

##### Question: Is there an effect within red wine ?

Input = "
Contrast    Merlot  Cabernet  Syrah  Chardonnay  Riesling  Gewürtztraminer
Red_line1  1       -1         0     0           0         0
Red_line2  0        1        -1     0           0         0
"

### Note: there are two lines of contrasts for a group of three treatments

### The column names match the order of levels of the treatment variable
### The coefficients of each row sum to 0

row.names=1))

Matriz

library(multcomp)

G = glht(model, linfct = mcp(Treatment = Matriz))

G\$linfct

summary(G,
test = Ftest())

Global Test:
F  DF1  DF2     Pr(>F)
1  24.3    2   12  6.029e-05

### Note that two lines of contrasts resulted in one hypothesis test
###   using 2 degrees of freedom.  This investigated the effect within

###   a group of 3 treatments.

##### Question: Is there an effect within white wine ?

Input = "
Contrast    Merlot  Cabernet  Syrah  Chardonnay  Riesling  Gewürtztraminer
White_line1  0       0         0      1           -1          0
White_line2  0       0         0      0            1         -1
"

### Note: there are two lines of contrasts for a group of three treatments

### The column names match the order of levels of the treatment variable
### The coefficients of each row sum to 0

row.names=1))

Matriz

library(multcomp)

G = glht(model, linfct = mcp(Treatment = Matriz))

G\$linfct

summary(G,
test = Ftest())

Global Test:
F DF1 DF2 Pr(>F)
1 0.3   2  12 0.7462

### Note that two lines of contrasts resulted in one hypothesis test
###   using 2 degrees of freedom.  This investigated the effect within

###   a group of 3 treatments.

#     #     #

##### Question: Is there a difference between red and white wines?  And, mean separation for red wine

Input = "
Contrast          Merlot  Cabernet  Syrah  Chardonnay  Riesling  Gewürtztraminer
Red_vs_white      1        1         1     -1          -1        -1
Merlot_vs_Cab     1       -1         0      0           0         0
Cab_vs_Syrah      0        1        -1      0           0         0
Syrah_vs_Merlot  -1        0         1      0           0         0
"

names match the order of levels of the treatment variable
### The coefficients of each row sum to 0

row.names=1))

Matriz

library(multcomp)

G = glht(model,
linfct = mcp(Treatment = Matriz))

G\$linfct

summary(G,

### Adjustment options: "none", "single-step", "Shaffer",
###                     "Westfall", "free", "holm", "hochberg",
###                     "hommel", "bonferroni", "BH", "BY", "fdr"

Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
Red_vs_white == 0     21.0000     1.4907  14.087   <0.001 ***

Merlot_vs_Cab == 0    -3.0000     0.8607  -3.486   0.0157 *
Cab_vs_Syrah == 0     -3.0000     0.8607  -3.486   0.0156 *
Syrah_vs_Merlot == 0   6.0000     0.8607   6.971   <0.001 ***

(Adjusted p values reported -- single-step method)

### With test=adjusted("none"), results will be the same as aov method below.

### Tests of contrasts within aov

Another method to use single-degree-of-freedom contrasts within an anova is to use the split option within the summary function for an aov analysis.  The number of degrees of freedom that a factor can be split into for contrast tests is limited.

Input =("
Treatment   Response
'D1:C1'    1.0
'D1:C1'    1.2
'D1:C1'    1.3
'D1:C2'    2.1
'D1:C2'    2.2
'D1:C2'    2.3
'D2:C1'    1.4
'D2:C1'    1.6
'D2:C1'    1.7
'D2:C2'    2.5
'D2:C2'    2.6
'D2:C2'    2.8
'Control'  1.0
'Control'  0.9
'Control'  0.8
")

Data\$Treatment = factor(Data\$Treatment, levels=unique(Data\$Treatment))

### Specify the order of factor levels. Otherwise R will alphabetize them.

Data

boxplot(Response ~ Treatment,
data = Data,
ylab="Response",
xlab="Treatment")

levels(Data\$Treatment)

###  You need to look at order of factor levels to determine the contrasts

[1] "D1:C1"   "D1:C2"   "D2:C1"   "D2:C2"   "Control"

###  Define contrasts

D1vsD2 =          c(1,  1, -1, -1,  0)
C1vsC2 =          c(1, -1,  1, -1,  0)
InteractionDC =   c(1, -1, -1,  1,  0)
TreatsvsControl = c(1,  1,  1,  1, -4)

Matriz = cbind(D1vsD2, C1vsC2,
InteractionDC, TreatsvsControl)

contrasts(Data\$Treatment) = Matriz

CList = list("D1vsD2" = 1,
"C1vsC2" = 2,
"InteractionDC" = 3,
"TreatsvsControl" = 4)

###  Define model and display summary

model = aov(Response ~ Treatment, data = Data)

summary(model,
split=list(Treatment=CList))

Df Sum Sq Mean Sq F value   Pr(>F)
Treatment                     4  6.189   1.547  85.963 1.06e-07 ***
Treatment: D1vsD2           1  0.521   0.521  28.935  0.00031 ***
Treatment: C1vsC2           1  3.307   3.307 183.750 9.21e-08 ***
Treatment: InteractionDC    1  0.001   0.001   0.046  0.83396
Treatment: TreatsvsControl  1  2.360   2.360 131.120 4.53e-07 ***
Residuals                    10  0.180   0.018

#     #     #