R Companion: Polynomial Contrasts in Models

Polynomial contrasts can be used for models with an ordered categorical independent variable to determine if there is a trend across the nominal categories. For this approach to make sense, it’s necessary that the categories can be sensibly ordered.

In R, the independent variable should be a factor variable. For the approach using the summary() results for an lm() model, the factor variable should be ordered. For the emmeans approach, the factor variable does not need to be ordered.

Typically, the categories are equally spaced. That is, if the treatments are fertilizer applied to an agricultural experiment were 0, 50, 100, 150, 200 kg ha^-1, they would equally spaced. That is, the difference between each treatment and subsequent treatment is 50 kg ha^-1. If the treatments aren’t equally spaced, adjustments could be made to the contrast coefficients below, though, in my experience, this is rarely done.

The essence of the method is to use post-hoc contrasts to determine if there is a linear, quadratic, cubic, and so on, trend across the categorical variable for the dependent variable. The usefulness of this method should be clear from the example below.

The following table is adapted from Horsley (n.d.). It shows the standard coefficients that can be used in the polynomial contrasts for a variable with a given number of categories. The example below uses the contrast coefficients for five treatments.

Note that the signs of the coefficients don’t determine the direction of the trend. That is, (–1, 0, +1) could find a positive linear contrast or a negative linear contrast. The t-value in the results would simply have the opposite if a contrast of (+1, 0, –1) were tested instead of (–1, 0, +1).

Number of    Degree of       Polynomial Contrast Coefficients
Treatments   Polynomial
                             _________________________________
       2            1        -1    +1

       3            1        -1     0    +1
                    2        +1    -2    +1

       4            1        -3    -1    +1    +3
                    2        +1    -1    -1    +1
                    3        -1    +3    -3    +1

       5            1        -2    -1     0    +1    +2
                    2        +2    -1    -2    -1    +2
                    3        -1    +2     0    -2    +1
                    4        +1    -4    +6    -4    +1

       6            1        -5    -3    -1    +1    +3    +5
                    2        +5    -1    -4    -4    -1    +5
                    3        -5    +7    +4    -4    -7    +5
                    4        +1    -3    +2    +2    -3    +1
                    5        -1    +5    -10   +10   -5    +1

Adapted from Horsley, no date.

Horsley soybean yield example

The following example is taken from Horsley (n.d.). Different spacings of soybean plants were used in a complete randomized block design (CRBD) to determine how these spacings affect crop yield.

Block = rep( 1:6, each = 5)

Spacing = rep(c(18, 24, 30, 36, 42), 6)

Yield = c(33.6, 31.1, 33.0, 28.4, 31.4,
          37.1, 34.5, 29.5, 29.9, 28.3,
          34.1, 30.5, 29.2, 31.6, 28.9,
          34.6, 32.7, 30.7, 32.3, 28.6,
          35.4, 30.7, 30.7, 28.1, 29.6,
          36.1, 30.3, 27.9, 26.9, 33.4)

Data = data.frame(Block   = factor(Block),
                  Spacing = factor(Spacing),
                  Yield   = Yield)

reshape(Data, idvar = "Block", timevar = "Spacing", direction = "wide")

     ### Show data in a table like in the original article

Block Yield.18 Yield.24 Yield.30 Yield.36 Yield.42
    1     33.6     31.1     33.0     28.4     31.4
    2     37.1     34.5     29.5     29.9     28.3
    3     34.1     30.5     29.2     31.6     28.9
    4     34.6     32.7     30.7     32.3     28.6
    5     35.4     30.7     30.7     28.1     29.6
    6     36.1     30.3     27.9     26.9     33.4

library(rcompanion)

groupwiseSum(Yield ~ Spacing, data = Data)

### Compare sums to those in the original article

Spacing n   Sum
     18 6 210.9
     24 6 189.8
     30 6 181.0
     36 6 177.2
     42 6 180.2

groupwiseSum(Yield ~ Block, data = Data)

### Compare sums to those in the original article

Block n   Sum
    1 5 157.5
    2 5 159.3
    3 5 154.3
    4 5 158.9
    5 5 154.5
    6 5 154.6

model = lm(Yield ~ Spacing + Block, data = Data)

library(car)

Anova(model)

### Compare results to those in the original article

Anova Table (Type II tests)

           Sum Sq Df F value    Pr(>F)
Spacing   125.661 4 8.5000 0.0003544 ***
Block       5.410 5 0.2927 0.9113275
Residuals 73.919 20

Using lm() to automatically report polynomial contrasts

In the summary() results of a lm() model, R will automatically report polynomial contrasts for the independent variables if they are ordered factors.

Data$SpacingOrdered = factor(Data$Spacing, ordered=TRUE)

It’s a good idea to look at the order of factor levels. They should be in the correct order (increasing or decreasing). If not, the variable can be re-ordered.

levels(Data$SpacingOrdered)

"18" "24" "30" "36" "42"

model2 = lm(Yield ~ SpacingOrdered + Block, data = Data)

summary(model2)

                 Estimate Std. Error t value Pr(>|t|)
(Intercept)       31.5000     0.8598 36.638 < 2e-16 ***
SpacingOrdered.L -3.9001     0.7848 -4.969 7.38e-05 ***
SpacingOrdered.Q   2.3697     0.7848   3.019 0.00677 **
SpacingOrdered.C -0.2899     0.7848 -0.369 0.71576
SpacingOrdered^4   0.1813     0.7848   0.231 0.81968

Note that L stands for "linear"; Q for "quadratic"; C stands for "cubic"; and ^4 for fourth-degree polynomial, or "quartic".

The results suggest that there is a significant linear trend across Spacing. Looking at the plot at the end of the chapter, it’s clear there is a general trend, that Yield decreases as Spacing increases.

The results also suggest that there is a significant quadratic trend. In this case, there is the suggestion that the mean for 42 inches is greater than that for 36 inches, suggesting it begins an upward trend for larger spacing.

You can see the contrasts that R is using for the ordered factors with,

contrasts(Data$SpacingOrdered)

.L .Q .C ^4
[1,] -6.324555e-01 0.5345225 -3.162278e-01 0.1195229
[2,] -3.162278e-01 -0.2672612 6.324555e-01 -0.4780914
[3,] -3.510833e-17 -0.5345225 1.755417e-16 0.7171372
[4,] 3.162278e-01 -0.2672612 -6.324555e-01 -0.4780914
[5,] 6.324555e-01 0.5345225 3.162278e-01 0.1195229

Using emmeans to determine custom polynomial contrasts

The emmeans package could also be used to assess polynomial contrasts. This approach is useful because you can specify custom contrasts, for example if the treatments aren’t equally spaced.

It’s a good idea to look at the order of factor levels. They should be in the correct order (increasing or decreasing). If not, the variable can be re-ordered.

model = lm(Yield ~ Spacing + Block, data = Data)

levels(Data$Spacing)

"18" "24" "30" "36" "42"

library(emmeans)

marginal = emmeans(model, ~ Spacing)

Contrasts = list(Linear    = c(-2, -1, 0, 1, 2),
                 Quadratic = c( 2, -1, -2, -1, 2),
                 Cubic     = c(-1, 2, 0, -2, 1),
                 Quartic   = c( 1, -4, 6, -4, 1))

Test = contrast(marginal, Contrasts)

test(Test)

contrast estimate   SE df t.ratio p.value
Linear     -12.333 2.48 20 -4.969 0.0001
Quadratic    8.867 2.94 20   3.019 0.0068
Cubic       -0.917 2.48 20 -0.369 0.7158
Quartic      1.517 6.57 20   0.231 0.8197

Note that these p-values are the same as those from the summary() from the lm() model with Spacing treated as an ordered factor. But the coefficients are different.

The coefficients are different because R uses different contrasts by default. These can be called with,

t(contr.poly(5))

[,1] [,2] [,3] [,4] [,5]
.L -0.6324555 -0.3162278 -3.510833e-17 0.3162278 0.6324555
.Q 0.5345225 -0.2672612 -5.345225e-01 -0.2672612 0.5345225
.C -0.3162278 0.6324555 1.755417e-16 -0.6324555 0.3162278
^4 0.1195229 -0.4780914 7.171372e-01 -0.4780914 0.1195229

Note that the 5 indicates that there are five levels of treatment.

You could use these as the contrast definition in emmeans with,

Contrasts2 = list(Linear   = t(contr.poly(5))[1,],
                 Quadratic = t(contr.poly(5))[2,],
                 Cubic     = t(contr.poly(5))[3,],
                 Quartic   = t(contr.poly(5))[4,])

Test2 = contrast(marginal, Contrasts2)

test(Test2)

contrast estimate    SE df t.ratio p.value
Linear      -3.900 0.785 20 -4.969 0.0001
Quadratic    2.370 0.785 20   3.019 0.0068
Cubic       -0.290 0.785 20 -0.369 0.7158
Quartic      0.181 0.785 20   0.231 0.8197

Plotting the results

If the treatments plotted on the x-axis aren’t equally spaced, it may be more useful to plot the x-axis as a continuous, numeric variable. But, here, the x-axis is treated as a categorical variable.

library(emmeans)

marginal = emmeans(model, ~ Spacing)

Out = as.data.frame(emmeans(model, ~ Spacing))

Out

Spacing   emmean        SE df lower.CL upper.CL
18      35.15000 0.7848496 20 33.51283 36.78717
24      31.63333 0.7848496 20 29.99617 33.27050
30      30.16667 0.7848496 20 28.52950 31.80383
36      29.53333 0.7848496 20 27.89617 31.17050
42      30.03333 0.7848496 20 28.39617 31.67050

library(ggplot2)

png(filename = "YieldPlot.png",
    width = 5,
    height = 3.75,
    units = "in",
    res    = 300)

ggplot(Out,
       aes(x = Spacing,
           y = emmean)) +
geom_point(shape = 15,
            size = 4) +
geom_errorbar(aes(ymin = lower.CL,
                   ymax = upper.CL),
               width = 0.05,
               linewidth = 0.5) +

theme_bw() +
theme(axis.title   = element_text(face = "bold")) +
xlab("\nSpacing (in.)") +
ylab("Yield (bu. / ac.)\n")

dev.off()

Plot of soybean yield vs. spacing

A plot of soybean yield vs. spacing with data from Horsley (n.d.). Error bars represent 95% confidence intervals of the estimated marginal mean. Note that in the original the plot has an error in the placement of one data point.

References

Horsley, R.D. No date. Orthogonal Polynomial Contrasts Individual Df Comparisons: Equally Spaced Treatments. North Dakota State University. rcompanion.org/documents/Polycnst.pdf.

An R Companion for the Handbook of Biological Statistics

Polynomial Contrasts in Models

Horsley soybean yield example

Using lm() to automatically report polynomial contrasts

Using emmeans to determine custom polynomial contrasts

Plotting the results

References