[banner]

An R Companion for the Handbook of Biological Statistics

Salvatore S. Mangiafico

Polynomial Contrasts in Models

Polynomial contrasts can be used for models with an ordered categorical independent variable to determine if there is a trend across the nominal categories.  For this approach to make sense, it’s necessary that the categories can be sensibly ordered.

 

In R, the independent variable should be a factor variable.  For the approach using the summary() results for an lm() model,  the factor variable should be ordered.  For the emmeans approach, the factor variable does not need to be ordered.

 

Typically, the categories are equally spaced.   That is, if the treatments are fertilizer applied to an agricultural experiment were 0, 50, 100, 150, 200 kg ha-1, they would equally spaced.  That is, the difference between each treatment and subsequent treatment is 50 kg ha-1.  If the treatments aren’t equally spaced, adjustments could be made to the contrast coefficients below, though, in my experience, this is rarely done.

 

The essence of the method is to use post-hoc contrasts to determine if there is a linear, quadratic, cubic, and so on, trend across the categorical variable for the dependent variable.  The usefulness of this method should be clear from the example below.

 

The following table is adapted from Horsley (n.d.).  It shows the standard coefficients that can be used in the polynomial contrasts for a variable with a given number of categories.  The example below uses the contrast coefficients for five treatments.

 

Note that the signs of the coefficients don’t determine the direction of the trend.  That is, (–1, 0, +1) could find a positive linear contrast or a negative linear contrast.  The t-value in the results would simply have the opposite if a contrast of (+1, 0, –1) were tested instead of (–1, 0, +1).



Number of    Degree of       Polynomial Contrast Coefficients
Treatments   Polynomial      
                             _________________________________
       2            1        -1    +1

       3            1        -1     0    +1
                    2        +1    -2    +1

       4            1        -3    -1    +1    +3
                    2        +1    -1    -1    +1
                    3        -1    +3    -3    +1

       5            1        -2    -1     0    +1    +2
                    2        +2    -1    -2    -1    +2
                    3        -1    +2     0    -2    +1
                    4        +1    -4    +6    -4    +1

       6            1        -5    -3    -1    +1    +3    +5
                    2        +5    -1    -4    -4    -1    +5
                    3        -5    +7    +4    -4    -7    +5
                    4        +1    -3    +2    +2    -3    +1
                    5        -1    +5    -10   +10   -5    +1

Adapted from Horsley, no date.

 

Horsley soybean yield example

 

The following example is taken from Horsley (n.d.).  Different spacings of soybean plants were used in a complete randomized block design (CRBD) to determine how these spacings affect crop yield.


Block = rep( 1:6, each = 5)

Spacing = rep(c(18, 24, 30, 36, 42), 6)

Yield = c(33.6, 31.1, 33.0, 28.4, 31.4,
          37.1, 34.5, 29.5, 29.9, 28.3,
          34.1, 30.5, 29.2, 31.6, 28.9,
          34.6, 32.7, 30.7, 32.3, 28.6,
          35.4, 30.7, 30.7, 28.1, 29.6,
          36.1, 30.3, 27.9, 26.9, 33.4)

Data = data.frame(Block   = factor(Block),
                  Spacing = factor(Spacing),
                  Yield   = Yield)

reshape(Data, idvar = "Block", timevar = "Spacing", direction = "wide")

     ### Show data in a table like in the original article


Block Yield.18 Yield.24 Yield.30 Yield.36 Yield.42
    1     33.6     31.1     33.0     28.4     31.4
    2     37.1     34.5     29.5     29.9     28.3
    3     34.1     30.5     29.2     31.6     28.9
    4     34.6     32.7     30.7     32.3     28.6
    5     35.4     30.7     30.7     28.1     29.6
    6     36.1     30.3     27.9     26.9     33.4


library(rcompanion)

groupwiseSum(Yield ~ Spacing, data = Data)

    ### Compare sums to those in the original article


Spacing n   Sum
     18 6 210.9
     24 6 189.8
     30 6 181.0
     36 6 177.2
     42 6 180.2


groupwiseSum(Yield ~ Block, data = Data)

    ### Compare sums to those in the original article


Block n   Sum
    1 5 157.5
    2 5 159.3
    3 5 154.3
    4 5 158.9
    5 5 154.5
    6 5 154.6


model = lm(Yield ~ Spacing + Block, data = Data)

library(car)

Anova(model)

    ### Compare results to those in the original article


Anova Table (Type II tests)

           Sum Sq Df F value    Pr(>F)   
Spacing   125.661  4  8.5000 0.0003544 ***
Block       5.410  5  0.2927 0.9113275   
Residuals  73.919 20


Using lm() to automatically report polynomial contrasts

 

In the summary() results of a lm() model, R will automatically report polynomial contrasts for the independent variables if they are ordered factors.


Data$SpacingOrdered = factor(Data$Spacing, ordered=TRUE)

 

It’s a good idea to look at the order of factor levels.  They should be in the correct order (increasing or decreasing).  If not, the variable can be re-ordered.

 

levels(Data$Spacing)    

 

"18" "24" "30" "36" "42"


model = lm(Yield ~ SpacingOrdered + Block, data = Data)

summary(model)


                 Estimate Std. Error t value Pr(>|t|)   
(Intercept)       31.5000     0.8598  36.638  < 2e-16 ***
SpacingOrdered.L  -3.9001     0.7848  -4.969 7.38e-05 ***
SpacingOrdered.Q   2.3697     0.7848   3.019  0.00677 **
SpacingOrdered.C  -0.2899     0.7848  -0.369  0.71576   
SpacingOrdered^4   0.1813     0.7848   0.231  0.81968

Note that L stands for "linear"; Q for "quadratic"; C stands for "cubic"; and ^4 for fourth-degree polynomial, or "quartic".


The results suggest that there is a significant linear trend across Spacing. Looking at the plot at the end of the chapter, it’s clear there is a general trend, that Yield decreases as Spacing increases.

 

The results also suggest that there is a significant quadratic trend.  In this case, there is the suggestion that the mean for 42 inches is greater than that for 36 inches, suggesting it begins an upward trend for larger spacing.

 

Using emmeans to determine custom polynomial contrasts

 

The emmeans package could also be used to assess polynomial contrasts.  This approach is useful because you can specify custom contrasts.

 

It’s a good idea to look at the order of factor levels.  They should be in the correct order (increasing or decreasing).  If not, the variable can be re-ordered.


levels(Data$Spacing)    


"18" "24" "30" "36" "42"


library(emmeans)

marginal = emmeans(model, ~ Spacing)

Contrasts = list(Linear    =  c(-2, -1,  0,  1, 2),
                 Quadratic =  c( 2, -1, -2, -1, 2),
                 Cubic     =  c(-1,  2,  0, -2, 1),
                 Quartic   =  c( 1, -4,  6, -4, 1))

Test = contrast(marginal, Contrasts)

test(Test)


contrast  estimate   SE df t.ratio p.value
Linear     -12.333 2.48 20  -4.969  0.0001
Quadratic    8.867 2.94 20   3.019  0.0068
Cubic       -0.917 2.48 20  -0.369  0.7158
Quartic      1.517 6.57 20   0.231  0.8197


Note that these results are the same as those from the summary() from the lm() model with Spacing treated as an ordered factor.

 

Plotting the results

 

If the treatments plotted on the x-axis aren’t equally spaced, it may be more useful to plot the x-axis as a continuous, numeric variable.  But, here, the x-axis is treated as a categorical variable.


library(emmeans)

marginal = emmeans(model, ~ Spacing)

Out = as.data.frame(emmeans(model, ~ Spacing))

Out


Spacing   emmean        SE df lower.CL upper.CL
18      35.15000 0.7848496 20 33.51283 36.78717
24      31.63333 0.7848496 20 29.99617 33.27050
30      30.16667 0.7848496 20 28.52950 31.80383
36      29.53333 0.7848496 20 27.89617 31.17050
42      30.03333 0.7848496 20 28.39617 31.67050


library(ggplot2)

png(filename = "YieldPlot.png",
    width  = 5,
    height = 3.75,
    units  = "in",
    res    = 300)

ggplot(Out,
       aes(x = Spacing,
           y = emmean)) +
 geom_point(shape = 15,
            size  = 4) +
 geom_errorbar(aes(ymin = lower.CL,
                   ymax = upper.CL),
               width = 0.05,
               linewidth  = 0.5) +

 theme_bw() +
 theme(axis.title   = element_text(face  = "bold")) +
 xlab("\nSpacing (in.)") +
 ylab("Yield (bu. / ac.)\n")

dev.off()


Plot of soybean yield vs. spacing

A plot of soybean yield vs. spacing with data from Horsley (n.d.). Error bars represent 95% confidence intervals of the estimated marginal mean.  Note that in the original the plot has an error in the placement of one data point.

 

References

 

Horsley, R.D. No date. Orthogonal Polynomial Contrasts Individual Df Comparisons: Equally Spaced Treatments. North Dakota State University. www.ndsu.edu/faculty/horsley/Polycnst.pdf.