[banner]

An R Companion for the Handbook of Biological Statistics

Salvatore S. Mangiafico

Two-way Anova

Examples in Summary and Analysis of Extension Program Evaluation


SAEEPER: Two-way ANOVA

SAEEPER: Using Random Effects in Models
SAEEPER: What are Least Square Means?

 

Packages used in this chapter

The following commands will install these packages if they are not already installed:


if(!require(FSA)){install.packages("FSA")}
if(!require(ggplot2)){install.packages("ggplot2")}
if(!require(car)){install.packages("car")}
if(!require(multcompView)){install.packages("multcompView")}
if(!require(lsmeans)){install.packages("lsmeans")}
if(!require(grid)){install.packages("grid")}
if(!require(nlme)){install.packages("nlme")}
if(!require(lme4)){install.packages("lme4")}
if(!require(lmerTest)){install.packages("lmerTest")} if(!require(rcompanion)){install.packages("rcompanion")}

 

When to use it

Null hypotheses

How the test works

Assumptions

See the Handbook for information on these topics.

 

Examples

The rattlesnake example is shown at the end of the “How to do the test” section.

 

How to do the test

For notes on linear models and conducting anova, see the “How to do the test” section in the One-way anova chapter of this book.  For two-way anova with robust regression, see the chapter on Two-way Anova with Robust Estimation.

 

Two-way anova example

 

### --------------------------------------------------------------
### Two-way anova, SAS example, pp. 179–180
### --------------------------------------------------------------

Input = ("
 id Sex     Genotype  Activity
  1 male    ff        1.884
  2 male    ff        2.283
  3 male    fs        2.396
  4 female  ff        2.838  
  5 male    fs        2.956
  6 female  ff        4.216
  7 female  ss        3.620
  8 female  ff        2.889  
  9 female  fs        3.550
 10 male    fs        3.105
 11 female  fs        4.556
 12 female  fs        3.087
 13 male    ff        4.939
 14 male    ff        3.486
 15 female  ss        3.079
 16 male    fs        2.649
 17 female  fs        1.943
 19 female  ff        4.198
 20 female  ff        2.473
 22 female  ff        2.033 
 24 female  fs        2.200
 25 female  fs        2.157
 26 male    ss        2.801
 28 male    ss        3.421
 29 female  ff        1.811
 30 female  fs        4.281
 32 female  fs        4.772
 34 female  ss        3.586
 36 female  ff        3.944
 38 female  ss        2.669
 39 female  ss        3.050
 41 male    ss        4.275 
 43 female  ss        2.963
 46 female  ss        3.236
 48 female  ss        3.673
 49 male    ss        3.110
")

Data = read.table(textConnection(Input),header=TRUE)

 

 

Means and summary statistics by group

 

library(Rmisc)

sum = summarySE(Data,
                measurevar="Activity",
                groupvars=c("Sex","Genotype"))
sum

 

     Sex Genotype N Activity        sd        se        ci

1 female       ff 8  3.05025 0.9599032 0.3393770 0.8024992

2 female       fs 8  3.31825 1.1445388 0.4046556 0.9568584

3 female       ss 8  3.23450 0.3617754 0.1279069 0.3024518

4   male       ff 4  3.14800 1.3745115 0.6872558 2.1871546

5   male       fs 4  2.77650 0.3168433 0.1584216 0.5041684

6   male       ss 4  3.40175 0.6348109 0.3174055 1.0101258

 

 

Interaction plot using summary statistics

 

library(ggplot2)

pd = position_dodge(.2)

ggplot(sum, aes(x=Genotype,
                y=Activity,
                color=Sex)) +
    geom_errorbar(aes(ymin=Activity-se,
                      ymax=Activity+se),
                   width=.2, size=0.7, position=pd) +
    geom_point(shape=15, size=4, position=pd) +
    theme_bw() +
    theme(
          axis.title.y = element_text(vjust= 1.8),
          axis.title.x = element_text(vjust= -0.5),
          axis.title = element_text(face = "bold")) +
    scale_color_manual(values = c("black", "blue"))

### You may see an error, “ymax not defined”
###  In this case, it does not appear to affect anything

 

 

Interaction plot for a two-way anova.  Square points represent means for groups, and error bars indicate standard errors of the mean.

 

 

Simple box plot of main effect and interaction

 

boxplot(Activity ~ Genotype,
        data = Data,
        xlab = "Genotype",
        ylab = "MPI Activity")

 

Rplot

 

 

boxplot(Activity ~ Genotype:Sex,
        data = Data,
        xlab = "Genotype x Sex",
        ylab = "MPI Activity")

 

Rplot03

 

Fit the linear model and conduct ANOVA

 

model = lm(Activity ~ Sex + Genotype + Sex:Genotype,
           data=Data)

library(car)

Anova(model, type="II")                    # Can use type="III"

   ### If you use type="III", you need the following line before the analysis
   ###
options(contrasts = c("contr.sum", "contr.poly"))

 

              Sum Sq Df F value Pr(>F)

Sex           0.0681  1  0.0861 0.7712

Genotype      0.2772  2  0.1754 0.8400

Sex:Genotype  0.8146  2  0.5153 0.6025

Residuals    23.7138 30 

 

 

anova(model)                               # Produces type I sum of squares

 

             Df  Sum Sq Mean Sq F value Pr(>F)

Sex           1  0.0681 0.06808  0.0861 0.7712

Genotype      2  0.2772 0.13862  0.1754 0.8400

Sex:Genotype  2  0.8146 0.40732  0.5153 0.6025

Residuals    30 23.7138 0.79046

 

 

summary(model)     # Produces r-square, overall p-value, parameter estimates

 

Multiple R-squared:  0.04663, Adjusted R-squared:  -0.1123

F-statistic: 0.2935 on 5 and 30 DF,  p-value: 0.9128

 

 

Checking assumptions of the model

 

hist(residuals(model),
     col="darkgray")

 

RPlot

A histogram of residuals from a linear model.  The distribution of these residuals should be approximately normal.

 

 

plot(fitted(model),
     residuals(model))

 

RPlot

 

A plot of residuals vs. predicted values.  The residuals should be unbiased and homoscedastic.  For an illustration of these properties, see this diagram by Steve Jost at DePaul University: condor.depaul.edu/sjost/it223/documents/resid-plots.gif.

 

 

### additional model checking plots with: plot(model)
### alternative: library(FSA); residPlot(model)

 

 

Post-hoc comparison of least-square means

For notes on least-square means, see the “Post-hoc comparison of least-square means” section in the Nested anova chapter in this book.

 

One advantage of the using the lsmeans package for post-hoc tests is that it can produce comparisons for interaction effects.

 

In general, if the interaction effect is significant, you will want to look at comparisons of means for the interactions.  If the interaction effect is not significant but a main effect is, it is appropriate to look at comparisons among the means for that main effect.  In this case, because no effect of Sex, Genotype, or Sex:Genotype was significant, we would not actually perform any mean separation test.

 

Mean separations for main factor with lsmeans

 

library(multcompView)
library(lsmeans)

lsmeans = lsmeans::lsmeans ### Uses the lsmeans function
                           ###  from the lsmeans package,
                           ###  not from the lmerTest package



leastsquare = lsmeans(model,
                      "Genotype",
                      adjust="tukey")

cld(leastsquare,
    alpha=.05,
    Letters=letters)

 

Genotype   lsmean        SE df lower.CL upper.CL .group

 fs       3.047375 0.2722236 30 2.359065 3.735685  a   

 ff       3.099125 0.2722236 30 2.410815 3.787435  a   

 ss       3.318125 0.2722236 30 2.629815 4.006435  a

 

    ### Means sharing a letter in .group are not significantly different

 

 

Mean separations for interaction effect with lsmeans

 

library(multcompView)
library(lsmeans)

lsmeans = lsmeans::lsmeans ### Uses the lsmeans function
                           ###  from the lsmeans package,
                           ###  not from the lmerTest package


leastsquare = lsmeans(model,
                      pairwise ~ Sex:Genotype,
                      adjust="tukey")

 

cld(leastsquare,
    alpha=.05, 
    Letters=letters)

    

Sex    Genotype  lsmean        SE df lower.CL upper.CL .group

 male   fs       2.77650 0.4445393 30 1.524666 4.028334  a   

 female ff       3.05025 0.3143368 30 2.165069 3.935431  a   

 male   ff       3.14800 0.4445393 30 1.896166 4.399834  a   

 female ss       3.23450 0.3143368 30 2.349319 4.119681  a   

 female fs       3.31825 0.3143368 30 2.433069 4.203431  a   

 male   ss       3.40175 0.4445393 30 2.149916 4.653584  a 

 

### Note that means are listed from low to high,

###  not in the same order as summarySE

 

 

Graphing the results

 

Simple bar plot with categories and no error bars

 

### Re-enter data as matrix

Input =("
Sex      ff       fs       ss
 Female  3.05025  3.31825  3.23450
 Male    3.14800  2.77650  3.40175
")

Matriz = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

Matriz

 

            ff      fs      ss

Female 3.05025 3.31825 3.23450

Male   3.14800 2.77650 3.40175

 

 

barplot(Matriz,
        beside=TRUE,
        legend=TRUE,
        ylim=c(0, 5),
        xlab="Genotype",
        ylab="MPI Activity")

 

RPlot

 

Bar plot with error bars with ggplot2

This plot uses the data frame created by summarySE in Rmisc.  Error bars indicate standard error of the means (se in the data frame).

 

library(Rmisc)

sum = summarySE(Data, measurevar="Activity", groupvars=c("Sex","Genotype"))

sum

 

     Sex Genotype N Activity        sd        se        ci

1 female       ff 8  3.05025 0.9599032 0.3393770 0.8024992

2 female       fs 8  3.31825 1.1445388 0.4046556 0.9568584

3 female       ss 8  3.23450 0.3617754 0.1279069 0.3024518

4   male       ff 4  3.14800 1.3745115 0.6872558 2.1871546

5   male       fs 4  2.77650 0.3168433 0.1584216 0.5041684

6   male       ss 4  3.40175 0.6348109 0.3174055 1.0101258

 

 

### Plot adapted from:
###   shinyapps.stat.ubc.ca/r-graph-catalog/
 
library(ggplot2)
library(grid)

ggplot(sum,
   aes(x = Genotype, y = Activity, fill = Sex,
       ymax=Activity+se, ymin=Activity-se))  +
       geom_bar(stat="identity", position = "dodge", width = 0.7) +
       geom_bar(stat="identity", position = "dodge",
                colour = "black", width = 0.7,
                show_guide = FALSE)  +
       scale_y_continuous(breaks = seq(0, 4, 0.5),
                limits = c(0, 4),
                expand = c(0, 0))  +
       scale_fill_manual(name = "Count type" ,
                 values = c('grey80', 'grey30'),
                 labels = c("Female",
                            "Male"))  +
       geom_errorbar(position=position_dodge(width=0.7),
                     width=0.0, size=0.5, color="black")  +
       labs(x = "Genotype",
            y = "MPI Activity")  +
       ## ggtitle("Main title") +
       theme_bw()  +
       theme(panel.grid.major.x = element_blank(),
             panel.grid.major.y = element_line(colour = "grey50"),
             plot.title = element_text(size = rel(1.5),
             face = "bold", vjust = 1.5),
             axis.title = element_text(face = "bold"),
             legend.position = "top",
             legend.title = element_blank(),
             legend.key.size = unit(0.4, "cm"),
             legend.key = element_rect(fill = "black"),
             axis.title.y = element_text(vjust= 1.8),
             axis.title.x = element_text(vjust= -0.5)
           )

 

#     #     #

 

RPlot

Bar plot for a two-way anova.  Bar heights represent means for groups, and error bars indicate standard errors of the mean.

 

 

Rattlesnake example – two-way anova without replication, repeated measures

This example could be interpreted as two-way anova without replication or as a one-way repeated measures experiment. Below it is analyzed as a two-way fixed effects model using the lm function, and as a mixed effects model using the nlme package and lme4 packages.

 

### --------------------------------------------------------------
### Two-way anova, rattlesnake example, pp. 177–178
### --------------------------------------------------------------

Input = ("
 Day  Snake  Openings
 1    D1        85
 1    D3       107
 1    D5        61
 1    D8        22
 1    D11       40
 1    D12       65
 2    D1        58
 2    D3        51
 2    D5        60
 2    D8        41
 2    D11       45
 2    D12       27
 3    D1        15
 3    D3        30
 3    D5        68
 3    D8        63
 3    D11       28
 3    D12        3
 4    D1        57
 4    D3        12
 4    D5        36
 4    D8        21
 4    D11       10
 4    D12       16
")

Data = read.table(textConnection(Input),header=TRUE)


### Treat Day as a factor variable

Data$Day = as.factor(Data$Day)

 

 

Using two-way fixed effects model

 

Means and summary statistics by group

 

library(Rmisc)

sum = summarySE(Data, measurevar="Openings", groupvars=c("Day"))

sum

 

  Day N Openings       sd        se       ci

1   1 6 63.33333 30.45434 12.432931 31.95987

2   2 6 47.00000 12.21475  4.986649 12.81859

3   3 6 34.50000 25.95958 10.597956 27.24291

4   4 6 25.33333 18.08498  7.383164 18.97903

 

 

Simple box plots

 

boxplot(Openings ~ Day,
        data = Data,
        xlab = "Day",
        ylab = "Openings until tail stops rattling")

 

RPlot

 

Fit the linear model and conduct ANOVA

 

model = lm(Openings ~ Day + Snake,
           data=Data)

library(car)

Anova(model, type="II")                    # Can use type="III"

 

          Sum Sq Df F value  Pr(>F) 

Day       4877.8  3  3.3201 0.04866 *

Snake     3042.2  5  1.2424 0.33818 

Residuals 7346.0 15                 

 

 

anova(model)                               # Produces type I sum of squares

 

          Df Sum Sq Mean Sq F value  Pr(>F) 

Day        3 4877.8 1625.93  3.3201 0.04866 *

Snake      5 3042.2  608.44  1.2424 0.33818 

Residuals 15 7346.0  489.73                 

 

 

summary(model)     # Produces r-square, overall p-value, parameter estimates

 

Multiple R-squared:  0.5188,  Adjusted R-squared:  0.2622

F-statistic: 2.022 on 8 and 15 DF,  p-value: 0.1142

 

 

Checking assumptions of the model

 

hist(residuals(model),
     col="darkgray")

 

RPlot

A histogram of residuals from a linear model.  The distribution of these residuals should be approximately normal.

 

 

plot(fitted(model),
     residuals(model))

 

RPlot

 

A plot of residuals vs. predicted values.  The residuals should be unbiased and homoscedastic.  For an illustration of these properties, see this diagram by Steve Jost at DePaul University: condor.depaul.edu/sjost/it223/documents/resid-plots.gif.

 

### additional model checking plots with: plot(model)
### alternative: library(FSA); residPlot(model)

 

 

Mean separations for main factor with lsmeans

 

For notes on least-square means, see the “Post-hoc comparison of least-square” means section in the Nested anova chapter in this book.  For other mean separation techniques for a main factor in anova, see “Tukey and Least Significant Difference mean separation tests (pairwise comparisons)” section in the One-way anova chapter.

 

library(multcompView)
library(lsmeans)

lsmeans = lsmeans::lsmeans ### Uses the lsmeans function
                           ###  from the lsmeans package,
                           ###  not from the lmerTest package


leastsquare = lsmeans(model,
                      "Day",
                      adjust="tukey")

cld(leastsquare,
    alpha=.05,
    Letters=letters)

 

Day   lsmean       SE df   lower.CL upper.CL .group

 4   25.33333 9.034476 15 -0.2085871 50.87525  a   

 3   34.50000 9.034476 15  8.9580796 60.04192  ab  

 2   47.00000 9.034476 15 21.4580796 72.54192  ab  

 1   63.33333 9.034476 15 37.7914129 88.87525   b 

 

    ### Means sharing a letter in .group are not significantly different

 

  

Using mixed effects model with nlme

This is an abbreviated example using the lme function in the nlme package.


library(nlme)

model = lme(Openings ~ Day, random=~1|Snake,
            data=Data,
            method="REML")

anova.lme(model,
          type="sequential",
          adjustSigma = FALSE)


            numDF denDF  F-value p-value
(Intercept)     1    15 71.38736  <.0001
Day             3    15  3.32005  0.0487



library(multcompView)
library(lsmeans)

lsmeans = lsmeans::lsmeans ### Uses the lsmeans function
                           ###  from the lsmeans package,
                           ###  not from the lmerTest package


leastsquare = lsmeans(model,
              "Day",
               adjust="tukey")

cld(leastsquare,
    alpha=.05,
    Letters=letters)


Day   lsmean       SE df asymp.LCL asymp.UCL .group
 4   25.33333 9.304196 NA  2.157372  48.50929  a   
 3   34.50000 9.304196 NA 11.324038  57.67596  ab  
 2   47.00000 9.304196 NA 23.824038  70.17596  ab  
 1   63.33333 9.304196 NA 40.157372  86.50929   b

   ### Means sharing a letter in .group are not significantly different

 

 

Using mixed effects model with lmer

This is an abbreviated example using the lmer function in the lme4 package.


library(lme4)
library(lmerTest)

model = lmer(Openings ~ Day + (1|Snake),
            data=Data,
            REML=TRUE)

anova(model)


Analysis of Variance Table of type III  with  Satterthwaite
approximation for degrees of freedom

    Sum Sq Mean Sq NumDF DenDF F.value  Pr(>F) 
Day 4877.8  1625.9     3    15  3.3201 0.04866 *



rand(model)


Analysis of Random effects Table:
      Chi.sq Chi.DF p.value
Snake 0.0915      1     0.8


Least square means with the lsmeans package


library(multcompView)
library(lsmeans)

lsmeans = lsmeans::lsmeans ### Uses the lsmeans function
                           ###  from the lsmeans package,
                           ###  not from the lmerTest package


leastsquare = lsmeans(model,
                     pairwise ~ Day,
                     alpha=.05,
                     adjust="tukey")

cld(leastsquare,
    alpha=.05, 
    Letters=letters,
    adjust="tukey")


Day   lsmean       SE    df   lower.CL upper.CL .group
 4   25.33333 9.304196 19.81 -0.1441779 50.81084  a   
 3   34.50000 9.304196 19.81  9.0224887 59.97751  ab  
 2   47.00000 9.304196 19.81 21.5224887 72.47751  ab  
 1   63.33333 9.304196 19.81 37.8558221 88.81084   b  

Degrees-of-freedom method: satterthwaite
Confidence level used: 0.95
Conf-level adjustment: sidak method for 4 estimates
P value adjustment: tukey method for comparing a family of 4 estimates
significance level used: alpha = 0.05

   ### Means sharing a letter in .group are not significantly different


Least square means using the lmerTest package


lsmeans = lmerTest::lsmeans ### Uses the lsmeans function
                            ###  from the lmerTest package,
                            ###  not from the lsmeans package

LT = lsmeans(model,
             test.effs = "Day")

LT


Least Squares Means table:
        Day Estimate Standard Error   DF t-value Lower CI Upper CI p-value   
Day  1  1.0    63.33           9.30 19.8    6.81    43.91     82.8  <2e-16 ***
Day  2  2.0    47.00           9.30 19.8    5.05    27.58     66.4   1e-04 ***
Day  3  3.0    34.50           9.30 19.8    3.71    15.08     53.9   0.001 **
Day  4  4.0    25.33           9.30 19.8    2.72     5.91     44.8   0.013 * 


PT = difflsmeans(model,
                 test.effs="Day")

PT


Differences of LSMEANS:

          Estimate Standard Error   DF t-value Lower CI Upper CI p-value  
Day 1 - 2     16.3          12.78 15.0    1.28   -10.90     43.6   0.220  
Day 1 - 3     28.8          12.78 15.0    2.26     1.60     56.1   0.039 *
Day 1 - 4     38.0          12.78 15.0    2.97    10.77     65.2   0.009 **
Day 2 - 3     12.5          12.78 15.0    0.98   -14.73     39.7   0.343  
Day 2 - 4     21.7          12.78 15.0    1.70    -5.57     48.9   0.111  
Day 3 - 4      9.2          12.78 15.0    0.72   -18.07     36.4   0.484  



### Extract lsmeans table


Sum = PT$diffs.lsmeans.table


### Extract comparisons and p-values

Comparison = rownames(Sum)

P.value   = Sum$'p-value'


### Adjust p-values

P.value.adj = p.adjust(P.value,
                       method =  
"none")   

### Fix names of comparisons

Comparison = gsub(
"-", "- Day", Comparison)


### Produce compact letter display

library(rcompanion)

cldList(comparison = Comparison,
        p.value    = P.value.adj,
        threshold = 0.05)

  Group Letter MonoLetter
1  Day1      a         a
2  Day2     ab         ab
3  Day3      b          b
4  Day4      b          b


#     #     #