[banner]

An R Companion for the Handbook of Biological Statistics

Salvatore S. Mangiafico

Nested Anova

Examples in Summary and Analysis of Extension Program Evaluation


SAEEPER: Using Random Effects in Models
SAEEPER: What are Least Square Means?
SAEEPER: One-way ANOVA with Random Blocks

 

 

Packages used in this chapter

The following commands will install these packages if they are not already installed:


if(!require(nlme)){install.packages("nlme")}
if(!require(multcomp)){install.packages("multcomp")}
if(!require(multcompView)){install.packages("multcompView")}
if(!require(lsmeans)){install.packages("lsmeans")}
if(!require(lme4)){install.packages("lme4")}
if(!require(lmerTest)){install.packages("lmerTest")}
if(!require(TukeyC)){install.packages("TukeyC")}

When to use it

Null hypotheses

How the test works

Partitioning variance and optimal allocation of resources

Unequal sample sizes

Assumptions

Example

Graphing the results

Similar tests

See the Handbook for information on these topics.

 

How to do the test

Nested anova example with mixed effects model (nlme)

One approach to fit a nested anova is to use a mixed effects model.  Here Tech is being treated as a fixed effect, while Rat is treated as a random effect.  Note that the F-value and p-value for the test on Tech agree with the values in the Handbook.  The effect of Rat will be tested by comparing this model to a model without the Rat term.  The model is fit using the lme function in nlme.


### --------------------------------------------------------------
### Nested anova, SAS example, pp. 171-173
### --------------------------------------------------------------

Input =("
 Tech  Rat Protein
 Janet 1   1.119
 Janet 1   1.2996
 Janet 1   1.5407
 Janet 1   1.5084
 Janet 1   1.6181
 Janet 1   1.5962
 Janet 1   1.2617
 Janet 1   1.2288
 Janet 1   1.3471
 Janet 1   1.0206
 Janet 2   1.045
 Janet 2   1.1418
 Janet 2   1.2569
 Janet 2   0.6191
 Janet 2   1.4823
 Janet 2   0.8991
 Janet 2   0.8365
 Janet 2   1.2898
 Janet 2   1.1821
 Janet 2   0.9177
 Janet 3   0.9873
 Janet 3   0.9873
 Janet 3   0.8714
 Janet 3   0.9452
 Janet 3   1.1186
 Janet 3   1.2909
 Janet 3   1.1502
 Janet 3   1.1635
 Janet 3   1.151
 Janet 3   0.9367
 Brad  5   1.3883
 Brad  5   1.104
 Brad  5   1.1581
 Brad  5   1.319
 Brad  5   1.1803
 Brad  5   0.8738
 Brad  5   1.387
 Brad  5   1.301
 Brad  5   1.3925
 Brad  5   1.0832
 Brad  6   1.3952
 Brad  6   0.9714
 Brad  6   1.3972
 Brad  6   1.5369
 Brad  6   1.3727
 Brad  6   1.2909
 Brad  6   1.1874
 Brad  6   1.1374
 Brad  6   1.0647
 Brad  6   0.9486
 Brad  7   1.2574
 Brad  7   1.0295
 Brad  7   1.1941
 Brad  7   1.0759
 Brad  7   1.3249
 Brad  7   0.9494
 Brad  7   1.1041
 Brad  7   1.1575
 Brad  7   1.294
 Brad  7   1.4543
")

Data = read.table(textConnection(Input),header=TRUE)

### Since Rat is read in as an integer variable, convert it to factor

Data$Rat = as.factor(Data$Rat)

library(nlme)

model = lme(Protein ~ Tech, random=~1|Rat,
            data=Data,
            method="REML")

anova.lme(model,
          type="sequential",
          adjustSigma = FALSE)


            numDF denDF  F-value p-value
(Intercept)     1    54 587.8664  <.0001
Tech            1     4   0.2677  0.6322


Post-hoc comparison of means

Note that “Tukey” here instructs the glht function to compare all means, not to perform a Tukey adjustment of multiple comparisons.


library(multcomp)

posthoc = glht(model,
               linfct = mcp(Tech="Tukey"))

mcs = summary(posthoc,
              test=adjusted("single-step"))

mcs

   ### Adjustment options: "none", "single-step", "Shaffer",
   ###                     "Westfall", "free", "holm", "hochberg",
   ###                     "hommel", "bonferroni", "BH", "BY", "fdr"


Linear Hypotheses:
                  Estimate Std. Error z value Pr(>|z|)
Janet - Brad == 0 -0.05060    0.09781  -0.517    0.605


cld(mcs,
    level=0.05,
    decreasing=TRUE)


 Brad Janet
  "a"   "a"

    ### Means sharing a letter are not significantly different

Post-hoc comparison of least-square means

Least squares means are adjusted for other terms in the model.  If the experimental design is unbalanced or there is missing data, the least square means may differ significantly from arithmetic means for treatments, but are generally more representative of the population means than the arithmetic means would be.

Note that the adjustments for multiple comparisons (adjust =”tukey”) appears in both the lsmeans and cld functions.


library(multcompView)
library(lsmeans)

leastsquare = lsmeans(model,
                      pairwise ~ Tech,
                      adjust="tukey")       ###  Tukey-adjusted comparisons

leastsquare


$lsmeans
 Tech    lsmean         SE df  lower.CL upper.CL
 Brad  1.211023 0.06916055  5 1.0332405 1.388806
 Janet 1.160420 0.06916055  4 0.9683995 1.352440

Confidence level used: 0.95

$contrasts
 contrast       estimate         SE df t.ratio p.value
 Brad - Janet 0.05060333 0.09780778  4   0.517  0.6322


cld(leastsquare,
    alpha=0.05,
    Letters=letters,      ### Use lower-case letters for .group
    adjust="tukey")       ### Tukey-adjusted comparisons


Tech    lsmean         SE df asymp.LCL asymp.UCL .group
 Janet 1.160420 0.06916018 NA  1.005745  1.315095  a   
 Brad  1.211023 0.06916018 NA  1.056348  1.365698  a 

    ### Means sharing a letter in .group are not significantly different

Test the significance of the random effect in the mixed effects model

In order to the test the significance of the random effect from our model (Rat), we can fit a new model with only the fixed effects from the model.  For this we use the gls function in the nlme package.  We then compare the two models with the anova fuction.  Note that the p-value does not agree with p-value from the Handbook, because the technique is different, though in this case the conclusion is the same.  As a general precaution, if your models are fit with “REML” (restricted maximum likelihood) estimation, then you should compare only models with the same fixed effects.  If you need to compare models with different fixed effects, use “ML” as the estimation method for all models.

 

model.fixed = gls(Protein ~ Tech,
                  data=Data,
                  method="REML")

anova(model,
      model.fixed)

 

            Model df       AIC       BIC   logLik   Test  L.Ratio p-value

model           1  4 -7.819054 0.4227176 7.909527                       

model.fixed     2  3 -4.499342 1.6819872 5.249671 1 vs 2 5.319713  0.0211

 

 

Checking assumptions of the model

 

hist(residuals(model),
     col="darkgray")

 

 

Rplot

A histogram of residuals from a linear model.  The distribution of these residuals should be approximately normal.

 

 

plot(fitted(model),
     residuals(model)) 

 

Rplot

 

A plot of residuals vs. predicted values.  The residuals should be unbiased and homoscedastic.  For an illustration of these properties, see this diagram by Steve Jost at DePaul University: condor.depaul.edu/sjost/it223/documents/resid-plots.gif.

 

 

### additional model checking plots with: plot(model)

 

Mixed effects model with lmer

The following is an abbreviated example of a nested anova using the lmer function in the lme4 package.  See the previous example in this chapter for explanation and model-checking.

 

### --------------------------------------------------------------
### Nested anova, SAS example, pp. 171-173
### --------------------------------------------------------------

Input =("
 Tech  Rat Protein
 Janet 1   1.119
 Janet 1   1.2996
 Janet 1   1.5407
 Janet 1   1.5084
 Janet 1   1.6181
 Janet 1   1.5962
 Janet 1   1.2617
 Janet 1   1.2288
 Janet 1   1.3471
 Janet 1   1.0206
 Janet 2   1.045
 Janet 2   1.1418
 Janet 2   1.2569
 Janet 2   0.6191
 Janet 2   1.4823
 Janet 2   0.8991
 Janet 2   0.8365
 Janet 2   1.2898
 Janet 2   1.1821
 Janet 2   0.9177
 Janet 3   0.9873
 Janet 3   0.9873
 Janet 3   0.8714
 Janet 3   0.9452
 Janet 3   1.1186
 Janet 3   1.2909
 Janet 3   1.1502
 Janet 3   1.1635
 Janet 3   1.151
 Janet 3   0.9367
 Brad  5   1.3883
 Brad  5   1.104
 Brad  5   1.1581
 Brad  5   1.319
 Brad  5   1.1803
 Brad  5   0.8738
 Brad  5   1.387
 Brad  5   1.301
 Brad  5   1.3925
 Brad  5   1.0832
 Brad  6   1.3952
 Brad  6   0.9714
 Brad  6   1.3972
 Brad  6   1.5369
 Brad  6   1.3727
 Brad  6   1.2909
 Brad  6   1.1874
 Brad  6   1.1374
 Brad  6   1.0647
 Brad  6   0.9486
 Brad  7   1.2574
 Brad  7   1.0295
 Brad  7   1.1941
 Brad  7   1.0759
 Brad  7   1.3249
 Brad  7   0.9494
 Brad  7   1.1041
 Brad  7   1.1575
 Brad  7   1.294
 Brad  7   1.4543
")

Data = read.table(textConnection(Input),header=TRUE)

Data$Rat = as.factor(Data$Rat)

library(lme4)
library(lmerTest)

model = lmer(Protein ~ Tech + (1|Rat),
             data=Data,
             REML=TRUE)

anova(model)


Analysis of Variance Table of type III  with  Satterthwaite
approximation for degrees of freedom

        Sum Sq   Mean Sq NumDF DenDF F.value Pr(>F)
Tech 0.0096465 0.0096465     1     4 0.26768 0.6322


rand(model)


Analysis of Random effects Table:

    Chi.sq Chi.DF p.value 
Rat   5.32      1    0.02 *



difflsmeans(model,
            test.effs="Tech")


Differences of LSMEANS:
                  Estimate Standard Error  DF t-value Lower CI Upper CI p-value
Tech Brad - Janet      0.1         0.0978 4.0    0.52   -0.221    0.322     0.6

 

 

library(multcomp)

posthoc = glht(model,

               linfct = mcp(Tech="Tukey"))

mcs = summary(posthoc,

              test=adjusted("single-step"))

mcs


Linear Hypotheses:
                  Estimate Std. Error z value Pr(>|z|)
Janet - Brad == 0 -0.05060    0.09781  -0.517    0.605
(Adjusted p values reported -- single-step method)

 


cld(mcs,
    level=0.05,
    decreasing=TRUE)


Brad Janet
  "a"   "a"

#     #     #

Nested anova example with the aov function


### --------------------------------------------------------------
### Nested anova, SAS example, pp. 171-173
### --------------------------------------------------------------

Input =("
Tech  Rat Protein
Janet 1   1.119
Janet 1   1.2996
Janet 1   1.5407
Janet 1   1.5084
Janet 1   1.6181
Janet 1   1.5962
Janet 1   1.2617
Janet 1   1.2288
Janet 1   1.3471
Janet 1   1.0206
Janet 2   1.045
Janet 2   1.1418
Janet 2   1.2569
Janet 2   0.6191
Janet 2   1.4823
Janet 2   0.8991
Janet 2   0.8365
Janet 2   1.2898
Janet 2   1.1821
Janet 2   0.9177
Janet 3   0.9873
Janet 3   0.9873
Janet 3   0.8714
Janet 3   0.9452
Janet 3   1.1186
Janet 3   1.2909
Janet 3   1.1502
Janet 3   1.1635
Janet 3   1.151
Janet 3   0.9367
Brad  5   1.3883
Brad  5   1.104
Brad  5   1.1581
Brad  5   1.319
Brad  5   1.1803
Brad  5   0.8738
Brad  5   1.387
Brad  5   1.301
Brad  5   1.3925
Brad  5   1.0832
Brad  6   1.3952
Brad  6   0.9714
Brad  6   1.3972
Brad  6   1.5369
Brad  6   1.3727
Brad  6   1.2909
Brad  6   1.1874
Brad  6   1.1374
Brad  6   1.0647
Brad  6   0.9486
Brad  7   1.2574
Brad  7   1.0295
Brad  7   1.1941
Brad  7   1.0759
Brad  7   1.3249
Brad  7   0.9494
Brad  7   1.1041
Brad  7   1.1575
Brad  7   1.294
Brad  7   1.4543
")

Data = read.table(textConnection(Input),header=TRUE)


### Since Rat is read in as an integer variable, convert it to factor

Data$Rat = as.factor(Data$Rat)

 

Using the aov function for a nested anova

The aov function in the native stats package allows you to specify an error component to the model.  When formulating this model in R, the correct error is Rat, not Tech/Rat (Rat within Tech) as used in the SAS example.  The SAS model will tolerate Rat or Rat(Tech).

The summary of the aov will produce the correct test for Tech.  The test for Rat can be performed by manually calculating the p-value for the F-test using the output for Error:Rat and Error:Within.

See the rattlesnake example in the Two-way anova chapter for designating an error term in a repeated-measures model.


fit = aov(Protein ~ Tech + Error(Rat), data=Data)
summary(fit)


Error: Rat
          Df Sum Sq Mean Sq F value Pr(>F)
Tech       1 0.0384 0.03841   0.268  0.632
Residuals  4 0.5740 0.14349  

   ### This matches “use for groups” in the Handbook


Using Mean Sq and Df values to get p-value for H = Tech and Error = Rat


pf(q=0.03841/0.14349,
   df1=1,
   df2=4,
   lower.tail=FALSE)


[1] 0.6321845

   ### Note: This is same test as summary(fit)


Using Mean Sq and Df values to get p-value for H = Rat and Error = Within


summary(fit)


Error: Within
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 54  1.946 0.03604


pf(q=0.14349/0.03604,
   df1=4,
   df2=54,
   lower.tail=F)


[1] 0.006663615  

   ### Matches “use for subgroups” in the Handbook


Post-hoc comparison of means with Tukey

The aov function with an Error component produces an object of aovlist type, which unfortunately isn’t handled by many post-hoc testing functions.  However, in the TukeyC package, you can specify a model and error term.  For unbalanced data, the dispersion parameter may need to be modified.


library(TukeyC)

tuk = TukeyC(Data,
             model = 'Protein ~ Tech + Error(Rat)',
             error = 'Rat',
             which = 'Tech',
             fl1=1,
             sig.level = 0.05)

summary(tuk)


Groups of means at sig.level = 0.05
      Means G1
Brad   1.21  a
Janet  1.16  a


#     #     #