 Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Using Random Effects in Models

This book will not investigate the concept of random effects in models in any substantial depth.  The goal of this chapter is to empower the reader to include random effects in models in cases of paired data or repeated measures.

Random effects in models for paired and repeated measures

As an example, if we are measuring the left hand and right hand of several individuals, the measurements are paired within each individual.  That is, we want to statistically match the left hand of Individual A to the right hand of Individual A, since we suppose that someone with a large left hand will have a large right hand.  Therefore, the variable Individual will be included in the model as a random variable.  Each Individual could be thought of has a block including a measurement for the left hand and a measurement for the right hand.

As a second example, imagine we are measuring the scores of instructors by raters multiple times, and we can match each rater’s score at each time to the same rater’s score at the other times.  So, we want to statistically match the score of Rater A at Time 1 to the score of Rater A at Time 2, since we suppose that someone who scores an instructor high at one time might score an instructor high at another time.  Therefore, the variable Rater will be included in the model as a random variable.  Each Rater could be thought of as a block including a measurement for each of the times.

The concept of random variables

Getting a firm grasp of the concept of random variables is not an easy task, so it is okay to not fret too much about completely understanding the concept at this point.  It is also helpful to keep in mind that whether a variable should be considered a fixed or random variable will depend on the interpretation of the person doing the analysis.

In the previous chapter, the variable Speaker was used as a fixed effect in the model.  Conceptually, the idea is that we are interested in the effect of each of the specific levels of that variable.  That is, we care specifically about Pooh and Piglet and their scores.

In many cases of blocking variables, though, we don’t necessarily care about the values for those specific blocks.

For example, if we conducted a study in two different schools focusing on two different curricula, we may want to know about the effect of the specific curricula, but chose the schools more or less at random.  We don’t really care if Springfield Elementary had a higher score than Shelbyville Elementary.  The two schools represent any two random schools, not those two specific schools.  But we definitely want to include the School effect in the model, because we suppose that one school could do better with both curricula than would the other school.

Another example of this is when we have instructors rated by student raters.  If we are studying the performance of the instructors, we don’t necessarily care about how Nelson or Milhouse or Ralph as individuals rate instructors.  They are representing students chosen at random.  We just need to include the effect of the variable Rater in the model to statistically account for the fact that each rater might tend to rate all instructors lower or higher than other raters.

In these examples, School and Rater could be included in their respective models as random effects.

Mixed effects models

When a model includes both fixed effects and random effects, it is called a mixed effects model.  Or the term hierarchical model may be used.

Optional technical note:  Random effects in more complex models

For more complex models, specifying random effects can become difficult.  Random effects can be crossed with one another or can be nested within one another.  Also, correlation structures for the random effects can be specified.  However, approaching this complexity is beyond the scope of this book.

The examples in this book treat random variables as simple intercept-only variables, and as simple blocks that are not nested within other variables.

Packages used in this chapter

The packages used in this chapter include:

•  FSA

•  psych

•  lme4

•  lmerTest

•  nlme

•  car

The following commands will install these packages if they are not already installed:

if(!require(FSA)){install.packages("FSA")}
if(!require(psych)){install.packages("psych")}
if(!require(lme4)){install.packages("lme4")}
if(!require(lmerTest)){install.packages("lmerTest")}
if(!require(nlme)){install.packages("nlme")}
if(!require(car)){install.packages("car")}

An example of a mixed model

For this example, we will revisit the hand size example from the Independent and Paired Values chapter.  The variable hand Length will be treated as an interval/ratio variable.

Note that the model includes the blocking variable, Individual, and so the data do not need to be in a certain order to match the paired observations.

Also note that mixed models may make certain assumptions about the distributions of the data.  For simplicity, this example will not check if those assumption are met.

By default, an analysis of variance for a mixed model doesn’t test the significance of the random effects in the model.  However, the effect of random terms can be tested by comparing the model to a model including only the fixed effects and excluding the random effects, or with the ranova function from the lmerTest package if the lme4 package is used to specify the model.

As a technical note, the lmerTest package has options to use Satterthwaite or Kenward–Roger degrees of freedom, and options for type-III or type-II tests in the analysis of variance, if the lme4 package is used to specify the model.

Individual  Hand     Length
A           Left     17.5
B           Left     18.4
C           Left     16.2
D           Left     14.5
E           Left     13.5
F           Left     18.9
G           Left     19.5
H           Left     21.1
I           Left     17.8
J           Left     16.8
K           Left     18.4
L           Left     17.3
M           Left     18.9
N           Left     16.4
O           Left     17.5
P           Left     15.0
A           Right    17.6
B           Right    18.5
C           Right    15.9
D           Right    14.9
E           Right    13.7
F           Right    18.9
G           Right    19.5
H           Right    21.5
I           Right    18.5
J           Right    17.1
K           Right    18.9
L           Right    17.5
M           Right    19.5
N           Right    16.5
O           Right    17.4
P           Right    15.6
")

###  Check the data frame

library(psych)

str(Data)

summary(Data)

Mixed model with lmer

One way to construct a mixed effects model for interval/ratio data is with the lmer function in the lme4 package.  The lmerTest package is used to produce an analysis of variance with p-values for model effects.

Notice the grammar in the lmer function that defines the model:  the term (1|Individual) is added to the model to indicate that Individual is the random term.

As a technical note, the 1 indicates that an intercept is to be fitted for each level of the random variable.

As another technical note, REML stands for restricted maximum likelihood.  It is a method of fitting the model and is often considered better than fitting with a conventional ML (maximum likelihood) method.

Define model and conduct analysis of variance

library(lme4)

library(lmerTest)

model = lmer(Length ~ Hand + (1|Individual),
data=Data,
REML=TRUE)

anova(model)

Analysis of Variance Table of type III  with  Satterthwaite
approximation for degrees of freedom

Sum Sq Mean Sq NumDF DenDF F.value   Pr(>F)
Hand 0.45125 0.45125     1    15  11.497 0.004034 **

Test the random effects in the model

The ranova function from the lmerTest package will test the random effects in the model.

ranova(model)

ANOVA-like table for random-effects: Single term deletions

npar  logLik     AIC    LRT Df Pr(>Chisq)
<none>              4 -36.313  80.626
(1 | Individual)    3 -65.532 137.064 58.438  1  2.098e-14 ***

Mixed model with nlme

Another way to construct a mixed effects model for interval/ratio data is with the lme function in the nlme package.

Notice the grammar in the lme function that defines the model:  the option random=~1|Individual is added to the model to indicate that Individual is the random term.

As a technical note, the 1 indicates that an intercept is to be fitted for each level of the random variable.

As another technical note, REML stands for restricted maximum likelihood.  It is a method of fitting the model, and is often considered better than fitting with a conventional ML (maximum likelihood) method.

Define model and conduct analysis of deviance

library(nlme)

model = lme(Length ~ Hand, random=~1|Individual,
data=Data,
method="REML")

library(car)

Anova(model)

Analysis of Deviance Table (Type II tests)

Chisq Df Pr(>Chisq)
Hand 11.497  1  0.0006972 ***

Test the random effects in the model

The random effects in the model can be tested by comparing the model to a model fitted with just the fixed effects and excluding the random effects.  Because there are not random effects in this second model, the gls function in the nlme package is used to fit this model.

We will use a similar method for cumulative link models.

model.fixed = gls(Length ~ Hand,
data=Data,
method="REML")

anova(model,
model.fixed)

Model df       AIC       BIC    logLik   Test L.Ratio p-value
model           1  4  80.62586  86.23065 -36.31293
model.fixed     2  3 137.06356 141.26715 -65.53178 1 vs 2 58.4377  <.0001