This book will not investigate the concept of random effects
in models in any substantial depth. The goal of this chapter is to empower the
reader to include random effects in models in cases of paired data or repeated
measures.
Random effects in models for paired and repeated measures
As an example, if we are measuring the left hand and right hand
of several individuals, the measurements are paired within each individual.
That is, we want to statistically match the left hand of Individual A to
the right hand of Individual A, since we suppose that someone with a
large left hand will have a large right hand. Therefore, the variable Individual
will be included in the model as a random variable. Each Individual could
be thought of has a block including a measurement for the left hand and a measurement
for the right hand.
As a second example, imagine we are measuring the scores of
instructors by raters multiple times, and we can match each rater’s score at
each time to the same rater’s score at the other times. So, we want to
statistically match the score of Rater A at Time 1 to the score
of Rater A at Time 2, since we suppose that someone who scores an
instructor high at one time might score an instructor high at another time.
Therefore, the variable Rater will be included in the model as a random
variable. Each Rater could be thought of as a block including a
measurement for each of the times.
The concept of random variables
Getting a firm grasp of the concept of random variables is
not an easy task, so it is okay to not fret too much about completely
understanding the concept at this point. It is also helpful to keep in mind
that whether a variable should be considered a fixed or random variable will
depend on the interpretation of the person doing the analysis.
In the previous chapter, the variable Speaker was
used as a fixed effect in the model. Conceptually, the idea is that we
are interested in the effect of each of the specific levels of that variable.
That is, we care specifically about Pooh and Piglet and their scores.
In many cases of blocking variables, though, we don’t
necessarily care about the values for those specific blocks.
For example, if we conducted a study in two different
schools focusing on two different curricula, we may want to know about the
effect of the specific curricula, but chose the schools more or less at random.
We don’t really care if Springfield Elementary had a higher score than Shelbyville Elementary. The two schools represent any two
random schools, not those two specific schools. But we definitely want to
include the School effect in the model, because we suppose that one
school could do better with both curricula than would the other school.
Another example of this is when we have instructors rated by
student raters. If we are studying the performance of the instructors, we
don’t necessarily care about how Nelson or Milhouse or Ralph as individuals
rate instructors. They are representing students chosen at random. We just
need to include the effect of the variable Rater in the model to
statistically account for the fact that each rater might tend to rate all
instructors lower or higher than other raters.
In these examples, School and Rater could be
included in their respective models as random effects.
Mixed effects models
When a model includes both fixed effects and random effects, it is called a mixed effects model. Or the term hierarchical model may be used.
Optional technical note: Random effects in more complex models
For more complex models, specifying random effects can become difficult. Random effects can be crossed with one another or can be nested within one another. Also, correlation structures for the random effects can be specified. However, approaching this complexity is beyond the scope of this book.
The examples in this book treat random variables as simple
intercept-only variables, and as simple blocks that are not nested within other
variables.
Packages used in this chapter
The packages used in this chapter include:
• FSA
• psych
• lme4
• lmerTest
• nlme
• car
The following commands will install these packages if they are not already installed:
if(!require(FSA)){install.packages("FSA")}
if(!require(psych)){install.packages("psych")}
if(!require(lme4)){install.packages("lme4")}
if(!require(lmerTest)){install.packages("lmerTest")}
if(!require(nlme)){install.packages("nlme")}
if(!require(car)){install.packages("car")}
An example of a mixed model
For this example, we will revisit the hand size example from the Independent and Paired Values chapter. The variable hand Length will be treated as an interval/ratio variable.
Note that the model includes the blocking variable, Individual,
and so the data do not need to be in a certain order to match the paired
observations.
Also note that mixed models may make certain assumptions
about the distributions of the data. For simplicity, this example will not
check if those assumption are met.
By default, an analysis of variance for a mixed model doesn’t test the significance of the random effects in the model. However, the effect of random terms can be tested by comparing the model to a model including only the fixed effects and excluding the random effects, or with the ranova function from the lmerTest package if the lme4 package is used to specify the model.
As a technical note, the lmerTest package has options to use Satterthwaite or Kenward–Roger degrees of freedom, and options for type-III or type-II tests in the analysis of variance, if the lme4 package is used to specify the model.
Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="
Individual Hand Length
A Left 17.5
B Left 18.4
C Left 16.2
D Left 14.5
E Left 13.5
F Left 18.9
G Left 19.5
H Left 21.1
I Left 17.8
J Left 16.8
K Left 18.4
L Left 17.3
M Left 18.9
N Left 16.4
O Left 17.5
P Left 15.0
A Right 17.6
B Right 18.5
C Right 15.9
D Right 14.9
E Right 13.7
F Right 18.9
G Right 19.5
H Right 21.5
I Right 18.5
J Right 17.1
K Right 18.9
L Right 17.5
M Right 19.5
N Right 16.5
O Right 17.4
P Right 15.6
")
### Check the data frame
library(psych)
headTail(Data)
str(Data)
summary(Data)
Mixed model with lmer
One way to construct a mixed effects model for interval/ratio data is with the lmer function in the lme4 package. The lmerTest package is used to produce an analysis of variance with p-values for model effects.
Notice the grammar in the lmer function that defines the model: the term (1|Individual) is added to the model to indicate that Individual is the random term.
As a technical note, the 1 indicates that an
intercept is to be fitted for each level of the random variable.
As another technical note, REML stands for restricted
maximum likelihood. It is a method of fitting the model and is often
considered better than fitting with a conventional ML (maximum
likelihood) method.
Define model and conduct analysis of variance
library(lme4)
library(lmerTest)
model = lmer(Length ~ Hand + (1|Individual),
data=Data,
REML=TRUE)
anova(model)
Analysis of Variance Table of type III with Satterthwaite
approximation for degrees of freedom
Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)
Hand 0.45125 0.45125 1 15 11.497 0.004034 **
Test the random effects in the model
The ranova function from the lmerTest package will test the random effects in the model.
ranova(model)
ANOVA-like table for random-effects: Single term deletions
npar logLik AIC LRT Df Pr(>Chisq)
<none> 4 -36.313 80.626
(1 | Individual) 3 -65.532 137.064 58.438 1 2.098e-14 ***
Mixed model with nlme
Another way to construct a mixed effects model for
interval/ratio data is with the lme function in the nlme package.
Notice the grammar in the lme function that defines
the model: the option random=~1|Individual is added to the model to
indicate that Individual is the random term.
As a technical note, the 1 indicates that an
intercept is to be fitted for each level of the random variable.
As another technical note, REML stands for restricted
maximum likelihood. It is a method of fitting the model, and is often
considered better than fitting with a conventional ML (maximum
likelihood) method.
Define model and conduct analysis of deviance
library(nlme)
model = lme(Length ~ Hand, random=~1|Individual,
data=Data,
method="REML")
library(car)
Anova(model)
Analysis of Deviance Table (Type II tests)
Chisq Df Pr(>Chisq)
Hand 11.497 1 0.0006972 ***
Test the random effects in the model
The random effects in the model can be tested by comparing the model to a model fitted with just the fixed effects and excluding the random effects. Because there are not random effects in this second model, the gls function in the nlme package is used to fit this model.
We will use a similar method for cumulative link models.
model.fixed = gls(Length ~ Hand,
data=Data,
method="REML")
anova(model,
model.fixed)
Model df AIC BIC logLik Test L.Ratio p-value
model 1 4 80.62586 86.23065 -36.31293
model.fixed 2 3 137.06356 141.26715 -65.53178 1 vs 2 58.4377 <.0001