## Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

# Two-way Repeated Ordinal Regression with CLMM

A two-way repeated ordinal analysis of variance can address an experimental design with two independent variables, each of which is a factor variable, plus a blocking variable.  The main effect of each independent variable can be tested, as well as the effect of the interaction of the two factors.

The example here looks at students’ knowledge scores for three speakers across two different times.  Because we know which student has each score at each time, we will use the Student as a blocking variable, with the suspicion that one student might have consistently lower knowledge than another student.  If we hadn’t recorded this information about students, we could use two-way ordinal regression.

As a matter of practical interpretation, we may not care about the absolute scores for each of the three speakers, only if the knowledge scores for students increased from Time 1 to Time 2.  At Time 1, students’ knowledge scores are lower for Piglet that for the other two speakers.  This isn’t the fault of Piglet; he is just speaking about a topic that the students have little knowledge of.

The clmm function can specify more complex models with multiple independent variables of different types, but this book will not explore more complex examples.

Post-hoc analysis to determine which groups are different can be conducted on each significant main effect and on the interaction effect if it is significant.

### Packages used in this chapter

The packages used in this chapter include:

•  psych

•  FSA

•  lattice

•  ggplot2

•  ordinal

•  car

•  RVAideMemoire

•  emmeans

•  multcomp

•  rcompanion

The following commands will install these packages if they are not already installed:

if(!require(psych)){install.packages("psych")}
if(!require(FSA)){install.packages("FSA")}
if(!require(lattice)){install.packages("lattice")}
if(!require(ggplot2)){install.packages("ggplot2")}
if(!require(ordinal)){install.packages("ordinal")}
if(!require(car)){install.packages("car")}
if(!require(RVAideMemoire)){install.packages("RVAideMemoire")}
if(!require(emmeans)){install.packages("emmeans")}
if(!require(multcomp)){install.packages("multcomp")}
if(!require(rcompanion)){install.packages("rcompanion")}

### Two-way repeated ordinal regression example

Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="

Speaker  Student  Time  Likert
Pooh        a       1       3
Pooh        b       1       5
Pooh        c       1       4
Pooh        d       1       4
Pooh        e       1       4
Pooh        f       1       3
Pooh        g       1       4
Pooh        h       1       4

Pooh        a       2       4
Pooh        b       2       5
Pooh        c       2       5
Pooh        d       2       5
Pooh        e       2       5
Pooh        f       2       4
Pooh        g       2       5
Pooh        h       2       5

Piglet      i       1       1
Piglet      j       1       2
Piglet      k       1       1
Piglet      l       1       1
Piglet      m       1       1
Piglet      n       1       2
Piglet      o       1       3
Piglet      p       1       1

Piglet      i       2       2
Piglet      j       2       4
Piglet      k       2       2
Piglet      l       2       1
Piglet      m       2       2
Piglet      n       2       2
Piglet      o       2       4
Piglet      p       2       2

Eeyore      q       1       4
Eeyore      r       1       5
Eeyore      s       1       4
Eeyore      t       1       4
Eeyore      u       1       4
Eeyore      v       1       4
Eeyore      w       1       4
Eeyore      x       1       4

Eeyore      q       2       5
Eeyore      r       2       4
Eeyore      s       2       4
Eeyore      t       2       4
Eeyore      u       2       3
Eeyore      v       2       4
Eeyore      w       2       4
Eeyore      x       2       4
")

### Change Time into a factor variable

Data\$Time = factor(Data\$Time)

### Order levels of the factor; otherwise R will alphabetize them

Data\$Speaker = factor(Data\$Speaker,
levels=unique(Data\$Speaker))

### Create a new variable which is the likert scores as an ordered factor

Data\$Likert.f = factor(Data\$Likert,
ordered = TRUE)

###  Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)

#### Summarize data treating Likert scores as factors

xtabs( ~ Time + Likert.f + Speaker,
data = Data)

Likert.f
Time 1 2 3 4 5
1 0 0 2 5 1
2 0 0 0 2 6

, , Speaker = Piglet

Likert.f
Time 1 2 3 4 5
1 5 2 1 0 0
2 1 5 0 2 0

, , Speaker = Eeyore

Likert.f
Time 1 2 3 4 5
1 0 0 0 7 1
2 0 0 1 6 1

#### Bar plots by group

Note that these plots do not take into the effect of blocking variable.

library(lattice)

histogram(~ Likert.f | Speaker,
data=Data,
layout=c(1,3)      #  columns and rows of individual plots
)

library(lattice)

histogram(~ Likert.f | Time,
data=Data,
layout=c(1,2)      #  columns and rows of individual plots
)

library(lattice)

histogram(~ Likert.f | Time + Speaker,
data=Data,
layout=c(2,3)      #  columns and rows of individual plots
)

#### Summarize data treating Likert scores as numeric

library(FSA)

Summarize(Likert ~ Speaker + Time,
data=Data,
digits=3)

Speaker Time n nvalid  mean    sd min   Q1 median  Q3 max percZero
1    Pooh    1 8      8 3.875 0.641   3 3.75      4 4.0   5        0
2  Piglet    1 8      8 1.500 0.756   1 1.00      1 2.0   3        0
3  Eeyore    1 8      8 4.125 0.354   4 4.00      4 4.0   5        0
4    Pooh    2 8      8 4.750 0.463   4 4.75      5 5.0   5        0
5  Piglet    2 8      8 2.375 1.061   1 2.00      2 2.5   4        0
6  Eeyore    2 8      8 4.000 0.535   3 4.00      4 4.0   5        0

#### Plot Likert scores before and after

Since we are most interested in pairing scores from Time 1 to Time 2, we can plot the data in a way that pairs points.

Points above and to left of the blue line indicate cases in which the score for Time 2 is greater than that for Time 1.

Note that the data must be ordered by Student and Speaker for this code to plot the data correctly.  That is, the first observation for Time==1 must be the same student and speaker as the first observation for Time==2

Also note that the points on the plot are jittered so that multiple points with the same values will be visible.

Time.1  = Data\$Likert[Data\$Time==1]
Time.2  = Data\$Likert[Data\$Time==2]
Speaker = Data\$Speaker[Data\$Time==2]

plot(Time.1, jitter(Time.2),
pch = 16,
cex = 1,
col = Speaker,
xlab="Time 1",
ylab="Time 2")

abline(0,1, col="blue", lwd=2)

legend('bottomright',
legend = unique(Speaker),
col = 1:3,
cex = 1,
pch = 16)

#### Produce interaction plot with medians and confidence intervals

The following plot will show the interaction between Speaker and Time.  The median for each Speaker x Time combination is plotted, and the confidence intervals for each are shown with error bars.  In this case some of the medians and confidence limits are the same values, so some error bars are not visible.

Be aware that confidence intervals determined by bootstrap for medians may not be appropriate with discrete data, like these Likert-item data, or with a small sample size.

The plot makes it easy to see the change in median score for each speaker from Time 1 to Time 2.

library(rcompanion)

Sum = groupwiseMedian(Likert ~ Speaker + Time,
data       = Data,
conf       = 0.95,
R          = 5000,
percentile = TRUE,
bca        = FALSE,
digits     = 3)
Sum

library(ggplot2)

pd = position_dodge(.2)

ggplot(Sum, aes(x=Time,
y=Median,
color=Speaker)) +
geom_errorbar(aes(ymin=Percentile.lower,
ymax=Percentile.upper),
width=.2, size=0.7, position=pd) +
geom_point(shape=15, size=4, position=pd) +
theme_bw() +
theme(axis.title = element_text(face = "bold")) +

ylab("Median Likert score")

#### Two-way repeated ordinal regression

In the model notation in the clmm function, here, Likert.f is the dependent variable and Speaker and Time are the independent variables.  The term Time:Spreaker adds the interaction effect of these two independent variables to the model.  Student is used as a blocking variable and is entered as a random variable.  The data= option indicates the data frame that contains the variables.  For the meaning of other options, see ?clmm.

Here the threshold = "equidistant" option is used in order to avoid errors for this data set.  This option does not need to be used routinely.

##### Optional technical note

In the model below Student is treated as a simple random blocking variable, specified by (1|Student).  Note that in the data, students were lettered ax straight through the Speakers.  If the students had been lettered ah for each Speaker, but there were actually 24 students, we would need to tell the model that Student was nested within Speaker, by specifying that the random effect was (1|Speaker/Student), or that the random effect of interest was the interaction of Speaker and Student, by specifying (1|Speaker:Student).  This second specification is equivalent to that in the code below, with the original data, and will produce the same results.

I consider it a good practice to assign unique individuals with unique labels.  This practice is not always followed, but following the practice will avoid mis-specifying models.  In this case with 24 students, if they had been lettered ah for each Speaker, and the original model with the (1|Student) effect had been used, the results would be wrong because the model would treat Pooh’s student a as the same individual as Piglet’s student a.

##### Define model and conduct analysis of deviance

library(ordinal)

model = clmm(Likert.f ~ Time + Speaker + Time:Speaker + (1|Student),
data = Data,
threshold = "equidistant")

anova(model, type="II")

Error in eval(predvars, data, env) : object 'Likert.f' not found

### I don’t know why this error occurs

library(car)

library(RVAideMemoire)

Anova.clmm(model,
type = "II")

Analysis of Deviance Table (Type II tests)

LR Chisq Df Pr(>Chisq)
Time            9.774  1    0.00177 **
Speaker        34.485  2  3.249e-08 ***
Time:Speaker   28.202  2  7.517e-07 ***

##### p-value for model  and pseudo R-squared

In order to use the nagelkerke function for a clmm model object, a null model has to be specified explicitly.  (This is not the case with a clm model object).

We want the null model to have no fixed effects except for an intercept, indicated with a 1 on the right side of the ~.  But we also may want to include the random effects, in this case (1 | Student).

We can compare our full model against a model with an overall intercept and an intercept for each student.

Be sure that the threshold option in model.null is the same as in the original model.

model.null = clmm(Likert.f ~ 1 + (1 | Student),
data = Data
,
threshold = "equidistant")

library(rcompanion)

nagelkerke(fit  = model,
null = model.null
)

\$Pseudo.R.squared.for.model.vs.null
Pseudo.R.squared
McFadden                             0.544198
Cox and Snell (ML)                   0.787504
Nagelkerke (Cragg and Uhler)         0.836055

\$Likelihood.ratio.test
Df.diff LogLik.diff  Chisq   p.value
-5     -37.172 74.344 1.275e-14

Another approach to determining the p-value and pseudo R-squared for a clmm model is comparing the model to a null model with only an intercept and neither the fixed nor the random effects.

model.null.2 = clm(Likert.f ~ 1,
data = Data
,
threshold = "equidistant")

library(rcompanion)

nagelkerke(fit  = model,
null = model.null.2
)

\$Pseudo.R.squared.for.model.vs.null
Pseudo.R.squared
McFadden                             0.587734
Cox and Snell (ML)                   0.842667
Nagelkerke (Cragg and Uhler)         0.880526

\$Likelihood.ratio.test
Df.diff LogLik.diff  Chisq    p.value
-6     -44.385 88.771 5.4539e-17

##### Determine the effect of the random variable

The effect of the random variable (Student, in this case) can be determined by comparing the full model to a model with only the fixed effects (Speaker, Time, and Speaker:Time in this case).

First, model.fixed is defined, and then the two models are passed to the anova function.

Be sure that the threshold option in model.fixed is the same as in the original model.

model.fixed = clm(Likert.f ~ Time + Speaker + Time:Speaker,
data = Data,
threshold = "equidistant")

anova(model,
model.fixed)

Likelihood ratio tests of cumulative link models:

no.par     AIC  logLik LR.stat df Pr(>Chisq)
model.fixed      7 101.197 -43.598
model            8  78.268 -31.134  24.928  1   5.95e-07 ***

#### Post-hoc tests

For clmm model objects, the emmean, SE, LCL, and UCL values should be ignored, unless specific options in emmeans are selected.  See ?emmeans::models for more information.

Because the interaction term in the model was significant, the group separations for the interaction is explored.

library(emmeans)

marginal = emmeans(model,
~ Speaker + Time)
pairs(marginal,
adjust="tukey")

library(multcomp)

cld(marginal, Letters=letters)

Speaker  Time    emmean          SE df asymp.LCL asymp.UCL .group
Piglet  1    -36.08909 0.002735251 NA -36.09629 -36.08190  a
Piglet  2    -17.88982 0.003252687 NA -17.89838 -17.88127   b
Eeyore  2     17.46501 2.047095879 NA  12.07902  22.85100    c
Pooh    1     18.19627 0.002535027 NA  18.18960  18.20294    c
Eeyore  1     19.30751 2.030625019 NA  13.96486  24.65016    c
Pooh    2     36.72566 0.002910484 NA  36.71800  36.73332     d

Confidence level used: 0.95
P value adjustment: tukey method for comparing a family of 6 estimates
significance level used: alpha = 0.05

NOTE: Compact letter displays can be misleading
because they show NON-findings rather than findings.
Consider using 'pairs()', 'pwpp()', or 'pwpm()' instead.

### Groups sharing a letter in .group are not significantly different

### Remember to ignore “emmean”, “LCL”, “LCL”, and “UCL” with CLMM

##### Focusing on meaningful comparisons

Because we wanted to focus on the comparing each speaker at Time1 and Time 2, let’s rearrange the emmeans cld table to focus on these comparisons.

With the results in this format, it is easy to see that the scores for both Pooh and Piglet improved from Time 1 to Time 2, but that the scores for poor Eeyore did not.

Speaker Time    emmean          SE df  asymp.LCL asymp.UCL .group
Pooh    1     18.19627 0.002535027 NA  18.18960  18.20294    c
Pooh    2     36.72566 0.002910484 NA  36.71800  36.73332     d

Speaker Time    emmean          SE df  asymp.LCL asymp.UCL .group
Piglet  1    -36.08909 0.002735251 NA -36.09629 -36.08190  a
Piglet  2    -17.88982 0.003252687 NA -17.89838 -17.88127   b

Speaker Time    emmean          SE df  asymp.LCL asymp.UCL .group
Eeyore  1     19.30751 2.030625019 NA  13.96486  24.65016    c
Eeyore  2     17.46501 2.047095879 NA  12.07902  22.85100    c

#### Interaction plot with group separation letters

Group separation letters can be added manually to the plot of medians.

##### Check model assumptions

At the time of writing, the nominal_test and scale_test functions don’t work with clmm model objects, so we will define a similar model with clm and no random effects, and test the proportional odds assumption on this model.

model.clm = clm(Likert.f ~ Time + Speaker + Time:Speaker + Student,
data = Data,
threshold = "equidistant")

Warning message:
(1) Hessian is numerically singular: parameters are not uniquely determined
In addition: Absolute convergence criterion was met, but relative criterion was not met.

###  The fitting of this model failed.  We’ll try a model without student.

model.clm = clm(Likert.f ~ Time + Speaker + Time:Speaker,
data = Data,
threshold = "equidistant")

nominal_test(model.clm)

Tests of nominal effects

formula: Likert.f ~ Time + Speaker + Time:Speaker
Df  logLik     AIC    LRT Pr(>Chi)
<none>          -43.598 101.197
Time          1 -43.149 102.298 0.8983  0.34325
Speaker       2 -40.053  98.106 7.0903  0.02886 *
Time:Speaker  5 -39.393 102.786 8.4110  0.13499

###  Note the significant result for speaker in this test.

scale_test(model.clm)

Warning messages:
1: (-1) Model failed to converge with max|grad| = 2.26258e-05 (tol = 1e-06)
In addition: iteration limit reached

###  This test failed.  It would be possible to adjust the fitting parameters,
###    and try to get the model to converge.