## Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

# Friedman Test

The Friedman test determines if there are differences among groups for two-way data structured in a specific way, namely in an unreplicated complete block design.  In this design, one variable serves as the treatment or group variable, and another variable serves as the blocking variable.  It is the differences among treatments or groups that we are interested in.  We aren’t necessarily interested in differences among blocks, but we want our statistics to take into account differences in the blocks.  In the unreplicated complete block design, each block has one and only one observation of each treatment.

For an example of this structure, look at the Belcher family data below.  Rater is considered the blocking variable, and each rater has one observation for each Instructor.  The test will determine if there are differences among values for Instructor, taking into account any consistent effect of a Rater.  For example, if Rater a rated consistently low and Rater g rated consistently high, the Friedman test can account for this statistically.

In other cases, the blocking variable might be the class where the ratings were done or the school where the ratings were done.  If you were testing differences among curricula or other teaching treatments with different instructors, different instructors might be used as blocks.

Some people critique the Friedman test for having low power in detecting differences among groups.  It has been suggested, however, that Friedman test may be powerful when there are five or more groups.

In general, you may want to choose a more powerful test.  For an ordinal dependent variable, ordinal regression can be used, with the blocking variable being used as a random variable in the model.  For a continuous dependent variable, the Quade test is an option, or aligned ranks transformation anova (ART anova) could be used, with the blocking variable being used as a random variable in the model.

##### Post-hoc tests

The outcome of the Friedman test tells you if there are differences among the groups, but doesn’t tell you which groups are different from other groups.  In order to determine which groups are different from others, post-hoc testing can be conducted.  Several are presented here.

##### Appropriate data

•  Two-way data arranged in an unreplicated complete block design

•  Dependent variable is ordinal, interval, or ratio

•  Treatment or group independent variable is a factor with two or more levels.  That is, two or more groups

•  Blocking variable is a factor with two or more levels

•  Blocks are independent of each other and have no interaction with treatments

##### Hypotheses

•  Null hypothesis:  The distributions of values for each group are equal.

•  Alternative hypothesis (two-sided): There is systematic difference in the distribution of values for the groups.

##### Interpretation

Significant results can be reported as “There was a significant difference in values among groups.”

##### Other notes and alternative tests

The Quade test is used for the same kinds of data and hypotheses, but can be more powerful in some cases.  It has been suggested that Friedman test may be preferable when there are a larger number of groups (five or more), while the Quade is preferable for fewer groups.  The Quade test is described in the next chapter.

Cumulative link models for ordinal data (ordinal regression) are appropriate when the dependent variable is ordinal.  Otherwise, aligned ranks transformation anova may be appropriate.  Either of these approaches allows for more flexibility in design than the Friedman or Quade tests.

If the unreplicated block design is partially incomplete, the Skillings–Mack test can be used.

### Packages used in this chapter

The packages used in this chapter include:

•  psych

•  FSA

•  lattice

•  coin

•  PMCMRplus

•  rcompanion

•  DescTools

### Friedman test example

Instructor        Rater  Likert
'Bob Belcher'        a      4
'Bob Belcher'        b      5
'Bob Belcher'        c      4
'Bob Belcher'        d      6
'Bob Belcher'        e      6
'Bob Belcher'        f      6
'Bob Belcher'        g     10
'Bob Belcher'        h      6
'Linda Belcher'      a      8
'Linda Belcher'      b      6
'Linda Belcher'      c      8
'Linda Belcher'      d      8
'Linda Belcher'      e      8
'Linda Belcher'      f      7
'Linda Belcher'      g     10
'Linda Belcher'      h      9
'Tina Belcher'       a      7
'Tina Belcher'       b      5
'Tina Belcher'       c      7
'Tina Belcher'       d      8
'Tina Belcher'       e      8
'Tina Belcher'       f      9
'Tina Belcher'       g     10
'Tina Belcher'       h      9
'Gene Belcher'       a      6
'Gene Belcher'       b      4
'Gene Belcher'       c      5
'Gene Belcher'       d      5
'Gene Belcher'       e      6
'Gene Belcher'       f      6
'Gene Belcher'       g      5
'Gene Belcher'       h      5
'Louise Belcher'     a      8
'Louise Belcher'     b      7
'Louise Belcher'     c      8
'Louise Belcher'     d      8
'Louise Belcher'     e      9
'Louise Belcher'     f      9
'Louise Belcher'     g      8
'Louise Belcher'     h     10
")

### Order levels of the factor; otherwise R will alphabetize them

Data\$Instructor = factor(Data\$Instructor,
levels=unique(Data\$Instructor))

### Create a new variable which is the likert scores as an ordered factor

Data\$Likert.f = factor(Data\$Likert,
ordered=TRUE)

###  Check the data frame

library(psych)

str(Data)

summary(Data)

#### Summarize data treating Likert scores as factors

xtabs( ~ Instructor + Likert.f,
data = Data)

Likert.f
Instructor       4 5 6 7 8 9 10
Bob Belcher    2 1 4 0 0 0  1
Linda Belcher  0 0 1 1 4 1  1
Tina Belcher   0 1 0 2 2 2  1
Gene Belcher   1 4 3 0 0 0  0
Louise Belcher 0 0 0 1 4 2  1

XT = xtabs( ~ Instructor + Likert.f,
data = Data)

prop.table(XT,
margin = 1)

Likert.f
Instructor           4     5     6     7     8     9    10
Bob Belcher    0.250 0.125 0.500 0.000 0.000 0.000 0.125
Linda Belcher  0.000 0.000 0.125 0.125 0.500 0.125 0.125
Tina Belcher   0.000 0.125 0.000 0.250 0.250 0.250 0.125
Gene Belcher   0.125 0.500 0.375 0.000 0.000 0.000 0.000
Louise Belcher 0.000 0.000 0.000 0.125 0.500 0.250 0.125

#### Bar plots by group

Note that the bar plots don’t show the effect of the blocking variable.

library(lattice)

histogram(~ Likert.f | Instructor,
data=Data,
layout=c(1,5),
col="darkgray")

####  (1,5) indicates the columns and rows for the plots

#### Summarize data treating Likert scores as numeric

library(FSA)

Summarize(Likert ~ Instructor,
data=Data,
digits=3)

Instructor n  mean    sd min   Q1 median   Q3 max percZero
1    Bob Belcher 8 5.875 1.885   4 4.75      6 6.00  10        0
2  Linda Belcher 8 8.000 1.195   6 7.75      8 8.25  10        0
3   Tina Belcher 8 7.875 1.553   5 7.00      8 9.00  10        0
4   Gene Belcher 8 5.250 0.707   4 5.00      5 6.00   6        0
5 Louise Belcher 8 8.375 0.916   7 8.00      8 9.00  10        0

#### Friedman test example

This example uses the formula notation indicating that Likert is the dependent variable, Instructor is the independent variable, and Rater is the blocking variable.  The data= option indicates the data frame that contains the variables.  For the meaning of other options, see ?friedman.test or documentation for other employed functions.

friedman.test(Likert ~ Instructor | Rater,
data = Data)

Friedman rank sum test

Friedman chi-squared = 23.139, df = 4, p-value = 0.0001188

library(coin)

friedman_test(Likert ~ Instructor | Rater,
data = Data)

Asymptotic Friedman Test

chi-squared = 23.139, df = 4, p-value = 0.0001188

library(PMCMRplus)

friedmanTest(y      = Data\$Likert,
groups = Data\$Instructor,
blocks = Data\$Rater)

Friedman rank sum test

Friedman chi-squared = 23.139, df = 4, p-value = 0.0001188

#### Effect size

Kendall’s W, or Kendall’s coefficient of concordance, can be used as an effect size statistic for Friedman’s test.

The following interpretations are based on personal intuition. They are not intended to be universal.

 small medium large Kendall’s W k = 3 < 0.10 0.10  – < 0.30 ≥ 0.30 k = 5 < 0.10 0.10  – < 0.25 ≥ 0.25 k = 7 < 0.10 0.10  – < 0.20 ≥ 0.20 k = 9 < 0.10 0.10  – < 0.20 ≥ 0.20

XT = xtabs(Likert ~ Instructor + Rater,
data = Data)

XT

Instructor        a  b  c  d  e  f  g  h
Bob Belcher     4  5  4  6  6  6 10  6
Linda Belcher   8  6  8  8  8  7 10  9
Tina Belcher    7  5  7  8  8  9 10  9
Gene Belcher    6  4  5  5  6  6  5  5
Louise Belcher  8  7  8  8  9  9  8 10

For the KendallW function, groups must be in rows, and raters must be in columns.

library(DescTools)

KendallW(XT,
correct=TRUE,
test=TRUE)

Kendall's coefficient of concordance Wt

Kendall chi-squared = 23.139, df = 4, subjects = 5, raters = 8,
p-value = 0.0001188

sample estimates:
Wt
0.7230903

In the output above, check that the correct number of groups and raters is listed under "subjects" and "raters", respectively.

library(rcompanion)

kendallW(XT, correct=TRUE)

W
0.723

kendallW(XT, correct=TRUE, ci=TRUE)

W lower.ci upper.ci
1 0.723    0.547    0.917

###  Confidence intervals by bootstrap may vary

#### Post-hoc tests

##### Conover test

### Order groups by median

Data\$Instructor = factor(Data\$Instructor,
levels = c("Linda Belcher", "Louise Belcher",
"Tina Belcher", "Bob Belcher",
"Gene Belcher"))

library(PMCMRplus)

CT = frdAllPairsConoverTest(y      = Data\$Likert,
groups = Data\$Instructor,
blocks = Data\$Rater,

CT

Pairwise comparisons using Conover's all-pairs test for a two-way balanced complete block design

Linda Belcher Louise Belcher Tina Belcher Bob Belcher
Louise Belcher 0.9794        -              -            -
Tina Belcher   0.9884        0.8278         -            -
Bob Belcher    0.0853        0.0169         0.2490       -
Gene Belcher   0.0099        0.0012         0.0447       0.9489

library(rcompanion)

CTT =PMCMRTable(CT)

CTT

Comparison p.value
1  Louise Belcher - Linda Belcher = 0   0.979
2    Tina Belcher - Linda Belcher = 0   0.988
3     Bob Belcher - Linda Belcher = 0  0.0853
4    Gene Belcher - Linda Belcher = 0 0.00993
5   Tina Belcher - Louise Belcher = 0   0.828
6    Bob Belcher - Louise Belcher = 0  0.0169
7   Gene Belcher - Louise Belcher = 0 0.00123
8      Bob Belcher - Tina Belcher = 0   0.249
9     Gene Belcher - Tina Belcher = 0  0.0447
10     Gene Belcher - Bob Belcher = 0   0.949

library(rcompanion)

cldList(p.value ~ Comparison, data = CTT)

Group Letter MonoLetter
1 LouiseBelcher      a        a
2   TinaBelcher     ab        ab
3    BobBelcher     bc         bc
4   GeneBelcher      c          c
5  LindaBelcher     ab        ab

##### Exact test

library(PMCMRplus)

ET = frdAllPairsExactTest(y      = Data\$Likert,
groups = Data\$Instructor,
blocks = Data\$Rater,

ET

Pairwise comparisons using Eisinga, Heskes, Pelzer & Te Grotenhuis all-pairs test with exact p-values for a two-way balanced complete block design

data: y, groups and blocks

Linda Belcher Louise Belcher Tina Belcher Bob Belcher
Louise Belcher 0.65081       -              -            -
Tina Belcher   0.69729       0.44188        -            -
Bob Belcher    0.02456       0.00768        0.07761      -
Gene Belcher   0.00601       0.00047        0.01833      0.60368

library(rcompanion)

ETT =PMCMRTable(ET)

ETT

Comparison  p.value
1  Louise Belcher - Linda Belcher = 0    0.651
2    Tina Belcher - Linda Belcher = 0    0.697
3     Bob Belcher - Linda Belcher = 0   0.0246
4    Gene Belcher - Linda Belcher = 0  0.00601
5   Tina Belcher - Louise Belcher = 0    0.442
6    Bob Belcher - Louise Belcher = 0  0.00768
7   Gene Belcher - Louise Belcher = 0 0.000467
8      Bob Belcher - Tina Belcher = 0   0.0776
9     Gene Belcher - Tina Belcher = 0   0.0183
10     Gene Belcher - Bob Belcher = 0    0.604

library(rcompanion)

cldList(p.value ~ Comparison, data = ETT)

Group Letter MonoLetter
1 LouiseBelcher      a        a
2   TinaBelcher     ab        ab
3    BobBelcher     bc         bc
4   GeneBelcher      c          c
5  LindaBelcher      a        a

##### Nemenyi test

library(PMCMRplus)

NT = frdAllPairsNemenyiTest(Likert ~ Instructor | Rater, data = Data)

NT

Pairwise comparisons using Nemenyi-Wilcoxon-Wilcox all-pairs test for a two-way balanced complete block design

Linda Belcher Louise Belcher Tina Belcher Bob Belcher
Louise Belcher 0.9816        -              -            -
Tina Belcher   0.9897        0.8426         -            -
Bob Belcher    0.1021        0.0224         0.2775       -
Gene Belcher   0.0136        0.0019         0.0557       0.9540

library(rcompanion)

NTT =PMCMRTable(NT)

NTT

Comparison p.value
1  Louise Belcher - Linda Belcher = 0   0.982
2    Tina Belcher - Linda Belcher = 0    0.99
3     Bob Belcher - Linda Belcher = 0   0.102
4    Gene Belcher - Linda Belcher = 0  0.0136
5   Tina Belcher - Louise Belcher = 0   0.843
6    Bob Belcher - Louise Belcher = 0  0.0224
7   Gene Belcher - Louise Belcher = 0 0.00189
8      Bob Belcher - Tina Belcher = 0   0.278
9     Gene Belcher - Tina Belcher = 0  0.0557
10     Gene Belcher - Bob Belcher = 0   0.954

library(rcompanion)

cldList(p.value ~ Comparison, data = NTT)

Group Letter MonoLetter
1 LouiseBelcher      a        a
2   TinaBelcher    abc        abc
3    BobBelcher     bc         bc
4   GeneBelcher      b         b
5  LindaBelcher     ac        a c

##### Siegel test

library(PMCMRplus)

groups = Data\$Instructor,
blocks = Data\$Rater,

ST

Pairwise comparisons using Siegel-Castellan all-pairs test for a two-way balanced complete block design

Linda Belcher Louise Belcher Tina Belcher Bob Belcher
Louise Belcher 0.6353        -              -            -
Tina Belcher   0.6353        0.4344         -            -
Bob Belcher    0.0285        0.0089         0.0802       -
Gene Belcher   0.0078        0.0020         0.0180       0.5960

library(rcompanion)

STT =PMCMRTable(ST)

STT

Comparison p.value
1  Louise Belcher - Linda Belcher = 0   0.635
2    Tina Belcher - Linda Belcher = 0   0.635
3     Bob Belcher - Linda Belcher = 0  0.0285
4    Gene Belcher - Linda Belcher = 0 0.00783
5   Tina Belcher - Louise Belcher = 0   0.434
6    Bob Belcher - Louise Belcher = 0 0.00888
7   Gene Belcher - Louise Belcher = 0 0.00203
8      Bob Belcher - Tina Belcher = 0  0.0802
9     Gene Belcher - Tina Belcher = 0   0.018
10     Gene Belcher - Bob Belcher = 0   0.596

library(rcompanion)

cldList(p.value ~ Comparison, data = STT)

Group Letter MonoLetter
1 LouiseBelcher      a        a
2   TinaBelcher     ab        ab
3    BobBelcher     bc         bc
4   GeneBelcher      c          c
5  LindaBelcher      a        a

### Example from Conover

This example is taken from the Friedman test section of Conover (1999).

Homeowner Grass1 Grass2 Grass3 Grass4
1        4      3      2      1
2        4      2      3      1
3        3      1.5    1.5    4
4        3      1      2      4
5        4      2      1      3
6        2      2      2      4
7        1      3      2      4
8        2      4      1      3
9        3.5    1      2      3.5
10        4      1      3      2
11        4      2      3      1
12        3.5    1      2      3.5
")

if(!require(tidyr)){install.packages("tidyr")}

library(tidyr)

Conover = gather(Conover1, Grass, Rating, Grass1:Grass4, factor_key=TRUE)

###  Check the data frame

library(psych)

str(Conover)

summary(Conover)

###  Friedman test

friedman.test(Rating ~ Grass | Homeowner,
data = Conover)

Friedman rank sum test

Friedman chi-squared = 8.0973, df = 3, p-value = 0.04404

GT = xtabs(Rating ~ Grass + Homeowner,
data = Conover)

GT

Homeowner
Grass      1   2   3   4   5   6   7   8   9  10  11  12
Grass1 4.0 4.0 3.0 3.0 4.0 2.0 1.0 2.0 3.5 4.0 4.0 3.5
Grass2 3.0 2.0 1.5 1.0 2.0 2.0 3.0 4.0 1.0 1.0 2.0 1.0
Grass3 2.0 3.0 1.5 2.0 1.0 2.0 2.0 1.0 2.0 3.0 3.0 2.0
Grass4 1.0 1.0 4.0 4.0 3.0 4.0 4.0 3.0 3.5 2.0 1.0 3.5

library(DescTools)

KendallW(GT, correct=TRUE, test=TRUE)

Kendall's coefficient of concordance Wt

Kendall chi-squared = 8.0973, df = 3, subjects = 4, raters = 12, p-value = 0.04404

sample estimates:
Wt
0.2249263

library(PMCMRplus)

frdAllPairsExactTest(y      = Conover\$Rating,
groups = Conover\$Grass,
blocks = Conover\$Homeowner,

Pairwise comparisons using Eisinga, Heskes, Pelzer & Te Grotenhuis all-pairs test with exact p-values for a two-way balanced complete block design

Grass1 Grass2 Grass3
Grass2 0.094  -      -
Grass3 0.094  0.938  -
Grass4 0.701  0.194  0.201