 ## Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

# Friedman Test

The Friedman test determines if there are differences among groups for two-way data structured in a specific way, namely in an unreplicated complete block design.  In this design, one variable serves as the treatment or group variable, and another variable serves as the blocking variable.  It is the differences among treatments or groups that we are interested in.  We aren’t necessarily interested in differences among blocks, but we want our statistics to take into account differences in the blocks.  In the unreplicated complete block design, each block has one and only one observation of each treatment.

For an example of this structure, look at the Belcher family data below.  Rater is considered the blocking variable, and each rater has one observation for each Instructor.  The test will determine if there are differences among values for Instructor, taking into account any consistent effect of a Rater.  For example if Rater a rated consistently low and Rater g rated consistently high, the Friedman test can account for this statistically.

In other cases, the blocking variable might be the class where the ratings were done or the school where the ratings were done.  If you were testing differences among curricula or other teaching treatments with different instructors, different instructors might be used as blocks.

If the distribution of the differences in scores between each pair of groups are all symmetrical, or if the distribution of values for each group is similar in shape and spread, the Friedman test determines if there is a difference in medians among groups.  If not, the test determines if there is a systematic difference in the values among the groups.

Some people critique the Friedman test for having low power in detecting differences among groups.  It has been suggested, however, that Friedman test may be powerful when there are five or more groups.

##### Post-hoc tests

The outcome of the Friedman test tells you if there are differences among the groups, but doesn’t tell you which groups are different from other groups.  In order to determine which groups are different from others, post-hoc testing can be conducted.

For a post-hoc analysis, the function pairwiseSignTest in the rcompanion package can be used.  It performs a two-sample paired sign test on each pair of groups.

##### Appropriate data

•  Two-way data arranged in an unreplicated complete block design

•  Dependent variable is ordinal, interval, or ratio

•  Treatment or group independent variable is a factor with two or more levels.  That is, two or more groups

•  Blocking variable is a factor with two or more levels

•  Blocks are independent of each other and have no interaction with treatments

•  In order to be a test of medians, the distribution of the differences in scores between each pair of groups are all symmetrical, or the distributions of values for each group have similar shape and spread.  Otherwise the test is a test of distributions.

##### Hypotheses

If the distribution of the differences in scores between each pair of groups are all symmetrical, or the distributions of values for each group have similar shape and spread:

•  Null hypothesis:  The medians of values for each group are equal.

•  Alternative hypothesis (two-sided): The medians of values for each group are not equal.

If the above conditions are not met:

•  Null hypothesis:  The distributions of values for each group are equal.

•  Alternative hypothesis (two-sided): There is systematic difference in the distribution of values for the groups.

##### Interpretation

If the distribution of the differences in scores between each pair of groups are all symmetrical, or the distributions of values for each group have similar shape and spread:

Significant results can be reported as “There was a significant difference in median values across groups.”

Post-hoc analysis allows you to say “The median for group A was higher than the median for group B”, and so on.

If the above conditions are not met:

Significant results can be reported as “There was a significant difference in values among groups.”

##### Other notes and alternative tests

The Quade test is used for the same kinds of data and hypotheses, but can be more powerful in some cases.  It has been suggested that Friedman test may be preferable when there are a larger number of groups (five or more), while the Quade is preferable for fewer groups.  The Quade test is described in the next chapter.

Another alternative is to use cumulative link models for ordinal data, which are described later in this book.

### Packages used in this chapter

The packages used in this chapter include:

•  psych

•  FSA

•  lattice

•  BSDA

•  multcompView

•  PMCMR

•  rcompanion

The following commands will install these packages if they are not already installed:

if(!require(psych)){install.packages("psych")}
if(!require(FSA)){install.packages("FSA")}
if(!require(lattice)){install.packages("lattice")}
if(!require(BSDA)){install.packages("BSDA")}
if(!require(multcompView)){install.packages("multcompView")}

if(!require(PMCMR)){install.packages("PMCMR")}
if(!require(rcompanion)){install.packages("rcompanion")}

### Friedman test example

Input =("
Instructor        Rater  Likert
'Bob Belcher'        a      4
'Bob Belcher'        b      5
'Bob Belcher'        c      4
'Bob Belcher'        d      6
'Bob Belcher'        e      6
'Bob Belcher'        f      6
'Bob Belcher'        g     10
'Bob Belcher'        h      6
'Linda Belcher'      a      8
'Linda Belcher'      b      6
'Linda Belcher'      c      8
'Linda Belcher'      d      8
'Linda Belcher'      e      8
'Linda Belcher'      f      7
'Linda Belcher'      g     10
'Linda Belcher'      h      9
'Tina Belcher'       a      7
'Tina Belcher'       b      5
'Tina Belcher'       c      7
'Tina Belcher'       d      8
'Tina Belcher'       e      8
'Tina Belcher'       f      9
'Tina Belcher'       g     10
'Tina Belcher'       h      9
'Gene Belcher'       a      6
'Gene Belcher'       b      4
'Gene Belcher'       c      5
'Gene Belcher'       d      5
'Gene Belcher'       e      6
'Gene Belcher'       f      6
'Gene Belcher'       g      5
'Gene Belcher'       h      5
'Louise Belcher'     a      8
'Louise Belcher'     b      7
'Louise Belcher'     c      8
'Louise Belcher'     d      8
'Louise Belcher'     e      9
'Louise Belcher'     f      9
'Louise Belcher'     g      8
'Louise Belcher'     h     10
")

### Order levels of the factor; otherwise R will alphabetize them

Data\$Instructor = factor(Data\$Instructor,
levels=unique(Data\$Instructor))

### Create a new variable which is the likert scores as an ordered factor

Data\$Likert.f = factor(Data\$Likert,
ordered=TRUE)

###  Check the data frame

library(psych)

str(Data)

summary(Data)

### Remove unnecessary objects

rm(Input)

#### Summarize data treating Likert scores as factors

xtabs( ~ Instructor + Likert.f,
data = Data)

Likert.f
Instructor       4 5 6 7 8 9 10
Bob Belcher    2 1 4 0 0 0  1
Linda Belcher  0 0 1 1 4 1  1
Tina Belcher   0 1 0 2 2 2  1
Gene Belcher   1 4 3 0 0 0  0
Louise Belcher 0 0 0 1 4 2  1

XT = xtabs( ~ Instructor + Likert.f,
data = Data)

prop.table(XT,
margin = 1)

Likert.f
Instructor           4     5     6     7     8     9    10
Bob Belcher    0.250 0.125 0.500 0.000 0.000 0.000 0.125
Linda Belcher  0.000 0.000 0.125 0.125 0.500 0.125 0.125
Tina Belcher   0.000 0.125 0.000 0.250 0.250 0.250 0.125
Gene Belcher   0.125 0.500 0.375 0.000 0.000 0.000 0.000
Louise Belcher 0.000 0.000 0.000 0.125 0.500 0.250 0.125

#### Bar plots by group

Note that the bar plots don’t show the effect of the blocking variable.

library(lattice)

histogram(~ Likert.f | Instructor,
data=Data,
layout=c(1,5)      #  columns and rows of individual plots
) #### Bar plots of differences between groups

We can make a bar plot of the differences in values for two groups, just as we did for the sign test previously.  Values for Bob and Louisa were chosen based on their bar plots having very different shapes.

The resulting plot shows a distribution that is clearly not symmetrical.

Note that the data must be ordered by the blocking variable so that the first observation for Louisa will be paired with the first observation for Bob, and so on.

Also note that we had to specify the levels in the factor function defining Diff.f.  This is so that the values with zero counts will be displayed on the plot.

Bob = Data\$Likert [Data\$Instructor == "Bob Belcher"]
Louisa = Data\$Likert [Data\$Instructor == "Louise Belcher"]

Difference = Bob - Louisa

Diff.f = factor(Difference,
ordered = TRUE,
levels = c("-4", "-3", "-2", "-1", "0", "1", "2", "3", "4")
)

X = xtabs(~ Diff.f)

barplot(X,
col="dark gray",
xlab="Difference in Likert",
ylab="Frequency") #### pairwiseDifferences function to produce bar plots of differences between all groups

The function pairwiseDifferences will create a new data frame of differences for all pairs of differences.  The plotit=TRUE option will produce bar plots of the counts for each pair.  Otherwise the lattice package can be used to show these plots in one large trellis plot.

Note that the data must be ordered by the blocking variable so that the first observation for Bob will be paired with the first observation for Linda, and so on.

library(rcompanion)

Data.diff = pairwiseDifferences(Likert ~ Instructor,
data      = Data,
factorize = TRUE,
plotit    = TRUE)

library(psych)

Comparison Difference Difference.f
1     Bob Belcher - Linda Belcher         -4           -4
2     Bob Belcher - Linda Belcher         -1           -1
3     Bob Belcher - Linda Belcher         -4           -4
4     Bob Belcher - Linda Belcher         -2           -2
...                          <NA>        ...         <NA>
77  Gene Belcher - Louise Belcher         -3           -3
78  Gene Belcher - Louise Belcher         -3           -3
79  Gene Belcher - Louise Belcher         -3           -3
80  Gene Belcher - Louise Belcher         -5           -5

library(lattice)

histogram(~ Difference | Comparison,
data=Data.diff,
type = "count",
layout=c(2,5)      #  columns and rows of individual plots
) #### Summarize data treating Likert scores as numeric

library(FSA)

Summarize(Likert ~ Instructor,
data=Data,
digits=3)

Instructor n  mean    sd min   Q1 median   Q3 max percZero
1    Bob Belcher 8 5.875 1.885   4 4.75      6 6.00  10        0
2  Linda Belcher 8 8.000 1.195   6 7.75      8 8.25  10        0
3   Tina Belcher 8 7.875 1.553   5 7.00      8 9.00  10        0
4   Gene Belcher 8 5.250 0.707   4 5.00      5 6.00   6        0
5 Louise Belcher 8 8.375 0.916   7 8.00      8 9.00  10        0

#### Friedman test example

This example uses the formula notation indicating that Likert is the dependent variable, Instructor is the independent variable, and Rater is the blocking variable.  The data= option indicates the data frame that contains the variables.  For the meaning of other options, see ?friedman.test

friedman.test(Likert ~ Instructor | Rater,
data = Data)

Friedman rank sum test

Friedman chi-squared = 23.139, df = 4, p-value = 0.0001188

#### Effect size

Kendall’s W, or Kendall’s coefficient of concordance, can be used as an effect size statistic for Friedman’s test.

The following interpretations are based on personal intuition. They are not intended to be universal.

 small medium large Kendall’s W k = 3 < 0.10 0.10  – < 0.30 ≥ 0.30 k = 5 < 0.10 0.10  – < 0.25 ≥ 0.25 k = 7 < 0.10 0.10  – < 0.20 ≥ 0.20 k = 9 < 0.10 0.10  – < 0.20 ≥ 0.20

XT = xtabs(Likert ~ Instructor + Rater,
data = Data)

XT

Instructor        a  b  c  d  e  f  g  h
Bob Belcher     4  5  4  6  6  6 10  6
Linda Belcher   8  6  8  8  8  7 10  9
Tina Belcher    7  5  7  8  8  9 10  9
Gene Belcher    6  4  5  5  6  6  5  5
Louise Belcher  8  7  8  8  9  9  8 10

For the KendallW function, groups must be in rows, and raters must be in columns.

library(DescTools)

KendallW(XT,
correct=TRUE,
test=TRUE)

Kendall's coefficient of concordance Wt

Kendall chi-squared = 23.139, df = 4, subjects = 5, raters = 8,
p-value = 0.0001188

sample estimates:
Wt
0.7230903

In the output, check that the correct number of groups and raters is listed under "subjects" and "raters", respectively.

#### Post-hoc test: pairwise sign test for multiple comparisons of groups

Post-hoc testing can be conducted with the functions pairwiseSignTest and pairwiseSignMatrix.  These functions conduct a paired two-sample sign test on each pair of groups, and output the results as either a table or a matrix, respectively.  The matrix output can be converted to a compact letter display using the multcompLetters function in the multcompView package.

To prevent the inflation of type I error rates, adjustments to the p-values can be made using the p.adjust.method option.  See ?p.adjust for details on available p-value adjustment methods.

It has been suggested that the sign test may lack power in detecting differences in paired data sets.  But is useful because it has few assumptions about the distributions of the data to compare, and is the test analogous to the Friedman test with two groups.

##### Table format and compact letter display

Note that the data must be ordered by the blocking variable so that the first observation for Bob will be paired with the first observation for Linda, and so on.

### Order groups by median

Data\$Instructor = factor(Data\$Instructor,
levels = c("Linda Belcher", "Louise Belcher",
"Tina Belcher", "Bob Belcher",
"Gene Belcher"))

### Pairwise sign tests

library(rcompanion)

PT = pairwiseSignTest(Likert ~ Instructor,
data   = Data,
method = "fdr")
# Adjusts p-values for multiple comparisons;

PT

1  Linda Belcher - Louise Belcher = 0 1    0.375  0.46880
2    Linda Belcher - Tina Belcher = 0 3    0.625  0.68750
3     Linda Belcher - Bob Belcher = 0 7  0.01563  0.03908
4    Linda Belcher - Gene Belcher = 0 8 0.007812  0.02604
5   Louise Belcher - Tina Belcher = 0 5   0.2187  0.31240
6    Louise Belcher - Bob Belcher = 0 7  0.07031  0.11720
7   Louise Belcher - Gene Belcher = 0 8 0.007812  0.02604
8      Tina Belcher - Bob Belcher = 0 6  0.03125  0.06250
9     Tina Belcher - Gene Belcher = 0 8 0.007812  0.02604
10     Bob Belcher - Gene Belcher = 0 4   0.6875  0.68750

### Compact letter display

library(rcompanion)

data = PT,
threshold  = 0.05)

Group Letter MonoLetter
1  LindaBelcher      a        a
2 LouiseBelcher     ab        ab
3   TinaBelcher     ab        ab
4    BobBelcher     bc         bc
5   GeneBelcher      c          c

Groups sharing a letter are not significantly different (alpha = 0.05).

##### Matrix format and compact letter display

Note that the data must be ordered by the blocking variable so that the first observation for Bob will be paired with the first observation for Linda, and so on.

### Order groups by median

Data\$Instructor = factor(Data\$Instructor,
levels = c("Linda Belcher", "Louise Belcher",
"Tina Belcher", "Bob Belcher",
"Gene Belcher"))

### Pairwise sign tests

library(rcompanion)

PM = pairwiseSignMatrix(Likert ~ Instructor,
data   = Data,
method = "fdr")
# Adjusts p-values for multiple comparisons;

PM

Linda Belcher Louise Belcher Tina Belcher Bob Belcher Gene Belcher
Linda Belcher        1.00000        0.46880      0.68750     0.03908      0.02604
Louise Belcher       0.46880        1.00000      0.31240     0.11720      0.02604
Tina Belcher         0.68750        0.31240      1.00000     0.06250      0.02604
Bob Belcher          0.03908        0.11720      0.06250     1.00000      0.68750
Gene Belcher         0.02604        0.02604      0.02604     0.68750      1.00000

library(multcompView)

compare="<",
threshold=0.05,  # p-value to use as significance threshold
Letters=letters,
reversed = FALSE)

Linda Belcher Louise Belcher   Tina Belcher    Bob Belcher   Gene Belcher
"a"           "ab"           "ab"           "bc"           "c"

### Groups sharing a letter are not significantly different.

#### Post-hoc Conover test

A different post-hoc test for the Friedman test can be conducted with the posthoc.friedman.conover.test function in the PMCMR package.  The output can be converted to a compact letter display using the multcompLetters function in the multcompView package.

### Order groups by median

Data\$Instructor = factor(Data\$Instructor,
levels = c("Linda Belcher", "Louise Belcher",
"Tina Belcher", "Bob Belcher",
"Gene Belcher"))

### Conover test

library(PMCMR)

PT = posthoc.friedman.conover.test(y      = Data\$Likert,
groups = Data\$Instructor,
blocks = Data\$Rater,
# Adjusts p-values for multiple comparisons;

PT

Pairwise comparisons using Conover's test for a two-way
balanced complete block design

Linda Belcher Louise Belcher Tina Belcher Bob Belcher
Louise Belcher 0.27303       -              -            -
Tina Belcher   0.31821       0.05154        -            -
Bob Belcher    2.8e-05       1.9e-06        0.00037      -
Gene Belcher   1.2e-06       1.1e-07        8.8e-06      0.17328

### Compact letter display

PT0 = as.matrix(PT\$p.value)

library(rcompanion)

PT1 = fullPTable(PT0)

library(multcompView)

multcompLetters(PT1,
compare="<",
threshold=0.05,
Letters=letters,
reversed = FALSE)

Linda Belcher Louise Belcher   Tina Belcher    Bob Belcher   Gene Belcher
"a"            "a"            "a"            "b"            "b"