Mood’s median test compares the medians of two or more groups. The test can be conducted with the mood.medtest function in the RVAideMemoire package or with the median_test function in the coin package.
Post-hoc tests
The outcome of Mood’s median test tells you if there are differences among the groups, but doesn’t tell you which groups are different from other groups. In order to determine which groups are different from others, post-hoc testing can be conducted. The function pairwiseMedianTest in the rcompanion package can perform the post-hoc tests. It simply passes data for pairs of groups to coin::median_test and produces a table of output.
Appropriate data
• One-way data with two or more groups
• Dependent variable is ordinal, interval, or ratio
• Independent variable is a factor with levels indicating groups
• Observations between groups are independent. That is, not paired or repeated measures data
Hypotheses
• Null hypothesis: The medians of the populations from which the groups were sampled are equal.
• Alternative hypothesis (two-sided): The medians of the populations from which the groups were sampled are not all equal.
Interpretation
Significant results can be reported as “There was a significant difference in the median values among groups.”
Post-hoc analysis allows you to say “The median for group A was higher than the median for group B”, and so on.
Packages used in this chapter
The packages used in this chapter include:
• RVAideMemoire
• coin
• rcompanion
The following commands will install these packages if they are not already installed:
if(!require(RVAideMemoire)){install.packages("RVAideMemoire")}
if(!require(coin)){install.packages("coin")}
if(!require(rcompanion)){install.packages("rcompanion")}
Mood’s median test example
This example uses the formula notation indicating that Likert is the dependent variable and Speaker is the independent variable. The data= option indicates the data frame that contains the variables. For the meaning of other options, see ?mood.medtest or the documentation for the employed function.
A significant p-value for Mood’s median test indicates that not all medians among groups are equal.
For appropriate plots and summary statistics, see the Kruskal–Wallis Test chapter.
Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="
Speaker Likert
Pooh 3
Pooh 5
Pooh 4
Pooh 4
Pooh 4
Pooh 4
Pooh 4
Pooh 4
Pooh 5
Pooh 5
Piglet 2
Piglet 4
Piglet 2
Piglet 2
Piglet 1
Piglet 2
Piglet 3
Piglet 2
Piglet 2
Piglet 3
Tigger 4
Tigger 4
Tigger 4
Tigger 4
Tigger 5
Tigger 3
Tigger 5
Tigger 4
Tigger 4
Tigger 3
")
### Check the data frame
library(psych)
headTail(Data)
str(Data)
summary(Data)
RVAideMemoire package
library(RVAideMemoire)
mood.medtest(Likert ~ Speaker,
data = Data,
exact = FALSE)
Mood's median test
X-squared = 3.36, df = 2, p-value = 0.1864
An interesting thing happened with the result here. The test counts how many observations in each group are greater than the global median for all groups together, in this case 4. It then tests if there is a significant difference in this proportion among groups. For this data set, however, both Pooh and Tigger have a majority of observations equal to the global median. Because they are equal to the global median, they are not greater than the global median, and so aren’t much different than Piglet’s scores on this count. The result in this case is a non-significant p-value.
But the test would come out differently if we were counting observation less than the global median, because Pooh and Tigger have few of these, and Piglet has relatively many.
This is a quirk with Mood’s median test, and isn’t common in statistical tests.
One solution would be to re-code the function to count observations less than the global median.
But it is easier to simply invert the scale we are using. This is really an arbitrary change, but for this test, it can make a difference. Imagine if our original scale interpreted 5 to be the best, and 1 to be the worst. When we designed the survey tool, we could just as easily have made 1 the best and 5 the worst. And then instead of ranking “good” with a 4, the respondents would have marked it 2, and so on. By the way the calculations are done, this arbitrary change in scale will change the results of Mood’s median test.
For a 5-point scale, we do this inversion by simply by making a new variable equal to 6 minus the original score.
With Mood’s median test, I recommend making this kind of inversion in cases where many values are equal to the global median. Then use whichever result has a lower p-value.
Data$Likert.inv = 6 - Data$Likert
library(psych)
headTail(Data)
Speaker Likert Likert.inv
1 Pooh 3 3
2 Pooh 5 1
3 Pooh 4 2
4 Pooh 4 2
... <NA> ... ...
27 Tigger 5 1
28 Tigger 4 2
29 Tigger 4 2
30 Tigger 3 3
library(RVAideMemoire)
mood.medtest(Likert.inv ~ Speaker,
data = Data,
exact = FALSE)
Mood's median test
X-squared = 15.833, df = 2, p-value = 0.0003646
Coin package
### Median test
library(coin)
median_test(Likert.inv ~ Speaker,
data = Data)
Asymptotic K-Sample Brown-Mood Median Test
chi-squared = 15.306, df = 2, p-value = 0.0004747
### Median test by Monte Carlo
simulation
library(coin)
median_test(Likert.inv ~ Speaker,
data = Data,
distribution = approximate(nresample = 10000))
Approximative K-Sample Brown-Mood Median Test
chi-squared = 15.306, p-value = 3e-04
Post-hoc: pairwise median tests
If Mood’s median test is significant, a post-hoc analysis can be performed to determine which groups differ from each other group.
For this we will use the pairwiseMedianTest function in the rcompanion package, which conducts Mood’s median test on all pairs of groups from one-way data.
Because the post-hoc test will produce multiple p-values, adjustments to the p-values can be made to avoid inflating the possibility of making a type-I error. There are a variety of methods for controlling the familywise error rate or for controlling the false discovery rate. See ?p.adjust for details on these methods.
pairwiseMedianTest function
### Order groups by median
Data$Speaker = factor(Data$Speaker,
levels=c("Pooh", "Tigger",
"Piglet"))
### Pairwise median tests
library(rcompanion)
PT = pairwiseMedianTest(Likert.inv
~ Speaker,
data = Data,
exact = NULL,
method = "fdr")
#
Adjusts p-values for multiple comparisons;
# See ?p.adjust for
options
PT
Comparison p.value p.adjust
1 Pooh - Tigger = 0 0.5416 0.541600
2 Pooh - Piglet = 0 0.0004883 0.001465
3 Tigger - Piglet = 0 0.001381 0.002072
### Compact letter display
library(rcompanion)
cldList(p.adjust ~ Comparison,
data = PT,
threshold = 0.05)
Group Letter MonoLetter
1 Pooh a a
2 Tigger a a
3 Piglet b b
Groups sharing a letter are not significantly
different (alpha = 0.05).
pairwiseMedianMatrix function
### Order groups by median
Data$Speaker = factor(Data$Speaker,
levels=c("Pooh", "Tigger",
"Piglet"))
### Pairwise median tests
library(rcompanion)
PT = pairwiseMedianMatrix(Likert.inv
~ Speaker,
data = Data,
exact = NULL,
method = "fdr")
#
Adjusts p-values for multiple comparisons;
# See ?p.adjust for options
PT
$Unadjusted
Pooh Tigger Piglet
Pooh NA 0.5416 0.0004883
Tigger NA NA 0.0013810
Piglet NA NA NA
$Method
[1] "fdr"
$Adjusted
Pooh Tigger Piglet
Pooh 1.000000 0.541600 0.001465
Tigger 0.541600 1.000000 0.002072
Piglet 0.001465 0.002072 1.000000
### Compact letter display
library(multcompView)
multcompLetters(PT$Adjusted,
compare="<",
threshold=0.05,
Letters=letters)
Pooh Tigger Piglet
"a" "a" "b"
Effect size measurements
A simple effect size measurement for Mood’s median test is is to compare the medians of the groups.
In addition, the whole 5-number summary coule be used, including the minimum, 1st quartile, median, 3rd quartile, and the maximum.
library(FSA)
Summarize(Likert ~ Speaker, data=Data)
Speaker n mean sd min Q1 median Q3 max
1 Piglet 10 2.3 0.8232726 1 2 2 2.75 4
2 Pooh 10 4.2 0.6324555 3 4 4 4.75 5
3 Tigger 10 4.0 0.6666667 3 4 4 4.00 5
Examining the medians and confidence intervals would be a somewhat different approach. Here, be cautious that confidence intervals by bootstrap may not be appropriate for the median for ordinal data with may ties, such as with Likert item data, or with small samples.
library(rcompanion)
groupwiseMedian(Likert ~ Speaker, data=Data, bca=FALSE, perc=TRUE)
Speaker n Median Conf.level Percentile.lower Percentile.upper
1 Piglet 10 2 0.95 2.0 3.0
2 Pooh 10 4 0.95 4.0 5.0
3 Tigger 10 4 0.95 3.5 4.5
### Note that confidence intervals by bootstrap
may vary.
In addition, looking at a statistic of stochastic dominance, like Vargha and Delaney’s A, may be useful in this case.
library(rcompanion)
vda(x = Data$Likert[Data$Speaker=="Piglet"],
y = Data$Likert[Data$Speaker=="Pooh"],
verbose=TRUE)
Statistic Value
1 Proportion Ya > Yb 0.01
2 Proportion Ya < Yb 0.91
3 Proportion ties 0.08
VDA
0.05
vda(x = Data$Likert[Data$Speaker=="Piglet"],
y = Data$Likert[Data$Speaker=="Tigger"],
verbose=TRUE)
Statistic Value
1 Proportion Ya > Yb 0.02
2 Proportion Ya < Yb 0.88
3 Proportion ties 0.10
VDA
0.07
vda(x = Data$Likert[Data$Speaker=="Pooh"],
y = Data$Likert[Data$Speaker=="Tigger"],
verbose=TRUE)
Statistic Value
1 Proportion Ya > Yb 0.36
2 Proportion Ya < Yb 0.20
3 Proportion ties 0.44
VDA
0.58
Finally, we can divide the difference in medians from two
groups by their pooled median absolute deviation (mad), which I’ve
termed Mangiafico’s d. It’s somewhat analogous to a nonparametric
version of Cohen’s d. Note that this statistic assumes the data are at
least interval in nature, as so may not be appropriate for Likert item data.
Here, the largest statistic is for the difference between groups A and C, where
d = –4.55.
A = c(1,2,2,2,2,3,4,5)
B = c(2,3,4,4,4,5,5,5)
C = c(3,4,4,4,5,5,5,5)
Y = c(A, B, C)
Group = c(rep("A", length(A)), rep("B", length(B)), rep("C",
length(C)))
Data3 = data.frame(Group, Y)
library(rcompanion)
multiMangiaficoD(Y ~ Group, data=Data3)
$pairs
Comparison Median.1 Median.2 MAD.1 MAD.2 Difference Pooled.MAD Mangiafico.d
1 A - B 2 4.0 0.741 1.480 -2.0 1.170 -1.460
2 A - C 2 4.5 0.741 0.741 -2.5 0.741 -4.550
3 B - C 4 4.5 1.480 0.741 -0.5 1.170 -0.364
$comparison
Comparison
"A - C"
$statistic.m
Maximum(abs(d))
4.55
Manual calculation
For this example, the original Data$Likert is used. But the test is modified to use “greater than or equal to mu” rather than “less than or equal to mu”.
MU = median(Data$Likert)
A1 = sum(Data$Likert[Data$Speaker=="Pooh"] >= MU)
B1 = sum(Data$Likert[Data$Speaker=="Piglet"] >= MU)
C1 = sum(Data$Likert[Data$Speaker=="Tigger"] >= MU)
A2 = sum(Data$Likert[Data$Speaker=="Pooh"] < MU)
B2 = sum(Data$Likert[Data$Speaker=="Piglet"] < MU)
C2 = sum(Data$Likert[Data$Speaker=="Tigger"] < MU)
Matrix = matrix(c(A1, B1, C1, A2, B2, C2), byrow=FALSE, ncol=2)
rownames(Matrix) = c("Pooh", "Piglet", "Tigger")
colnames(Matrix) = c("GreaterThanEqualMu", "LessThanMu")
Matrix
GreaterThanEqualMu LessThanMu
Pooh 9 1
Piglet 1 9
Tigger 8 2
Monte Carlo simulation
chisq.test(Matrix, simulate.p.value=TRUE, B=10000)
Pearson's Chi-squared test with simulated p-value (based on 10000 replicates)
X-squared = 15.833, df = NA, p-value = 0.0005999
Post-hoc test
library(rcompanion)
pairwiseNominalIndependence(Matrix, simulate.p.value=TRUE, B=10000,
fisher=FALSE, gtest=FALSE)
Comparison p.Chisq p.adj.Chisq
1 Pooh : Piglet 0.0014 0.00420
2 Pooh : Tigger 1.0000 1.00000
3 Piglet : Tigger 0.0051 0.00765