Permutation tests can be used to compare medians or percentiles among groups. This is useful, for example, to compare the 25th percentile or 75th percentile among groups.
The examples presented here use the percentileTest function in the rcompanion package, which can compare only two groups. For more complicated models, quantile regression could be used.
The percentileTest function can also be used for means, variance, interquartile range, or proportion of observations less than a threshold value.
In general, there are not assumptions about the distributions of the data in the groups. Medians and percentiles are meaningful for ordinal or discrete data. If the data within groups are not symmetrical, comparisons of means or variances may not be useful.
Pairwise tests
Pairwise tests across multiple groups can be conducted with the pairwisePercentileTest function.
Confidence intervals for percentiles
Confidence intervals for percentiles across groups can be determined with the groupwisePercentile function.
Packages used in this chapter
The packages used in this chapter include:
• FSA
• ggplot2
• rcompanion
The following commands will install these packages if they are not already installed:
if(!require(FSA)){install.packages("FSA")}
if(!require(ggplot2)){install.packages("ggplot2")}
if(!require(rcompanion)){install.packages("rcompanion")}
Examples
The examples in this chapter will revisit the income data from the Choosing a Statistical Test chapter, with the addition of data for a third town.
### Import and modify the data
TwoTowns = read.table("http://rcompanion.org/documents/TwoTowns.csv",
header=TRUE, sep=",")
Town.A = TwoTowns$Income[TwoTowns$Town=="Town.A"]
Town.B = TwoTowns$Income[TwoTowns$Town=="Town.B"]
Town.C = TwoTowns$Income[TwoTowns$Town=="Town.B"] + 20000
Income = c(Town.A, Town.B, Town.C)
Town = c(rep("Town.A", 101), rep("Town.B", 101),
rep("Town.C", 101))
ThreeTowns = data.frame(Town, Income)
str(ThreeTowns)
### Statistics by groups
library(FSA)
Summarize(Income ~ Town, data = ThreeTowns, digits = 3)
Town n mean sd min Q1 median Q3 max
1 Town.A 101 48146.43 10851.67 23557 40971 48007 56421 77774
2 Town.B 101 115275.22 163878.16 29050 34142 47223 108216 880027
3 Town.C 101 135275.22 163878.16 49050 54142 67223 128216 900027
Box plots
library(ggplot2)
ggplot(ThreeTowns,
aes(x = Town, y = Income)) +
geom_boxplot() +
coord_trans(y = "log10") +
ylab("Income \n(y axis is log scaled)") +
xlab("Town")
Confidence intervals for percentiles
Note that the percentiles below match Q1, median, and Q3 above, respectively, except that they are rounded to three significant figures. The confidence limits are determined by bootstrap, so your values may differ from the values here.
library(rcompanion)
groupwisePercentile(Income ~ Town,
data = ThreeTowns,
tau = 0.25,
R = 1000)
Town n tau Percentile Conf.level Bca.lower Bca.upper
1 Town.A 101 0.25 41000 0.95 36000 43600
2 Town.B 101 0.25 34100 0.95 31400 36700
3 Town.C 101 0.25 54100 0.95 51400 56700
groupwisePercentile(Income ~ Town,
data = ThreeTowns,
tau = 0.50,
R = 1000)
Town n tau Percentile Conf.level Bca.lower Bca.upper
1 Town.A 101 0.5 48000 0.95 45600 50100
2 Town.B 101 0.5 47200 0.95 40700 59800
3 Town.C 101 0.5 67200 0.95 60800 79800
groupwisePercentile(Income ~ Town,
data = ThreeTowns,
tau = 0.75,
R = 1000)
Town n tau Percentile Conf.level Bca.lower Bca.upper
1 Town.A 101 0.75 56400 0.95 52400 59400
2 Town.B 101 0.75 108000 0.95 70900 144000
3 Town.C 101 0.75 128000 0.95 91500 164000
Test for percentiles between two groups
The percentileTest function compares only two groups. It uses the first two levels of the grouping variable, treating it as a factor variable. The p-value is determined by permutation test, so your result may differ from the value here. If the test takes too long, the r value can be decreased, but a greater r value will result in a more precise p-value.
### Test 25th percentiles
library(rcompanion)
percentileTest(Income ~ Town,
data = ThreeTowns,
test = "percentile",
tau = 0.25,
r = 5000)
$Test
Formula Data Test tau
1 Income ~ Town ThreeTowns percentile 0.25
$Summary
n mean sd min p25 median p75 max iqr
1 Town.A 101 48150 10850 23560 40970 48010 56420 77770 15450
2 Town.B 101 115300 163900 29050 34140 47220 108200 880000 74070
$Result
p.value
1 p-value 0.0162
### Test medians
percentileTest(Income ~ Town,
data = ThreeTowns,
test = "median",
r = 5000)
$Test
Formula Data Test
1 Income ~ Town ThreeTowns median
$Summary
n mean sd min p25 median p75 max iqr
1 Town.A 101 48150 10850 23560 40970 48010 56420 77770 15450
2 Town.B 101 115300 163900 29050 34140 47220 108200 880000 74070
$Result
p.value
1 p-value 0.7458
### Test 75th perentiles
percentileTest(Income ~ Town,
data = ThreeTowns,
test = "percentile",
tau = 0.75,
r = 5000)
$Test
Formula Data Test tau
1 Income ~ Town ThreeTowns percentile 0.75
$Summary
n mean sd min p25 median p75 max iqr
1 Town.A 101 48150 10850 23560 40970 48010 56420 77770 15450
2 Town.B 101 115300 163900 29050 34140 47220 108200 880000 74070
$Result
p.value
1 p-value 0
Test for percentiles among multiple groups
The pairwisePercentileTest function will compare percentiles among multiple groups. The p-values are determined by permutation test, so your results may differ from the values here. If the test takes too long, the r value can be decreased, but a greater r value will result in a more precise p-value.
### Test 25th percentiles
library(rcompanion)
PT25 = pairwisePercentileTest(Income ~ Town,
data = ThreeTowns,
test = "percentile",
tau = 0.25,
r = 5000)
PT25
Comparison p.value p.adjust
1 Town.A - Town.B = 0 0.0166 0.0166
2 Town.A - Town.C = 0 0 0.0000
3 Town.B - Town.C = 0 0 0.0000
cldList(p.adjust ~ Comparison, data = PT25)
Group Letter MonoLetter
1 Town.A a a
2 Town.B b b
3 Town.C c c
### Test medians
PT50 = pairwisePercentileTest(Income ~ Town,
data = ThreeTowns,
test = "median",
r = 5000)
PT50
Comparison p.value p.adjust
1 Town.A - Town.B = 0 0.7538 0.753800
2 Town.A - Town.C = 0 0 0.000000
3 Town.B - Town.C = 0 0.00155 0.002325
cldList(p.adjust ~ Comparison, data = PT50)
Group Letter MonoLetter
1 Town.A a a
2 Town.B a a
3 Town.C b b
### Test 75th percentiles
PT75 = pairwisePercentileTest(Income ~ Town,
data = ThreeTowns,
test = "percentile",
tau = 0.75,
r = 5000)
PT75
Comparison p.value p.adjust
1 Town.A - Town.B = 0 0 0.000
2 Town.A - Town.C = 0 0 0.000
3 Town.B - Town.C = 0 0.456 0.456
cldList(p.adjust ~ Comparison, data = PT75)
Group Letter MonoLetter
1 Town.A a a
2 Town.B b b
3 Town.C b b
Summary table of results
|
―――― Grouping letters * ―――― |
||
|
25th percentile |
Median |
75th percentile |
Town |
|
|
|
A |
a |
a |
a |
B |
b |
a |
b |
C |
c |
b |
b |
* Groups sharing a letter in each column are not significantly different. |