## Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Advertisement

# Permutation Tests for Medians and Percentiles

Advertisement

Permutation tests can be used to compare medians or percentiles among groups.  This is useful, for example, to compare the 25th percentile or 75th percentile among groups.

The examples presented here use the percentileTest function in the rcompanion package, which can compare only two groups.  This approach may be more powerful than using Mood’s median test for medians.  For more complicated models, quantile regression could be used.

The percentileTest function can also be used for means, variance, interquartile range, or proportion of observations less than a threshold value.

In general, there are not assumptions about the distributions of the data in the groups.   Medians and percentiles are meaningful for ordinal or discrete data.  If the data within groups is not symmetrical, means or variances may not be useful.

##### Pairwise tests

Pairwise tests across multiple groups can be conducted with the pairwisePercentileTest function.

##### Confidence intervals for percentiles

Confidence intervals for percentiles across groups can be determined with the groupwisePercentile function.

### Packages used in this chapter

The packages used in this chapter include:

•  FSA

•  ggplot2

•  rcompanion

The following commands will install these packages if they are not already installed:

if(!require(FSA)){install.packages("FSA")}
if(!require(ggplot2)){install.packages("ggplot2")}
if(!require(rcompanion)){install.packages("rcompanion")}

### Examples

The examples in this chapter will revisit the income data from the Choosing a Statistical Test chapter, with the addition of data for a third town.

### Import and modify the data

TwoTowns = read.table("http://rcompanion.org/documents/TwoTowns.csv",
header=TRUE, sep=",")

Town.A  = TwoTowns\$Income[TwoTowns\$Town=="Town.A"]
Town.B  = TwoTowns\$Income[TwoTowns\$Town=="Town.B"]
Town.C  = TwoTowns\$Income[TwoTowns\$Town=="Town.B"] + 20000

Income  = c(Town.A, Town.B, Town.C)
Town    = c(rep("Town.A", 101), rep("Town.B", 101), rep("Town.C", 101))

ThreeTowns = data.frame(Town, Income)

str(ThreeTowns)

### Statistics by groups

library(FSA)

Summarize(Income ~ Town, data = ThreeTowns, digits = 3)

Town   n      mean        sd   min    Q1 median     Q3    max
1 Town.A 101  48146.43  10851.67 23557 40971  48007  56421  77774
2 Town.B 101 115275.22 163878.16 29050 34142  47223 108216 880027
3 Town.C 101 135275.22 163878.16 49050 54142  67223 128216 900027

#### Box plots

library(ggplot2)

ggplot(ThreeTowns,
aes(x = Town, y = Income)) +
geom_boxplot() +
coord_trans(y = "log10") +
ylab("Income \n(y axis is log scaled)") +
xlab("Town")

#### Confidence intervals for percentiles

Note that the percentiles below match Q1, median, and Q3 above, respectively, except that they are rounded to three significant figures.  The confidence limits are determined by bootstrap, so your values may differ from the values here.

library(rcompanion)

groupwisePercentile(Income ~ Town,
data = ThreeTowns,
tau  = 0.25,
R    = 1000)

Town   n  tau Percentile Conf.level Bca.lower Bca.upper
1 Town.A 101 0.25      41000       0.95     36000     43600
2 Town.B 101 0.25      34100       0.95     31400     36700
3 Town.C 101 0.25      54100       0.95     51400     56700

groupwisePercentile(Income ~ Town,
data = ThreeTowns,
tau  = 0.50,
R    = 1000)

Town   n tau Percentile Conf.level Bca.lower Bca.upper
1 Town.A 101 0.5      48000       0.95     45600     50100
2 Town.B 101 0.5      47200       0.95     40700     59800
3 Town.C 101 0.5      67200       0.95     60800     79800

groupwisePercentile(Income ~ Town,
data = ThreeTowns,
tau  = 0.75,
R    = 1000)

Town   n  tau Percentile Conf.level Bca.lower Bca.upper
1 Town.A 101 0.75      56400       0.95     52400     59400
2 Town.B 101 0.75     108000       0.95     70900    144000
3 Town.C 101 0.75     128000       0.95     91500    164000

#### Test for percentiles between two groups

The percentileTest function compares only two groups.  It uses the first two levels of the grouping variable, treating it as a factor.  The p-value is determined by permutation test, so your result may differ from the value here.  If the test takes too long, the r value can be decreased, but a greater r value will result in a more precise p-value.

percentileTest(Income ~ Town,
data = ThreeTowns,
test = "percentile",
tau  = 0.25,
r    = 5000)

\$Test
Formula       Data       Test  tau
1 Income ~ Town ThreeTowns percentile 0.25

\$Summary
n   mean     sd   min   p25 median    p75    max   iqr
1 Town.A 101  48150  10850 23560 40970  48010  56420  77770 15450
2 Town.B 101 115300 163900 29050 34140  47220 108200 880000 74070

\$Result
p.value
1 p-value   0.0162

percentileTest(Income ~ Town,
data = ThreeTowns,
test = "median",
r    = 5000)

\$Test
Formula       Data   Test
1 Income ~ Town ThreeTowns median

\$Summary
n   mean     sd   min   p25 median    p75    max   iqr
1 Town.A 101  48150  10850 23560 40970  48010  56420  77770 15450
2 Town.B 101 115300 163900 29050 34140  47220 108200 880000 74070

\$Result
p.value
1 p-value   0.7458

percentileTest(Income ~ Town,
data = ThreeTowns,
test = "percentile",
tau  = 0.75,
r    = 5000)

\$Test
Formula       Data       Test  tau
1 Income ~ Town ThreeTowns percentile 0.75

\$Summary
n   mean     sd   min   p25 median    p75    max   iqr
1 Town.A 101  48150  10850 23560 40970  48010  56420  77770 15450
2 Town.B 101 115300 163900 29050 34140  47220 108200 880000 74070

\$Result
p.value
1 p-value       0

#### Test for percentiles among multiple groups

The pairwisePercentileTest function will compare percentiles among multiple groups.  The p-values are determined by permutation test, so your results may differ from the values here.  If the test takes too long, the r value can be decreased, but a greater r value will result in a more precise p-value.

PT25 = pairwisePercentileTest(Income ~ Town,
data = ThreeTowns,
test = "percentile",
tau  = 0.25,
r    = 5000)

PT25

Comparison p.value p.adjust
1 Town.A - Town.B = 0  0.0166   0.0166
2 Town.A - Town.C = 0       0   0.0000
3 Town.B - Town.C = 0       0   0.0000

cldList(p.adjust ~ Comparison, data = PT25)

Group Letter MonoLetter
1 Town.A      a        a
2 Town.B      b         b
3 Town.C      c          c

PT50 = pairwisePercentileTest(Income ~ Town,
data = ThreeTowns,
test = "median",
r    = 5000)

PT50

Comparison p.value p.adjust
1 Town.A - Town.B = 0  0.7538 0.753800
2 Town.A - Town.C = 0       0 0.000000
3 Town.B - Town.C = 0 0.00155 0.002325

cldList(p.adjust ~ Comparison, data = PT50)

Group Letter MonoLetter
1 Town.A      a        a
2 Town.B      a        a
3 Town.C      b         b

PT75 = pairwisePercentileTest(Income ~ Town,
data = ThreeTowns,
test = "percentile",
tau  = 0.75,
r    = 5000)

PT75

Comparison p.value p.adjust
1 Town.A - Town.B = 0       0    0.000
2 Town.A - Town.C = 0       0    0.000
3 Town.B - Town.C = 0   0.456    0.456

cldList(p.adjust ~ Comparison, data = PT75)

Group Letter MonoLetter
1 Town.A      a        a
2 Town.B      b         b
3 Town.C      b         b

#### Summary table of results

 ――――  Grouping letters *  ―――― 25th percentile Median 75th percentile Town A a a a B b a b C c b b *  Groups sharing a letter in each column are not significantly different.