[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Permutation Tests for Medians and Percentiles

Permutation tests can be used to compare medians or percentiles among groups.  This is useful, for example, to compare the 25th percentile or 75th percentile among groups.

 

The examples presented here use the percentileTest function in the rcompanion package, which can compare only two groups.  This approach may be more powerful than using Mood’s median test for medians.  For more complicated models, quantile regression could be used.

 

The percentileTest function can also be used for means, variance, interquartile range, or proportion of observations less than a threshold value.

 

In general, there are not assumptions about the distributions of the data in the groups.   Medians and percentiles are meaningful for ordinal or discrete data.  If the data within groups is not symmetrical, means or variances may not be useful.

 

Pairwise tests

Pairwise tests across multiple groups can be conducted with the pairwisePercentileTest function.

 

Confidence intervals for percentiles

Confidence intervals for percentiles across groups can be determined with the groupwisePercentile function. 

 

Packages used in this chapter

 

The packages used in this chapter include:

•  FSA

•  ggplot2

•  rcompanion

 

The following commands will install these packages if they are not already installed:


if(!require(FSA)){install.packages("FSA")}
if(!require(ggplot2)){install.packages("ggplot2")}
if(!require(rcompanion)){install.packages("rcompanion")}


Examples

 

The examples in this chapter will revisit the income data from the Choosing a Statistical Test chapter, with the addition of data for a third town.

 

### Import and modify the data

TwoTowns = read.table("http://rcompanion.org/documents/TwoTowns.csv",
                        header=TRUE, sep=",")

Town.A  = TwoTowns$Income[TwoTowns$Town=="Town.A"]
Town.B  = TwoTowns$Income[TwoTowns$Town=="Town.B"]
Town.C  = TwoTowns$Income[TwoTowns$Town=="Town.B"] + 20000

Income  = c(Town.A, Town.B, Town.C)
Town    = c(rep("Town.A", 101), rep("Town.B", 101), rep("Town.C", 101))

ThreeTowns = data.frame(Town, Income)

str(ThreeTowns)


### Statistics by groups

library(FSA)

Summarize(Income ~ Town, data = ThreeTowns, digits = 3)


    Town   n      mean        sd   min    Q1 median     Q3    max
1 Town.A 101  48146.43  10851.67 23557 40971  48007  56421  77774
2 Town.B 101 115275.22 163878.16 29050 34142  47223 108216 880027
3 Town.C 101 135275.22 163878.16 49050 54142  67223 128216 900027


Box plots


library(ggplot2)

ggplot(ThreeTowns,
       aes(x = Town, y = Income)) +
  geom_boxplot() +
     coord_trans(y = "log10") +
  ylab("Income \n(y axis is log scaled)") +
  xlab("Town")




Confidence intervals for percentiles

Note that the percentiles below match Q1, median, and Q3 above, respectively, except that they are rounded to three significant figures.  The confidence limits are determined by bootstrap, so your values may differ from the values here.


library(rcompanion)

groupwisePercentile(Income ~ Town,
                    data = ThreeTowns,
                    tau  = 0.25,
                    R    = 1000)


    Town   n  tau Percentile Conf.level Bca.lower Bca.upper
1 Town.A 101 0.25      41000       0.95     36000     43600
2 Town.B 101 0.25      34100       0.95     31400     36700
3 Town.C 101 0.25      54100       0.95     51400     56700


groupwisePercentile(Income ~ Town,
                    data = ThreeTowns,
                    tau  = 0.50,
                    R    = 1000)


    Town   n tau Percentile Conf.level Bca.lower Bca.upper
1 Town.A 101 0.5      48000       0.95     45600     50100
2 Town.B 101 0.5      47200       0.95     40700     59800
3 Town.C 101 0.5      67200       0.95     60800     79800


groupwisePercentile(Income ~ Town,
                    data = ThreeTowns,
                    tau  = 0.75,
                    R    = 1000)


    Town   n  tau Percentile Conf.level Bca.lower Bca.upper
1 Town.A 101 0.75      56400       0.95     52400     59400
2 Town.B 101 0.75     108000       0.95     70900    144000
3 Town.C 101 0.75     128000       0.95     91500    164000


Test for percentiles between two groups

The percentileTest function compares only two groups.  It uses the first two levels of the grouping variable, treating it as a factor.  The p-value is determined by permutation test, so your result may differ from the value here.  If the test takes too long, the r value can be decreased, but a greater r value will result in a more precise p-value.

 

 

percentileTest(Income ~ Town,
               data = ThreeTowns,
               test = "percentile",
               tau  = 0.25,
               r    = 5000)


$Test
        Formula       Data       Test  tau
1 Income ~ Town ThreeTowns percentile 0.25

$Summary
           n   mean     sd   min   p25 median    p75    max   iqr
1 Town.A 101  48150  10850 23560 40970  48010  56420  77770 15450
2 Town.B 101 115300 163900 29050 34140  47220 108200 880000 74070

$Result
          p.value
1 p-value   0.0162


percentileTest(Income ~ Town,
               data = ThreeTowns,
               test = "median",
               r    = 5000)


$Test
        Formula       Data   Test
1 Income ~ Town ThreeTowns median

$Summary
           n   mean     sd   min   p25 median    p75    max   iqr
1 Town.A 101  48150  10850 23560 40970  48010  56420  77770 15450
2 Town.B 101 115300 163900 29050 34140  47220 108200 880000 74070

$Result
          p.value
1 p-value   0.7458


percentileTest(Income ~ Town,
               data = ThreeTowns,
               test = "percentile",
               tau  = 0.75,
               r    = 5000)


$Test
        Formula       Data       Test  tau
1 Income ~ Town ThreeTowns percentile 0.75

$Summary
           n   mean     sd   min   p25 median    p75    max   iqr
1 Town.A 101  48150  10850 23560 40970  48010  56420  77770 15450
2 Town.B 101 115300 163900 29050 34140  47220 108200 880000 74070

$Result
          p.value
1 p-value       0


Test for percentiles among multiple groups

The pairwisePercentileTest function will compare percentiles among multiple groups.  The p-values are determined by permutation test, so your results may differ from the values here.  If the test takes too long, the r value can be decreased, but a greater r value will result in a more precise p-value.


PT25 = pairwisePercentileTest(Income ~ Town,
                              data = ThreeTowns,
                              test = "percentile",
                              tau  = 0.25,
                              r    = 5000)

PT25


           Comparison p.value p.adjust
1 Town.A - Town.B = 0  0.0166   0.0166
2 Town.A - Town.C = 0       0   0.0000
3 Town.B - Town.C = 0       0   0.0000


cldList(p.adjust ~ Comparison, data = PT25)


   Group Letter MonoLetter
1 Town.A      a        a 
2 Town.B      b         b
3 Town.C      c          c


PT50 = pairwisePercentileTest(Income ~ Town,
                              data = ThreeTowns,
                              test = "median",
                              r    = 5000)

PT50


           Comparison p.value p.adjust
1 Town.A - Town.B = 0  0.7538 0.753800
2 Town.A - Town.C = 0       0 0.000000
3 Town.B - Town.C = 0 0.00155 0.002325


cldList(p.adjust ~ Comparison, data = PT50)


   Group Letter MonoLetter
1 Town.A      a        a 
2 Town.B      a        a
3 Town.C      b         b


PT75 = pairwisePercentileTest(Income ~ Town,
                              data = ThreeTowns,
                              test = "percentile",
                              tau  = 0.75,
                              r    = 5000)

PT75


           Comparison p.value p.adjust
1 Town.A - Town.B = 0       0    0.000
2 Town.A - Town.C = 0       0    0.000
3 Town.B - Town.C = 0   0.456    0.456


cldList(p.adjust ~ Comparison, data = PT75)


   Group Letter MonoLetter
1 Town.A      a        a 
2 Town.B      b         b
3 Town.C      b         b


Summary table of results

 

  ――――  Grouping letters *  ――――

 

25th percentile

Median

75th percentile

Town

 

 

 

  A

  a

  a

  a

  B

  b

  a

  b

  C

  c

  b

  b

*  Groups sharing a letter in each column are not significantly different.