Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Introduction to Permutation Tests

Permutation tests do not rely on assumptions about the distribution of the sampled populations, as some other tests do.  It is my understanding, however, that for certain tests—for example those specifically testing a difference in means—that there are assumptions about the underlying populations.  For example, the Fisher-Pitman test is sensitive to the mean and the dispersion simultaneously.

Permutation tests work by resampling the observed data many times in order to determine a p-value for the test.  Recall that the p-value is defined as the probability of getting data as extreme as the observed data when the null hypothesis is true. If the data are shuffled many times in accordance with the null hypothesis being true, the number of cases with data as extreme as the observed data could be counted and a p-value calculated.

The advantages of permutation tests are:

•  the lack of assumptions about the distribution of the underlying data,

•  their flexibility in the kinds of data they can handle (nominal, ordinal, interval/ratio),

•  and their being relatively straightforward to conduct and interpret.

The disadvantages of permutation tests are:

•  the limited complexity of designs they can handle,

•  and the unfamiliarity with them for many readers.

R packages

The coin package offers a very flexible framework to conduct permutation tests. The coin package provides functions for common permutation tests, and, in the general framework, can handle nominal, ordinal, and interval/ratio data.

Another useful package in lmPerm, which conducts analyses analogous to general linear models (lm in R) with permutation tests.

There are other packages that implement permutation tests.

Packages used in this chapter

The packages used in this chapter include:

•  coin

•  lmPerm

The following commands will install these packages if they are not already installed:

if(!require(coin)){install.packages("coin")}
if(!require(lmPerm)){install.packages("lmPerm")}

Permutation test example

The following example uses the data from the One-way Anova chapter.

Note that results from permutation tests may vary due to the resampling procedure and the number of iterations.

Instructor       Student  Sodium
'Brendon Small'      a    1200
'Brendon Small'      b    1400
'Brendon Small'      c    1350
'Brendon Small'      d     950
'Brendon Small'      e    1400
'Brendon Small'      f    1150
'Brendon Small'      g    1300
'Brendon Small'      h    1325
'Brendon Small'      i    1425
'Brendon Small'      j    1500
'Brendon Small'      k    1250
'Brendon Small'      l    1150
'Brendon Small'      m     950
'Brendon Small'      n    1150
'Brendon Small'      o    1600
'Brendon Small'      p    1300
'Brendon Small'      q    1050
'Brendon Small'      r    1300
'Brendon Small'      s    1700
'Brendon Small'      t    1300
'Coach McGuirk'      u    1100
'Coach McGuirk'      v    1200
'Coach McGuirk'      w    1250
'Coach McGuirk'      x    1050
'Coach McGuirk'      y    1200
'Coach McGuirk'      z    1250
'Coach McGuirk'      aa   1350
'Coach McGuirk'      ab   1350
'Coach McGuirk'      ac   1325
'Coach McGuirk'      ae   1225
'Coach McGuirk'      af   1125
'Coach McGuirk'      ag   1000
'Coach McGuirk'      ah   1125
'Coach McGuirk'      ai   1400
'Coach McGuirk'      aj   1200
'Coach McGuirk'      ak   1150
'Coach McGuirk'      al   1400
'Coach McGuirk'      am   1500
'Coach McGuirk'      an   1200
'Melissa Robins'     ao   900
'Melissa Robins'     ap   1100
'Melissa Robins'     aq   1150
'Melissa Robins'     ar   950
'Melissa Robins'     as   1100
'Melissa Robins'     at   1150
'Melissa Robins'     au   1250
'Melissa Robins'     av   1250
'Melissa Robins'     aw   1225
'Melissa Robins'     ax   1325
'Melissa Robins'     ay   1125
'Melissa Robins'     az   1025
'Melissa Robins'     ba    950
'Melissa Robins'     bc    925
'Melissa Robins'     bd   1200
'Melissa Robins'     be   1100
'Melissa Robins'     bf    950
'Melissa Robins'     bg   1300
'Melissa Robins'     bh   1400
'Melissa Robins'     bi   1100
")

###  Order factors by the order in data frame
###  Otherwise, R will alphabetize them

Data\$Instructor = factor(Data\$Instructor,
levels=unique(Data\$Instructor))

###  Check the data frame

library(psych)

str(Data)

summary(Data)

Summarize data by group

library(FSA)

Summarize(Sodium ~ Instructor,
data=Data,
digits=3)

Instructor  n    mean      sd  min      Q1 median      Q3  max
1  Brendon Small 20 1287.50 193.734  950 1150.00 1300.0 1400.00 1700
2  Coach McGuirk 20 1246.25 142.412 1000 1143.75 1212.5 1350.00 1525
3 Melissa Robins 20 1123.75 143.149  900 1006.25 1112.5 1231.25 1400

Fisher-Pitman permutation test

library(coin)

oneway_test(Sodium ~ Instructor,
data = Data)

Asymptotic K-Sample Fisher-Pitman Permutation Test

chi-squared = 9.6282, df = 2, p-value = 0.008114

Post-hoc analysis with pairwise tests

library(rcompanion)

PT = pairwisePermutationTest(Sodium ~ Instructor,
data     = Data,
method   = "fdr")

PT

1  Brendon Small - Coach McGuirk = 0 0.5949   0.4405  0.44050
2 Brendon Small - Melissa Robins = 0   7.63  0.00574  0.01722
3 Coach McGuirk - Melissa Robins = 0  6.329  0.01188  0.01782

cldList(comparison = PT\$Comparison,
threshold  = 0.05)

Group Letter MonoLetter
1  BrendonSmall      a         a
2  CoachMcGuirk      a         a
3 MelissaRobins      b          b

Permutation test with lmPerm

library(lmPerm)

model = lmp(Sodium ~ Instructor, data = Data,
perm="Prob",
seqs=FALSE)

anova(model)

Analysis of Variance Table

Df R Sum Sq R Mean Sq Iter Pr(Prob)
Instructor  2   290146    145073 5000    0.008 **
Residuals  57  1487812     26102

summary(model)

Multiple R-Squared: 0.1632

Post-hoc analysis with pairwise tests

### Brendon Small vs. Coach McGuirk

model.1 = lmp(Sodium ~ Instructor,
data = Data[Data\$Instructor=="Brendon Small" |
Data\$Instructor=="Coach McGuirk" ,],
perm="Prob",
seqs=FALSE)

anova(model.1)

Analysis of Variance Table

Df R Sum Sq R Mean Sq Iter Pr(Prob)
Instructor  1    17016     17016  120   0.4583
Residuals  38  1098469     28907

### Brendon Small vs. Melissa Robins

model.2 = lmp(Sodium ~ Instructor,
data = Data[Data\$Instructor=="Brendon Small" |
Data\$Instructor=="Melissa Robins" ,],
perm="Prob",
seqs=FALSE)

anova(model.2)

Analysis of Variance Table

Df R Sum Sq R Mean Sq Iter Pr(Prob)
Instructor  1   268141    268141 5000   0.0026 **
Residuals  38  1102469     29012

### Coach McGuirk vs. Melissa Robins

model.3 = lmp(Sodium ~ Instructor,
data = Data[Data\$Instructor=="Coach McGuirk" |
Data\$Instructor=="Melissa Robins" ,],
perm="Prob",
seqs=FALSE)

anova(model.3)

Analysis of Variance Table

Df R Sum Sq R Mean Sq Iter Pr(Prob)
Instructor  1   150062    150062 4969  0.01992 *
Residuals  38   774688     20387

0.45830 0.00780 0.02988

References

Hothorn, T., K. Hornik,  M.A. van de Wiel, and A. Zeileis. 2015. Implementing a Class of Permutation Tests: The coin Packagecran.r-project.org/web/packages/coin/vignettes/Implementation.pdf.

library(coin); help(package="coin")

library(lmPerm); help(package="lmPerm")

library(lmPerm); vignette("lmPerm")