Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Introduction to Traditional Nonparametric Tests

Packages used in this chapter


The packages used in this chapter include:

•  rcompanion


The following commands will install these packages if they are not already installed:




The traditional nonparametric tests presented in this book are primarily rank-based tests.  Instead of using the numeric values of the dependent variable, the dependent variable is converted into relative ranks.


For example, imagine we have the heights of eight students in centimeters.

Height = c(110, 132, 137, 139, 140, 142, 142, 145)

names(Height) = letters[1:8]


  a   b   c   d   e   f   g   h
110 132 137 139 140 142 142 145


  a   b   c   d   e   f   g   h
1.0 2.0 3.0 4.0 5.0 6.5 6.5 8.0

a has the smallest height and so is ranked 1.  b has the next smallest height and so is ranked 2.  And so on.  Note that f and g are tied for spots 6 and 7, and so share a rank of 6.5. 


Also note that the value of a is quite a bit smaller than the others, but its rank is simply 1.  Information about the absolute height values is lost, and only the relative ranking is retained in the ranks.  That is, if the value of a were changed to 100 or 5 or –10, its rank would remain 1 in this data set.


The advantage of using these rank-based tests is that they don’t make many assumptions about the distribution of the data.  Instead, their conclusions are based on the relative ranks of values in the groups being tested.


Advantages of nonparametric tests


•  Most of the traditional nonparametric tests presented in this section are relatively common, and your audience is relatively likely to be familiar with them.

•  They are appropriate for interval/ratio or ordinal dependent variables.


•  Their nonparametric nature makes them appropriate for data that don’t meet the assumptions of parametric analyses.  These include data that are skewed, non-normal, contain outliers, or possibly are censored. (Censored data is data where there is an upper or lower limit to values.  For example, if ages under 5 are reported as “under 5”.)


Disadvantages of nonparametric tests


•  These tests are typically named after their authors, with names like Mann–Whitney, Kruskal–Wallis, and Wilcoxon signed-rank.  It may be difficult to remember these names, or to remember which test is used in which situation.


•  Most of the traditional nonparametric tests presented here are limited by the types of experimental designs they can address.   They are typically limited to a one-way comparison of independent groups (e.g. Kruskal–Wallis), or to unreplicated complete block design for paired samples (e.g. Friedman).  The aligned-ranks approach, however, allows for more complicated designs.


•  Readers are likely to find a lot of contradictory information in different sources about the hypotheses and assumptions of these tests.  In particular, authors will often treat the hypotheses of some tests as corresponding to tests of medians, and then list the assumptions of the test as corresponding to these hypotheses.   However, if this is not explicitly explained, the result is that different sources list different assumptions that data must meet in order for the test to be valid.  This creates unnecessary confusion in the mind of students trying to correctly employ these tests.


Interpretation of nonparametric tests


In general, these tests determine if there is a systematic difference among groups.  This may be due to a difference in location (e.g. median) or in the shape or spread of the distribution of the data.  Tests like Mann–Whitney and Kruskal–Wallis use a null hypothesis of “stochastic equality”, with the alternative sometimes called “stochastic dominance”.  It is therefore appropriate to report significant results as, e.g., “There is a significant difference between Likert scores from the pre-test and the post-test."  Or, "The significant Mann–Whitney test indicates that Likert scores from the two classes come from different populations." 


For the Mann-Whitney and Kruskal-Wallis tests, if the distributions of the groups have the same shape and spread, then it can be assumed that the difference between groups is a difference in medians.  Otherwise, the difference is a difference in distributions.


You should look at the distributions of each group in these tests, with histograms or box plots, so that your conclusions can accurately reflect the data.  You don't want to imply that differences between two treatments are differences in medians when they are really differences in the shape or spread of the distributions.  On the other hand, if it really does look like the difference is a difference in location, you may want to be clear about this.


As a point of interest, Mangiafico (2015) and McDonald (2014) in the “References” section provide an example of a significant Kruskal–Wallis test where the groups have identical medians.


Effect size statistics


Effect size statistics for traditional nonparametric tests include Freeman’s “coefficient of determination” (Freeman’s theta) for Mann–Whitney and Kruskal–Wallis (Freeman, 1965), epsilon-squared for Kruskal–Wallis, and r for Mann–Whitney.  An r value can be calculated for the paired signed-rank test as well, and Kendall’s W can be used for Friedman’s test.


A couple of accessible resource on effect sizes for these tests are Tomczak and Tomczak (2014) and King and Rosopa (2010).


Effect size statistics included here determine the degree to which one group has data with higher ranks than other groups.  They tend to vary from 0 (groups have data that are stochastically equal) to 1 (one group stochastically dominates).  They are related to the probability that a value from one group will be greater than a value from another group.


As rank-based measures, these effect size statistics do not indicate the difference in absolute values between groups.  That is, if you were to replace the 5’s in the second example below with 100’s, the value of the Freeman’s theta statistic would not change, because in either case the 5’s or 100’s are the highest-ranked numbers.  For a practical interpretation of results, it is usually important to consider the absolute values of data such as with descriptive statistics.



A = c(1,1,1, 2,2,2, 3,3,3, 4,4,4)
B = c(1,1,1, 2,2,2, 3,3,3, 4,4,4)

Y = c(A, B)
G = c(rep("A", 12), rep("B", 12))

freemanTheta(Y, G)


A = c(1,1,1, 2,2,2, 3,3,3, 4,4,4)
B = c(2,2,2, 3,3,3, 4,4,4, 5,5,5)

Y = c(A, B)
G = factor(c(rep("A", 12), rep("B", 12)))

freemanTheta(Y, G)


A = c(1,1,1, 2,2,2, 3,3,3, 4,4,4)
B = c(3,3,3, 4,4,4, 5,5,5, 6,6,6)

Y = c(A, B)
G = factor(c(rep("A", 12), rep("B", 12)))

freemanTheta(Y, G)


A = c(1,1,1, 2,2,2, 3,3,3, 4,4,4)
B = c(5,5,5, 6,6,6, 7,7,7, 8,8,8)

Y = c(A, B)
G = factor(c(rep("A", 12), rep("B", 12)))

freemanTheta(Y, G)


Optional:  Appropriate use of traditional nonparametric tests


Using traditional nonparametric tests with ordinal data

Some authors caution against using traditional nonparametric tests with ordinal dependent variables, since many of them were developed for use with continuous (interval/ratio) data.  Some authors have further concerns about situations where are likely to be many ties in ranks, such as Likert data. 


Other authors argue that, since these tests rank-transform data before analysis and have adjustments for tied ranks, that they are appropriate for ordinal data.


Simulations comparing traditionally nonparametric tests to ordinal regression are presented in the “Optional:  Simulated comparisons of traditional nonparametric tests and ordinal regression” in the Introduction to Likert Data chapter.



Using traditional nonparametric tests with interval/ratio data

These nonparametric tests are commonly used for interval/ratio data when the data fail to meet the assumptions of parametric analysis. 


Some authors discourage using common nonparametric tests for interval/ratio data in some circumstances.


•  One issue is the interpretation of the results mentioned above.  That is, often results are incorrectly interpreted as a difference in medians when they are really describing a stochastic difference in distributions.


•  Another problem is the lack of flexibility in designs these test can handle.


•  Finally, these tests may lack power relative to their parametric equivalents.


Given these considerations and the fact that that parametric statistics are often relatively robust to minor deviations in their assumptions, some authors argue that it is often better to stick with parametric analyses for interval/ratio data if it’s possible to make them work.




Freeman, L.C. 1965. Elementary Applied Statitics: For Students in Behavioral Science. John Wiley & Sons. New York.


King, B.M. and P.J. Rosopa. 2010. Some (Almost) Assumption-Free Tests. In Statistical Reasoning in the Behavioral Sciences, 6th ed. Wiley.


“Kruskal–Wallis Test” in Mangiafico, S.S. 2015. An R Companion for the Handbook of Biological Statistics, version 1.09. rcompanion.org/rcompanion/d_06.html.


“Kruskal–Wallis Test” in McDonald, J.H. 2014. Handbook of Biological Statistics. www.biostathandbook.com/kruskalwallis.html.


Tomczak, M. and Tomczak, E. 2014. The need to report effect size estimates revisited. An overview of some recommended measures of effect size. Trends in Sports Sciences 1(21):1–25. www.tss.awf.poznan.pl/files/3_Trends_Vol21_2014__no1_20.pdf.