 ## Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

# Introduction to Traditional Nonparametric Tests

### Packages used in this chapter

The packages used in this chapter include:

•  effsize

The following commands will install these packages if they are not already installed:

if(!require(effsize)){install.packages("effsize")}

### Introduction

The traditional nonparametric tests presented in this book are primarily rank-based tests.  Instead of using the numeric values of the dependent variable, the dependent variable is converted into relative ranks.

For example, imagine we have the heights of eight students in centimeters.

Height = c(110, 132, 137, 139, 140, 142, 142, 145)

names(Height) = letters[1:8]

Height

a   b   c   d   e   f   g   h
110 132 137 139 140 142 142 145

rank(Height)

a   b   c   d   e   f   g   h
1.0 2.0 3.0 4.0 5.0 6.5 6.5 8.0

a has the smallest height and so is ranked 1.  b has the next smallest height and so is ranked 2.  And so on.  Note that f and g are tied for spots 6 and 7, and so share a rank of 6.5.

Also note that the value of a is quite a bit smaller than the others, but its rank is simply 1.  Information about the absolute height values is lost, and only the relative ranking is retained in the ranks.  That is, if the value of a were changed to 100 or 5 or –10, its rank would remain 1 in this data set.

The advantage of using these rank-based tests is that they don’t make many assumptions about the distribution of the data.  Instead, their conclusions are based on the relative ranks of values in the groups being tested.

•  Most of the traditional nonparametric tests presented in this section are relatively common, and your audience is relatively likely to be familiar with them.

•  They are appropriate for interval/ratio or ordinal dependent variables.

•  Their nonparametric nature makes them appropriate for data that don’t meet the assumptions of parametric analyses.  These include data that are skewed, non-normal, contain outliers, or possibly are censored. (Censored data is data where there is an upper or lower limit to values.  For example, if ages under 5 are reported as “under 5”.)

•  These tests are typically named after their authors, with names like Mann–Whitney, Kruskal–Wallis, and Wilcoxon signed-rank.  It may be difficult to remember these names, or to remember which test is used in which situation.

•  Most of the traditional nonparametric tests presented here are limited by the types of experimental designs they can address.   They are typically limited to a one-way comparison of independent groups (e.g. Kruskal–Wallis), or to unreplicated complete block design for paired samples (e.g. Friedman).  The aligned ranks transformation approach, however, allows for more complicated designs.

•  Readers are likely to find a lot of contradictory information in different sources about the hypotheses and assumptions of these tests.  In particular, authors will often treat the hypotheses of some tests as corresponding to tests of medians, and then list the assumptions of the test as corresponding to these hypotheses.   However, if this is not explicitly explained, the result is that different sources list different assumptions that data must meet in order for the test to be valid.  This creates unnecessary confusion in the mind of students trying to correctly employ these tests.

### Interpretation of nonparametric tests

In general, these tests determine if there is a systematic difference among groups.  This may be due to a difference in location (e.g. median) or in the shape or spread of the distribution of the data.  With the Mann–Whitney and Kruskal–Wallis tests, the difference among groups that is of interest is the probability of an observation from one group being larger than an observation from another group.  If this probability is 0.50, this is termed “stochastic equality”, and when this probability is far from 0.50, it is sometimes called “stochastic dominance”.

Optional technical note: Without additional assumptions about the distribution of the data, the Mann–Whitney and Kruskal–Wallis tests do not test hypotheses about the group medians.  Mangiafico (2015) and McDonald (2014) in the “References” section provide an example of a significant Kruskal–Wallis test where the groups have identical medians, but differ in their stochastic dominance.

### Effect size statistics

Effect size statistics for traditional nonparametric tests include Cliff’s delta and Vargha and Delaney’s A for Mann–Whitney, and epsilon-squared and Freeman’s “coefficient of determination” (Freeman’s theta) (Freeman, 1965) for Kruskal–Wallis.  There is also an r statistic for Mann–Whitney and the paired signed-rank test. Kendall’s W can be used for Friedman’s test.

A couple of accessible resources on effect sizes for these tests are Tomczak and Tomczak (2014) and King and Rosopa (2010).

Some effect size statistics included here determine the degree to which one group has data with higher ranks than other groups.  They tend to vary from 0 (groups have data that are stochastically equal) to 1 (one group stochastically dominates).  They are related to the probability that a value from one group will be greater than a value from another group.

As rank-based measures, these effect size statistics do not indicate the difference in absolute values between groups.  That is, if you were to replace the 5’s in the second example below with 100’s, the value of the effect size statistics would not change, because in either case the 5’s or 100’s are the highest-ranked numbers.  For a practical interpretation of results, it is usually important to consider the absolute values of data such as with descriptive statistics.

library(effsize)

A = c(1,1,1, 2,2,2, 3,3,3, 4,4,4)
B = c(1,1,1, 2,2,2, 3,3,3, 4,4,4)

cliff.delta(B, A)

Cliff's Delta

delta estimate: 0

### This corresponds to a VDA of 0.5,
###  the probability of an observation in B being larger than
###  an observation in A.

A = c(1,1,1, 2,2,2, 3,3,3, 4,4,4)
B = c(2,2,2, 3,3,3, 4,4,4, 5,5,5)

cliff.delta(B, A)

Cliff's Delta

delta estimate: 0.4375

### This corresponds to a VDA of 0.719,
###  the probability of an observation in B being larger than
###  an observation in A.

A = c(1,1,1, 2,2,2, 3,3,3, 4,4,4)
B = c(3,3,3, 4,4,4, 5,5,5, 6,6,6)

cliff.delta(B, A)

Cliff's Delta

delta estimate: 0.75

### This corresponds to a VDA of 0.875,
###  the probability of an observation in B being larger than
###  an observation in A.

A = c(1,1,1, 2,2,2, 3,3,3, 4,4,4)
B = c(5,5,5, 6,6,6, 7,7,7, 8,8,8)

cliff.delta(B, A)

Cliff's Delta

delta estimate: 1

### This corresponds to a VDA of 1,
###  the probability of an observation in B being larger than
###  an observation in A.

### Optional:  Appropriate use of traditional nonparametric tests

#### Using traditional nonparametric tests with ordinal data

Some authors caution against using traditional nonparametric tests with ordinal dependent variables, since many of them were developed for use with continuous (interval/ratio) data.  Some authors have further concerns about situations where are likely to be many ties in ranks, such as Likert data.

Other authors argue that, since these tests rank-transform data before analysis and have adjustments for tied ranks, that they are appropriate for ordinal data.

Simulations comparing traditionally nonparametric tests to ordinal regression are presented in the “Optional:  Simulated comparisons of traditional nonparametric tests and ordinal regression” in the Introduction to Likert Data chapter.

#### Using traditional nonparametric tests with interval/ratio data

These nonparametric tests are commonly used for interval/ratio data when the data fail to meet the assumptions of parametric analysis.

Some authors discourage using common nonparametric tests for interval/ratio data in some circumstances.

•  One issue is the interpretation of the results mentioned above.  That is, often results are incorrectly interpreted as a difference in medians when they are really describing a stochastic difference in distributions.

•  Another problem is the lack of flexibility in designs these test can handle.

•  Finally, these tests may lack power relative to their parametric equivalents.

Given these considerations and the fact that that parametric statistics are often relatively robust to minor deviations in their assumptions, some authors argue that it is often better to stick with parametric analyses for interval/ratio data if it’s possible to make them work.

### References

Freeman, L.C. 1965. Elementary Applied Statitics: For Students in Behavioral Science. John Wiley & Sons. New York.

King, B.M., P.J. Rosopa, and E.W. Minium. 2018. Some (Almost) Assumption-Free Tests. In Statistical Reasoning in the Behavioral Sciences, 7th ed. Wiley.

“Kruskal–Wallis Test” in Mangiafico, S.S. 2015. An R Companion for the Handbook of Biological Statistics, version 1.09. rcompanion.org/rcompanion/d_06.html.

“Kruskal–Wallis Test” in McDonald, J.H. 2014. Handbook of Biological Statistics. www.biostathandbook.com/kruskalwallis.html.

Tomczak, M. and Tomczak, E. 2014. The need to report effect size estimates revisited. An overview of some recommended measures of effect size. Trends in Sports Sciences 1(21):1–25. www.tss.awf.poznan.pl/files/3_Trends_Vol21_2014__no1_20.pdf.