### Packages used in this chapter

The packages used in this chapter include:

• rcompanion

The following commands will install these packages if they are not already installed:

if(!require(rcompanion)){install.packages("rcompanion")}

### Introduction

The traditional nonparametric tests presented in this book are primarily rank-based tests. Instead of using the numeric values of the dependent variable, the dependent variable is converted into relative ranks.

For example, imagine we have the heights of eight students in centimeters.

Height = c(110, 132, 137, 139, 140, 142, 142, 145)

names(Height) = letters[1:8]

Height

a b c d e f g h

110 132 137 139 140 142 142 145

rank(Height)

a b c d e f g h

1.0 2.0 3.0 4.0 5.0 6.5 6.5 8.0

*a* has the smallest height and so is ranked 1. *b*
has the next smallest height and so is ranked 2. And so on. Note that *f*
and *g* are tied for spots 6 and 7, and so share a rank of 6.5.

Also note that the value of *a* is quite a bit smaller
than the others, but its rank is simply 1. Information about the absolute
height values is lost, and only the relative ranking is retained in the ranks.
That is, if the value of *a* were changed to 100 or 5 or –10, its rank
would remain 1 in this data set.

The advantage of using these rank-based tests is that they don’t make many assumptions about the distribution of the data. Instead, their conclusions are based on the relative ranks of values in the groups being tested.

### Advantages of nonparametric tests

• Most of the traditional nonparametric tests presented in
this section are relatively common, and your audience is relatively likely to
be familiar with them.

• They are appropriate for interval/ratio or, often, ordinal dependent variables.

• Their nonparametric nature makes them appropriate for data that don’t meet the assumptions of parametric analyses. These include data that are skewed, non-normal, contain outliers, or, possibly, are censored. (Censored data is data where there is an upper or lower limit to values. For example, if ages under 5 are reported as “under 5”.)

### Disadvantages of nonparametric tests

• These tests are typically named after their authors, with names like Mann–Whitney, Kruskal–Wallis, and Wilcoxon signed-rank. It may be difficult to remember these names, or to remember which test is used in which situation.

• Most of the traditional nonparametric tests presented here are limited by the types of experimental designs they can address. They are typically limited to a one-way comparison of independent groups (e.g. Kruskal–Wallis), or to unreplicated complete block design for paired samples (e.g. Friedman).

• There may be more flexible approaches that can cover more complex designs. The aligned ranks transformation is one nonparametric approach. Ordinal regression is appropriate when there is an ordinal dependent variable. Permutation tests may be applicable in some cases.

• Readers are likely to find a lot of contradictory information in different sources about the hypotheses and assumptions of these tests. In particular, authors will often treat the hypotheses of some tests as corresponding to tests of medians, and then list the assumptions of the test as corresponding to these hypotheses. However, if this is not explicitly explained, the result is that different sources list different assumptions that the underlying populations must meet in order for the test to be valid. This creates unnecessary confusion in the mind of students trying to correctly employ these tests.

### Interpretation of nonparametric tests

In general, these tests determine if there is a *systematic*
difference among groups. This may be due to a difference in location (e.g.
median) or in the shape or spread of the distribution of the data. With the Mann–Whitney
and Kruskal–Wallis tests, the difference among groups that is of interest is
the probability of an observation from one group being larger than an
observation from another group. If this probability is 0.50, this is termed “stochastic
equality”, and when this probability is far from 0.50, it is sometimes called
“stochastic dominance”.

*Optional technical note*: Without additional
assumptions about the distribution of the data, the Mann–Whitney and
Kruskal–Wallis tests do not test hypotheses about the group medians. Mangiafico
(2015) and McDonald (2014) in the “References” section provide an example of a
significant Kruskal–Wallis test where the groups have identical medians, but
differ in their stochastic dominance.

### Effect size statistics

Effect size statistics for traditional nonparametric tests
include Cliff’s *delta* and Vargha and Delaney’s *A* for Mann–Whitney,
and *epsilon*-squared and Freeman’s “coefficient of determination”
(Freeman’s* theta*) (Freeman, 1965) for Kruskal–Wallis. Rank biserial
correlation is appropriate for Mann–Whitney and the paired signed-rank test.
Kendall’s *W* can be used for Friedman’s test.

A couple of accessible resources on effect sizes for these tests are Tomczak and Tomczak (2014) and King and Rosopa (2010).

Some effect size statistics included here determine the degree to which one group has data with higher ranks than other groups. They tend to vary from 0 (groups have data that are stochastically equal) to 1 (one group, the first, stochastically dominates) or –1 (the other, second, group stochastically dominates). They are related to the probability that a value from one group will be greater than a value from another group.

As rank-based measures, these effect size statistics do not
indicate the difference in absolute values between groups. That is, if you
were to replace the *5*’s in the second example below with *100*’s,
the value of the effect size statistics would not change, because in either
case the *5*’s or *100*’s are the highest-ranked numbers. For a
practical interpretation of results, it is usually important to consider the
absolute values of data such as with descriptive statistics.

library(rcompanion)

A = c(1,1,1, 2,2,2, 3,3,3, 4,4,4)

B = c(1,1,1, 2,2,2, 3,3,3, 4,4,4)

cliffDelta(x=A, y=B)

Cliff.delta

0

### This corresponds to a VDA of 0.5,

### the probability of an observation in B being larger than

### an observation in A.

A = c(1,1,1, 2,2,2, 3,3,3, 4,4,4)

B = c(2,2,2, 3,3,3, 4,4,4, 5,5,5)

cliffDelta(x=A, y=B)

Cliff.delta

-0.438

### Note that a negative Cliff’s delta
suggests that the values in

### B tend to be larger than those in A.

### This corresponds to a VDA of 0.281,

### the probability of an observation in A being larger than

### an observation in B.

A = c(1,1,1, 2,2,2, 3,3,3, 4,4,4)

B = c(3,3,3, 4,4,4, 5,5,5, 6,6,6)

cliffDelta(x=A, y=B)

Cliff.delta

-0.75

### This corresponds to a VDA of 0.125,

### the probability of an observation in A being larger than

### an observation in B.

A = c(1,1,1, 2,2,2, 3,3,3, 4,4,4)

B = c(5,5,5, 6,6,6, 7,7,7, 8,8,8)

cliffDelta(x=A, y=B)

Cliff.delta

-1

### This corresponds to a VDA of 0,

### the probability of an observation in A being larger than

### an observation in B.

### Optional: Appropriate use of traditional nonparametric tests

#### Using traditional nonparametric tests with ordinal data

Some authors caution against using traditional nonparametric tests with ordinal dependent variables since many of them were developed for use with continuous (interval/ratio) data. Some authors have further concerns about situations where are likely to be many ties in ranks, such as Likert data.

Other authors argue that, since these tests rank-transform data before analysis and have adjustments for tied ranks, that they are appropriate for ordinal data.

Simulations comparing traditionally nonparametric tests to
ordinal regression are presented in the “Optional: Simulated comparisons of
traditional nonparametric tests and ordinal regression” in the *Introduction
to Likert Data* chapter. And in Mangiafico (2019).

#### Using traditional nonparametric tests with interval/ratio data

These nonparametric tests are commonly used for interval/ratio data when the data fail to meet the assumptions of parametric analysis.

Some authors discourage using common nonparametric tests for interval/ratio data in some circumstances.

• One issue is the interpretation of the results mentioned above. That is, often results are incorrectly interpreted as a difference in medians when they are really describing a stochastic difference in distributions.

• Another problem is the lack of flexibility in designs that these test can handle.

• Finally, these tests may lack power relative to their parametric equivalents.

Given these considerations and the fact that that parametric
statistics are often relatively robust to minor deviations in their
assumptions, some authors argue that it is often better to stick with
parametric analyses for interval/ratio data if it’s possible to make them work.
Often, with a parametric approach, a *generalized *linear model would be appropriate
where *general* linear models aren’t appropriate.

### References

Freeman, L.C. 1965. *Elementary Applied Statitics: For
Students in Behavioral Science*. John Wiley & Sons. New York.

King, B.M., P.J. Rosopa, and E.W. Minium. 2018. Some (Almost)
Assumption-Free Tests. In *Statistical Reasoning in the Behavioral Sciences*,
7th ed. Wiley.

“Kruskal–Wallis Test” in Mangiafico, S.S. 2015. *An R
Companion for the Handbook of Biological Statistics*, version 1.09. rcompanion.org/rcompanion/d_06.html.

“Kruskal–Wallis Test” in McDonald, J.H. 2014. *Handbook of
Biological Statistics*. www.biostathandbook.com/kruskalwallis.html.

Mangiafico, S.S. 2019. How Should We Analyze Likert Item Data? *Journal
of the National Association of County Agricultural Agents* 12(2). www.nacaa.com/journal/index.php?jid=1001.

Tomczak, M. and Tomczak, E. 2014. The need to report effect size estimates revisited. An overview of some recommended measures of effect size. Trends in Sports Sciences 1(21):1–25. www.tss.awf.poznan.pl/files/3_Trends_Vol21_2014__no1_20.pdf.