[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Introduction to Traditional Nonparametric Tests

The ordinal tests presented in this book are common nonparametric tests.  They are primarily rank-based tests that use the ranks of data instead of their numeric values.  This makes them appropriate for data sets where the dependent variable is interval/ratio or ordinal.  Some authors urge caution when using these tests with data where there are likely to be many ties, including Likert data.

 

Nonparametric test assumptions

 

Nonparametric tests do not assume that the underlying data have any specific distribution.  However, it is important to understand the assumptions of each specific test before using it.

 

Advantages of nonparametric tests

 

•  The tests presented in this section are relatively common, and your audience is relatively likely to be familiar with them.

•  They are appropriate for ordinal dependent variables.

 

•  They are robust to outliers, and may be appropriate for censored data.  Censored data is data where there is an upper or lower limit to values.  For example, if ages under 5 are reported as “under 5”.

 

•  Their nonparametric nature makes them appropriate for interval/ratio data that don’t meet the assumptions of parametric analyses.  These include data that are skewed, non-normal, or contain outliers.

 

•  For skewed data, it is more desirable to report medians rather than means.  An example of this with skewed data for household income is shown in the “Statistics of location for interval/ratio data” section of the Descriptive statistics for interval/ratio data chapter. 

The image in the “References” section from Wikimedia Commons shows the actual distribution of household income in the U.S. in 2010.

The video by Dr. Nic in the “References” section discusses cases when it is better to use the median rather than the mean.

 

Interpretation of nonparametric tests

 

It is tempting to think of these common nonparametric tests as testing for a difference in medians between groups, but reporting results this way is not always appropriate.  In general, these tests determine if there is a systematic difference between two groups.  This may be due to a difference in location (e.g. median) or in the shape or spread of the distribution of the data.  It is therefore appropriate to report significant results as, e.g., “There is a significant difference between Likert scores from the pre-test and the post-test."  Or, "The significant Mann–Whitney test indicates that Likert scores from the two classes come from different populations."

 

For the Mann-Whitney, Kruskal-Wallis, and Friedman tests, if the distributions of the groups have the same shape and spread, then it can be assumed that the difference between groups is a difference in medians.  Otherwise, the difference is a difference in distributions.

 

In reality, you should look at the distributions of each group in these tests, with histograms or boxplots, so that your conclusions can accurately reflect the data.  You don't want to imply that differences between two treatments are differences in medians when they are really differences in the shape or spread of the distributions.  On the other hand, if it really does look like the differences a difference in location, you want to be clear about this.

 

Mangiafico (2015) and McDonald (2014) in the “References” section provide an example of a significant Kruskal–Wallis test where the groups have identical medians.

 

Using permutation tests and ordinal regression for ordinal data

 

The following sections of this book, Ordinal Tests with Cumulative Link Models and Permutation Tests describe different approach to handling ordinal data that may be a better approach than using the traditional nonparametric tests described in this section, at least in some cases.

 

Using traditional nonparametric tests with ordinal data

 

Some authors caution against using traditional nonparametric tests with ordinal dependent variables, since many of them were developed for use with continuous (interval/ratio) data.  Other authors argue that, since these tests rank-transform data before analysis and have adjustments for tied ranks, that they are appropriate for ordinal data.  Some authors have further concerns about situations where are likely to be many ties in ranks, such as Likert data.

 

Using traditional nonparametric tests with interval/ratio data

 

These nonparametric tests are commonly used for interval/ratio data when the data fail to meet the assumptions of parametric analysis. 

 

Some authors discourage using common nonparametric tests for interval/ratio data.

 

•  One issue is the interpretation of the results mentioned above.  That is, often results are incorrectly interpreted as a difference in medians when they are really describing a difference in distributions.

 

•  Another problem is the lack of flexibility in designs these test can handle.  For example, there is no common equivalent for a parametric two-way analysis of variance.

 

•  Finally, these tests may lack power relative to their parametric equivalents.

 

Given these considerations and the fact that that parametric statistics are often relatively robust to minor deviations in their assumptions, some authors argue that it is often better to stick with parametric analyses for interval/ratio data if it’s possible to make them work.

 

References

 

“Kruskal–Wallis Test” in Mangiafico, S.S. 2015. An R Companion for the Handbook of Biological Statistics, version 1.09. rcompanion.org/rcompanion/d_06.html.

 

“Small numbers in chi-square and G–tests” in McDonald, J.H. 2014. Handbook of Biological Statistics. www.biostathandbook.com/kruskalwallis.html.

 

“Distribution of Annual Household Income in the United States 2010”. 2011. Wikimedia Commons. upload.wikimedia.org/wikipedia/commons/0/0d/Distribution_of_Annual_Household_Income_in_the_United_States_2010.png.

 

"The median outclasses the mean" from Dr. Nic. 2013. Learn and Teach Statistics & Operations Research. learnandteachstatistics.wordpress.com/2013/04/29/median/.