### When to use this test

The two-sample Mann–Whitney U test compares values for two groups. A significant result suggests that the values for the two groups are different. It is equivalent to a two-sample Wilcoxon rank-sum test.

In the context of this book, the test is useful to compare the scores or ratings from two speakers, two different presentations, or two groups of audiences.

If the shape and spread of the distributions of values of each group is similar, then the test compares the medians of the two groups. Otherwise, the test is really testing if there is a systematic difference in the values of the two groups.

The test assumes that the observations are independent. That is, it is not appropriate for paired observations or repeated measures data.

The test is performed with the *wilcox.test* function.

If the distributions of values of each group are similar in shape, but have outliers, then Mood’s median test is an appropriate alternative.

##### Appropriate data

• Two-sample data. That is, one-way data with two groups only

• Dependent variable is ordinal, interval, or ratio

• Independent variable is a factor with two levels. That is, two groups

• Observations between groups are independent. That is, not paired or repeated measures data

• In order to be a test of medians, the distributions of values for each group need to be of similar shape and spread; outliers affect the spread. Otherwise the test is a test of distributions.

##### Hypotheses

*If the distributions of the two groups are similar in
shape and spread:*

• Null hypothesis: The medians of values for each group are equal.

• Alternative hypothesis (two-sided): The medians of values for each group are not equal.

*If the distributions of the two groups are not similar in
shape and spread:*

• Null hypothesis: The distribution of values for each group are equal.

• Alternative hypothesis (two-sided): There is systematic difference in the distribution of values for the groups.

##### Interpretation

*If the distributions of the two groups are similar in
shape:*

Significant results can be reported as e.g. “The median value of group A was significantly different from that of group B.”

*If the distributions of the two groups are not similar in
shape:*

Significant results can be reported as e.g. “Values for group A were significantly different from those for group B.”

##### Other notes and alternative tests

The Mann–Whitney U test can be considered equivalent to the Kruskal–Wallis test with only two groups.

Mood’s median test compares the medians of two groups. It is described in the next chapter.

Another alternative is to use cumulative link models for ordinal data, which are described later in this book.

### Packages used in this chapter

The packages used in this chapter include:

• psych

• FSA

• lattice

The following commands will install these packages if they are not already installed:

if(!require(psych)){install.packages("psych")}

if(!require(FSA)){install.packages("FSA")}

if(!require(lattice)){install.packages("lattice")}

### Two-sample Mann–Whitney U test example

This example re-visits the Pooh and Piglet data from the *Descriptive
Statistics with the likert Package* chapter.

It answers the question, “Are Pooh's scores significantly different from those of Piglet?”

The Mann–Whitney U test is conducted with the *wilcox.test*
function, which produces a *p*-value for the hypothesis. First the
data are summarized and examined using bar plots for each group.

Because the bar plots show that the distributions of scores for Pooh and Piglet are relatively similar in shape, the Mann–Whitney U test can be interpreted as a test of medians.

Input =("

Speaker Likert

Pooh 3

Pooh 5

Pooh 4

Pooh 4

Pooh 4

Pooh 4

Pooh 4

Pooh 4

Pooh 5

Pooh 5

Piglet 2

Piglet 4

Piglet 2

Piglet 2

Piglet 1

Piglet 2

Piglet 3

Piglet 2

Piglet 2

Piglet 3

")

Data = read.table(textConnection(Input),header=TRUE)

### Create a new variable which is the Likert
scores as an ordered factor

Data$Likert.f = factor(Data$Likert,

ordered = TRUE)

### Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)

### Remove unnecessary objects

rm(Input)

#### Summarize data treating Likert scores as factors

xtabs( ~ Speaker + Likert.f,

data = Data)

Likert.f

Speaker 1 2 3 4 5

Piglet 1 6 2 1 0

Pooh 0 0 1 6 3

XT = xtabs( ~ Speaker + Likert.f,

data = Data)

prop.table(XT,

margin = 1)

Likert.f

Speaker 1 2 3 4 5

Piglet 0.1 0.6 0.2 0.1 0.0

Pooh 0.0 0.0 0.1 0.6 0.3

#### Bar plots of data by group

library(lattice)

histogram(~ Likert.f | Speaker,

data=Data,

layout=c(1,2) # columns and rows of
individual plots

)

#### Summarize data treating Likert scores as numeric

library(FSA)

Summarize(Likert ~ Speaker,

data=Data,

digits=3)

Speaker n mean sd min Q1 median Q3 max percZero

1 Piglet 10 2.3 0.823 1 2 2 2.75 4 0

2 Pooh 10 4.2 0.632 3 4 4 4.75 5 0

#### Two-sample Mann–Whitney U test example

This example uses the formula notation indicating that *Likert*
is the dependent variable and *Speaker* is the independent variable. The *data=*
option indicates the data frame that contains the variables. For the meaning
of other options, see *?wilcox.test*.

wilcox.test(Likert ~ Speaker,

data=Data)

Wilcoxon rank sum test with continuity correction

W = 5, p-value = 0.0004713

alternative hypothesis: true location shift is not equal to 0

### You may get a "cannot compute exact p-value
with ties" error.

### You can ignore this or use the exact=FALSE option.

### Exercises J

1. Considering Pooh and Piglet’s data,

a. What was the median score for each instructor?

b. What were the first and third quartiles for each
instructor’s scores?

c. Are the data for both instructors reasonably
similar in shape and spread?

d. Based on your previous answer, what is the null hypothesis
for the Mann–Whitney test?

e. According to the Mann–Whitney test, is there a difference
in scores between the instructors?

f. How would you summarize the results of the descriptive statistics and tests? Include practical considerations of any differences.

2. Brian and Stewie Griffin want to assess the education level of students in
their courses on creative writing for adults. They want to know the median
education level for each class, and if the education level of the classes were
different between instructors.

They used the following table to code his data.

Code Abbreviation Level

1 < HS Less than high school

2 HS High school

3 BA Bachelor’s

4 MA Master’s

5 PhD Doctorate

The following are the course data.

Instructor Student Education

'Brian Griffin' a 3

'Brian Griffin' b 2

'Brian Griffin' c 3

'Brian Griffin' d 3

'Brian Griffin' e 3

'Brian Griffin' f 3

'Brian Griffin' g 4

'Brian Griffin' h 5

'Brian Griffin' i 3

'Brian Griffin' j 4

'Brian Griffin' k 3

'Brian Griffin' l 2

'Stewie Griffin' m 4

'Stewie Griffin' n 5

'Stewie Griffin' o 4

'Stewie Griffin' p 4

'Stewie Griffin' q 4

'Stewie Griffin' r 4

'Stewie Griffin' s 3

'Stewie Griffin' t 5

'Stewie Griffin' u 4

'Stewie Griffin' v 4

'Stewie Griffin' w 3

'Stewie Griffin' x 2

For each of the following, answer the question, and ** show
the output from the analyses you used to answer the question**.

a. What was the median score for each instructor? (Be sure to report the education level, not just the numeric code!)

b. What were the first and third quartiles for each
instructor’s scores?

c. Are the data for both instructors reasonably
similar in shape and spread?

d. Based on your previous answer, what is the null hypothesis
for the Mann–Whitney test?

e. According to the Mann–Whitney test, is there a difference
in scores between the instructors?

f. Plot Brian and Stewie’s data in a way that helps you visualize the data. Do the results reflect what you would expect from looking at the plot?

g. How would you summarize the results of the descriptive statistics and tests? Include your practical interpretation.