Two-sample paired Wilcoxon signed rank test in SAEPER
For a discussion of this test, see the corresponding chapter in Summary and Analysis of Extension Program Evaluation in R (rcompanion.org/handbook/F_06.html).
Importing packages in this chapter
The following commands will import required packages used in this chapter from libraries and assign them common aliases. You may need install these libraries first.
import io
import os
import numpy as np
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
import pingouin as pg
import seaborn as sns
Setting your working directory
You may wish to set your working directory for exported plots.
os.chdir("C:/Users/Sal Mangiafico/Desktop")
print(os.getcwd())
Example of paired Wilcoxon signed rank test
Data = pd.read_table(sep="\\s+", filepath_or_buffer=io.StringIO("""
Speaker Time Student Likert
Pooh 1 a 1
Pooh 1 b 4
Pooh 1 c 3
Pooh 1 d 3
Pooh 1 e 3
Pooh 1 f 3
Pooh 1 g 4
Pooh 1 h 3
Pooh 1 i 3
Pooh 1 j 3
Pooh 2 a 4
Pooh 2 b 5
Pooh 2 c 4
Pooh 2 d 5
Pooh 2 e 4
Pooh 2 f 5
Pooh 2 g 3
Pooh 2 h 4
Pooh 2 i 3
Pooh 2 j 4
"""))
### Convert Speaker, Student, and Time to category type
Data['Speaker'] = Data['Speaker'].astype('category')
Data['Student'] = Data['Student'].astype('category')
Data['Time'] = Data['Time'].astype('category')
print(Data.info())
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Speaker 20 non-null category
1 Time 20 non-null category
2 Student 20 non-null category
3 Likert 20 non-null int64
Number of observations per group
It is helpful to check the data to be sure there is one observation per student per time.
pd.crosstab(Data['Time'], Data['Student'])
Student a b c d e f g h i j
Time
1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1
Plot the paired data
For plotting and the subsequent analysis, we’ll extract arrays for Time 1 and Time 2. It’s important at this point that data are ordered so that the first observation in Time 1 is paired with the first observation in Time 2, and so on.
Time1 = np.array(Data['Likert'][Data['Time']==1])
Time2 = np.array(Data['Likert'][Data['Time']==2])
Difference = Time2 - Time1
Scatter plot with one-to-one line
We’ll have to jitter one of the arrays so that all the points will be displayed on the plot.
Note that in the scatter plot, the points tend to be above and to the left of the one-to-one line, suggesting that Time 2 tends to have higher values than Time 1.
Time1Jitter = Time1 + np.random.normal(0, 0.3, Time2.shape)
Simple seaborn call
sns.scatterplot(x=Time1Jitter, y=Time2, color='#000000')
plt.axline(xy1=(0,0), slope=1)
plt.show()
Formatting and export as file
sns.scatterplot(x=Time1Jitter, y=Time2, color='#000000')
plt.title('')
plt.xlabel("\nTime 1")
plt.ylabel('Time 2\n')
plt.xlim(-0.2, 5.2)
plt.ylim(-0.2, 5.2)
plt.axline(xy1=(0,0), slope=1)
plt.tight_layout()
plt.savefig('ScatterAndLine.png', format='png', dpi=300)
plt.show()

Bar plot of differences
Note that there are higher counts for differences greater than zero than less than zero, suggesting that the difference (Time 2 – Time 1) tends to positive, suggesting that Time 2 tends to have higher values than Time 1.
Simple seaborn call
sns.countplot(x=Difference)
Formatting and export as file
sns.set_theme(style='darkgrid')
plt.figure(figsize=(5, 3.75))
sns.countplot(x=Difference)
plt.title('')
plt.xlabel("\nDifference in score (Time 2 – Time 1)")
plt.ylabel('Frequency\n')
plt.tight_layout()
plt.savefig('BarPlotDifferences.png', format='png', dpi=300)
plt.show()

Paired-samples Wilcoxon signed-rank test
Using pingouin
pg.wilcoxon(x=Time1, y=Time2)
W-val alternative p-val RBC CLES
Wilcoxon 3.5 two-sided 0.023552 -0.844444 0.16
Note that the effect size statistic, rank biserial correlation coefficient, RBC, is reported by default.
Using pysci.stats
Note, that the continuty correction is not used by default, whereas it is used by default in pingouin.
stats.wilcoxon(Time1, Time2, correction=True)
WilcoxonResult(statistic=3.5, pvalue=0.020041916312799807)