[banner]

A Python Companion to Extension Program Evaluation

Salvatore S. Mangiafico

Friedman Test

Friedman test in SAEPER

 

For a discussion of this test, see the corresponding chapter in Summary and Analysis of Extension Program Evaluation in R (rcompanion.org/handbook/F_10.html).

 

Importing packages in this chapter

 

The following commands will import required packages used in this chapter from libraries and assign them common aliases.  You may need install these libraries first.

 

import io

 

import os

 

import numpy as np

 

import scipy.stats as stats

 

import pandas as pd

 

import pingouin as pg

 

import scikit_posthocs as sp

 

import matplotlib.pyplot as plt

 

import seaborn as sns

 

 

Setting your working directory

 

You may wish to set your working directory for exported plots.

 

os.chdir("C:/Users/Sal Mangiafico/Desktop")

 

print(os.getcwd())

 

 

Example of Friedman test

 

Data = pd.read_table(sep="\\s+", filepath_or_buffer=io.StringIO("""

 

Instructor        Rater  Likert
 "Bob Belcher"        a      4
 "Bob Belcher"        b      5
 "Bob Belcher"        c      4
 "Bob Belcher"        d      6
 "Bob Belcher"        e      6
 "Bob Belcher"        f      6
 "Bob Belcher"        g     10
 "Bob Belcher"        h      6
 "Linda Belcher"      a      8
 "Linda Belcher"      b      6
 "Linda Belcher"      c      8
 "Linda Belcher"      d      8
 "Linda Belcher"      e      8
 "Linda Belcher"      f      7
 "Linda Belcher"      g     10
 "Linda Belcher"      h      9
 "Tina Belcher"       a      7
 "Tina Belcher"       b      5
 "Tina Belcher"       c      7
 "Tina Belcher"       d      8
 "Tina Belcher"       e      8
 "Tina Belcher"       f      9
 "Tina Belcher"       g     10
 "Tina Belcher"       h      9
 "Gene Belcher"       a      6
 "Gene Belcher"       b      4
 "Gene Belcher"       c      5
 "Gene Belcher"       d      5
 "Gene Belcher"       e      6
 "Gene Belcher"       f      6
 "Gene Belcher"       g      5
 "Gene Belcher"       h      5
 "Louise Belcher"     a      8
 "Louise Belcher"     b      7
 "Louise Belcher"     c      8
 "Louise Belcher"     d      8
 "Louise Belcher"     e      9
 "Louise Belcher"     f      9
 "Louise Belcher"     g      8
 "Louise Belcher"     h     10 
"""))

 

### Convert Instructor and Rater to category type

 

Data['Instructor']  = Data['Instructor'].astype('category')

 

Data['Rater']  = Data['Rater'].astype('category')

 

 

### Create new variable, Likert as a category variable

 

Data['Likert.f']  = Data['Likert'].astype('category')

 

 

### Order Speaker by desired values

 

InstructorLevels = ['Bob Belcher', 'Linda Belcher', 'Tina Belcher',

                 'Gene Belcher', 'Louise Belcher']

 

Data['Instructor'] = Data['Instructor'].cat.reorder_categories(InstructorLevels)

 

 

print(Data['Instructor'].cat.categories)

Index(['Bob Belcher', 'Linda Belcher', 'Tina Belcher', 'Gene Belcher',

       'Louise Belcher'],

      dtype='object')

 

 

print(Data.info())

 

 #   Column      Non-Null Count  Dtype  

---  ------      --------------  -----  

 0   Instructor  40 non-null     category

 1   Rater       40 non-null     category

 2   Likert      40 non-null     int64  

 3   Likert.f    40 non-null     category

 

 

 

Summarize data treating Likert scores as categories

 

pd.crosstab(Data['Instructor'], Data['Likert.f'])

 

Likert.f        4  5  6  7  8  9  10

Instructor                         

Bob Belcher     2  1  4  0  0  0   1

Linda Belcher   0  0  1  1  4  1   1

Tina Belcher    0  1  0  2  2  2   1

Gene Belcher    1  4  3  0  0  0   0

Louise Belcher  0  0  0  1  4  2   1

 

 

pd.crosstab(Data['Instructor'], Data['Likert.f'], normalize='index')

 

Likert.f            4      5      6      7     8      9     10

Instructor                                                   

Bob Belcher     0.250  0.125  0.500  0.000  0.00  0.000  0.125

Linda Belcher   0.000  0.000  0.125  0.125  0.50  0.125  0.125

Tina Belcher    0.000  0.125  0.000  0.250  0.25  0.250  0.125

Gene Belcher    0.125  0.500  0.375  0.000  0.00  0.000  0.000

Louise Belcher  0.000  0.000  0.000  0.125  0.50  0.250  0.125

 

 

Bar plots of data by group

 

sns.set_theme(style='white')

 

Plot = sns.FacetGrid(data=Data, row='Instructor',

                  margin_titles=True, height=2, aspect= 2)

 

Plot.map(sns.countplot, 'Likert.f')

 

Plot.tight_layout()

 

Plot.savefig('LikertBarBelcher.png', format='png', dpi=300)

 

Image007

 

 

Summarize data treating Likert scores as numeric

 

Summary = Data.groupby('Instructor')['Likert'].describe()

 

print(Summary)

 

                count   mean       std  min   25%  50%   75%   max

Instructor                                                       

Bob Belcher       8.0  5.875  1.885092  4.0  4.75  6.0  6.00  10.0

Linda Belcher     8.0  8.000  1.195229  6.0  7.75  8.0  8.25  10.0

Tina Belcher      8.0  7.875  1.552648  5.0  7.00  8.0  9.00  10.0

Gene Belcher      8.0  5.250  0.707107  4.0  5.00  5.0  6.00   6.0

Louise Belcher    8.0  8.375  0.916125  7.0  8.00  8.0  9.00  10.0

 

 

Friedman test example

 

Using pingouin

 

pg.friedman(data=Data, dv="Likert", within="Instructor", subject="Rater")

 

              Source        W  ddof1          Q     p-unc

Friedman  Instructor  0.72309      4  23.138889  0.000119

 

### Note that the effective size statistic, Kendall’s W, is included in the output.

 

 

Using pysci.stats

 

Bob    = np.array(Data['Likert'][Data['Instructor']== 'Bob Belcher'])

Linda  = np.array(Data['Likert'][Data['Instructor']== 'Linda Belcher'])

Tina   = np.array(Data['Likert'][Data['Instructor']== 'Tina Belcher'])

Gene   = np.array(Data['Likert'][Data['Instructor']== 'Gene Belcher'])

Louise = np.array(Data['Likert'][Data['Instructor']== 'Louise Belcher'])

 

stats.friedmanchisquare(Bob, Linda, Tina, Gene, Louise)

 

FriedmanchisquareResult(statistic=23.138888888888907, pvalue=0.00011878735218879764)

 

 

Stat, Pvalue = stats.friedmanchisquare(Bob, Linda, Tina, Gene, Louise)

 

round(Stat, 3)

 

23.139

 

round(Pvalue, 6)

 

0.000119

 

 

Post-hoc tests for multiple comparisons of groups

 

Some results below differ from those reported by R.  This may have to do with differences p-value adjustment methods.  For similar tests, with the p_adjust=None option, the results will be the same as the test in R using the p.adjust.method="none" option.

 

The following call will prevent pandas from truncating the output.

 

pd.set_option('display.max_columns', 500)

 

 

The following will order the Instructor categories by their median responses.  It appears, though, that this ordering isn’t used in the following post-hoc functions.

 

InstructorLevels = ['Linda Belcher', 'Louise Belcher','Tina Belcher',

                 'Bob Belcher','Gene Belcher']

 

Data['Instructor'] = Data['Instructor'].cat.reorder_categories(InstructorLevels)

 

 

print(Data['Instructor'].cat.categories)

 

Index(['Linda Belcher', 'Louise Belcher', 'Tina Belcher', 'Bob Belcher',

       'Gene Belcher'],

      dtype='object')

 

 

Conover test

 

Several different p-value adjustment methods are available. See the function documentation for the options.

 

sp.posthoc_conover_friedman(Data, melted=True,

                            y_col='Likert', group_col='Instructor',

                            block_col='Rater', block_id_col='Rater',

                            p_adjust=None)

                 

                Bob Belcher  Linda Belcher  Tina Belcher  Gene Belcher  Louise Belcher

Bob Belcher        1.000000       0.000085      0.000925  1.932115e-01   4.987865e-06

Linda Belcher      0.000085       1.000000      0.381682  2.237578e-06   3.086386e-01

Tina Belcher       0.000925       0.381682      1.000000  2.509627e-05   6.434725e-02

Gene Belcher       0.193212       0.000002      0.000025  1.000000e+00   1.433972e-07

Louise Belcher     0.000005       0.308639      0.064347  1.433972e-07   1.000000e+00

 

 

Nemenyi test

 

sp.posthoc_nemenyi_friedman(Data, melted=True,

                            y_col='Likert', group_col='Instructor',

                            block_col='Rater', block_id_col='Rater')

 

                Bob Belcher  Linda Belcher  Tina Belcher  Gene Belcher  Louise Belcher

Bob Belcher        1.000000       0.102135      0.277518      0.953951  0.022372

Linda Belcher      0.102135       1.000000      0.989665      0.013570  0.981551 

Tina Belcher       0.277518       0.989665      1.000000      0.055728  0.842625

Gene Belcher       0.953951       0.013570      0.055728      1.000000  0.001894

Louise Belcher     0.022372       0.981551      0.842625      0.001894  1.000000

 

 

Siegel test

 

Several different p-value adjustment methods are available. See the function documentation for the options.

 

 sp.posthoc_siegel_friedman(Data, melted=True,

                             y_col='Likert', group_col='Instructor',

                             block_col='Rater', block_id_col='Rater',

                             p_adjust=None)

 

 

                Bob Belcher  Linda Belcher  Tina Belcher  Gene Belcher Louise Belcher 

Bob Belcher        1.000000       0.014255      0.048107      0.476767   0.002663 

Linda Belcher      0.014255       1.000000      0.635256      0.001565   0.579991

Tina Belcher       0.048107       0.635256      1.000000      0.007190   0.304072

Gene Belcher       0.476767       0.001565      0.007190      1.000000   0.000203 

Louise Belcher     0.002663       0.579991      0.304072      0.000203   1.000000

 

 

Miller test

 

sp.posthoc_miller_friedman(Data, melted=True,

                            y_col='Likert', group_col='Instructor',

                            block_col='Rater', block_id_col='Rater')

  

 

                Bob Belcher  Linda Belcher  Tina Belcher  Gene Belcher  Louise Belcher

Bob Belcher        1.000000       0.198682      0.418842      0.972890   0.060478

Linda Belcher      0.198682       1.000000      0.994127      0.040428   0.989407

Tina Belcher       0.418842       0.994127      1.000000      0.124465   0.901150

Gene Belcher       0.972890       0.040428      0.124465      1.000000   0.007940

Louise Belcher     0.060478       0.989407      0.901150      0.007940   1.000000

 

 

Example from Conover

 

This example is taken from the Friedman test section of Conover (1999).  Note, here, that the data aren’t in long format, but in wide format.  For data in this format, the easiest thing to subset the data into a data frame called Conover1, with just the columns of observations.

 

Conover = pd.read_table(sep="\\s+", filepath_or_buffer=io.StringIO("""

Homeowner Grass1 Grass2 Grass3 Grass4

 1        4      3      2      1

 2        4      2      3      1

 3        3      1.5    1.5    4

 4        3      1      2      4

 5        4      2      1      3

 6        2      2      2      4

 7        1      3      2      4

 8        2      4      1      3

 9        3.5    1      2      3.5

10        4      1      3      2

11        4      2      3      1

12        3.5    1      2      3.5

"""))

 

Columns = ['Grass1', 'Grass2', 'Grass3', 'Grass4']

 

Conover1 = Conover[Columns]

 

Conover1

 

    Grass1  Grass2  Grass3  Grass4

0      4.0     3.0     2.0     1.0

1      4.0     2.0     3.0     1.0

2      3.0     1.5     1.5     4.0

3      3.0     1.0     2.0     4.0

4      4.0     2.0     1.0     3.0

5      2.0     2.0     2.0     4.0

6      1.0     3.0     2.0     4.0

7      2.0     4.0     1.0     3.0

8      3.5     1.0     2.0     3.5

9      4.0     1.0     3.0     2.0

10     4.0     2.0     3.0     1.0

11     3.5     1.0     2.0     3.5

 

 

pg.friedman(Conover1)

 

          Source         W  ddof1         Q     p-unc

Friedman  Within  0.224926      3  8.097345  0.044042

 

 

sp.posthoc_conover_friedman(Conover1)  

 

          Grass1    Grass2    Grass3    Grass4

Grass1  1.000000  0.014895  0.022603  0.483434

Grass2  0.014895  1.000000  0.860437  0.071737

Grass3  0.022603  0.860437  1.000000  0.101742

 

 

References

 

Conover, W.J. 1999. Practical Nonparametric Statistics, 3rd. John Wiley & Sons.