[banner]

A Python Companion to Extension Program Evaluation

Salvatore S. Mangiafico

Basic Plots

Basic plots in SAEPER

 

For a discussion of the following topics, see the corresponding chapter in Summary and Analysis of Extension Program Evaluation in R (rcompanion.org/handbook/C_04.html).

 

•  The need to understand plots

•  Some advice on producing plots

•  Describing histogram shapes

•  Misleading and disorienting plots

 

Importing packages in this chapter

 

The following commands will import required packages used in this chapter from libraries and assign them common aliases.  You may need install these libraries first.

 

import io

 

import os

 

import numpy as np

 

import scipy.stats as stats

 

import pandas as pd

 

import matplotlib.pyplot as plt

 

import seaborn as sns

 

from statsmodels.graphics.mosaicplot import mosaic

 

 

Setting your working directory

 

You may wish to set your working directory for exported plots.

 

os.chdir("C:/Users/Sal Mangiafico/Desktop")

 

print(os.getcwd())

 

 

Themes and formatting in seaborn and pyplot

 

When using the seaborn library the basic theme for the plot can be set with seaborn.set_theme().  The theme styles are “white”, “dark”, “whitegrid”, “darkgrid”, and “ticks”.

 

Further formatting, such as defining and formatting the y-axis labels, can achieved with options in the pyplot package in matplotlib. Some formatting is shown is the examples below.

 

Examples of basic plots for interval/ratio and ordinal data

 

Data = pd.read_table(sep="\\s+", filepath_or_buffer=io.StringIO("""

Student  Gender  Teacher  Steps  Rating
a        female  Catbus    8000   7
b        female  Catbus    9000  10
c        female  Catbus   10000   9
d        female  Catbus    7000   5
e        female  Catbus    6000   4
f        female  Catbus    8000   8
g        male    Catbus    7000   6
h        male    Catbus    5000   5
i        male    Catbus    9000  10
j        male    Catbus    7000   8
k        female  Satsuki   8000   7
l        female  Satsuki   9000   8
m        female  Satsuki   9000   8
n        female  Satsuki   8000   9
o        male    Satsuki   6000   5
p        male    Satsuki   8000   9
q        male    Satsuki   7000   6
r        female  Totoro   10000  10
s        female  Totoro    9000  10
t        female  Totoro    8000   8
u        female  Totoro    8000   7
v        female  Totoro    6000   7
w        male    Totoro    6000   8
x        male    Totoro    8000  10
y        male    Totoro    7000   7
z        male    Totoro    7000   7
"""))

 

### Convert Gender and Teacher to category type

 

Data['Gender']  = Data['Gender'].astype('category')

 

Data['Teacher'] = Data['Teacher'].astype('category')

 

 

### Order Teacher by its order in the orginal data

 

TeacherLevels = ['Catbus', 'Satsuki', 'Totoro']

 

Data['Teacher'] = Data['Teacher'].cat.reorder_categories(TeacherLevels)

 

 

### Change the level of Gender to be capitalized

 

GenderNames = {'female': 'Female', 'male': 'Male'}

 

Data['Gender']  = Data['Gender'].cat.rename_categories(GenderNames)

 

 

### Display some summary statistics for the data frame

 

print(Data.info())

 

#   Column   Non-Null Count  Dtype  

---  ------   --------------  -----  

 0   Student  26 non-null     object 

 1   Gender   26 non-null     category

 2   Teacher  26 non-null     category

 3   Steps    26 non-null     int64  

 4   Rating   26 non-null     int64

 

 

print(Data['Gender'].cat.categories)

 

Index(['Female', 'Male'], dtype='object')

 

 

print(Data['Teacher'].cat.categories)

 

Index(['Catbus', 'Satsuki', 'Totoro'], dtype='object')

 

 

Histograms

 

Simple histogram

 

Simple seaborn call

 

sns.histplot(data=Data, x='Steps')

 

 

Formatting and export as file

 

sns.set_theme(style='white')

 

plt.figure(figsize=(5, 3.75))

 

sns.histplot(data=Data, x='Steps')

plt.title('')

plt.xlabel('\nSteps')

plt.ylabel('')

plt.tight_layout()

 

plt.savefig('StepsHist.png', format='png', dpi=300)

 

plt.show()

 

Plot001

 

 

Histogram with density line

 

Simple seaborn call

 

sns.histplot(data=Data, x='Steps', kde=True)

 

 

Formatting and export as file

 

sns.set_theme(style='white')

 

plt.figure(figsize=(5, 3.75))

 

sns.histplot(data=Data, x='Steps', kde=True)

plt.title('')

plt.xlabel('\nSteps')

plt.ylabel('')

plt.tight_layout()

 

plt.savefig('StepsHistDensity.png', format='png', dpi=300)

 

plt.show()

 

Plot002

 

 

Histogram with normal curve

 

Simple seaborn call

 

Plot = sns.histplot(x=Data['Steps'], stat='density')

 

Sum = Data['Steps'].describe()

Curve = np.linspace(Sum['min'], Sum['max'])

Plot.plot(Curve, stats.norm.pdf(Curve, Sum['mean'], Sum['std']))

 

 

Formatting and export as file

 

sns.set_theme(style='white')

 

plt.figure(figsize=(5, 3.75))

 

sns.histplot(x=Data['Steps'], stat='density')

 

Sum = Data['Steps'].describe()

Curve = np.linspace(Sum['min'], Sum['max'])

 

plt.plot(Curve, stats.norm.pdf(Curve, Sum['mean'], Sum['std']))

plt.title('')

plt.xlabel('\nSteps')

plt.ylabel('')

plt.tight_layout()

 

plt.savefig('StepsHistNormal.png', format='png', dpi=300)

 

plt.show()

 

Plot003

 

 

Histograms for one-way data

 

Simple seaborn call

 

Plot = sns.FacetGrid(data=Data, row='Gender',

                  margin_titles=True, height=2, aspect= 2)

 

Plot.map(sns.histplot, 'Steps')

 

 

Formatting and export as file

 

sns.set_theme(style='white')

 

Plot = sns.FacetGrid(data=Data, row='Gender',

                  margin_titles=True, height=2, aspect= 2)

 

Plot.map(sns.histplot, 'Steps')

 

Plot.tight_layout()

 

Plot.savefig('StepsHistFacetOne.png', format='png', dpi=300)

 

Plot004

 

 

Histograms for two-way data

 

Simple seaborn call

 

Plot = sns.FacetGrid(data=Data, row='Gender', col='Teacher',

                  margin_titles=True, height=2, aspect= 2)

 

Plot.map(sns.histplot, 'Steps')

 

 

Formatting and export as file

 

sns.set_theme(style='white')

 

Plot = sns.FacetGrid(data=Data, row='Gender', col='Teacher',

                  margin_titles=True, height=2, aspect= 1.5)

 

Plot.map(sns.histplot, 'Steps')

 

Plot.tight_layout()

 

Plot.savefig('StepsHistFacetTwo.png', format='png', dpi=300)

 

Plot005

 

 

Box plots

 

Box plots can also be made with the seaborn.catplot(), as used in the plots below for means with the option kind='box'.  Some options would need to be changed. 

 

Simple box plot

 

Simple seaborn call

 

sns.boxplot(data=Data, y='Steps', orient='v' , width=0.2)

 

 

Formatting and export as file

 

sns.set_theme(style='darkgrid')

 

plt.figure(figsize=(5, 3.75))

 

sns.boxplot(y=Data['Steps'], width=0.2)

plt.title('')

plt.xlabel('')

plt.ylabel('Steps\n')

plt.tight_layout()

 

plt.savefig('Boxplot.png', format='png', dpi=300)

 

plt.show()

 

Plot006

 

 

Box plot with mean

 

Simple seaborn call

 

sns.boxplot(y=Data['Steps'], width=0.2, showmeans=True)

 

 

Formatting and export as file

 

sns.set_theme(style='darkgrid')

 

sns.boxplot(y=Data['Steps'], width=0.2, showmeans=True,

              meanprops={"marker": "+",

                       "markeredgecolor": "black",

                       "markersize": "10"})

plt.title('')

plt.xlabel('')

plt.ylabel('Steps\n')

plt.tight_layout()

 

plt.savefig('BoxplotMean.png', format='png', dpi=300)

 

plt.show()

 

Plot007

 

 

Box plot for one-way data

 

Simple seaborn call

 

sns.boxplot(data=Data, x='Gender', y='Steps', width=0.3)

 

 

Formatting and export as file

 

sns.set_theme(style='darkgrid')

 

plt.figure(figsize=(5, 4))

 

sns.boxplot(data=Data, x='Gender', y='Steps', width=0.5, gap=.2)

plt.title('')

plt.xlabel('\nGender')

plt.ylabel('Steps\n')

plt.tight_layout()

 

plt.savefig('BoxplotOneWay.png', format='png', dpi=300)

 

plt.show()

 

Plot008

 

 

Box plot for two-way data

 

Simple seaborn call

 

sns.boxplot(data=Data, x='Gender', y='Steps', hue='Teacher', width=0.7, gap=.2)

 

 

Formatting and export as file

 

sns.set_theme(style='darkgrid')

 

plt.figure(figsize=(5, 4))

 

sns.boxplot(data=Data, x='Gender', y='Steps', hue='Teacher', width=0.7, gap=.2)

plt.title('')

plt.xlabel('\nGender')

plt.ylabel('Steps\n')

plt.tight_layout()

 

plt.savefig('BoxplotTwoWay.png', format='png', dpi=300)

 

plt.show()

 

Plot009

 

 

Plot of means and interaction plots

 

A variety of types of plots can be made with seaborn.catplot(). These include:

 

•  strip plot, kind = 'strip', (the default)

•  swarm plot, kind = 'swarm'

•  box plot, kind = 'box'

•  violin plot, kind = 'violin'

•  boxen plot, or letter-value plot, kind = 'boxen'

•  bar plot, kind = 'bar'

•  count plot, kind = 'count'

 

Error bars are displayed by default in seaborn.catplot().  The default is to use 95% confidence intervals by bootstrap.

 

•  standard deviation, errorbar = ('sd')

•  two times the standard deviation, errorbar = ('sd', 2)

•  standard error, errorbar = ('se')

•  traditional 95% confidence interval of the mean , errorbar = ('se', 1.96)

•  confidence interval by bootstrap, errorbar = ('ci')

•  confidence interval by bootstrap, errorbar = ('ci', n_boot = 5000)

•  confidence interval of the median by bootstrap, errorbar = ('ci'), estimator = 'median'

 

Means for one-way data

 

Simple seaborn call

 

sns.catplot(data=Data, x='Gender', y='Steps', kind="point",

            errorbar=('se', 1.96), capsize=0.12, linestyles='none')

 

 

Formatting and export as file

 

sns.set_theme(style='darkgrid')

 

Plot = sns.catplot(data=Data, x='Gender', y='Steps', kind="point",

                   errorbar=('se', 1.96), capsize=0.12, linestyles='none',

                   height=4, aspect=1.33)

 

Plot.set_titles('')

Plot.set_xlabels('\nGender')

Plot.set_ylabels('Steps\n')

 

Plot.tight_layout()

 

Plot.savefig('StepsMeanGender.png', format='png', dpi=300)

 

Plot010

 

 

Means for two-way data

 

Simple seaborn call

 

sns.catplot(data=Data, x='Gender', y='Steps', hue='Teacher', kind="point",

            errorbar=('se', 1.96), capsize=0.12, linestyles='none',

            dodge=0.3)

 

 

Formatting and export as file

 

sns.set_theme(style='darkgrid')

 

Plot = sns.catplot(data=Data, x='Gender', y='Steps', hue='Teacher', kind="point",

                   errorbar=('se', 1.96), capsize=0.12, linestyles='none',

                   dodge=0.3, height=4, aspect=1.33)

 

Plot.set_titles('')

Plot.set_xlabels('\nGender')

Plot.set_ylabels('Steps\n')

 

Plot.tight_layout()

 

Plot.savefig('StepsMeanGenderTeacher.png', format='png', dpi=300)

 

Plot011

 

 

Bar plot of means

 

These plots will also be made with seaborn.catplot(). See notes above on the options for plot types and error bars.

 

One-way data

 

Simple seaborn call

 

sns.catplot(data=Data, x='Gender', y='Steps', kind='bar',

                   errorbar=('se', 1.96), capsize=0.12, dodge=True)

 

 

Formatting and export as file

 

sns.set_theme(style='darkgrid')

 

Plot = sns.catplot(data=Data, x='Gender', y='Steps', kind='bar',

                   errorbar=('se', 1.96), capsize=0.12, dodge=True, height=4,

                   aspect=1)

 

Plot.set_titles('')

Plot.set_xlabels('\nGender')

Plot.set_ylabels('Steps\n')

 

Plot.tight_layout()

 

Plot.savefig('StepsBarGender.png', format='png', dpi=300)

 

Plot012

 

 

Two-way data

 

Simple seaborn call

 

sns.catplot(data=Data, x='Teacher',  y='Steps', hue='Gender', kind='bar',

                   errorbar=('se', 1.96), capsize=0.12, dodge=True)

 

 

Formatting and export as file

 

sns.set_theme(style='darkgrid')

 

Plot = sns.catplot(data=Data, x='Teacher', y='Steps', hue='Gender', kind='bar',

                   errorbar=('se', 1.96), capsize=0.12, dodge=True, height=4,

                   aspect=1)

 

Plot.set_titles('')

Plot.set_xlabels('\nGender')

Plot.set_ylabels('Steps\n')

 

Plot.tight_layout()

 

Plot.savefig('StepsBarGenderTeacher.png', format='png', dpi=300)

 

Plot013

 

 

Scatter plot

 

Simple scatter plot

 

### Manually jitter the Steps values so that they’re visible on the plot

 

Data['StepsJitter'] = np.array(Data['Steps'] +

                        np.random.normal(0, 200, Data['Steps'].shape))

 

 

Simple seaborn call

 

sns.scatterplot(data=Data, x='StepsJitter', y='Rating')

 

 

Formatting and export as file

 

sns.set_theme(style='darkgrid')

 

plt.figure(figsize=(5, 3.75))

 

sns.scatterplot(data=Data, x='StepsJitter', y='Rating')

 

plt.title('')

plt.xlabel("\nSteps (jittered)")

plt.ylabel('Rating (1 - 10)\n')

plt.tight_layout()

 

plt.savefig('StepsRatingScatter.png', format='png', dpi=300)

 

Plot014

 

 

Examples of basic plots for nominal data

 

 Bar plot for counts of a nominal variable

 

The following plots will show the count of observations in the levels of the category variables.  The plots can be modified to display counts, proportions, percents, or probability with stat='count', stat='proportion', stat='percent', or stat='probability'.

 

One-way data

 

Simple seaborn call

 

sns.countplot(data=Data, x='Gender')

 

 

Formatting and export as file

 

sns.set_theme(style='darkgrid')

 

plt.figure(figsize=(5, 3.75))

 

sns.countplot(data=Data, x='Gender')

 

plt.title('')

plt.xlabel("\nGender")

plt.ylabel('Count of observations\n')

plt.tight_layout()

 

plt.savefig('GenderCount.png', format='png', dpi=300)

 

Plot015

 

 

Two-way data

 

Simple seaborn call

 

sns.countplot(data=Data, x='Teacher', hue='Gender')

 

 

Formatting and export as file

 

sns.set_theme(style='darkgrid')

 

plt.figure(figsize=(5, 3.75))

 

sns.countplot(data=Data, x='Teacher', hue='Gender')

 

plt.title('')

plt.xlabel("\nGender")

plt.ylabel('Count of observations\n')

plt.tight_layout()

 

plt.savefig('GenderCountTwoWay.png', format='png', dpi=300)

 

Plot016

 

 

Confidence intervals for proportions

 

At the time of writing, I don’t know an easy way to ask seaborn to display confidence intervals for proportions of counts.

 

Mosaic plot

 

Simple statsmodels call

 

mosaic(Data, ['Teacher', 'Gender'])

 

 

Formatting and export as file

 

plt.figure(figsize=(5, 3.75))

 

mosaic(Data, ['Teacher', 'Gender'])

plt.title('')

plt.xlabel('\nTeacher')

plt.ylabel('Gender\n')

plt.tight_layout()

 

plt.savefig('Mosaic.png', format='png', dpi=300)

 

Plot017

 

 

References

 

Michael Waskom, M. 2024. seaborn: User guide and tutorial. seaborn.pydata.org/tutorial.html.