Basic plots in SAEPER
For a discussion of the following topics, see the corresponding chapter in Summary and Analysis of Extension Program Evaluation in R (rcompanion.org/handbook/C_04.html).
• The need to understand plots
• Some advice on producing plots
• Describing histogram shapes
• Misleading and disorienting plots
Importing packages in this chapter
The following commands will import required packages used in this chapter from libraries and assign them common aliases. You may need install these libraries first.
import io
import os
import numpy as np
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.graphics.mosaicplot import mosaic
Setting your working directory
You may wish to set your working directory for exported plots.
os.chdir("C:/Users/Sal Mangiafico/Desktop")
print(os.getcwd())
Themes and formatting in seaborn and pyplot
When using the seaborn library the basic theme for the plot can be set with seaborn.set_theme(). The theme styles are “white”, “dark”, “whitegrid”, “darkgrid”, and “ticks”.
Further formatting, such as defining and formatting the y-axis labels, can achieved with options in the pyplot package in matplotlib. Some formatting is shown is the examples below.
Examples of basic plots for interval/ratio and ordinal data
Data = pd.read_table(sep="\\s+",
filepath_or_buffer=io.StringIO("""
Student Gender Teacher Steps Rating
a female Catbus 8000 7
b female Catbus 9000 10
c female Catbus 10000 9
d female Catbus 7000 5
e female Catbus 6000 4
f female Catbus 8000 8
g male Catbus 7000 6
h male Catbus 5000 5
i male Catbus 9000 10
j male Catbus 7000 8
k female Satsuki 8000 7
l female Satsuki 9000 8
m female Satsuki 9000 8
n female Satsuki 8000 9
o male Satsuki 6000 5
p male Satsuki 8000 9
q male Satsuki 7000 6
r female Totoro 10000 10
s female Totoro 9000 10
t female Totoro 8000 8
u female Totoro 8000 7
v female Totoro 6000 7
w male Totoro 6000 8
x male Totoro 8000 10
y male Totoro 7000 7
z male Totoro 7000 7
"""))
### Convert Gender and Teacher to category type
Data['Gender'] = Data['Gender'].astype('category')
Data['Teacher'] = Data['Teacher'].astype('category')
### Order Teacher by its order in the orginal data
TeacherLevels = ['Catbus', 'Satsuki', 'Totoro']
Data['Teacher'] = Data['Teacher'].cat.reorder_categories(TeacherLevels)
### Change the level of Gender to be capitalized
GenderNames = {'female': 'Female', 'male': 'Male'}
Data['Gender'] = Data['Gender'].cat.rename_categories(GenderNames)
### Display some summary statistics for the data frame
print(Data.info())
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Student 26 non-null object
1 Gender 26 non-null category
2 Teacher 26 non-null category
3 Steps 26 non-null int64
4 Rating 26 non-null int64
print(Data['Gender'].cat.categories)
Index(['Female', 'Male'], dtype='object')
print(Data['Teacher'].cat.categories)
Index(['Catbus', 'Satsuki', 'Totoro'], dtype='object')
Histograms
Simple histogram
Simple seaborn call
sns.histplot(data=Data, x='Steps')
Formatting and export as file
sns.set_theme(style='white')
plt.figure(figsize=(5, 3.75))
sns.histplot(data=Data, x='Steps')
plt.title('')
plt.xlabel('\nSteps')
plt.ylabel('')
plt.tight_layout()
plt.savefig('StepsHist.png', format='png', dpi=300)
plt.show()
Histogram with density line
Simple seaborn call
sns.histplot(data=Data, x='Steps', kde=True)
Formatting and export as file
sns.set_theme(style='white')
plt.figure(figsize=(5, 3.75))
sns.histplot(data=Data, x='Steps', kde=True)
plt.title('')
plt.xlabel('\nSteps')
plt.ylabel('')
plt.tight_layout()
plt.savefig('StepsHistDensity.png', format='png', dpi=300)
plt.show()
Histogram with normal curve
Simple seaborn call
Plot = sns.histplot(x=Data['Steps'], stat='density')
Sum = Data['Steps'].describe()
Curve = np.linspace(Sum['min'], Sum['max'])
Plot.plot(Curve, stats.norm.pdf(Curve, Sum['mean'], Sum['std']))
Formatting and export as file
sns.set_theme(style='white')
plt.figure(figsize=(5, 3.75))
sns.histplot(x=Data['Steps'], stat='density')
Sum = Data['Steps'].describe()
Curve = np.linspace(Sum['min'], Sum['max'])
plt.plot(Curve, stats.norm.pdf(Curve, Sum['mean'], Sum['std']))
plt.title('')
plt.xlabel('\nSteps')
plt.ylabel('')
plt.tight_layout()
plt.savefig('StepsHistNormal.png', format='png', dpi=300)
plt.show()
Histograms for one-way data
Simple seaborn call
Plot = sns.FacetGrid(data=Data, row='Gender',
margin_titles=True, height=2, aspect= 2)
Plot.map(sns.histplot, 'Steps')
Formatting and export as file
sns.set_theme(style='white')
Plot = sns.FacetGrid(data=Data, row='Gender',
margin_titles=True, height=2, aspect= 2)
Plot.map(sns.histplot, 'Steps')
Plot.tight_layout()
Plot.savefig('StepsHistFacetOne.png', format='png', dpi=300)
Histograms for two-way data
Simple seaborn call
Plot = sns.FacetGrid(data=Data, row='Gender', col='Teacher',
margin_titles=True, height=2, aspect= 2)
Plot.map(sns.histplot, 'Steps')
Formatting and export as file
sns.set_theme(style='white')
Plot = sns.FacetGrid(data=Data, row='Gender', col='Teacher',
margin_titles=True, height=2, aspect= 1.5)
Plot.map(sns.histplot, 'Steps')
Plot.tight_layout()
Plot.savefig('StepsHistFacetTwo.png', format='png', dpi=300)
Box plots
Box plots can also be made with the seaborn.catplot(), as used in the plots below for means with the option kind='box'. Some options would need to be changed.
Simple box plot
Simple seaborn call
sns.boxplot(data=Data, y='Steps', orient='v' , width=0.2)
Formatting and export as file
sns.set_theme(style='darkgrid')
plt.figure(figsize=(5, 3.75))
sns.boxplot(y=Data['Steps'], width=0.2)
plt.title('')
plt.xlabel('')
plt.ylabel('Steps\n')
plt.tight_layout()
plt.savefig('Boxplot.png', format='png', dpi=300)
plt.show()
Box plot with mean
Simple seaborn call
sns.boxplot(y=Data['Steps'], width=0.2, showmeans=True)
Formatting and export as file
sns.set_theme(style='darkgrid')
sns.boxplot(y=Data['Steps'], width=0.2, showmeans=True,
meanprops={"marker": "+",
"markeredgecolor": "black",
"markersize": "10"})
plt.title('')
plt.xlabel('')
plt.ylabel('Steps\n')
plt.tight_layout()
plt.savefig('BoxplotMean.png', format='png', dpi=300)
plt.show()
Box plot for one-way data
Simple seaborn call
sns.boxplot(data=Data, x='Gender', y='Steps', width=0.3)
Formatting and export as file
sns.set_theme(style='darkgrid')
plt.figure(figsize=(5, 4))
sns.boxplot(data=Data, x='Gender', y='Steps', width=0.5, gap=.2)
plt.title('')
plt.xlabel('\nGender')
plt.ylabel('Steps\n')
plt.tight_layout()
plt.savefig('BoxplotOneWay.png', format='png', dpi=300)
plt.show()
Box plot for two-way data
Simple seaborn call
sns.boxplot(data=Data, x='Gender', y='Steps', hue='Teacher', width=0.7, gap=.2)
Formatting and export as file
sns.set_theme(style='darkgrid')
plt.figure(figsize=(5, 4))
sns.boxplot(data=Data, x='Gender', y='Steps', hue='Teacher', width=0.7, gap=.2)
plt.title('')
plt.xlabel('\nGender')
plt.ylabel('Steps\n')
plt.tight_layout()
plt.savefig('BoxplotTwoWay.png', format='png', dpi=300)
plt.show()
Plot of means and interaction plots
A variety of types of plots can be made with seaborn.catplot(). These include:
• strip plot, kind = 'strip', (the default)
• swarm plot, kind = 'swarm'
• box plot, kind = 'box'
• violin plot, kind = 'violin'
• boxen plot, or letter-value plot, kind = 'boxen'
• bar plot, kind = 'bar'
• count plot, kind = 'count'
Error bars are displayed by default in seaborn.catplot(). The default is to use 95% confidence intervals by bootstrap.
• standard deviation, errorbar = ('sd')
• two times the standard deviation, errorbar = ('sd', 2)
• standard error, errorbar = ('se')
• traditional 95% confidence interval of the mean , errorbar = ('se', 1.96)
• confidence interval by bootstrap, errorbar = ('ci')
• confidence interval by bootstrap, errorbar = ('ci', n_boot = 5000)
• confidence interval of the median by bootstrap, errorbar = ('ci'), estimator = 'median'
Means for one-way data
Simple seaborn call
sns.catplot(data=Data, x='Gender', y='Steps', kind="point",
errorbar=('se', 1.96), capsize=0.12, linestyles='none')
Formatting and export as file
sns.set_theme(style='darkgrid')
Plot = sns.catplot(data=Data, x='Gender', y='Steps', kind="point",
errorbar=('se', 1.96), capsize=0.12, linestyles='none',
height=4, aspect=1.33)
Plot.set_titles('')
Plot.set_xlabels('\nGender')
Plot.set_ylabels('Steps\n')
Plot.tight_layout()
Plot.savefig('StepsMeanGender.png', format='png', dpi=300)
Means for two-way data
Simple seaborn call
sns.catplot(data=Data, x='Gender', y='Steps', hue='Teacher', kind="point",
errorbar=('se', 1.96), capsize=0.12, linestyles='none',
dodge=0.3)
Formatting and export as file
sns.set_theme(style='darkgrid')
Plot = sns.catplot(data=Data, x='Gender', y='Steps', hue='Teacher', kind="point",
errorbar=('se', 1.96), capsize=0.12, linestyles='none',
dodge=0.3, height=4, aspect=1.33)
Plot.set_titles('')
Plot.set_xlabels('\nGender')
Plot.set_ylabels('Steps\n')
Plot.tight_layout()
Plot.savefig('StepsMeanGenderTeacher.png', format='png', dpi=300)
Bar plot of means
These plots will also be made with seaborn.catplot(). See notes above on the options for plot types and error bars.
One-way data
Simple seaborn call
sns.catplot(data=Data, x='Gender', y='Steps', kind='bar',
errorbar=('se', 1.96), capsize=0.12, dodge=True)
Formatting and export as file
sns.set_theme(style='darkgrid')
Plot = sns.catplot(data=Data, x='Gender', y='Steps', kind='bar',
errorbar=('se', 1.96), capsize=0.12, dodge=True, height=4,
aspect=1)
Plot.set_titles('')
Plot.set_xlabels('\nGender')
Plot.set_ylabels('Steps\n')
Plot.tight_layout()
Plot.savefig('StepsBarGender.png', format='png', dpi=300)
Two-way data
Simple seaborn call
sns.catplot(data=Data, x='Teacher', y='Steps', hue='Gender', kind='bar',
errorbar=('se', 1.96), capsize=0.12, dodge=True)
Formatting and export as file
sns.set_theme(style='darkgrid')
Plot = sns.catplot(data=Data, x='Teacher', y='Steps', hue='Gender', kind='bar',
errorbar=('se', 1.96), capsize=0.12, dodge=True, height=4,
aspect=1)
Plot.set_titles('')
Plot.set_xlabels('\nGender')
Plot.set_ylabels('Steps\n')
Plot.tight_layout()
Plot.savefig('StepsBarGenderTeacher.png', format='png', dpi=300)
Scatter plot
Simple scatter plot
### Manually jitter the Steps values so that they’re visible on the plot
Data['StepsJitter'] = np.array(Data['Steps'] +
np.random.normal(0, 200, Data['Steps'].shape))
Simple seaborn call
sns.scatterplot(data=Data, x='StepsJitter', y='Rating')
Formatting and export as file
sns.set_theme(style='darkgrid')
plt.figure(figsize=(5, 3.75))
sns.scatterplot(data=Data, x='StepsJitter', y='Rating')
plt.title('')
plt.xlabel("\nSteps (jittered)")
plt.ylabel('Rating (1 - 10)\n')
plt.tight_layout()
plt.savefig('StepsRatingScatter.png', format='png', dpi=300)
Examples of basic plots for nominal data
Bar plot for counts of a nominal variable
The following plots will show the count of observations in the levels of the category variables. The plots can be modified to display counts, proportions, percents, or probability with stat='count', stat='proportion', stat='percent', or stat='probability'.
One-way data
Simple seaborn call
sns.countplot(data=Data, x='Gender')
Formatting and export as file
sns.set_theme(style='darkgrid')
plt.figure(figsize=(5, 3.75))
sns.countplot(data=Data, x='Gender')
plt.title('')
plt.xlabel("\nGender")
plt.ylabel('Count of observations\n')
plt.tight_layout()
plt.savefig('GenderCount.png', format='png', dpi=300)
Two-way data
Simple seaborn call
sns.countplot(data=Data, x='Teacher', hue='Gender')
Formatting and export as file
sns.set_theme(style='darkgrid')
plt.figure(figsize=(5, 3.75))
sns.countplot(data=Data, x='Teacher', hue='Gender')
plt.title('')
plt.xlabel("\nGender")
plt.ylabel('Count of observations\n')
plt.tight_layout()
plt.savefig('GenderCountTwoWay.png', format='png', dpi=300)
Confidence intervals for proportions
At the time of writing, I don’t know an easy way to ask seaborn to display confidence intervals for proportions of counts.
Mosaic plot
Simple statsmodels call
mosaic(Data, ['Teacher', 'Gender'])
Formatting and export as file
plt.figure(figsize=(5, 3.75))
mosaic(Data, ['Teacher', 'Gender'])
plt.title('')
plt.xlabel('\nTeacher')
plt.ylabel('Gender\n')
plt.tight_layout()
plt.savefig('Mosaic.png', format='png', dpi=300)
References
Michael Waskom, M. 2024. seaborn: User guide and tutorial. seaborn.pydata.org/tutorial.html.