[banner]

A Python Companion to Extension Program Evaluation

Salvatore S. Mangiafico

Confidence Intervals

Confidence intervals in SAEPER

 

For a discussion of the following topics, see the corresponding chapter in Summary and Analysis of Extension Program Evaluation in R (rcompanion.org/handbook/C_03.html). 

 

•  Understanding confidence intervals

 

Importing packages in this chapter

 

The following commands will import required packages used in this chapter from libraries and assign them common aliases.  You may need install these libraries first.

 

import io

 

import os

 

import numpy as np

 

import scipy.stats as stats

 

import pandas as pd

 

import statsmodels.stats.api as sms

 

import researchpy as rp

 

import seaborn as sns

 

import matplotlib.pyplot as plt

 

 

Setting your working directory

 

You may wish to set your working directory for exported plots.

 

os.chdir("C:/Users/Sal Mangiafico/Desktop")

 

print(os.getcwd())

 

 

Example for confidence intervals

 

For this example, extension educators had students wear pedometers to count their number of steps over the course of a day.  The following data are the result.  Rating is the rating each student gave about the usefulness of the program, on a 1-to-10 scale.


Data = pd.read_table(sep="\\s+", filepath_or_buffer=io.StringIO("""

 

Student  Gender  Teacher  Steps  Rating

a        female  Catbus    8000   7

b        female  Catbus    9000  10

c        female  Catbus   10000   9

d        female  Catbus    7000   5

e        female  Catbus    6000   4

f        female  Catbus    8000   8

g        male    Catbus    7000   6

h        male    Catbus    5000   5

i        male    Catbus    9000  10

j        male    Catbus    7000   8

k        female  Satsuki   8000   7

l        female  Satsuki   9000   8

m        female  Satsuki   9000   8

n        female  Satsuki   8000   9

o        male    Satsuki   6000   5

p        male    Satsuki   8000   9

q        male    Satsuki   7000   6

r        female  Totoro   10000  10

s        female  Totoro    9000  10

t        female  Totoro    8000   8

u        female  Totoro    8000   7

v        female  Totoro    6000   7

w        male    Totoro    6000   8

x        male    Totoro    8000  10

y        male    Totoro    7000   7

z        male    Totoro    7000   7

"""))

 

 

print(Data.info())

 

 #   Column   Non-Null Count  Dtype

---  ------   --------------  -----

 0   Student  26 non-null     object

 1   Gender   26 non-null     object

 2   Teacher  26 non-null     object

 3   Steps    26 non-null     int64

 4   Rating   26 non-null     int64

dtypes: int64(2), object(3)

 

 

print(Data.describe())

 

              Steps     Rating

count     26.000000  26.000000

mean    7692.307692   7.615385

std     1289.006773   1.745323

min     5000.000000   4.000000

25%     7000.000000   7.000000

50%     8000.000000   8.000000

75%     8750.000000   9.000000

max    10000.000000  10.000000

 

 

print(Data.describe(include='object'))

 

       Student  Gender Teacher

count       26      26      26

unique      26       2       3

top          a  female  Catbus

freq         1      15      10

 

 

Confidence intervals for means

 

The traditional method for the confidence interval for the mean, using the t distribution, can be performed with scipy.stats, statsmodels.stats, or the researchpy package.    

 

Bootstrapped confidence intervals for the mean can be calculated with the bootstrap function in scipy.stats.

 

One-sample data – traditional method

 

Using scipy.stats

 

Steps = Data['Steps']

 

stats.t.interval(0.95, len(Steps)-1, loc=np.mean(Steps), scale=stats.sem(Steps))

 

(7171.6665892025285, 8212.948795412856)

 

 

Using statsmodels.stats

 

Steps = Data['Steps']

 

sms.DescrStatsW(Steps).tconfint_mean()

 

(7171.666589202528, 8212.948795412856)

 

 

Using researchpy.rp

 

Steps = Data['Steps']

 

rp.summary_cont(Data['Steps'], conf = 0.95, decimals = 0)

 

  Variable     N    Mean      SD     SE  95% Conf.  Interval

0    Steps  26.0  7692.0  1289.0  253.0     7172.0    8213.0

 

 

One-sample data – bootstrap method

 

Using scipy.stats

 

The variable Steps will be extracted from the data frame and converted to a tuple so that it can be passed to the bootstrap function.  Here, the BCa method is used. Other methods are percentile and basic.

 

Steps = (Data['Steps'],)

 

print(Steps)

 

(0      8000

 1      9000

 2     10000

 3      7000

 4      6000

 < snip >

 

 

Boot = stats.bootstrap(Steps, np.mean, confidence_level=0.95, method='BCa')

 

print(Boot.confidence_interval)

 

ConfidenceInterval(low=7192.307692307692, high=8153.846153846154)

 

round(Boot.confidence_interval[0], 0)

 

7192.0

 

round(Boot.confidence_interval[1], 0)

 

8192.0

 

### Note that results by bootstrap may vary

 

 

plt.figure(figsize=(5, 3.75))

 

sns.histplot(x=Boot.bootstrap_distribution)

plt.title('')

plt.xlabel('\nBootstrapped means')

plt.ylabel('')

plt.grid(False)

plt.tight_layout()

 

plt.savefig('StepsRatingPlot.png', format='png', dpi=300)

 

plt.show()

 

Histogram of bootstrapped means

 

 

Using pingouin

 

Steps = np.array(Data['Steps'])

 

stat = np.mean(Steps)

 

ci = pg.compute_bootci(Steps, func='mean', decimals=0)

 

print(round(stat, 0), ci)

 

7692.0 [7192. 8115.]

 

 

### Note that results by bootstrap may vary

 

 

One-way data – traditional method

 

Using researchpy.rp

 

rp.summary_cont(Data.groupby(Data['Gender'])['Steps'], conf = 0.95, decimals = 0)

 

         N    Mean      SD     SE  95% Conf.  Interval

Gender                                               

female  15  8200.0  1207.0  312.0     7532.0    8868.0

male    11  7000.0  1095.0  330.0     6264.0    7736.0

 

 

Two-way data – traditional method

 

Using researchpy.rp

 

rp.summary_cont(Data.groupby(['Teacher', 'Gender'])['Steps'], conf = 0.95, decimals = 0)

 

                N    Mean      SD     SE  95% Conf.  Interval

Teacher Gender                                              

Catbus  female  6  8000.0  1414.0  577.0     6516.0    9484.0

        male    4  7000.0  1633.0  816.0     4402.0    9598.0

Satsuki female  4  8500.0   577.0  289.0     7581.0    9419.0

        male    3  7000.0  1000.0  577.0     4516.0    9484.0

Totoro  female  5  8200.0  1483.0  663.0     6358.0   10042.0

        male    4  7000.0   816.0  408.0     5701.0    8299.0

 

 

One-way data – bootstrap method

 

At the time of writing, I don’t know how to pass groupby results to the bootstrap function.  However, the groups can be extracted manually and passed to the bootstrap function.