Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico



Confidence Intervals



Packages used in this chapter


The packages used in this chapter include:

•  Rmisc

•  DescTools

•  plyr

•  boot

•  rcompanion


The following commands will install these packages if they are not already installed:


Understanding confidence intervals


Confidence intervals are used to indicate how accurate a calculated statistic is likely to be.  Confidence intervals can be calculated for a variety of statistics, such as the mean, median, or slope of a linear regression.  This chapter will focus on confidences intervals for means.  This book contains a separate chapter, Confidence Intervals for Medians, which addresses confidence intervals for medians.  There is also a chapter Confidence Intervals for Proportions in this book.


The Statistics Learning Center video in the Required Readings below gives a good explanation of the meaning of confidence intervals.


Populations and samples

Most of the statistics we use assume we are analyzing a sample which we are using to represent a larger population.  If extension educators want to know about the caloric intake of 7th graders, they would be hard-pressed to get the resources to have every 7th grader in the U.S. keep a food diary.  Instead they might collect data from one or two classrooms, and then treat the data sample as if it represents a larger population of students.


The mean caloric intake could be calculated for this sample, but this mean will not be exactly the same as the mean for the larger population.  If we collect a large sample and the values aren’t too variable, then the sample mean should be close to the population mean.  But if we have few observations, or the values are highly variable, we are less confident our sample mean is close to the population mean.


We will use confidence intervals to give a sense of this confidence.


It’s best not to overthink the discussion on populations and samples.  We aren’t necessarily actually extending our statistics to a larger population.  That is, we shouldn’t think our measurements from two classrooms are actually indicative of the whole country.  There are likely many factors that would change the result school to school and region to region.  But even if we are thinking about just the 7th graders in just these two classrooms, most of our statistics will still be based on the assumption that there is a larger population of 7th graders, and we are sampling just a subset.


Statistics and parameters

When we calculate the sample mean, the result is a statistic.  It’s an estimate of the population mean, but our calculated sample mean would vary depending on our sample.  In theory, there is a mean for the population of interest, and we consider this population mean a parameter.  Our goal in calculating the sample mean is estimating the population parameter.


Point estimates and confidence intervals

Our sample mean is a point estimate for the population parameter.  A point estimate is a useful approximation for the parameter, but considering the confidence interval for the estimate gives us more information.


As a definition of confidence intervals, if we were to sample the same population many times and calculated a sample mean and a 95% confidence interval each time, then 95% of those intervals would contain the actual population mean.


If this definition of confidence intervals doesn’t make much intuitive sense to you at this point, don’t worry about it.  Working through some of the examples in this book will help you understand their usefulness.   


One use of confidence intervals is to give a sense of how accurate our calculated statistic is relative to the population parameter.


An example

Imagine we have a rule of thumb that we consider a town with a mean household income of greater than $100,000 to be high-income. 


For Town A we sample some households, and calculate the mean household income and the 95% confidence interval for this statistic.  The mean is $125,000, but the data are quiet variable, and the 95% confidence interval is from $75,000 to $175,000.  In this case, we don’t have much confidence that Town A is actually a high-income town.  The point estimate for the population mean is greater than $100,000, but the confidence interval extends considerably lower than this threshold.


For Town B, we also get a mean of $125,000, so the point estimate is the same as for Town A.  But the 95% confidence interval is from $105,000 to $145,000.  Here, we have some confidence that Town B is actually a high-income town, because the whole 95% confidence interval lies higher than the $100,000 threshold.


Confidence intervals as an alternative to some tests

Most of the statistical tests in this book will calculate a probability (p-value) of the likelihood of data and draw a conclusion from this p-value.  John McDonald, in the Optional Readings below, describes how confidence intervals can be used as an alternative approach.


For example, if we want to compare the means of two groups to see if they are statistically different, we will use a t-test, or similar test, calculate a p-value, and draw a conclusion.  An alternative approach would be to construct 95% or 99% confidence intervals about the mean for each group.  If the confidence intervals of the two means don’t overlap, we are justified in calling them statistically different.


Likewise, if the 95% confidence interval for some statistic includes zero, we can conclude that the statistic is not significantly different from zero.

As a technical note, non-overlapping confidence intervals for means do not equate exactly to a t-test with a p-value of 0.05.  They are different methods to assess similar questions.  The article by Cumming and Finch in the “References” section gives more details on the relationship between overlapping confidence intervals and p-values from statistical tests.


Example for confidence intervals


For this example, extension educators had students wear pedometers to count their number of steps over the course of a day.  The following data are the result.  Rating is the rating each student gave about the usefulness of the program, on a 1-to-10 scale.

Input = ("
Student  Sex     Teacher  Steps  Rating
a        female  Catbus    8000   7
b        female  Catbus    9000  10
c        female  Catbus   10000   9
d        female  Catbus    7000   5
e        female  Catbus    6000   4
f        female  Catbus    8000   8
g        male    Catbus    7000   6
h        male    Catbus    5000   5
i        male    Catbus    9000  10
j        male    Catbus    7000   8
k        female  Satsuki   8000   7
l        female  Satsuki   9000   8
m        female  Satsuki   9000   8
n        female  Satsuki   8000   9
o        male    Satsuki   6000   5
p        male    Satsuki   8000   9
q        male    Satsuki   7000   6
r        female  Totoro   10000  10
s        female  Totoro    9000  10
t        female  Totoro    8000   8
u        female  Totoro    8000   7
v        female  Totoro    6000   7
w        male    Totoro    6000   8
x        male    Totoro    8000  10
y        male    Totoro    7000   7
z        male    Totoro    7000   7

Data = read.table(textConnection(Input),header=TRUE)

###  Check the data frame




### Remove unnecessary objects


Recommended procedures for confidence intervals for means

Confidence intervals for means can be calculated by various methods. 


The traditional method is the most commonly encountered, and is appropriate for normally distributed data or with large sample sizes.  It produces an interval that is symmetric about the mean.


For skewed data, confidence intervals by bootstrapping may be more reliable.


For routine use, I recommend using bootstrapped confidence intervals, particularly the BCa or percentile methods.  For further discussion, see below Optional Analyses: confidence intervals for the mean by bootstrapping.


groupwiseMean function for grouped and ungrouped data

The groupwiseMean function in the rcompanion package can produce confidence intervals both by traditional and bootstrap methods, for grouped and ungrouped data. 


The data must be housed in a data frame.  By default, the function reports confidence intervals by the traditional method.


In the groupwiseMean function, the measurement and grouping variables can be indicated with formula notation, with the measurement variable on the left side of the tilde (~), and grouping variables on the right.


The confidence level is indicated by, e.g., the conf = 0.95 argument.  The digits option indicates the number of significant digits to which the output is rounded.  Note that in the output, the means and other statistics are rounded to 3 significant figures.


Ungrouped data

Ungrouped data is indicated with a 1 on the right side of the formula, or the group = NULL argument.


groupwiseMean(Steps ~ 1,
              data   = Data,
              conf   = 0.95,
              digits = 3)

   .id  n Mean Conf.level Trad.lower Trad.upper

1 <NA> 26 7690       0.95       7170       8210

   ### Trad.lower and Trad.upper indicate the confidence interval
    ###   for the mean by traditional method.

One-way data


groupwiseMean(Steps ~ Sex,
              data   = Data,
              conf   = 0.95,
              digits = 3)

     Sex  n Mean Conf.level Trad.lower Trad.upper
1 female 15 8200       0.95       7530       8870
2   male 11 7000       0.95       6260       7740

   ### Trad.lower and Trad.upper indicate the confidence interval
    ###   for the mean by traditional method.

Two-way data



groupwiseMean(Steps ~ Teacher + Sex,
              data = Data,
              conf = 0.95,
              digits = 3)

  Teacher    Sex n Mean Conf.level Trad.lower Trad.upper
1  Catbus female 6 8000       0.95       6520       9480
2  Catbus   male 4 7000       0.95       4400       9600
3 Satsuki female 4 8500       0.95       7580       9420
4 Satsuki   male 3 7000       0.95       4520       9480
5  Totoro female 5 8200       0.95       6360      10000
6  Totoro   male 4 7000       0.95       5700       8300

   ### Trad.lower and Trad.upper indicate the confidence interval
   ###   for the mean by traditional method.

Bootstrapped means by group

In the groupwiseMean function, the type of confidence interval is requested by setting certain options to TRUE.  These options are traditional, normal, basic, percentile and bca.  The boot option reports an optional statistic, the mean by bootstrap.  The R option indicates the number of iterations to calculate each bootstrap statistic.


groupwiseMean(Steps ~ Sex,
              data   = Data,
              conf   = 0.95,
              digits = 3,
              R      = 10000,
              boot        = TRUE,
              traditional = FALSE,
              normal      = FALSE,
              basic       = FALSE,
              percentile  = FALSE,
              bca         = TRUE)

     Sex  n Mean Boot.mean Conf.level Bca.lower Bca.upper
1 female 15 8200      8200       0.95      7470      8670
2   male 11 7000      7000       0.95      6270      7550


groupwiseMean(Steps ~ Teacher + Sex,
              data   = Data,
              conf   = 0.95,
              digits = 3,
              R      = 10000,
              boot        = TRUE,
              traditional = FALSE,
              normal      = FALSE,
              basic       = FALSE,
              percentile  = FALSE,
              bca         = TRUE)

  Teacher    Sex n Mean Boot.mean Conf.level Bca.lower Bca.upper
1  Catbus female 6 8000      8000       0.95      6830      8830
2  Catbus   male 4 7000      6990       0.95      5500      8000
3 Satsuki female 4 8500      8500       0.95      8000      8750
4 Satsuki   male 3 7000      7000       0.95      6000      7670
5  Totoro female 5 8200      8190       0.95      6800      9000
6  Totoro   male 4 7000      7000       0.95      6250      7500

Optional: Other functions for traditional confidence intervals for means

Functions that produce confidence intervals for means by the traditional method include t.test, CI in Rmisc, and MeanCI in DescTools.


One Sample t-test

95 percent confidence interval:
 7171.667 8212.949

mean of x



    mean   lwr.ci   upr.ci
7692.308 7171.667 8212.949



   upper     mean    lower
8212.949 7692.308 7171.667


group.CI(Steps ~ Sex,
         ci = 0.95)

      Sex Steps.upper Steps.mean Steps.lower
1 female    8868.482       8200    7531.518
2   male    7735.930       7000    6264.070

group.CI(Steps ~ Teacher + Sex,
         ci = 0.95)

  Teacher    Sex Steps.upper Steps.mean Steps.lower
1  Catbus female    9484.126       8000    6515.874
2 Satsuki female    9418.693       8500    7581.307
3  Totoro female   10041.685       8200    6358.315
4  Catbus   male    9598.457       7000    4401.543
5 Satsuki   male    9484.138       7000    4515.862
6  Totoro   male    8299.228       7000    5700.772

Optional Analyses: confidence intervals for the mean by bootstrapping

Bootstrapping is a method that samples the data many times, each time calculating a statistic, and then determining a confidence interval or other statistic from these iterations.


Bootstrapped confidence intervals can be more reliable than those determined by the traditional method for certain data sets.  For more details on the different types of bootstrapped confidence intervals, see the Carpenter and Bithel article in the “References” section of this chapter.  The BCa method (bias corrected, accelerated) is often cited as the best method, and the percentile method is also cited as typically good.


The boot package can calculate confidence intervals for means by bootstrap.  In the boot function, R indicates the number of re-samplings.


The function groupwiseMean in the rcompanion package allows for calculating confidence intervals for means for grouped data, using the bootstrap procedures from the boot package.


Note that for bootstrap procedures, your results may vary slightly from the results reported here.


Mboot = boot(Data$Steps,
             function(x,i) mean(x[i]),



   ### The mean based on the bootstrap method.

        conf = 0.95,
        type = c("norm", "basic" ,"perc", "bca")

Intervals :
Level      Normal              Basic        
95%   (7208, 8174 )   (7192, 8154 ) 

Level     Percentile            BCa         
95%   (7231, 8192 )   (7154, 8115 ) 
Calculations and Intervals on Original Scale

### Other information


     col = "darkgray")

Required readings


[Video]  “Understanding Confidence Intervals: Statistics Help” from Statistics Learning Center. (Dr. Nic). 2013. www.youtube.com/watch?v=tFWsuO9f74o.


Optional readings


“Confidence limits” in McDonald, J.H. 2014. Handbook of Biological Statistics. www.biostathandbook.com/confidence.html.


[Video]  “Calculating the Confidence interval for a mean using a formula” from Statistics Learning Center. (Dr. Nic). 2013. www.youtube.com/watch?v=s4SRdaTycaw.


“Confidence Intervals”, Chapter 8 in Openstax. 2013. Introductory Statistics. openstaxcollege.org/textbooks/introductory-statistics.


“Confidence intervals” , Chapter 4.2 in Diez, D.M., C.D. Barr , and M. Çetinkaya-Rundel. 2012. OpenIntro Statistics, 2nd ed. www.openintro.org/.




Carpenter, J. and J. Bithel. 2000. “Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians”.  Statistics in Medicine 19:1141–1164. www.tau.ac.il/~saharon/Boot/


Cumming, G. and Finch, S.  2005. Inference by Eye:Confidence Intervals and How to Read Pictures of Data. American Psychologist 60:170-180.


Optional analyses


Confidence intervals for geometric mean

The geometric mean was discussed in the previous chapter.  Here, the CI function in Rmisc is used to construct confidence intervals for the geometric mean.


Bacteria = c(20, 40, 50, 60, 100, 120, 150, 200, 1000)


exp(CI(log(Bacteria), ci=0.95))

    upper      mean     lower
233.85448  98.38887  41.39484

Confidence intervals for geometric means for groups

The function groupwiseGeometric in the rcompanion package produces the geometric mean and limits for the geometric mean plus and minus the standard deviation, standard error, and confidence interval.

Input = ("
Site  Bacteria
 A        20
 A        40
 A        50
 A        60
 A       100
 A       120
 A       150
 A       200
 A      1000

 B       100
 B       120
 B       210
 B       300
 B       420
 B       400
 B       500
 B       800
 B      4000
 C        10
 C        30
 C        40
 C        60
 C       110
 C       100
 C       160
 C       210
 C      1200

Data = read.table(textConnection(Input),header=TRUE)


groupwiseGeometric(Bacteria ~ Site,
                   data   = Data,
                   digits = 3,
                   na.rm  = TRUE)

  Site n Geo.mean sd.lower sd.upper se.lower se.upper ci.lower ci.upper
1    A 9     98.4     31.9      303     67.6      143     41.4      234
2    B 9    389.0    129.0     1170    269.0      561    167.0      906
3    C 9     88.1     22.7      341     56.1      138     31.1      249

Exercises D


1. Considering the Catbus, Satsuki, and Totoro data,

What was the mean of Steps?

What is the 95% confidence interval for Steps (traditional method)?


2. Considering the Catbus, Satsuki, and Totoro data,

What was the mean of Steps for females?

What is the 95% confidence interval for Steps for females (traditional method)?


3. Looking at the 95% confidence intervals for Steps for males and females, are we justified in claiming that the mean Steps was statistically different for females and males?  Why?

4.  As part of a nutrition education program, extension educators had students keep diaries of what they ate for a day and then calculated the calories students consumed. 

Student  Teacher  Sex     Calories  Rating
a        Tetsuo   male    2300      3
b        Tetsuo   female  1800      3
c        Tetsuo   male    1900      4
d        Tetsuo   female  1700      5
e        Tetsuo   male    2200      4
f        Tetsuo   female  1600      3
g        Tetsuo   male    1800      3
h        Tetsuo   female  2000      3
i        Kaneda   male    2100      4
j        Kaneda   female  1900      5
k        Kaneda   male    1900      4
l        Kaneda   female  1600      4
m        Kaneda   male    2000      4
n        Kaneda   female  2000      5
o        Kaneda   male    2100      3
p        Kaneda   female  1800      4


For each of the following, answer the question, and show the output from the analyses you used to answer the question.


a.  What is the mean of Calories?

b. What is the 95% confidence interval for Calories (traditional method)?


c.  What is the mean of Calories for females?

d. What is the 95% confidence interval for Calories for females (traditional method)?


e. Looking at the 95% confidence intervals for Calories for males and females, are we justified in claiming that the mean Calories was statistically different for females and males? Why?