[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Confidence Intervals for Medians

Confidence intervals are discussed in the Confidence Intervals chapter.

 

The methods for determining confidence intervals for medians are distinct from the traditional method for means.  There are different methods for calculating confidence intervals for the median, and there are few different methods are presented in this chapter.

 

When data distributions are normal or uniform in distribution, and the number of observations is large (say, n ≥ 100), any of the methods should work reasonably well.  When distributions are heavily skewed or the number of observations is relatively small (say, n < 20), results from some methods can differ notably from others.

 

For routine use, I recommend the groupwiseMedian function in the rcompanion package, with either the BCa or the percentile method. These are bootstrap methods.

 

The BCa (bias corrected, accelerated) is often cited as the best for theoretical reasons.  The percentile method is also cited as typically good.  However, if you get the “extreme order statistics used as endpoints” warning message, use a different test.  For small data sets, the interval from BCa may be wider than for some other methods.

 

For a description of the bootstrap confidence interval methods, see Carpenter and Bithell (2000) in the “References” section below.

 

Technical note:  Bootstrapped confidence intervals may not be reliable for discreet data, such as the ordinal Likert data used in these examples, especially for small samples.  My understanding is that the determination of the confidence intervals assumes a continuous and bell-shaped distribution of the statistic (the median, in this case) from the bootstrapped samples.  This concern is not considered in the examples in this book.

 

Other methods used in the groupwiseMedian function include the basic, normal, exact, and wilcox methods.

 

Other functions that calculate a confidence interval for a median are the wilcox.test function, and the MedianCI function in the DescTools package.


Packages used in this chapter

 

The packages used in this chapter include:

•  psych

•  FSA

•  boot

•  DescTools

•  plyr

•  rcompanion

 

The following commands will install these packages if they are not already installed:


if(!require(psych)){install.packages("psych")}
if(!require(FSA)){install.packages("FSA")}
if(!require(boot)){install.packages("boot")}
if(!require(DescTools)){install.packages("DescTools")}
if(!require(plyr)){install.packages("plyr")}
if(!require(rcompanion)){install.packages("rcompanion")}


Medians, quantiles, and confidence intervals for one-sample data

 

For one-sample data, the median and quantiles can be calculated with the median function, the summary function, and the Summarize function from the FSA package.

 

The function groupwiseMedian in the rcompanion package produces medians and confidence intervals for medians.  It can also calculate these statistics for grouped data (one-way or multi-way).

 

This example will use some theoretical data for Lisa Simpson, rated on a 10-point Likert item.


Input =("
 Speaker          Rater  Likert
 'Lisa Simpson'   1         8
 'Lisa Simpson'   2        10
 'Lisa Simpson'   3         9
 'Lisa Simpson'   4         10
 'Lisa Simpson'   5         6
 'Lisa Simpson'   6         5
 'Lisa Simpson'   7         3
 'Lisa Simpson'   8         7
 'Lisa Simpson'   9         8
 'Lisa Simpson'  10         5
 'Lisa Simpson'  11        10
 'Lisa Simpson'  12         4
 'Lisa Simpson'  13         8
 'Lisa Simpson'  14         6
 'Lisa Simpson'  15         9
 'Lisa Simpson'  16         8
 'Lisa Simpson'  17         7
 'Lisa Simpson'  18         5
 'Lisa Simpson'  19         8
 'Lisa Simpson'  20         7
 'Lisa Simpson'  21         9
 'Lisa Simpson'  22         8
 'Lisa Simpson'  23         7
 'Lisa Simpson'  24         5
 'Lisa Simpson'  25        10
 'Lisa Simpson'  26         7
")

Data = read.table(textConnection(Input),header=TRUE)

###  Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)


### Remove unnecessary objects

rm(Input)


Produce median with median, summary, and Summarize functions


median(Data$Likert)


[1] 7.5


summary(Data)


         Speaker       Rater           Likert     
 Lisa Simpson:26   Min.   : 1.00   Min.   : 3.000 
                   1st Qu.: 7.25   1st Qu.: 6.000 
                   Median :13.50   Median : 7.500 
                   Mean   :13.50   Mean   : 7.269 
                   3rd Qu.:19.75   3rd Qu.: 8.750 
                   Max.   :26.00   Max.   :10.000


The Summarize function in the FSA package

Note that the 1 in the formula on the right side of the tilde (~) indicates that the function should treat the data as one-sample data


library(FSA)

Summarize(Likert ~ 1,
          data=Data,
          digits=3)


 n nvalid  mean    sd min Q1 median   Q3 max percZero
26     26 7.269 1.951   3  6    7.5 8.75  10        0


groupwiseMedian function to produce medians and confidence intervals

 

In the groupwiseMedian function, the basic, normal, percentile, bca, wilcox, and exact options determine which types of confidence intervals will be calculated.  By default, only the bca option is set to TRUE, so the other options usually don’t need to be included.

 

Note that results for any statistic derived from an iterative process like bootstrapping may be slightly different.

 

Note that the 1 in the formula on the right side of the tilde (~) indicates that the function should treat the data as one-sample data.

 

See library(rcompanion); ?groupwiseMedian for more information on this function.

 

If the function takes too long to complete, you can decrease the R= value.


library(rcompanion)

groupwiseMedian(Likert ~ 1,
                data       = Data,
                conf       = 0.95,
                R          = 5000,
                percentile = TRUE,
                bca        = FALSE,
                basic      = FALSE,
                normal     = FALSE,
                wilcox     = FALSE,
                digits     = 3)


   .id  n Median Conf.level Percentile.lower Percentile.upper
1 <NA> 26    7.5       0.95              6.5                8


Medians, quantiles, and confidence intervals for grouped data

 

The following example revisits the Pooh, Piglet and Tigger data from the Descriptive Statistics with the likert Package chapter.

 

The Summarize function in the FSA package will produce medians and quantiles for grouped data including one-way and multi-way data.

 

The function groupwiseMedian will produce medians and confidence intervals for grouped data, including one-way and multi-way data.


Input =("
 Speaker  Likert
 Pooh      3
 Pooh      5
 Pooh      4
 Pooh      4
 Pooh      4
 Pooh      4
 Pooh      4
 Pooh      4
 Pooh      5
 Pooh      5
 Piglet    2
 Piglet    4
 Piglet    2
 Piglet    2
 Piglet    1
 Piglet    2
 Piglet    3
 Piglet    2
 Piglet    2
 Piglet    3
 Tigger    4
 Tigger    4
 Tigger    4
 Tigger    4
 Tigger    5
 Tigger    3
 Tigger    5
 Tigger    4
 Tigger    4
 Tigger    3
")

Data = read.table(textConnection(Input),header=TRUE)


Summarize function in FSA package for grouped data


library(FSA)

Summarize(Likert ~ Speaker,
          data=Data,
          digits=3)


  Speaker  n nvalid mean    sd min Q1 median   Q3 max percZero
1  Piglet 10     10  2.3 0.823   1  2      2 2.75   4        0
2    Pooh 10     10  4.2 0.632   3  4      4 4.75   5        0
3  Tigger 10     10  4.0 0.667   3  4      4 4.00   5        0


groupwiseMedian function for grouped data

 

library(rcompanion)

groupwiseMedian(Likert ~ Speaker,
                data       = Data,
                conf       = 0.95,
                R          = 5000,
                percentile = TRUE,
                bca        = FALSE,
                digits     = 3)


  Speaker  n Median Conf.level Percentile.lower Percentile.upper
1  Piglet 10      2       0.95              2.0              3.0
2    Pooh 10      4       0.95              4.0              5.0
3  Tigger 10      4       0.95              3.5              4.5


Optional methods and discussion on confidence intervals for medians


Input =("
 Speaker          Rater  Likert
 'Lisa Simpson'   1         8
 'Lisa Simpson'   2        10
 'Lisa Simpson'   3         9
 'Lisa Simpson'   4         10
 'Lisa Simpson'   5         6
 'Lisa Simpson'   6         5
 'Lisa Simpson'   7         3
 'Lisa Simpson'   8         7
 'Lisa Simpson'   9         8
 'Lisa Simpson'  10         5
 'Lisa Simpson'  11        10
 'Lisa Simpson'  12         4
 'Lisa Simpson'  13         8
 'Lisa Simpson'  14         6
 'Lisa Simpson'  15         9
 'Lisa Simpson'  16         8
 'Lisa Simpson'  17         7
 'Lisa Simpson'  18         5
 'Lisa Simpson'  19         8
 'Lisa Simpson'  20         7
 'Lisa Simpson'  21         9
 'Lisa Simpson'  22         8
 'Lisa Simpson'  23         7
 'Lisa Simpson'  24         5
 'Lisa Simpson'  25        10
 'Lisa Simpson'  26         7
")

Data = read.table(textConnection(Input),header=TRUE)


Optional: Confidence interval for medians with the wilcox.test function

For one-sample data, the wilcox.test function will produce a confidence interval for the median.  It will also produce a “(pseudo)median”.  For details on the calculations of these statistics, see ?wilcox.test.

 

Note that the conf.int=TRUE option must be used to produce the confidence interval, and the conf.level=0.95 option indicates that a 95% confidence interval should be calculated.

 

wilcox.test(Data$Likert,
            alternative="two.sided",
            correct=TRUE,
            conf.int=TRUE,
            conf.level=0.95)


95 percent confidence interval:
 6.499961 8.000009

 

Optional: Median and confidence interval with the DescTools package

With the MedianCI function in the DescTools package, the method=exact option uses the confidence interval from the SignTest function in DescTools.  The method=boot option uses the basic method from the boot package.


library(DescTools)

MedianCI(Data$Likert,
         conf.level = 0.95,
         na.rm = FALSE,
         method = "exact",
         R = 10000)


median lwr.ci upr.ci
   7.5    6.0    8.0


Optional: Confidence interval for median by bootstrap

Bootstrapping is a method by which a statistic is calculated by repeated sampling the given data to better estimate the distribution of values in the population from which the sample was taken.

 

The boot package in R can derive various statistics with a bootstrap process.  Note that the grammar of the function is somewhat complicated, but here Data$Likert is the variable we wish to get the statistics for, and R=10000 indicates the number of bootstrap replicates to use.  Mboot here is defined as the result of the bootstrap.  The boot.ci function produces four types of confidence intervals for the median.  Note also that the displaying the result Mboot gives a standard error for the estimated median.  See ?boot.ci for more details on these methods.

 

If the function takes too long to complete, you can decrease the R= value.

 

Note that results for any statistic derived from an iterative process like bootstrapping may be slightly different if the process is re-run.

 


library(boot)

Mboot = boot(Data$Likert,
             function(x,i) median(x[i]),
             R=10000)

boot.ci(Mboot,
        conf = 0.95,
        type = c("norm", "basic" ,"perc", "bca")
        )

Intervals :
Level      Normal              Basic        
95%   ( 6.510,  8.535 )   ( 7.000,  8.500 ) 

Level     Percentile            BCa         
95%   ( 6.5,  8.0 )   ( 6.0,  8.0 ) 


### Other information

Mboot

hist(Mboot$t[,1],
     col = "darkgray")


References

 

Carpenter, J. and J. Bithel. 2000. “Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians”.  Statistics in Medicine 19:1141–1164. www.tau.ac.il/~saharon/Boot/10.1.1.133.8405.pdf.

 

Exercises H

 

1. Considering Lisa Simpson's data,

 

a.  What was her median Likert score?

b.  Minimum?  Maximum?   25th percentile?  75th percentile?


2. For Lisa Simpson's data, what is the 95% confidence interval for the median using the following methods?

 

a.  wilcox

b.  normal

c.  basic

d.  percentile

e.  BCa

 

3. Considering the Pooh, Piglet, and Tigger data,

 

a.  For Piglet, what was his median score and 95% confidence interval using the percentile method?

4.  Bart Simpson and Milhouse Van Houten were each evaluated on a Likert scale.  Determine the median score for each, and determine the 95% confidence interval for these median scores.
 

•  Be sure to state which method for the confidence interval you used.

•  How can you interpret the results?  Include the practical interpretation.  For example, are the differences in scores large enough to be meaningful?


Speaker    Likert
Bart       7
Bart       7
Bart       8
Bart       9
Bart       5
Bart       6
Bart       6
Bart       7
Bart       8
Bart       7
Bart       7
Bart       7
Milhouse   8
Milhouse   8
Milhouse   7
Milhouse   7
Milhouse   9 
Milhouse   9 
Milhouse  10
Milhouse   6
Milhouse   7
Milhouse   8
Milhouse   7
Milhouse   8