[banner]

An R Companion for the Handbook of Biological Statistics

Salvatore S. Mangiafico

Statistics of Central Tendency

Most common statistics of central tendency can be calculated with functions in the native stats package.  The psych and DescTools packages add functions for the geometric mean and the harmonic mean.  The describe function in the psych package includes the mean, median, and trimmed mean along with other common statistics.  In the native stats package, summary is a quick way to see the mean, median, and quantiles for numeric variables in a data frame.  The mode is not commonly calculated, but can be found in DescTools.

 

Many functions which determine common statistics of central tendency or dispersion will return an NA if there are any missing values (NA’s) in the analyzed data.  In most cases this behavior can be changed with the na.rm=TRUE option, which will simply exclude any NA’s in the data.  The functions shown here either exclude NA’s by default or use the na.rm=TRUE option.

 

Examples in Summary and Analysis of Extension Program Evaluation


SAEPER: Descriptive Statistics

 

Packages used in this chapter

The following commands will install these packages if they are not already installed:


if(!require(psych)){install.packages("psych")}
if(!require(DescTools)){install.packages("DescTools")}

 

Introduction

The normal distribution

See the Handbook for information on these topics.

 

Different measures of central tendency

Methods are described in the “Example” section below.

 

 Example

 

### --------------------------------------------------------------
### Central tendency example, pp. 105
106
### --------------------------------------------------------------

Input =("
Stream                      Fish
 Mill_Creek_1                76
 Mill_Creek_2               102
 North_Branch_Rock_Creek_1   12
 North_Branch_Rock_Creek_2   39
 Rock_Creek_1                55
 Rock_Creek_2                93
 Rock_Creek_3                98
 Rock_Creek_4                53
 Turkey_Branch              102
")

Data = read.table(textConnection(Input),header=TRUE)

 

 

Arithmetic mean

 

mean(Data$ Fish, na.rm=TRUE)

 

[1] 70

 

 

Geometric mean

 

library(psych)

geometric.mean(Data$ Fish)

 

[1] 59.83515

 

library(DescTools)

Gmean(Data$ Fish)

[1] 59.83515

 

 

Harmonic mean

 

library(psych)

harmonic.mean(Data$ Fish)

 

[1] 45.05709

 

library(DescTools)

Hmean(Data$ Fish)

[1] 45.05709

 

 

 

Median

 

median(Data$ Fish, na.rm=TRUE)

 

[1] 76

 

 

Mode


library(DescTools)

Mode(Data$ Fish)

[1] 102

 

 

 

Summary and describe functions for means, medians, and other statistics

The interquartile range (IQR) is 3rd Qu. minus 1st Qu.

 

summary(Data$ Fish)          # Also works on whole data frames
                             # Will also report count of NA’s

 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.

     12      53      76      70      98     10

 

 

library(psych)
           
describe(Data$ Fish,          # Also works on whole data frames
               type=2)        # Type of skew and kurtosis

 

 

  vars n mean    sd median trimmed  mad min max range  skew kurtosis   se

1    1 9   70 32.09     76      70 34.1  12 102    90 -0.65    -0.69 10.7

 

 

Histogram

 

hist(Data$ Fish,   
    col="gray", 
    main="Maryland Biological Stream Survey",
    xlab="Fish count")    

 

#     #     #

 

Rplot

 

 

 

DescTools to produce summary statistics and plots

The Desc function in the package DescTools produces summary information for individual variables or whole data frames.  It has custom output for factor, numeric, integer, and date variables.

 

### --------------------------------------------------------------
### Central tendency example, pp. 105–106
### --------------------------------------------------------------

Input =("
Stream                      Fish
 Mill_Creek_1                76
 Mill_Creek_2               102
 North_Branch_Rock_Creek_1   12
 North_Branch_Rock_Creek_2   39
 Rock_Creek_1                55
 Rock_Creek_2                93
 Rock_Creek_3                98
 Rock_Creek_4                53
 Turkey_Branch              102
")

Data = read.table(textConnection(Input),header=TRUE)


### Add a numeric variable with the same values as Fish

Data$Fish.num = as.numeric(Data$Fish)


### Produce summary statistics and plots

library(DescTools)

Desc(Data,
     plotit=TRUE)
    

----------------------------------------------------------------------------

1 - Stream (factor)

 

  length      n    NAs levels unique  dupes

       9      9      0      9      9      n

 

                      level freq  perc cumfreq cumperc

1              Mill_Creek_1    1  .111       1    .111

2              Mill_Creek_2    1  .111       2    .222

3 North_Branch_Rock_Creek_1    1  .111       3    .333

4 North_Branch_Rock_Creek_2    1  .111       4    .444

5              Rock_Creek_1    1  .111       5    .556

6              Rock_Creek_2    1  .111       6    .667

7              Rock_Creek_3    1  .111       7    .778

8              Rock_Creek_4    1  .111       8    .889

9             Turkey_Branch    1  .111       9   1.000

----------------------------------------------------------------------------

.

.

< results snipped >

.

.

----------------------------------------------------------------------------

3 - Fish.num (numeric)

 

  length      n    NAs unique     0s   mean meanSE

       9      9      0      8      0     70 10.695

 

     .05    .10    .25 median    .75    .90    .95

  22.800 33.600     53     76     98    102    102

 

     rng     sd  vcoef    mad    IQR   skew   kurt

      90 32.086  0.458 34.100     45 -0.448 -1.389

 

lowest : 12, 39, 53, 55, 76

highest: 55, 76, 93, 98, 102 (2)

 

Shapiro-Wilks normality test  p.value : 0.23393

 

 

 

 

 

 

DescTools with grouped data

 

### --------------------------------------------------------------
###  Summary statistics with grouped data, hypothetical data
### --------------------------------------------------------------

Input =("
Stream                     Animal  Count
 Mill_Creek_1               Fish     76
 Mill_Creek_2               Fish    102
 North_Branch_Rock_Creek_1  Fish     12
 North_Branch_Rock_Creek_2  Fish     39
 Rock_Creek_1               Fish     55
 Rock_Creek_2               Fish     93
 Rock_Creek_3               Fish     98
 Rock_Creek_4               Fish     53
 Turkey_Branch              Fish    102
 
 Mill_Creek_1               Insect   28
 Mill_Creek_2               Insect   85
 North_Branch_Rock_Creek_1  Insect   17
 North_Branch_Rock_Creek_2  Insect   20
 Rock_Creek_1               Insect   33
 Rock_Creek_2               Insect   75
 Rock_Creek_3               Insect   78
 Rock_Creek_4               Insect   25
 Turkey_Branch              Insect   87
")

D2 = read.table(textConnection(Input),header=TRUE)


library(DescTools)
   
Desc(Count ~ Animal,
     D2,
     digits=1,
     plotit=TRUE)    

----------------------------------------------------------------------------

Count ~ Animal

 

Summary:

n pairs: 18, valid: 18 (100%), missings: 0 (0%), groups: 2

 

 

          Fish  Insect

mean      70.0"   49.8'

median    76.0"   33.0'

sd        32.1    30.4

IQR       45.0    53.0

n            9       9

np       0.500   0.500

NAs          0       0

0s           0       0

 

' min, " max

 

Kruskal-Wallis rank sum test:

  Kruskal-Wallis chi-squared = 2.125, df = 1, p-value = 0.1449

 

 

 

 

How to calculate the statistics

Methods are described in the “Example” section above.