## An R Companion for the Handbook of Biological Statistics

Salvatore S. Mangiafico

# Statistics of Central Tendency

Most common statistics of central tendency can be calculated with functions in the native stats package.  The psych and DescTools packages add functions for the geometric mean and the harmonic mean.  The describe function in the psych package includes the mean, median, and trimmed mean along with other common statistics.  In the native stats package, summary is a quick way to see the mean, median, and quantiles for numeric variables in a data frame.  The mode is not commonly calculated, but can be found in DescTools.

Many functions which determine common statistics of central tendency or dispersion will return an NA if there are any missing values (NA’s) in the analyzed data.  In most cases this behavior can be changed with the na.rm=TRUE option, which will simply exclude any NA’s in the data.  The functions shown here either exclude NA’s by default or use the na.rm=TRUE option.

### Packages used in this chapter

The following commands will install these packages if they are not already installed:

if(!require(psych)){install.packages("psych")}
if(!require(DescTools)){install.packages("DescTools")}

Introduction

The normal distribution

See the Handbook for information on these topics.

Different measures of central tendency

Methods are described in the “Example” section below.

### Example

### --------------------------------------------------------------
### Central tendency example, pp. 105
106
### --------------------------------------------------------------

Input =("
Stream                      Fish
Mill_Creek_1                76
Mill_Creek_2               102
North_Branch_Rock_Creek_1   12
North_Branch_Rock_Creek_2   39
Rock_Creek_1                55
Rock_Creek_2                93
Rock_Creek_3                98
Rock_Creek_4                53
Turkey_Branch              102
")

#### Arithmetic mean

mean(Data\$ Fish, na.rm=TRUE)

[1] 70

#### Geometric mean

library(psych)

geometric.mean(Data\$ Fish)

[1] 59.83515

library(DescTools)

Gmean(Data\$ Fish)

[1] 59.83515

#### Harmonic mean

library(psych)

harmonic.mean(Data\$ Fish)

[1] 45.05709

library(DescTools)

Hmean(Data\$ Fish)

[1] 45.05709

#### Median

median(Data\$ Fish, na.rm=TRUE)

[1] 76

#### Mode

library(DescTools)

Mode(Data\$ Fish)

[1] 102

#### Summary and describe functions for means, medians, and other statistics

The interquartile range (IQR) is 3rd Qu. minus 1st Qu.

summary(Data\$ Fish)          # Also works on whole data frames
# Will also report count of NA’s

Min. 1st Qu.  Median    Mean 3rd Qu.    Max.

12      53      76      70      98     10

library(psych)

describe(Data\$ Fish,          # Also works on whole data frames
type=2)        # Type of skew and kurtosis

vars n mean    sd median trimmed  mad min max range  skew kurtosis   se

1    1 9   70 32.09     76      70 34.1  12 102    90 -0.65    -0.69 10.7

#### Histogram

hist(Data\$ Fish,
col="gray",
main="Maryland Biological Stream Survey",
xlab="Fish count")

#     #     #

#### DescTools to produce summary statistics and plots

The Desc function in the package DescTools produces summary information for individual variables or whole data frames.  It has custom output for factor, numeric, integer, and date variables.

### --------------------------------------------------------------
### Central tendency example, pp. 105–106
### --------------------------------------------------------------

Input =("
Stream                      Fish
Mill_Creek_1                76
Mill_Creek_2               102
North_Branch_Rock_Creek_1   12
North_Branch_Rock_Creek_2   39
Rock_Creek_1                55
Rock_Creek_2                93
Rock_Creek_3                98
Rock_Creek_4                53
Turkey_Branch              102
")

### Add a numeric variable with the same values as Fish

Data\$Fish.num = as.numeric(Data\$Fish)

### Produce summary statistics and plots

library(DescTools)

Desc(Data,
plotit=TRUE)

----------------------------------------------------------------------------

1 - Stream (factor)

length      n    NAs levels unique  dupes

9      9      0      9      9      n

level freq  perc cumfreq cumperc

1              Mill_Creek_1    1  .111       1    .111

2              Mill_Creek_2    1  .111       2    .222

3 North_Branch_Rock_Creek_1    1  .111       3    .333

4 North_Branch_Rock_Creek_2    1  .111       4    .444

5              Rock_Creek_1    1  .111       5    .556

6              Rock_Creek_2    1  .111       6    .667

7              Rock_Creek_3    1  .111       7    .778

8              Rock_Creek_4    1  .111       8    .889

9             Turkey_Branch    1  .111       9   1.000

----------------------------------------------------------------------------

.

.

< results snipped >

.

.

----------------------------------------------------------------------------

3 - Fish.num (numeric)

length      n    NAs unique     0s   mean meanSE

9      9      0      8      0     70 10.695

.05    .10    .25 median    .75    .90    .95

22.800 33.600     53     76     98    102    102

rng     sd  vcoef    mad    IQR   skew   kurt

90 32.086  0.458 34.100     45 -0.448 -1.389

lowest : 12, 39, 53, 55, 76

highest: 55, 76, 93, 98, 102 (2)

Shapiro-Wilks normality test  p.value : 0.23393

#### DescTools with grouped data

### --------------------------------------------------------------
###  Summary statistics with grouped data, hypothetical data
### --------------------------------------------------------------

Input =("
Stream                     Animal  Count
Mill_Creek_1               Fish     76
Mill_Creek_2               Fish    102
North_Branch_Rock_Creek_1  Fish     12
North_Branch_Rock_Creek_2  Fish     39
Rock_Creek_1               Fish     55
Rock_Creek_2               Fish     93
Rock_Creek_3               Fish     98
Rock_Creek_4               Fish     53
Turkey_Branch              Fish    102

Mill_Creek_1               Insect   28
Mill_Creek_2               Insect   85
North_Branch_Rock_Creek_1  Insect   17
North_Branch_Rock_Creek_2  Insect   20
Rock_Creek_1               Insect   33
Rock_Creek_2               Insect   75
Rock_Creek_3               Insect   78
Rock_Creek_4               Insect   25
Turkey_Branch              Insect   87
")

library(DescTools)

Desc(Count ~ Animal,
D2,
digits=1,
plotit=TRUE)

----------------------------------------------------------------------------

Count ~ Animal

Summary:

n pairs: 18, valid: 18 (100%), missings: 0 (0%), groups: 2

Fish  Insect

mean      70.0"   49.8'

median    76.0"   33.0'

sd        32.1    30.4

IQR       45.0    53.0

n            9       9

np       0.500   0.500

NAs          0       0

0s           0       0

' min, " max

Kruskal-Wallis rank sum test:

Kruskal-Wallis chi-squared = 2.125, df = 1, p-value = 0.1449

How to calculate the statistics

Methods are described in the “Example” section above.