Most common statistics of central tendency can be calculated with functions in the native stats package. The psych and DescTools packages add functions for the geometric mean and the harmonic mean. The describe function in the psych package includes the mean, median, and trimmed mean along with other common statistics. In the native stats package, summary is a quick way to see the mean, median, and quantiles for numeric variables in a data frame. The mode is not commonly calculated, but can be found in DescTools.
Many functions which determine common statistics of central tendency or dispersion will return an NA if there are any missing values (NA’s) in the analyzed data. In most cases this behavior can be changed with the na.rm=TRUE option, which will simply exclude any NA’s in the data. The functions shown here either exclude NA’s by default or use the na.rm=TRUE option.
Examples in Summary and Analysis of Extension Program Evaluation
SAEPER: Descriptive
Statistics
Packages used in this chapter
The following commands will install these packages if they are not already installed:
if(!require(psych)){install.packages("psych")}
if(!require(DescTools)){install.packages("DescTools")}
Introduction
The normal distribution
See the Handbook for information on these topics.
Different measures of central tendency
Methods are described in the “Example” section below.
Example
###
--------------------------------------------------------------
### Central tendency example, pp. 105–106
### --------------------------------------------------------------
Input =("
Stream Fish
Mill_Creek_1 76
Mill_Creek_2 102
North_Branch_Rock_Creek_1 12
North_Branch_Rock_Creek_2 39
Rock_Creek_1 55
Rock_Creek_2 93
Rock_Creek_3 98
Rock_Creek_4 53
Turkey_Branch 102
")
Data = read.table(textConnection(Input),header=TRUE)
Arithmetic mean
mean(Data$ Fish, na.rm=TRUE)
[1] 70
Geometric mean
library(psych)
geometric.mean(Data$ Fish)
[1] 59.83515
library(DescTools)
Gmean(Data$ Fish)
[1] 59.83515
Harmonic mean
library(psych)
harmonic.mean(Data$ Fish)
[1] 45.05709
library(DescTools)
Hmean(Data$ Fish)
[1] 45.05709
Median
median(Data$ Fish, na.rm=TRUE)
[1] 76
Mode
library(DescTools)
Mode(Data$ Fish)
[1] 102
Summary and describe functions for means, medians, and other statistics
The interquartile range (IQR) is 3rd Qu. minus 1st Qu.
summary(Data$ Fish) # Also
works on whole data frames
# Will also report count of NA’s
Min. 1st Qu. Median Mean 3rd Qu. Max.
12 53 76 70 98 10
library(psych)
describe(Data$ Fish, # Also works on whole
data frames
type=2) # Type of skew and
kurtosis
vars n mean sd median trimmed mad min max range skew kurtosis se
1 1 9 70 32.09 76 70 34.1 12 102 90 -0.65 -0.69 10.7
Histogram
hist(Data$ Fish,
col="gray",
main="Maryland Biological Stream Survey",
xlab="Fish count")
# # #
DescTools to produce summary statistics and plots
The Desc function in the package DescTools produces summary information for individual variables or whole data frames. It has custom output for factor, numeric, integer, and date variables.
###
--------------------------------------------------------------
### Central tendency example, pp. 105–106
### --------------------------------------------------------------
Input =("
Stream Fish
Mill_Creek_1 76
Mill_Creek_2 102
North_Branch_Rock_Creek_1 12
North_Branch_Rock_Creek_2 39
Rock_Creek_1 55
Rock_Creek_2 93
Rock_Creek_3 98
Rock_Creek_4 53
Turkey_Branch 102
")
Data = read.table(textConnection(Input),header=TRUE)
### Add a numeric variable with the same values as Fish
Data$Fish.num = as.numeric(Data$Fish)
### Produce summary statistics and plots
library(DescTools)
Desc(Data,
plotit=TRUE)
----------------------------------------------------------------------------
1 - Stream (factor)
length n NAs levels unique dupes
9 9 0 9 9 n
level freq perc cumfreq cumperc
1 Mill_Creek_1 1 .111 1 .111
2 Mill_Creek_2 1 .111 2 .222
3 North_Branch_Rock_Creek_1 1 .111 3 .333
4 North_Branch_Rock_Creek_2 1 .111 4 .444
5 Rock_Creek_1 1 .111 5 .556
6 Rock_Creek_2 1 .111 6 .667
7 Rock_Creek_3 1 .111 7 .778
8 Rock_Creek_4 1 .111 8 .889
9 Turkey_Branch 1 .111 9 1.000
----------------------------------------------------------------------------
.
.
< results snipped >
.
.
----------------------------------------------------------------------------
3 - Fish.num (numeric)
length n NAs unique 0s mean meanSE
9 9 0 8 0 70 10.695
.05 .10 .25 median .75 .90 .95
22.800 33.600 53 76 98 102 102
rng sd vcoef mad IQR skew kurt
90 32.086 0.458 34.100 45 -0.448 -1.389
lowest : 12, 39, 53, 55, 76
highest: 55, 76, 93, 98, 102 (2)
Shapiro-Wilks normality test p.value : 0.23393
DescTools with grouped data
###
--------------------------------------------------------------
### Summary statistics with grouped data, hypothetical data
### --------------------------------------------------------------
Input =("
Stream Animal Count
Mill_Creek_1 Fish 76
Mill_Creek_2 Fish 102
North_Branch_Rock_Creek_1 Fish 12
North_Branch_Rock_Creek_2 Fish 39
Rock_Creek_1 Fish 55
Rock_Creek_2 Fish 93
Rock_Creek_3 Fish 98
Rock_Creek_4 Fish 53
Turkey_Branch Fish 102
Mill_Creek_1 Insect 28
Mill_Creek_2 Insect 85
North_Branch_Rock_Creek_1 Insect 17
North_Branch_Rock_Creek_2 Insect 20
Rock_Creek_1 Insect 33
Rock_Creek_2 Insect 75
Rock_Creek_3 Insect 78
Rock_Creek_4 Insect 25
Turkey_Branch Insect 87
")
D2 = read.table(textConnection(Input),header=TRUE)
library(DescTools)
Desc(Count ~ Animal,
D2,
digits=1,
plotit=TRUE)
----------------------------------------------------------------------------
Count ~ Animal
Summary:
n pairs: 18, valid: 18 (100%), missings: 0 (0%), groups: 2
Fish Insect
mean 70.0" 49.8'
median 76.0" 33.0'
sd 32.1 30.4
IQR 45.0 53.0
n 9 9
np 0.500 0.500
NAs 0 0
0s 0 0
' min, " max
Kruskal-Wallis rank sum test:
Kruskal-Wallis chi-squared = 2.125, df = 1, p-value = 0.1449
How to calculate the statistics
Methods are described in the “Example” section above.