### Packages used in this chapter

The packages used in this chapter include:

• psych

• rcompanion

The following commands will install these packages if they are not already installed:

if(!require(psych)){install.packages("psych")}

if(!require(rcompanion)){install.packages("rcompanion")}

### When to use parametric tests

Parametric statistical tests are among the most common
you’ll encounter. They include *t*-test, analysis of variance, and linear
regression.

They are used when the dependent variable is an interval/ratio data variable. This might include variables measured in science such as fish length, child height, crop yield weight, or pollutant concentration in water.

One advantage of using parametric statistical tests is that your audience will likely be familiar with the techniques and interpretation of the results. These tests are also often more flexible and more powerful than their nonparametric analogues.

Their major drawback is that all parametric tests assume something about the distribution of the underlying data. If these assumptions are violated, the resultant test statistics will not be valid, and the tests will not be as powerful as for cases when assumptions are met.

#### Count data may not be appropriate for common parametric tests

A frequent error is to use common parametric models and tests
with count data for the dependent variable. Instead, count data could be analyzed
either by using tests for nominal data or by using regression methods
appropriate for count data. These include Poisson regression, negative
binomial regression, and zero-inflated Poisson regression. See the *Hermite and
Poisson Regression for Count Data* chapter.

##### When to use parametric tests for count data

It is sometimes permissible to use common parametric tests for count data or other discrete data. They can be used in cases where counts are used as a type of measurement of some property of subjects, provided that 1) the distribution of data or residuals from the analysis meet test assumptions; and 2) there are few or no counts at or close to zero, or close to a maximum, if one exists. Permissible examples might include test scores, age, or number of steps taken during the day. Technically, each of these measurements is bound by zero, and are discrete rather than continuous measurements. However, if other conditions are met, it is reasonable to handle them as if they were continuous measurement variables.

This kind of count data will sometimes need to be transformed to meet the assumptions of parametric analysis. Square root and logarithmic transformations are common. However, if there are many counts at or near zero, transformation is unlikely to help. It is usually not worth the effort to attempt to force count data to meet the assumptions of parametric analysis with transformations, since there are more appropriate methods available.

#### Percentage and proportion data

Percentage and proportion data are often inappropriate for parametric statistics for the same reasons given for count data. They will often not meet the assumptions of the tests, and are particularly problematic if there are some or many observations close to 0 or 1. The arcsine transformation was used traditionally, but logistic regression or beta regression may be more appropriate.

### Assumptions in parametric statistics

All parametric analyses have assumptions about the underlying data, and these assumptions should always be confirmed when using these tests. If these assumptions are violated, the resulting statistics and conclusions will not be valid, and the tests may lack power relative to alternative tests.

The assumptions vary among tests, but a general discussion follows.

For examples in this section, we’ll revisit the Catbus data.

Input = ("

Student Sex Teacher Steps Rating

a female Catbus 8000 7

b female Catbus 9000 10

c female Catbus 10000 9

d female Catbus 7000 5

e female Catbus 6000 4

f female Catbus 8000 8

g male Catbus 7000 6

h male Catbus 5000 5

i male Catbus 9000 10

j male Catbus 7000 8

k female Satsuki 8000 7

l female Satsuki 9000 8

m female Satsuki 9000 8

n female Satsuki 8000 9

o male Satsuki 6000 5

p male Satsuki 8000 9

q male Satsuki 7000 6

r female Totoro 10000 10

s female Totoro 9000 10

t female Totoro 8000 8

u female Totoro 8000 7

v female Totoro 6000 7

w male Totoro 6000 8

x male Totoro 8000 10

y male Totoro 7000 7

z male Totoro 7000 7

")

Data = read.table(textConnection(Input),header=TRUE)

Data

### Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)

### Remove unnecessary objects

rm(Input)

#### Random sampling

All statistical tests assume that the data captured in the sample are randomly chosen from the population as a whole. Selection bias will obviously affect the validity of the outcome of the analysis.

#### Independent observations

Tests will also assume that observations are independent of one another, except when the analysis takes non-independence into account. One common case of non-independent observations is in repeated measures experiments, in which the same subject is observed over time. If you were measuring test scores of students over time, you might expect students with a high test score on one date to have a high test score on subsequent dates. In this case the observation on one date would not be independent of observations on other dates.

The independence of observation is often assumed from good experimental design. Also, data or residuals can be plotted, for example to see if observations from one date are correlated to those for another date.

#### Normal distribution of data or residuals

Parametric tests assume that the data come from a population
of known distribution. In particular, the tests discussed in this section
assume that the distribution of the data are *conditionally* normal in
distribution. That is, the data are normally distributed *once the effects
of the variables in the model are taken into account*. Practically
speaking, this means that the residuals from the analysis should be normally
distributed. This will usually be assessed with a histogram of residuals, a
density plot as shown below, or with quantile–quantile plot.

A select number of tests will require that data itself be
normally distributed. This will be limited to one-sample *t*-test,
two-sample *t*-test, and paired *t*-test. For other tests, the
distribution of the residuals will be investigated.

Residuals from an analysis are also commonly called *errors*.
They are the difference between the observations and the value predicted by the
model. For example, if the calculated mean of a sample is 10, and one
observation is 12, the residual for this observation is 2. If another
observation is 7, the residual for this observation is –3.

Be careful not to get confused about this assumption. You
may see discussion about how “data” should be normally distributed for
parametric tests. This is usually wrong-headed. The *t*-test assumes
that the observations for *each group* are normally distributed, but if
there is a difference in the groups, we might expect a bi-modal distribution,
not a simple normal distribution, for the combined data. This is why in most
cases we look at the distribution of the residuals, not the raw data.

##### Optional: Considering the distributions of data for groups and their residuals

The following code is just to illustrate this principle, and isn’t
normally used in analysis. First, the distributions of *Steps* by *Sex
*in the *Catbus*
data set are examined, and then the distribution of residuals — group means subtracted
from each observation — is examined.

### Density plot of observations by sex for
the Catbus data
set

ggplot(Data,

aes(Steps, fill = Sex)) +

geom_density(position="dodge",

alpha = 0.6)

The following code creates a variable *Mean* for *Steps*
for each *Sex* in the Catbus data set, and then subtracts *Steps*
from each mean, calling the result *Residual*. This is for illustrative
purposes only; you won’t normally need to manipulate data in this manner.

M1 = mean(Data$Steps[Data$Sex=="female"])

M2 = mean(Data$Steps[Data$Sex=="male"])

Data$Mean[Data$Sex=="female"] = M1

Data$Mean[Data$Sex=="male"] = M2

Data$Residual = Data$Steps - Data$Mean

### Density plot of residuals for mean for each sex for the Catbus data set

ggplot(Data,

aes(Residual)) +

geom_density(fill = "gray")

#### Homogeneity of variance

Parametric analyses will also assume a homogeneity of
variance among groups. That is, for Student's *t*-test comparing two groups, each
group should have the same variance.

A good approach to assess this assumption is to plot residuals vs. predicted values. The residuals should have approximately the same spread across all predicted values.

Homogeneity of variance is also called *homoscedasticity*.
The opposite is *heteroscedasticity*.

### Residuals from mean for each sex vs.
mean for Catbus data set

plot(jitter(Residual) ~ Mean,

data = Data)

As a technical note, by default R conducts a variant of the *t*-test called Welch’s *t*-test. This test does not assume homogeneity of variance and so can be used to compare two groups with unequal variances.

#### Additivity of treatment effects

Models for two-way analysis of variance and similar analyses are constructed as linear models in which the dependent variable is predicted as a linear combination of the independent variables.

A violation of this assumption is sometimes indicated when a plot of residuals versus predicted values exhibits a curved pattern.

#### Outliers

Outliers are observations whose value is far outside what is expected. They can play havoc with parametric analyses since they affect the distribution of the data and strongly influence the mean.

There are a variety of formal tests for detecting outliers, but they will not be discussed here. The best approach is one that looks at residuals after an analysis. Good tools are the “Residuals vs. leverage” plot and other plots in the “Other diagnostic plots” section below.

It’s my opinion that outliers should not be removed from data unless there is a good reason, usually when a value is impossible or a measurement error of some kind is suspected.

#### Parametric tests are somewhat robust

Some parametric tests are somewhat robust to violations of
certain assumptions. For example, the *t*-test is reasonably robust to
violations of normality for symmetric distributions, but not to samples having
unequal variances (unless Welch's *t*-test is used). A one-way analysis of variance is likewise reasonably
robust to violations in normality.

The upshot is that model assumptions should always be checked, but you may be able to tolerate small violations in the distribution of residuals or homoscedasticity. Large violations will make the test invalid, though. It is important to be honest with your assessments when checking model assumptions. It is better to transform data, change your model, use a robust method, or use a nonparametric test than to not have confidence in your analysis.

### Assessing model assumptions

For this example, we’ll revisit the Catbus data. We’ll then
define a linear model where *Steps* is the dependent variable and *Sex*
and *Teacher* are the independent variables.

Input = ("

Student Sex Teacher Steps Rating

a female Catbus 8000 7

b female Catbus 9000 10

c female Catbus 10000 9

d female Catbus 7000 5

e female Catbus 6000 4

f female Catbus 8000 8

g male Catbus 7000 6

h male Catbus 5000 5

i male Catbus 9000 10

j male Catbus 7000 8

k female Satsuki 8000 7

l female Satsuki 9000 8

m female Satsuki 9000 8

n female Satsuki 8000 9

o male Satsuki 6000 5

p male Satsuki 8000 9

q male Satsuki 7000 6

r female Totoro 10000 10

s female Totoro 9000 10

t female Totoro 8000 8

u female Totoro 8000 7

v female Totoro 6000 7

w male Totoro 6000 8

x male Totoro 8000 10

y male Totoro 7000 7

z male Totoro 7000 7

")

Data = read.table(textConnection(Input),header=TRUE)

### Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)

### Remove unnecessary objects

rm(Input)

###### Define a linear model

model = lm(Steps ~ Sex + Teacher,

data = Data)

#### Using formal tests to assess normality of residuals

There are formal tests to assess the normality of residuals. Common tests include Shapiro-Wilk, Anderson–Darling, Kolmogorov–Smirnov, and D’Agostino–Pearson. These are presented in the “Optional analyses: formal tests for normality” section.

In general, I don’t recommend using these tests because their results are dependent on sample size. When the sample size is large, the tests may indicate a statistically significant departure from normality, even if that departure is small. And when sample sizes are small, they won’t detect departures from normality.

The article from the *Fells Stats* blog, in the
“References” section has pretty convincing examples of these problems.

#### Skew and kurtosis

There are no definitive guidelines as to what range of skew or kurtosis are acceptable for considering residuals to be normally distributed.

In general, I would not recommend relying on skew and kurtosis calculations, but instead use histograms and other plots.

If I were forced to give advice for skewness calculations, I might say, be cautious if the absolute value is > 0.5, and consider it not normally distributed if the absolute value is > 1.0. Some authors use 2.0 as a cutoff for normality, and others use a higher limit for kurtosis.

library(psych)

x = residuals(model)

describe(x,

type=2) # Type of skew and kurtosis

vars n mean sd median
trimmed mad min max range skew kurtosis

1 1 26 0 1132.26 -39.58 11.07 1114.18 -2202.4 2123.25 4325.65 -0.13
-0.11

#### Using visual inspection to assess the normality of residuals

Usually, the best method to see if model residuals meet the assumptions of normal distribution and homoscedasticity are to plot them and inspect the plots visually.

##### Histogram with normal curve

A histogram of the residuals should be approximately normal, without excessive skew or kurtosis. Adding a normal curve with the same mean and standard deviation as the data helps to assess the histogram.

x = residuals(model)

library(rcompanion)

normal.histogram(x)

##### Kernel density plot with normal curve

A kernel density plot is similar to a histogram, but is smoothed into a curve. Sometimes a density plot gives a better representation of the distribution of data, because the appearance of the histogram depends upon how many bins are used.

The *plotNormalDensity* function will produce this
plot. Options include those for the *plot* function, as well as *adjust*,
*bw*, and *kernel* which are passed to the *density* function. *col1*,
*col2*, and *col3* change plot colors, and *lwd* changes line
thickness.

library(rcompanion)

x = residuals(model)

plotNormalDensity(x,

adjust = 1) ### Decrease this
number

### to make line less smooth

##### Plot of residuals vs. fitted values

Patterns in the plot of residuals versus fitted values can indicate a lack of homoscedasticity or that errors are not independent of fitted values.

plot(fitted(model),

residuals(model))

##### Examples of residual plots and histograms

In the four plots below, A) *Residuals.a* show normally
distributed and homoscedastic residuals, suggesting model assumptions were met.
B) *Residuals.b* show a non-normal distribution of residuals. C) *Residuals.c*
show that the residuals are not independent of the fitted values. In this
case, the model needs to be modified in order to describe the data well. D) *Residuals.d*
show heteroscedasticity, since variability in the residuals is greater for
large fitted values than for small fitted values. (Adapted from similar plots
in Tabachnick, 2001*).*

The following plots are histograms of the same residuals
shown in the previous plots. A) *Residuals.a* are reasonably-close to
normally distributed. B) *Residuals.b* are highly skewed (left, or
negative). C) *Residuals.c* are moderately negatively skewed. This
distribution would probably not cause too much havoc with most parametric
tests, but, depending on the circumstances, I would probably try to transform a
variable or find a better-fitting model. D) *Residuals.d* are symmetric,
but leptokurtic. I probably wouldn’t be too concerned with the distribution.

Results from the *describe* function in the *psych*
package are shown below the histograms.

### describe(Residuals.a)

vars n mean sd median trimmed mad min max range skew kurtosis se

1 1 200 0 6.82 0.63 0.07 6.91 -20 18.51 38.51 -0.14 0.03 0.48

### describe(Residuals.b)

vars n mean sd median trimmed mad min max range skew kurtosis se

1 1 200 0 3.77 1.38 0.86 1.62 -20 2.62 22.62 -2.53 7.2 0.27

### describe(Residuals.c)

vars n mean sd median trimmed mad min max range skew kurtosis se

1 1 200 0 6.79 1.25 0.48 7.75 -20 12.88 32.88 -0.54 -0.42 0.48

### describe(Residuals.d)

vars n mean sd median trimmed mad min max range skew kurtosis se

1 1 200 0 5.53 0.11 0.21 3.83 -20 16.57 36.57 -0.48 1.94 0.39

Other diagnostic plots

Using the *plot* function for most common types of
models will show diagnostic plots or other useful plots for the model. For
linear models, the *plot* function produces plots of 1) residuals vs.
fitted values; 2) normal quantile–quantile plot; 3) square root of standardized
residuals vs. fitted values; and 4) standardized residuals vs. leverage.

While these are useful diagnostic plots, their interpretation will not be discussed here.

plot(model)

#### Steps to handle violations of assumptions

If residuals show heteroscedasticity or a non-normal distribution, the analysis must be modified in some way to get valid results. Some options include the following.

__1. Select the correct parametric test and check
assumptions__

__2. Change your model__

Problems with distribution of residuals is sometimes related
to the model not describing the data well. Adding or removing terms from the
model may correct this. As an example, imagine you had plotted bivariate data
and they formed a shape like a parabola. If you fit a straight line to this
data, you might get residuals that look like *Residuals.c* above. Adding an
x^{2} term, or other term to add curvature to the fitted model, might
correct this. A similar situation can be encountered in analysis of variance
and similar tests; it is sometimes helpful to add other terms or covariates.

__3. Transform data and repeat your original analysis__

The dependent variable, or other variables, can be
transformed so that model assumptions are met. The tests are then performed on
these transformed data. See the *Transforming Data* chapter for details.

__4. Use nonparametric tests__

Nonparametric tests include those discussed in the *Traditional
Nonparametric Tests* section of this book and those chapters on *permutation
tests*. Quantile regression and generalized additive model may be
options as well.

__5. Use robust methods__

There are other statistical methods which are robust to
violations of parametric assumptions or are nonparametric. See Mangiafico (2015a) and (2015b) in the “References”*
*section for examples.

### Descriptive Statistics for Parametric Statistics

Descriptive statistics for interval/ratio data are discussed
in the “Descriptive statistics for interval/ratio data” section in the *Descriptive
Statistics* chapter.

Descriptive plots for interval/ratio data are discussed in
the “Examples of basic plots for interval/ratio and ordinal data” section in
the *Basic Plots* chapter.

### Optional readings

** “Normality”** in McDonald, J.H. 2014.

*Handbook of Biological Statistics*. www.biostathandbook.com/normality.html.

** “Independence”** in McDonald, J.H. 2014.

*Handbook of Biological Statistics*. www.biostathandbook.com/independence.html.

** “Homoscedasticity and heteroscedasticity”** in
McDonald, J.H. 2014.

*Handbook of Biological Statistics*. www.biostathandbook.com/homoscedasticity.html.

### References

“One-way Analysis with Permutation Test” in Mangiafico, S.S.
2015a. *An R Companion for the Handbook of Biological Statistics*, version
1.09. rcompanion.org/rcompanion/d_06a.html.

“Two-way Anova with Robust Estimation” in Mangiafico, S.S.
2015b. *An R Companion for the Handbook of Biological Statistics*, version
1.09. rcompanion.org/rcompanion/d_08a.html.

Tabachnick, B.G. and S.F. Fidell. 2001. *Using
multivariate statistics*. 4th Edition. Allyn and

Bacon, Boston, MA.

“Normality tests don't do what you think they do.” 2012. *Fells
Stats*. blog.fellstat.com/?p=61.

### Optional analyses: formal tests for normality

Code for conducting formal tests of normality are included here. I don’t recommend using them, but they may be instructive in training the eye to interpret histograms and Q-Q plots, or to respond to a less-savvy reviewer.

In each case, the null hypothesis is that the data
distribution is not different from normal. That is, a significant *p*-value
(*p* < 0.05) suggests that data are not normally distributed.

As mentioned previously, a limitation to using these tests
is that as the number of observations are increased, the ability of the test to
return a significant *p*-value increases, even for small deviations from a
normal distribution.

Input = ("

Student Sex Teacher Steps Rating

a female Catbus 8000 7

b female Catbus 9000 10

c female Catbus 10000 9

d female Catbus 7000 5

e female Catbus 6000 4

f female Catbus 8000 8

g male Catbus 7000 6

h male Catbus 5000 5

i male Catbus 9000 10

j male Catbus 7000 8

k female Satsuki 8000 7

l female Satsuki 9000 8

m female Satsuki 9000 8

n female Satsuki 8000 9

o male Satsuki 6000 5

p male Satsuki 8000 9

q male Satsuki 7000 6

r female Totoro 10000 10

s female Totoro 9000 10

t female Totoro 8000 8

u female Totoro 8000 7

v female Totoro 6000 7

w male Totoro 6000 8

x male Totoro 8000 10

y male Totoro 7000 7

z male Totoro 7000 7

")

Data = read.table(textConnection(Input),header=TRUE)

### Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)

### Remove unnecessary objects

rm(Input)

###### Define a linear model

model = lm(Steps ~ Sex + Teacher,

data = Data)

#### Shapiro–Wilk normality test

x = residuals(model)

shapiro.test(x)

Shapiro-Wilk normality test

W = 0.96256, p-value = 0.4444

#### Anderson-Darling normality test

if(!require(nortest)){install.packages("nortest")}

library(nortest)

x = residuals(model)

ad.test(x)

Anderson-Darling normality test

A = 0.39351, p-value = 0.3506

#### One-sample Kolmogorov-Smirnov test

x = residuals(model)

ks.test(x,

"pnorm",

mean = mean(x),

sd = sd(x))

One-sample Kolmogorov-Smirnov test

D = 0.1399, p-value = 0.6889

#### D'Agostino Normality Test

if(!require(fBasics)){install.packages("fBasics")}

library(fBasics)

x = residuals(model)

dagoTest(x)

Title:

D'Agostino Normality Test

Test Results:

STATISTIC:

Chi2 | Omnibus: 0.1071

Z3 | Skewness: -0.3132

Z4 | Kurtosis: 0.0948

P VALUE:

Omnibus Test: 0.9479

Skewness Test: 0.7541

Kurtosis Test: 0.9245

### Optional analyses: formal tests for homogeneity of variance

Code for formal tests of homogeneity of variance among groups is presented here. I don’t recommend using them, but instead recommend using diagnostic plots.

In each case, the null hypothesis is that the variance among
groups is not different. That is, a significant *p*-value (*p* <
0.05) suggests that the variance among groups is different.

Input = ("

Student Sex Teacher Steps Rating

a female Catbus 8000 7

b female Catbus 9000 10

c female Catbus 10000 9

d female Catbus 7000 5

e female Catbus 6000 4

f female Catbus 8000 8

g male Catbus 7000 6

h male Catbus 5000 5

i male Catbus 9000 10

j male Catbus 7000 8

k female Satsuki 8000 7

l female Satsuki 9000 8

m female Satsuki 9000 8

n female Satsuki 8000 9

o male Satsuki 6000 5

p male Satsuki 8000 9

q male Satsuki 7000 6

r female Totoro 10000 10

s female Totoro 9000 10

t female Totoro 8000 8

u female Totoro 8000 7

v female Totoro 6000 7

w male Totoro 6000 8

x male Totoro 8000 10

y male Totoro 7000 7

z male Totoro 7000 7

")

Data = read.table(textConnection(Input),header=TRUE)

### Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)

### Remove unnecessary objects

rm(Input)

###### Define a linear model

model = lm(Steps ~ Sex + Teacher,

data = Data)

Bartlett’s test for homogeneity of variance

Bartlett’s test is known to be sensitive to non-normality in samples. That is, non-normal samples can result in a significant test due to the non-normality.

x = residuals(model)

bartlett.test(x ~ interaction(Sex, Teacher),

data=Data)

Bartlett test of homogeneity of variances

Bartlett's K-squared = 3.7499, df = 5, p-value = 0.586

#### Levene’s test for homogeneity of variance

Levene’s test is an alternative to Bartlett’s that is supposedly less sensitive to departures from normality in the data.

if(!require(car)){install.packages("car")}

library(car)

x = residuals(model)

leveneTest(x ~ Sex * Teacher

data=Data,

center=mean) ### Use the original
Levene’s test

Levene's Test for Homogeneity of Variance (center = mean)

Df F value Pr(>F)

group 5 0.4457 0.8113

20

#### Brown–Forsythe or robust Levene’s test

The Brown–Forsythe modification of Levene’s test makes it more robust to departures in normality of the data.

if(!require(car)){install.packages("car")}

library(car)

x = residuals(model)

leveneTest(x ~ Sex * Teacher, data=Data)

Levene's Test for Homogeneity of Variance (center = median)

Df F value Pr(>F)

group 5 0.4015 0.842

20

if(!require(lawstat)){install.packages("lawstat")}

library(lawstat)

x = residuals(model)

levene.test(x, interaction(Data$Sex, Data$Teacher))

modified robust Brown-Forsythe Levene-type test based on the absolute
deviations from the median

Test Statistic = 0.4015, p-value = 0.842

#### Fligner-Killeen test

The Fligner-Killeen test is another test for homogeneity of variances that is robust to departures in normality of the data.

x = residuals(model)

fligner.test(x ~ interaction(Sex, Teacher), data=Data)

Fligner-Killeen test of homogeneity of variances

Fligner-Killeen:med chi-squared = 2.685, df = 5, p-value = 0.7484