The paired *t*-test is commonly used. It compares the
means of two populations of paired observations by testing if the difference
between pairs is statistically different from zero or another number.

##### Appropriate data

• Two-sample data. That is, one measurement variable in two groups or samples

• Dependent variable is interval/ratio, and is continuous

• Independent variable is a factor with two levels. That is, two groups

• Data are paired. That is, the measurement for each observation in one group can be paired logically or by subject to a measurement in the other group

• The distribution of the difference of paired measurements is normally distributed

• Moderate skewness is permissible if the data distribution is unimodal without outliers

##### Hypotheses

• Null hypothesis: The difference between paired observations is equal to zero.

• Alternative hypothesis (two-sided): The difference between paired observations is not equal to zero.

##### Interpretation

Reporting significant results as “Mean of variable Y for group A was different than that for group B.” is acceptable.

##### Other notes and alternative tests

• The nonparametric analogue for this test is the two-sample paired rank-sum test.

• Power analysis for the paired *t*-test can be found at Mangiafico (2015) in the “References”
section.

### Packages used in this chapter

The packages used in this chapter include:

• psych

• rcompanion

The following commands will install these packages if they are not already installed:

if(!require(psych)){install.packages("psych")}

if(!require(rcompanion)){install.packages("rcompanion")}

### Paired *t*-test example

In the following example, Dumbland Extension had adult students fill out a financial literacy knowledge questionnaire both before and after completing a home financial management workshop. Each student’s score before and after was paired by student.

Note in the following data that the students’ names are
repeated, so that there is a before score for student *a* and an after
score for student *a*.

Since the data is in long form, we’ll order by *Time*,
then *Student* to be sure the first observation for *Before *is
student *a* and the first observation for *After* is student *a*,
and so on.

Input = ("

Time Student Score

Before a 65

Before b 75

Before c 86

Before d 69

Before e 60

Before f 81

Before g 88

Before h 53

Before i 75

Before j 73

After a 77

After b 98

After c 92

After d 77

After e 65

After f 77

After g 100

After h 73

After i 93

After j 75

")

Data = read.table(textConnection(Input),header=TRUE)

### Order data by Time and Student

Data = Data[order(Time, Student),]

### Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)

### Remove unnecessary objects

rm(Input)

#### Histogram of difference data

A histogram with a normal curve imposed will be used to check if the paired differences between the two populations is approximately normal in distribution.

First, two new variables, *Before* and *After*, are created by
extracting the values of *Score *for observations with the *Time*
variable equal to *Before* or *After*, respectively.

Before = Data$Score[Data$Time=="Before"]

After = Data$Score[Data$Time=="After"]

Difference = After - Before

x = Difference

library(rcompanion)

plotNormalHistogram(x,

xlab="Difference (After - Before)")

#### Plot the paired data

##### Scatter plot with one-to-one line

Paired data can visualized with a scatter plot of the paired
cases. In the plot below, points that fall above and to the left of the blue
line indicate cases for which the value for *After* was greater than for *Before*.

Note that the points in the plot are jittered slightly so that points that would fall directly on top of one another can be seen.

First, two new variables, *Before* and *After*,
are created by extracting the values of *Score *for observations with the *Time*
variable equal to *Before* or *After*, respectively.

A variable *Names* is also created for point labels.

Before = Data$Score[Data$Time=="Before"]

After = Data$Score[Data$Time=="After"]

Names = Data$Student[Data$Time=="Before"]

plot(Before, jitter(After), # jitter offsets
points so you can see them all

pch = 16, # shape of points

cex = 1.0, # size of points

xlim=c(50, 110), # limits of x-axis

ylim=c(50, 110), # limits of y-axis

xlab="Before", # label
for x-axis

ylab="After" # label
for y-axis

)

text(Before, After, labels=Names, # Label location
and text

pos=3, cex=1.0) #
Label text position and size

abline(0,1, col="blue", lwd=2) # line
with intercept of 0 and slope of 1

##### Bar plot of differences

Paired data can also be visualized with a bar chart of
differences. In the plot below, bars with a value greater than zero indicate
cases for which the value for *After* was greater than for *Before*.

New variables are first created for *Before*, *After*,
and their *Difference*.

A variable *Names *is also created for bar labels.

Before = Data$Score[Data$Time=="Before"]

After = Data$Score[Data$Time=="After"]

Difference = After – Before – 9

Names = Data$Student[Data$Time=="Before"]

barplot(Difference, # variable
to plot

col="dark gray", # color of bars

xlab="Observation", # x-axis label

ylab="Difference (After – Before)", # y-axis label

names.arg=Names
# labels for bars

)

#### Paired t-test

t.test(Score ~ Time,

data=Data,

paired = TRUE,

conf.level = 0.95)

Paired t-test

t = 3.8084, df = 9, p-value = 0.004163

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

4.141247 16.258753

sample estimates:

mean of the differences

10.2

### Optional readings

** “Paired t–test”** in McDonald, J.H. 2014.

*Handbook of Biological Statistics*. www.biostathandbook.com/pairedttest.html.

### References

* *

** “Paired t–test”** in Mangiafico, S.S. 2015.

*An R Companion for the Handbook of Biological Statistics*, version 1.09. rcompanion.org/rcompanion/d_09.html.

### Exercises Q

1. Considering the Dumbland Extension data,

What was the mean difference in score before and after the
training?

Was this an increase or a decrease?

What is the 95% confidence interval for this difference?

Is the data distribution for the paired differences reasonably
normal?

Was the mean score significantly different before and after the
training?

2. Residential properties in Dougal County rarely need phosphorus for good
turfgrass growth. As part of an extension education program, Early and Rusty
Cuyler asked homeowners to report their phosphorus fertilizer use, in pounds of
P_{2}O_{5} per acre, before the program and then one year
later.

Date Homeowner P2O5

'2014-01-01' a 0.81

'2014-01-01' b 0.86

'2014-01-01' c 0.79

'2014-01-01' d 0.59

'2014-01-01' e 0.71

'2014-01-01' f 0.88

'2014-01-01' g 0.63

'2014-01-01' h 0.72

'2014-01-01' i 0.76

'2014-01-01' j 0.58

'2015-01-01' a 0.67

'2015-01-01' b 0.83

'2015-01-01' c 0.81

'2015-01-01' d 0.50

'2015-01-01' e 0.71

'2015-01-01' f 0.72

'2015-01-01' g 0.67

'2015-01-01' h 0.67

'2015-01-01' i 0.48

'2015-01-01' j 0.68

For each of the following, answer the question, and ** show
the output from the analyses you used to answer the question**.

What was the mean difference in P_{2}O_{5}
before and after the training?

Is this an increase or a decrease?

What is the 95% confidence interval for this difference?

Is the data distribution for the paired differences reasonably
normal?

Was the mean P_{2}O_{5 }use significantly
different before and after the training?