An assumption of many statistical tests is that
observations are *independent *of one another. This means
that the value for one observation is unlikely to be influenced by the
value of another observation. If we pick students at random from
a class and measure their height, we can assume the height of the first
student will not affect the height of the next student. These
observations would be independent. If, however, we measured the
height of the same students across years, we would expect that a student
who is tall this year would likely be tall the next, and so on.
These observations would not be independent. We might call this
second set of observations *non-independent*, *paired*, *
dependent*, or *correlated*.

Dependent samples commonly arise in a few situations.
One is *repeated measures*, in which the same subject is measured
on multiple dates. This is like the student height example described
above.

A second is when we are taking multiple measurements
of the *same individual*. An example of this might be if
we are testing students on multiple concepts; we might suspect that
if a student scores well in one section, that she is likely to score
well in the other sections. Another example would be measuring
the length of people’s hands. We would suspect that someone with
a large left hand is likely to have a large right hand. A final
example would be if student raters were measuring multiple instructors.
We might suspect that a rater who scores one instructor low might be
likely to score another instructor low.

A related concept is that of *blocks*. If
observations can be broken into meaningful groups where values are likely
to be different, this should be taken into account. For example,
if we are measuring students’ scores from two classes, and we suspect
scores would be lower for one class than the other. If we were
testing instructional methods, we may care about the effect of the instructional
methods, and not care at all about the classes *per se*, but we
want to take differences due to the different classes into account.

### Packages used in this chapter

The packages used in this chapter include:

• FSA

• rcompanion

The following commands will install these packages if they are not already installed:

if(!require(FSA)){install.packages("FSA")}

if(!require(rcompanion)){install.packages("rcompanion")}

### An example of paired and unpaired data

In this example we measure the length in
centimeters of both the left hand and the right hand for each of 16
individuals.

Input = ("

Individual Hand
Length

A
Left 17.5

B
Left 18.4

C
Left 16.2

D
Left 14.5

E
Left 13.5

F
Left 18.9

G
Left 19.5

H
Left 21.1

I
Left 17.8

J
Left 16.8

K
Left 18.4

L
Left 17.3

M
Left 18.9

N
Left 16.4

O
Left 17.5

P
Left 15.0

A
Right 17.6

B
Right 18.5

C
Right 15.9

D
Right 14.9

E
Right 13.7

F
Right 18.9

G
Right 19.5

H
Right 21.5

I
Right 18.5

J
Right 17.1

K
Right 18.9

L
Right 17.5

M
Right 19.5

N
Right 16.5

O
Right 17.4

P
Right 15.6

")

Data = read.table(textConnection(Input),header=TRUE)

### Note: for
the paired test below, data must be ordered so that

### the
first observation of Group 1

### is the
same subject as the first observation of Group 2

### The following will order the data frame by Hand, and then by Individual

Data = Data[order(Data$Hand, Data$Individual),]

### Check the data frame

Data

str(Data)

summary(Data)

### Remove unnecessary objects

rm(Input)

#### Box plot and summary statistics by group

Below, the descriptive
statistics suggest that left hands and right hands had similar means,
medians, and standard deviations for *Length*.

library(FSA)

Summarize(Length ~ Hand,

data=Data,

digits=3)

boxplot(Length ~ Hand,

data=Data,

ylab="Length,
cm")

Hand n mean
sd min Q1 median Q3
max percZero

1 Left 16 17.356 1.948 13.5 16.35 17.50
18.52 21.1 0

2 Right 16
17.594 1.972 13.7 16.35 17.55 18.90 21.5
0

#### Bar plot to show paired differences

The previous summary statistics, however, do not
capture the paired nature of the data. Instead, we want to investigate
the difference between left hand and right hand for each individual.
We can calculate this difference, and use a bar plot to visualize the
difference. For most observations, the right hand was larger,
with *Right – Left* being greater than zero.

Left_hand = Data$Length[Data$Hand=="Left"]

Right_hand = Data$Length[Data$Hand=="Right"]

Difference =
Right_hand - Left_hand

barplot(Difference,

col="dark gray",

xlab="Observation",

ylab="Difference (Right – Left)")

#### Paired t-test and unpaired t-test

*t*-tests are discussed later in this book.
It isn’t important that you understand the test fully at this point.
In this example, a *t*-test that ignores the pairing of observations
found no difference between the mean length for left hand and right
hand, whereas the *t*-test that accounts for the paired observations
found a significant difference. On average the right hands were
about 0.2 cm longer than their paired left hands.

t.test(Length ~ Hand,

data = Data,

paired
= FALSE)

Welch Two Sample t-test

t = -0.3427, df = 29.996,
p-value = 0.7342

### No difference
between left hand and right if length treated as not paired

t.test(Length ~ Hand,

data = Data,

paired
= TRUE)

Paired t-test

t = -3.3907, df = 15, p-value
= 0.004034

mean of the differences

-0.2375

### Significant difference between left
hand and right

### if length treated as paired

#### Histogram of differences with normal curve

To be sure our paired *t*-test was valid, we’ll plot
the differences in hands to be sure their distribution is approximately
normal.

Left_hand = Data$Length[Data$Hand=="Left"]

Right_hand = Data$Length[Data$Hand=="Right"]

Difference =
Right_hand - Left_hand

library(rcompanion)

plotNormalHistogram(Difference,

xlab = "Difference")

### Distribution of differences is probably
close enough to normal

### for paired t-test