[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Paired t-test

The paired t-test is commonly used.  It compares the means of two populations of paired observations by testing if the difference between pairs is statistically different from zero or another number.

 

Appropriate data

•  Two-sample data.  That is, one measurement variable in two groups or samples

•  Dependent variable is interval/ratio, and is continuous

•  Independent variable is a factor with two levels.  That is, two groups

•  Data are paired.  That is, the measurement for each observation in one group can be paired logically or by subject to a measurement in the other group

•  The distribution of the difference of paired measurements is normally distributed

•  Moderate skewness is permissible if the data distribution is unimodal without outliers

 

Hypotheses

•  Null hypothesis:  The difference between paired observations is equal to zero.

•  Alternative hypothesis (two-sided): The difference between paired observations is not equal to zero.

 

Interpretation

Reporting significant results as “Mean of variable Y for group A was different than that for group B.” is acceptable.

 

Other notes and alternative tests

•  The nonparametric analogue for this test is the two-sample paired rank-sum test.

•  Power analysis for the paired t-test can be found at Mangiafico (2015) in the “References” section.

 

Packages used in this chapter

 

The packages used in this chapter include:

•  psych

•  rcompanion

 

The following commands will install these packages if they are not already installed:


if(!require(psych)){install.packages("psych")}
if(!require(rcompanion)){install.packages("rcompanion")}


Paired t-test example

 

In the following example, Dumbland Extension had adult students fill out a financial literacy knowledge questionnaire both before and after completing a home financial management workshop.  Each student’s score before and after was paired by student.

 

Note in the following data that the students’ names are repeated, so that there is a before score for student a and an after score for student a.

 

Since the data is in long form, we’ll order by Time, then Student to be sure the first observation for Before is student a and the first observation for After is student a, and so on.

 

Input = ("
Time    Student  Score
Before  a         65
Before  b         75
Before  c         86
Before  d         69
Before  e         60
Before  f         81
Before  g         88
Before  h         53
Before  i         75
Before  j         73
After   a         77
After   b         98
After   c         92
After   d         77
After   e         65
After   f         77
After   g        100
After   h         73
After   i         93
After   j         75
")

Data = read.table(textConnection(Input),header=TRUE)


###  Order data by Time and Student

Data = Data[order(Time, Student),]


###  Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)


### Remove unnecessary objects

rm(Input)


Histogram of difference data

A histogram with a normal curve imposed will be used to check if the paired differences between the two populations is approximately normal in distribution.


First, two new variables, Before and After, are created by extracting the values of Score for observations with the Time variable equal to Before or After, respectively.


Before = Data$Score[Data$Time=="Before"]

After  = Data$Score[Data$Time=="After"]

Difference = After - Before


x = Difference

library(rcompanion)

plotNormalHistogram(x,
                 xlab="Difference (After - Before)")


image


Plot the paired data

 

Scatter plot with one-to-one line

Paired data can visualized with a scatter plot of the paired cases.  In the plot below, points that fall above and to the left of the blue line indicate cases for which the value for After was greater than for Before.

 

Note that the points in the plot are jittered slightly so that points that would fall directly on top of one another can be seen.

 

First, two new variables, Before and After, are created by extracting the values of Score for observations with the Time variable equal to Before or After, respectively.

 

A variable Names is also created for point labels.


Before = Data$Score[Data$Time=="Before"]

After  = Data$Score[Data$Time=="After"]

Names  = Data$Student[Data$Time=="Before"]

                      
plot(Before, jitter(After),    # jitter offsets points so you can see them all
     pch = 16,                 # shape of points
     cex = 1.0,                # size of points
     xlim=c(50, 110),          # limits of x-axis
     ylim=c(50, 110),          # limits of y-axis
     xlab="Before",            # label for x-axis
     ylab="After"              # label for y-axis
     )

text(Before, After, labels=Names,  # Label location and text

     pos=3, cex=1.0)               # Label text position and size

abline(0,1, col="blue", lwd=2)     # line with intercept of 0 and slope of 1


image


Bar plot of differences

Paired data can also be visualized with a bar chart of differences.  In the plot below, bars with a value greater than zero indicate cases for which the value for After was greater than for Before.

 

New variables are first created for Before, After, and their Difference.

 

A variable Names is also created for bar labels.

 

Before = Data$Score[Data$Time=="Before"]

After  = Data$Score[Data$Time=="After"]

Difference = After – Before – 9

Names = Data$Student[Data$Time=="Before"]


barplot(Difference,                             # variable to plot
        col="dark gray",                        # color of bars
        xlab="Observation",                     # x-axis label
        ylab="Difference (After – Before)",     # y-axis label
       
names.arg=Names                         # labels for bars

        )


image


Paired t-test

 

t.test(Score ~ Time,
       data=Data,
       paired = TRUE,
       conf.level = 0.95)


Paired t-test

t = 3.8084, df = 9, p-value = 0.004163
alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:
  4.141247 16.258753

sample estimates:
mean of the differences
                   10.2


Optional readings

 

“Paired t–test” in McDonald, J.H. 2014. Handbook of Biological Statistics. www.biostathandbook.com/pairedttest.html.

 

References

 

“Paired t–test” in Mangiafico, S.S. 2015. An R Companion for the Handbook of Biological Statistics, version 1.09. rcompanion.org/rcompanion/d_09.html.

 

Exercises Q

 

1. Considering the Dumbland Extension data,

What was the mean difference in score before and after the training?

Was this an increase or a decrease?

What is the 95% confidence interval for this difference?

Is the data distribution for the paired differences reasonably normal?

Was the mean score significantly different before and after the training?


2. Residential properties in Dougal County rarely need phosphorus for good turfgrass growth.  As part of an extension education program, Early and Rusty Cuyler asked homeowners to report their phosphorus fertilizer use, in pounds of P2O5 per acre, before the program and then one year later.


Date              Homeowner  P2O5
'2014-01-01'      a          0.81
'2014-01-01'      b          0.86
'2014-01-01'      c          0.79
'2014-01-01'      d          0.59
'2014-01-01'      e          0.71
'2014-01-01'      f          0.88
'2014-01-01'      g          0.63
'2014-01-01'      h          0.72
'2014-01-01'      i          0.76
'2014-01-01'      j          0.58
'2015-01-01'      a          0.67
'2015-01-01'      b          0.83
'2015-01-01'      c          0.81
'2015-01-01'      d          0.50
'2015-01-01'      e          0.71
'2015-01-01'      f          0.72
'2015-01-01'      g          0.67
'2015-01-01'      h          0.67
'2015-01-01'      i          0.48
'2015-01-01'      j          0.68

For each of the following, answer the question, and show the output from the analyses you used to answer the question.

 

What was the mean difference in P2O5 before and after the training?

Is this an increase or a decrease?

What is the 95% confidence interval for this difference?

Is the data distribution for the paired differences reasonably normal?

Was the mean P2O5 use significantly different before and after the training?