[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Choosing a Statistical Test

Choosing a statistical test can be a daunting task for those starting out in the analysis of experiments.  This chapter provides a table of tests and models covered in this book, as well as some general advice for approaching the analysis of your data.

 

Plan your experimental design before you collect data

 

It is important to have an experimental design planned out before you start collecting data, and to have some an idea of how you plan on analyzing the data.  One of the most common mistakes people make in doing research is collecting a bunch of data without having thought through what questions they are trying to answer, what specific hypotheses they want to test, and what statistical tests they can use to test these hypotheses.

 

What is the hypothesis?

 

The most important consideration in choosing a statistical test is determining what hypothesis you want to test.  Or, more generally, what question are you are trying to answer.

 

Often people have a notion about the purpose of the research they are conducting, but haven’t formulated a specific hypothesis.  It is possible to begin with exploratory data analysis, to see what interesting secrets the data wish to say.  But ultimately, choosing a statistical test relies on having in mind a specific hypothesis to test.

 

For example, we may know that our goal is to determine if one curriculum works better than another.  But then we must be more specific in our hypothesis.  Perhaps we wish to compare the mean of scores that students get on an exam across the different curricula.  Then a specific null hypothesis is, There is no difference among the mean of student scores across curricula.

 

In this example, we identified the dependent variable as Student scores, and the independent variable as Curriculum.

 

Of course, we might make things more complicated.  For example, if the curricula were used in different classrooms, we might want to include Classroom as an independent blocking variable.

 

What number and type of variables do you have?

 

To a large extent, the appropriate statistical test for your data will depend upon the number and types of variables you wish to include in the analysis. 

 

Consider the type of dependent variable you wish to include.
 

•  If it is of interval/ratio type, you can consider parametric statistics or nonparametric statistics.
 

•  However, if it is an ordinal variable, you would look toward nonparametric and ordinal regression models. 

•  Nominal variables arranged in contingency tables can be analyzed with chi-square and similar tests.  Nominal dependent variables can be related to independent variables with logistic regression.

•  Count data dependent variables can be related to independent variables with Poisson regression and related models.

 

The number and type of independent variables will also be taken into account.  As will whether there are paired observations or random blocking variables.

 

The table below lists the tests in this book according to their number and types of variables.

 

Note that each test has its own set of assumptions for appropriate data, which should be assessed before proceeding with the analysis.

 

Also note that the tests in this book cover cases with a single dependent variable only.  There are other statistical tests, included under the umbrella of multivariate statistics that can analyze multiple dependent variables simultaneously.  These include multivariate analysis of variance (MANOVA), canonical correlation, and discriminant function analysis.

 

The “References” and “Optional readings” sections of this chapter includes a few other guides to choosing statistical tests.

 

 

Test

DV type, or variable type when there is no DV

DV

IV type

Number of IV

Levels in IV

Test type

One-sample Wilcoxon 

Ordinal or interval/ratio

Independent

Single default value

N/A

N/A

Nonparametric

Sign test for one-sample

Ordinal or interval/ratio

Independent

Single default value

N/A

N/A

Nonparametric

Two-sample Mann–Whitney

Ordinal or interval/ratio

Independent

Nominal

1

2

Nonparametric

Mood’s median test for two-sample

Ordinal or interval/ratio

Independent

Nominal

1

2

Nonparametric

Two-sample paired rank-sum

Ordinal or interval/ratio

Paired

Nominal

1, or 2 when one is blocking

2

Nonparametric

Sign test for two-sample paired

Ordinal or interval/ratio

Paired

Nominal

1, or 2 when one is blocking

2

Nonparametric

Kruskal–Wallis

Ordinal or interval/ratio

Independent

Nominal

1

2 or more

Nonparametric

Mood’s median

Ordinal or interval/ratio

Independent

Nominal

1

2 or more

Nonparametric

Friedman

Ordinal or interval/ratio

Independent blocked, or paired

Nominal

2 when one is blocking,

in unreplicated complete block design

2 or more

Nonparametric

Quade

Ordinal or interval/ratio

Independent blocked, or paired

Nominal

2 when one is blocking, in unreplicated complete block design

2 or more

Nonparametric

One-way Permutation Test of Independence

Ordinal or interval/ratio

Independent

Nominal

1

2 or more

Permutation

One-way Permutation Test of Symmetry

Ordinal or interval/ratio

Independent blocked, or paired

Nominal

2 when one is blocking

2 or more

Permutation

Two-sample CLM

Ordinal

Independent

Nominal

1

2

Ordinal regression

Two-sample paired CLMM

Ordinal

Paired

Nominal

2 when one is blocking

2

Ordinal regression

One-way ordinal ANOVA CLM

Ordinal

Independent

Nominal

1

2 or more

Ordinal regression

One-way repeated ordinal ANOVA CLMM

Ordinal

Independent

Nominal

2 when one is blocking

2 or more

Ordinal regression

Two-way ordinal ANOVA CLM

Ordinal

Independent

Nominal

2

2 or more

Ordinal regression

Two-way repeated ordinal ANOVA CLMM

Ordinal

Independent

Nominal

3 when one is blocking

2 or more

Ordinal regression

Goodness-of-fit tests for nominal variables

• binomial test

• multinomial test

• G-test goodness-of-fit

• Chi-square test goodness-of-fit

Nominal

Independent

Expected counts

N/A

Overall: vector of counts and expected proportions

Nominal

Association tests for nominal variables

• Fisher exact test of association

•  G-test of association

•  Chi-square test of association

Nominal

Independent

Nominal

N/A

Overall: 2-way contingency table

Nominal

Tests for paired nominal data

•  McNemar

• McNemar–Bowker

Nominal

Paired

Nominal

N/A

Overall: 2-way marginal contingency table

Nominal

Cochran–Mantel–Haenszel

Nominal

Independent

Nominal

N/A

Overall: 3-way contingency table

Nominal

Cochran’s Q

Nominal (2 levels only)

Paired

Nominal

2 when one is blocking

2 or more

Nominal

Linear-by-linear

Ordered nominal (ordinal)

Independent

Ordered nominal (ordinal)

N/A

Overall: 2-way or 3-way contingency table

Nominal

Cochran–Armitage (extended)

Ordered nominal (ordinal)

Independent

Nominal

N/A

Overall: 2-way or 3-way contingency table

Nominal

Log-linear model (multiway frequency analysis)

Nominal

Independent

Nominal

N/A

Overall: contingency table with 2 or dimensions

Generalized linear model

Logistic regression (standard)

Nominal with 2 levels

Independent

Interval/ratio or nominal

1 or more

2 or more

Generalized linear model

Multinomial logistic regression

Nominal with 2 or more levels

Independent

Interval/ratio or nominal

1 or more

2 or more

Generalized linear model

Mixed-effects logistic regression

Nominal with 2 levels

Independent or paired

Interval/ratio or nominal

1 or more when one is blocking or random

2 or more

Generalized linear model

One-sample t-test

Interval/ratio

Independent

Single default value

N/A

N/A

Parametric

Two-sample t-test

Interval/ratio

Independent

Nominal

1

2

Parametric

Paired t-test

Interval/ratio

Paired

Nominal

1, or 2 when one is blocking

2

Parametric

One-way ANOVA

Interval/ratio

Independent

Nominal

1

2 or more

Parametric

One-way ANOVA with blocks

Interval/ratio

Independent

Nominal

2 when one is blocking

2 or more

Parametric

One-way ANOVA with random blocks

Interval/ratio

Independent

Nominal

2 when one is blocking

2 or more

Parametric

Two-way ANOVA

Interval/ratio

Independent

Nominal

2

2 or more

Parametric

Repeated measures ANOVA

Interval/ratio

Paired across time

Nominal

2 or more when one is time effect

2 or more

Parametric

Multiple correlation

Interval/ratio or ordinal, depending on type selected

Independent

Interval/ratio or ordinal, depending on type selected

1 or more

Overall: multiple vectors of interval/ratio or ordinal data

Parametric or nonparametric depending on type selected

Pearson correlation

Interval/ratio

Independent

Interval/ratio

1

Overall: two vectors of interval/ratio data

Parametric

Kendall correlation

Interval/ratio or ordinal

Independent

Interval/ratio or ordinal

1

Overall: two vectors of interval/ratio or ordinal data

Nonparametric

Spearman correlation

Interval/ratio or ordinal

Independent

Interval/ratio or ordinal

1

Overall: two vectors of interval/ratio or ordinal data

Nonparametric

Linear regression

Interval/ratio

Independent

Interval/ratio

1

N/A

Parametric

Polynomial regression

Interval/ratio

Independent

Interval/ratio

2 or more that are polynomial terms

N/A

Parametric

Nonlinear regression and curvilinear regression

Interval/ratio

Independent

Interval/ratio

1

N/A

Parametric

Multiple regression

Interval/ratio

Independent

Interval/ratio

2 or more

N/A

Parametric

Robust linear regression

Interval/ratio

Independent

Interval/ratio

1

N/A

Robust parametric

Kendall–Theil regression

Interval/ratio

Independent

Interval/ratio

1

N/A

Nonparametric

Linear plateau and quadratic plateau models

Interval/ratio

Independent

Interval/ratio

1

N/A

Parametric

Cate–Nelson analysis

Interval/ratio

Independent

Interval/ratio

1

N/A

Mostly nonparametric

Hermite and Poisson regression

• Hermite regression

• Poisson regression

• Negative binomial regression

• Zero-inflated regression

Count

Independent

Interval/ratio or nominal

1 or more

2 or more

Generalized linear model

Beta regression

Proportion or percentage

Independent

Interval/ratio or nominal

1 or more

2 or more

Generalized linear model

 

 

References

 

[IDRE] Institute for Digital Research and Education. 2015. What statistical analysis should I use?  UCLA.  www.ats.ucla.edu/stat/stata/whatstat/.

 

“Choosing a statistical test” in McDonald, J.H. 2014. Handbook of Biological Statistics. www.biostathandbook.com/testchoice.html.

 

Optional readings

 

 [Video]  “Choosing which statistical test to use” from Statistics Learning Center (Dr. Nic). 2014. www.youtube.com/watch?v=rulIUAN0U3w.