 Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Introduction to Likert Data

Likert data—properly pronounced like “LICK-ert”—are ordered responses to questions or ratings.  Responses could be descriptive words, such as “agree”, “neutral”, or “disagree,” or numerical, such as “On a scale of 1 to 5, where 1 is ‘not interested’ and 5 is ‘very interested’…”  Likert data is commonly collected from surveys evaluating education programs, as well in a variety of opinion surveys and social science surveys.

Likert data

Numbers of responses

Most commonly, a 5- or 7- point scale is used for Likert items.  It is believed that most people can think about or visualize 5 or 7 ordered options easily.  Younger children, however, may do better with a 3-point scale or a simple dichotomous question.  On the other hand, if the audience is educated about a subject and trained in the evaluation, a 10-point scale could be used.

Symmetry

Responses to Likert items are usually symmetrical.  That is, if there are options for “agree” and “strongly agree”, there should be options for “disagree” and “strongly disagree”.

Neutral responses

Responses to Likert items also tend to have a neutral option, such as “neutral”,  “neither agree nor disagree”.  Neutral responses may also be terms like “sometimes” or “occasionally” if “never” and “rarely” on one side are balanced with “often” and “always”.

Form of responses

Numbered responses are typically described with descriptive terms, either for every number, for just the end points, or for the end points and the middle points, for example:

Strongly     Agree      Neutral     Disagree    Strongly
agree                                           disagree

1           2           3           4           5

———————————————————————————————

Strongly                                       Strongly
agree                                          disagree

1           2           3           4           5

———————————————————————————————

Strongly                Neutral                Strongly
agree                                          disagree

1           2           3           4           5

Other options for Likert responses include faces (smiley face, neutral face, frowny face), and a line on which respondents mark their response.

Questions may also include opt-out responses, like “Don’t know” or “Not applicable”.  These are included outside the Likert responses.

Strongly                Neutral                Strongly
agree                                          disagree

1           2           3           4           5           Not applicable

I am in favor of including opt-out responses as it tends to encourage more honest responses.  It seems to me it is better to allow a respondent to opt out of answering a question rather than force an inauthentic response.  It is possible that a respondent has no opinion or doesn’t understand a question, or that a question is not applicable for them.

That being said, the opt-out answer “Don’t know” may not be a great choice, simply because respondents and researchers may interpret “Don’t know” as a “Neutral” answer.  It may be better to choose less ambiguous opt-out answers like “Not applicable”.

Examples of Likert item responses

See:

Vagias, W.M. (2006). Likert-type scale response anchors. Clemson International Institute for Tourism & Research Development, Department of Parks, Recreation and Tourism Management.  Clemson University. www.clemson.edu/centers-institutes/tourism/documents/sample-scales.pdf.

Brown, S. Likert Scale Examples for Surveys. Iowa State University Extension. www.extension.iastate.edu/ag/staff/info/likertscaleexamples.pdf.

Likert items and Likert scales

Technically, a Likert item is a single question with Likert responses, whereas a Likert scale is a group of items viewed together as a single measure.  For example, one could have several Likert items with various questions about religious attitudes or behaviors, and then combine those items to a single Likert scale on religiosity.

When presenting methods and results, it is important to be clear if data were handled as Likert item data or Likert scale data.

This book will treat Likert data as individual Likert items, and will not create Likert scales.

Analysis of Likert item data

Likert data should be treated as ordinal data

There is some agreement that Likert item data should generally be treated as ordinal and not treated as interval/ratio data.

One consideration is that values in interval/ratio data need to be equally spaced.  That is, 2 is equally between 1 and 3, and you could average 1 and 3 and the response would be 2.  But it is not clear that “agree” is equally spaced between “strongly agree” and “neutral”.  Nor is it clear that “strongly agree” and “neutral” could be averaged for a result of “agree”.  Simply numbering the response levels does not make the responses interval/ratio data.

This book will treat Likert data as ordinal data.  It will avoid using parametric tests, such as t-test and ANOVA, with Likert data.  Likert data typically do not meet the assumptions of those parametric tests.

Instead, we will use nonparametric tests, permutation tests, and ordinal regression.

A couple other properties suggest that Likert data should not be treated as interval/ratio data.  Likert data are not continuous; that is, there typically aren’t any decimal points in Likert responses.  Also, the responses in Likert data are constrained at their ends; that is, on a five-point scale, the responses cannot be below 1 or above 5.

Where it is useful, this book will treat Likert data as nominal data for certain types of summaries.  In general it is better to not treat ordinal data as nominal data in statistical analyses.  One reason is that when treating the data as nominal data, the information about the ordered nature of the response categories is lost.  However, sometimes it is useful to collapse Likert responses into categories;  for example, grouping “strongly agree” and “agree” together as one category and reporting its frequency as a percentage of responses.

Some people treat Likert data as interval/ratio data

Not everyone agrees that Likert item data should not be treated as interval/ratio data.  A quick search of the internet will produce plenty of examples of people defending treating Likert data as interval/ratio data.

Cases in which treating Likert responses as interval/ratio data may be reasonable include:

•  When there are a high number of response options per question (say 10)

•  When only the endpoints of the responses are indicated with text descriptors

•  When response options are assumed to be equally spaced

•  When respondents mark their answer on a line so that the precise location of the mark can be measured

Analysis of Likert scale data

When several Likert items are combined into a scale, so that there are many possible numeric outcomes, the results are often treated as interval/ratio data.

This is not entirely permissible from a theoretical point of view since Likert scales are made up of Likert items, and so have the same properties.  But it is often a reasonable approach if the data meet the assumptions of the analysis.  This is particularly the case if the scale data take on many values.

Analysis of Likert data

Ordinal regression

Probably the best tool for the analysis of experiments with Likert item data as the dependent variable is ordinal regression.  The ordinal package in R provides a powerful and flexible framework for ordinal regression.  It can handle a wide variety of experimental designs, including those with paired or repeated observations.  Ordinal regression is relatively easy to perform in R, but might be somewhat challenging for the novice in statistical analyses.  Occasionally there are problems with fitting models or checking model assumptions.  These cases may be frustrating for the novice user.

Tests for ordinal tables

Another appropriate tool for the analysis of Likert item data are tests for ordinal data arranged in contingency table form.  These include the linear-by-linear test, which is a test of association between two ordinal variables, and the Cochran-Armitage test, which is a test of association between an ordinal variable and a nominal variable.  The major limitation to these tests is that they are limited to data arranged in a two-dimensional table.  Also, these tests require the spacing between ordinal categories to be indicated.  By default the tests assume that the categories are equally spaced, but the functions in R allow other spacing patterns to be used.

Permutation tests

Another tool appropriate for the analysis of Likert item data are permutation tests.  The coin package in R provides a relatively powerful and flexible framework for permutation tests with ordinal dependent variables.  It can handle models analogous to a one-way analysis of variance with stratification blocks, including paired or repeated observations.  This covers more than all the designs that can be handled with the common traditional nonparametric tests.

Traditional nonparametric tests are generally considered appropriate for analyses with ordinal dependent variables.  They have the advantages of being widely used and likely to be familiar for readers.  One disadvantage of these tests is that the variety of designs they can handle is limited.  The Kruskal–Wallis test can analyze a model analogous to a one-way analysis of variance.  The Friedman and Quade tests can analyze data in an unreplicated complete block design with paired or repeated observations.

As a technical note, some authors have questioned using traditional nonparametric tests with Likert item data.  One consideration is that the underlying statistics for some tests are based on the dependent variable being continuous in nature.  Another consideration is that, while these tests have provisions to handle tied values, some authors worry that they may not behave well when there are many ties, as is likely for Likert data.

However, the results of the simulation studies below show that traditional nonparametric tests are good approximations for ordinal regression in most cases.

Optional:  Simulated comparisons of traditional nonparametric tests and ordinal regression

These simulations used results from a 5-point Likert item as the dependent variable.  Here, the results from the ordinal regression are used as the preferred standard.

Mann–Whitney test

When sample sizes were reasonably large and equal between groups (n per group = 25), p-values from Mann–Whitney were closely related to those from ordinal regression, with the Mann–Whitney test being underpowered only slightly. p-values from Mann–Whitney test compared to those from ordinal regression (cumulative link model) with simulated data.  Dependent variable is 5-point Likert data.  Both groups have equal sample sizes (n per group = 25).  The blue line is the 1:1 line.  The red lines indicate a p-value of 0.05 on each axis.

Small sample size

When sample sizes were small (n per group = 8), p-values from Mann–Whitney were still closely related to those from ordinal regression, but the Mann–Whitney test was underpowered compared with ordinal regression. p-values from Mann–Whitney test compared to those from ordinal regression (cumulative link model) with simulated data.  Dependent variable is 5-point Likert data.  Both groups have equal sample sizes (n per group = 8).  The blue line is the 1:1 line.  The red lines indicate a p-value of 0.05 on each axis.

Kruskal–Wallis test

When there were more than two groups (here k = 5, with n per group = 25), p-values from Kruskal–Wallis were more dispersed relative to ordinal regression than were those from Mann–Whitney.  Results from Kruskal–Wallis approximated those from ordinal regression relatively well for most cases in the region around p = 0.05 and below.  In some cases, Kruskal–Wallis was underpowered in this region. p-values from Kruskal–Wallis test compared to those from ordinal regression (cumulative link model) with simulated data.  Dependent variable is 5-point Likert data.  All groups have equal sample sizes (n per group = 25).  The blue line is the 1:1 line.  The red lines indicate a p-value of 0.05 on each axis.

Cochran–Armitage and permutation tests

Cochran–Armitage and permutation tests for ordinal data reasonably approximated results from ordinal regression for most cases in the region around p = 0.05 and below when the threshold = “equidistant” option was used for the cumulative link model.  The Cochran–Armitage and permutation tests assume equal spacing of ordinal categories by default.

p-values for the methods matched less well when the threshold = “equidistant” option was not used, or when at least one group had few observations (not shown).

Results from Cochran–Armitage and permutation tests were very similar in the region around p = 0.05 and below (not shown). p-values from Cochran–Armitage test compared to those from ordinal regression (cumulative link model) assuming equally spaced categories in the ordinal variable.  Dependent variable is 5-point Likert data.  Two groups with equal sample sizes (n per group = 25).  The blue line is the 1:1 line.  The red lines indicate a p-value of 0.05 on each axis.

Optional:  Simulated comparisons of traditional nonparametric tests and exact tests and Monte Carlo approaches

These simulations used results from a 5-point Likert item as the dependent variable.  Here, the results from the exact test are assumed to be the preferred standard.  The exact tests were conducted with the exactRankTests package.  The results from this package appear to agree with those from the coin package for exact tests.  Monte Carlo simulations used 10,000 iterations.

Mann–Whitney test

When sample sizes were reasonably large and equal between groups (n per group = 25), p-values from Mann–Whitney were closely related to those from the exact test in the p = 0.05 region, with some variability . p-values from Mann–Whitney test compared to those from exact test with simulated data.  Dependent variable is 5-point Likert data.  Both groups have equal sample sizes (n per group = 25).  The blue line is the 1:1 line.  The red lines indicate a p-value of 0.05 on each axis.

Small sample size

When sample sizes were small (n per group = 8), p-values from Mann–Whitney were still related to those from exact test but with more scattering of values. p-values from Mann–Whitney test compared to those from exact test with simulated data.  Dependent variable is 5-point Likert data.  Both groups have equal sample sizes (n per group = 8).  The blue line is the 1:1 line.  The red lines indicate a p-value of 0.05 on each axis.

Mann–Whitney test and Monte Carlo

When sample sizes were reasonably large and equal between groups (n per group = 25), p-values from Mann–Whitney were closely related to those from Monte Carlo simulations of the Mann–Whitney test, with some variability . p-values from Kruskal–Wallis test compared to those from Monte Carlo simulated Kruskal–Wallis with simulated data.  Dependent variable is 5-point Likert data.  All groups have equal sample sizes (n per group = 25).  The blue line is the 1:1 line.  The red lines indicate a p-value of 0.05 on each axis.

Kruskal–Wallis test and Monte Carlo

When there were more than two groups (here k = 3, with n per group = 25), p-values from Kruskal–Wallis were sim

ilar to those from Monte Carlo simulated Kruskal–Wallis, with some variability. p-values from Kruskal–Wallis test compared to those from Monte Carlo simulated Kruskal–Wallis with simulated data.  Dependent variable is 5-point Likert data.  All groups have equal sample sizes (n per group = 25).  The blue line is the 1:1 line.  The red lines indicate a p-value of 0.05 on each axis.