[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Descriptive Statistics for Likert Data

Median

For Likert data, the primary measure of location we will use is the median.  For a set of numbers, the median is the middle value when the data are arranged in numerical order.  For example, for the set of values 1, 2, 3, 4, 5, 5, 5, the median value is 4.  If there are an even number of values, the two middle values are averaged. 

 

In concept, 50% of observations are less than the median, and 50% are greater than the median (ignoring the median itself in cases of an odd number of observations).  You could also find the median of ordinal data consisting just of text descriptors.  For example, for the set of values:

 

 “high school”, “associate degree”, “bachelor’s degree”, “master’s degree”, “doctorate”,

 

the median is “bachelor’s degree”.

 

Using medians makes sense for ordinal data, whereas using means would in general not be appropriate.

 

Count of responses and bar plot

In order to visualize the location and variation of Likert data, we will look at the distribution of the responses.  The distribution of responses can be enumerated by counting the number of responses for each response level, and can be visualized with a bar plot of these counts.  This is similar to a histogram of the responses.

 

Quartiles and percentiles

As a measure of the variation of data, it is also useful to look at the quartiles of the data set.  The first quartile indicates the value below which 25% of the values fall.  It is equivalent to the 25th percentile.  The second quartile is equivalent to the median.  The third quartile indicates the value below which 75% of the values fall.  It is equivalent to the 75th percentile. 

 

Packages used in this chapter

 

The packages used in this chapter include:

•  psych

•  FSA

•  lattice

•  ggplot2

•  plyr

•  boot

•  rcompanion

 

The following commands will install these packages if they are not already installed:


if(!require(psych)){install.packages("psych")}
if(!require(FSA)){install.packages("FSA")}
if(!require(lattice)){install.packages("lattice")}
if(!require(ggplot2)){install.packages("ggplot2")}
if(!require(plyr)){install.packages("plyr")}
if(!require(boot)){install.packages("boot")}
if(!require(rcompanion)){install.packages("rcompanion")}


Descriptive statistics for one-sample data

 

Imagine a scenario in which 10 respondents are evaluating a single speaker on a single Likert item.  The following example will create a data frame called Data consisting of three variables:  Speaker, Rater, and Likert.  The speaker is the same for all observations, and Likert represents the response on a five-point Likert item. 

 

One-sample data

One-sample data refers to a single set of values that is not broken into groups.  In this example, we have only one speaker with 10 ratings, hence one set of ratings values.

 

Variables and functions in this example

A new variable, Likert.f, will be created in the data frame.  It will have the same values as the Likert variable, but we will tell R to treat it as an ordered factor variable, whereas Likert is a numeric variable.  When summarizing data as a nominal variable, Likert.f will be used.  But when summarizing data as a numeric variable, Likert will be used.

 

Note that the str function reports that Speaker is a factor variable, Rater and Likert are integer variables, and Likert.f is an ordered factor variable.  The levels of the factor variables are shown with the level function.

 

Recall that Data$Likert tells R to use the Likert variable in the data frame Data.


Input =("
  Speaker          Rater  Likert
 'Maggie Simpson'   1         3
 'Maggie Simpson'   2         4
 'Maggie Simpson'   3         5
 'Maggie Simpson'   4         4
 'Maggie Simpson'   5         4
 'Maggie Simpson'   6         4
 'Maggie Simpson'   7         4
 'Maggie Simpson'   8         3
 'Maggie Simpson'   9         2
 'Maggie Simpson'  10         5
")

Data = read.table(textConnection(Input),header=TRUE)


### Create a new variable which is the Likert scores as an ordered factor


Data$Likert.f = factor(Data$Likert,
                       ordered = TRUE,
                       levels = c("1", "2", "3", "4", "5")
                       )


### Double check our data frame


library(psych)

headTail(Data)


           Speaker Rater Likert Likert.f
1   Maggie Simpson     1      3        3
2   Maggie Simpson     2      4        4
3   Maggie Simpson     3      5        5
4   Maggie Simpson     4      4        4
...           <NA>   ...    ...     <NA>
7   Maggie Simpson     7      4        4
8   Maggie Simpson     8      3        3
9   Maggie Simpson     9      2        2
10  Maggie Simpson    10      5        5


str(Data)


'data.frame':     10 obs. of  4 variables:
 $ Speaker : Factor w/ 1 level "Maggie Simpson": 1 1 1 1 1 1 1 1 1 1
 $ Rater   : int  1 2 3 4 5 6 7 8 9 10
 $ Likert  : int  3 4 5 4 4 4 4 3 2 5
 $ Likert.f: Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 3 4 5 4 4 4 4 3 2 5


levels(Data$Speaker)


[1] "Maggie Simpson"


levels(Data$Likert.f)


[1] "1" "2" "3" "4" "5"


summary(Data)


### Remove unnecessary objects

rm(Input)


Summary treating Likert data as nominal data

The number of responses for each response level of Likert.f  can be shown with the summary function, either on the Likert.f variable itself or as part of the whole data frame Data.  The xtabs function can produce a similar summary. 

 

This same summary is then visualized with a bar plot.

 

summary(Data$Likert.f)


1 2 3 4 5
0 1 2 5 2

   ### The top row is the value of Likert.f,
   ###  and the next row is the counts for each value
.


summary(Data)


           Speaker       Rater           Likert     Likert.f
 Maggie Simpson:10   Min.   : 1.00   Min.   :2.00   1:0    
                     1st Qu.: 3.25   1st Qu.:3.25   2:1    
                     Median : 5.50   Median :4.00   3:2    
                     Mean   : 5.50   Mean   :3.80   4:5    
                     3rd Qu.: 7.75   3rd Qu.:4.00   5:2    
                     Max.   :10.00   Max.   :5.00  


xtabs and bar plot


XT = xtabs(~ Likert.f,
           data=Data)

XT


Likert.f
1 2 3 4 5
0 1 2 5 2

   ###  Counts for each level of Likert.f


sum(XT)


[1] 10

   ### Sum of counts for whole table


prop.table(XT)


  1   2   3   4   5
0.0 0.1 0.2 0.5 0.2

  ### Proportion of total for each count


barplot(XT,  
        col="dark gray",
        xlab="Maggie's Likert",
        ylab="Frequency")


image


Summary treating Likert data as numeric data

In this code the Likert response data is treated as numeric data.  The summary function produces the median, as well as the minimum, maximum, first quartile, and third quartile for a variable.  The summary function can be used for either a single variable or a whole data frame.  The Summarize function in the FSA package produces those statistics as well as the number of observations (n).

 

These statistics are visualized with a box plot.  Note that the heavy line in the box is the median, and the ends of the box are the first quartile and third quartile.  The extent of the whiskers in the box plot defaults to being the most extreme values, but no more than 1.5 times the length of the box itself.  Values beyond the extent of the whiskers will be indicated with circles.  Note that in the box plot for this example that the whiskers extend to the most extreme values in the data (2 and 5), and that there are no circles in the plot.


summary(Data$Likert)


   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   2.00    3.25    4.00    3.80    4.00    5.00


summary(Data)


           Speaker       Rater           Likert     Likert.f
 Maggie Simpson:10   Min.   : 1.00   Min.   :2.00   1:0    
                     1st Qu.: 3.25   1st Qu.:3.25   2:1    
                     Median : 5.50   Median :4.00   3:2     
                     Mean   : 5.50   Mean   :3.80   4:5    
                     3rd Qu.: 7.75   3rd Qu.:4.00   5:2    
                     Max.   :10.00   Max.   :5.00  


library(FSA)

Summarize(Data$Likert,
          digits=3)


       n     mean       sd      min       Q1   median       Q3      max percZero
  10.000    3.800    0.919    2.000    3.250    4.000    4.000    5.000    0.000


Box plot


boxplot(Data$Likert,
        ylab="Likert scores",
        xlab="Maggie's results")


image


Descriptive statistics for one-way or multi-way data

 

For this example, imagine that there are two speakers, each of which is being evaluated on each of three questions on 10-point Likert items.  There are six raters, each of whom evaluates each speaker on each of these items.

 

One-way data

One-way data refers to a data set with a single measured variable, but divided into groups.  In this example looking at values of Likert for Spongebob and for Patrick would be an example of one-way data.  Note that for one-way data there is a single dependent variable (Likert) and a single independent factor variable (Speaker).  

 

Looking at Likert for the three levels of Question (Information, Presentation, and Questions) would also be an example of one-way data.

 

Multi-way data

Multi-way data refers to looking at a single measured variable divided into groups, where the groups are defined by at least two factor variables.  If we look at values of Likert for each of the two speakers for each of the three questions, we are looking at two-way data.

 

Variables and functions in this example

The following example will create a data frame called Data consisting of four variables:  Speaker, Question, Rater, and Likert.  An additional variable named Likert.f will be created as an ordered factor variable with the same value as Likert for each observation.

 

Note that the str function reports that Speaker, Question, and Rater are factor variables, Likert is an integer variable, and Likert.f is an ordered factor variable.  The levels of the factor variables are shown with the level function.

 

Recall that Data$Likert tells R to use the Likert variable in the data frame Data.


Input =("
 Speaker     Question        Rater  Likert
 Spongebob   Information     a      5
 Spongebob   Information     b      6
 Spongebob   Information     c      7
 Spongebob   Information     d      8
 Spongebob   Information     e      6
 Spongebob   Information     f      5
 Spongebob   Presentation    a      8
 Spongebob   Presentation    b      7
 Spongebob   Presentation    c      8
 Spongebob   Presentation    d      9
 Spongebob   Presentation    e     10
 Spongebob   Presentation    f      7
 Spongebob   Questions       a      3
 Spongebob   Questions       b      4
 Spongebob   Questions       c      5
 Spongebob   Questions       d      4
 Spongebob   Questions       e      6
 Spongebob   Questions       f      5
 Patrick     Information     a      6
 Patrick     Information     b      7
 Patrick     Information     c      8
 Patrick     Information     d      9       
 Patrick     Information     e      7 
 Patrick     Information     f      6 
 Patrick     Presentation    a      9
 Patrick     Presentation    b      9
 Patrick     Presentation    c     10
 Patrick     Presentation    d     10      
 Patrick     Presentation    e      8
 Patrick     Presentation    f      8
 Patrick     Questions       a      5
 Patrick     Questions       b      6
 Patrick     Questions       c      6
 Patrick     Questions       d      7       
 Patrick     Questions       e      5 
 Patrick     Questions       f      7           
")

Data = read.table(textConnection(Input),header=TRUE)


### Order factor levels, otherwise R will alphabetize them

Data$Speaker = factor(Data$Speaker,
                      levels=unique(Data$Speaker))
                     
Data$Question = factor(Data$Question,
                      levels=unique(Data$Question))


### Create an ordered factor of Likert data

Data$Likert.f = factor(Data$Likert,
                          ordered=TRUE,
                          levels = c("1", "2", "3", "4", "5",
                                     "6", "7", "8", "9", "10")
                          )


### Examine data frame

library(psych)

headTail(Data)


      Speaker    Question Rater Likert Likert.f
1   Spongebob Information     a      5        5
2   Spongebob Information     b      6        6
3   Spongebob Information     c      7        7
4   Spongebob Information     d      8        8
...      <NA>        <NA>  <NA>    ...     <NA>
33    Patrick   Questions     c      6        6
34    Patrick   Questions     d      7        7
35    Patrick   Questions     e      5        5
36    Patrick   Questions     f      7        7


str(Data)


'data.frame':     36 obs. of  5 variables:
 $ Speaker : Factor w/ 2 levels "Spongebob","Patrick": 1 1 1 1 1 1 1 1 1 1 ...
 $ Question: Factor w/ 3 levels "Information",..: 1 1 1 1 1 1 2 2 2 2 ...
 $ Rater   : Factor w/ 6 levels "a","b","c","d",..: 1 2 3 4 5 6 1 2 3 4 ...
 $ Likert  : int  5 6 7 8 6 5 8 7 8 9 ...
 $ Likert.f: Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 5 6 7 8 6 5 8 7 8 9 ...


levels(Data$Speaker)


[1] "Spongebob" "Patrick" 


levels(Data$Question)


[1] "Information"  "Presentation" "Questions" 


levels(Data$Rater)


[1] "a" "b" "c" "d" "e" "f"


levels(Data$Likert.f)


[1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"


summary(Data)


### Remove unnecessary objects

rm(Input)


Summary treating Likert data as nominal data

The number of responses for each response level of Likert.f is shown with the summary function, but note that these counts are for the entire data frame.

The xtabs function can be used to generate counts across groups.  This process is known as cross-tabulation, or cross-tabs.  The prop.table function, with the margin=1 option produces the proportions of observations for each row.

This same summary is then visualized with a bar plot.


summary(Data)


      Speaker           Question  Rater     Likert          Likert.f
 Spongebob:18   Information :12   a:6   Min.   : 3.000   6      :7 
 Patrick  :18   Presentation:12   b:6   1st Qu.: 5.750   7      :7 
                Questions   :12   c:6   Median : 7.000   5      :6 
                                  d:6   Mean   : 6.833   8      :6 
                                  e:6   3rd Qu.: 8.000   9      :4 
                                  f:6   Max.   :10.000   10     :3 
                                                         (Other):3



XT = xtabs(~ Speaker + Likert.f, data=Data)

XT


           Likert.f
Speaker     1 2 3 4 5 6 7 8 9 10
  Spongebob 0 0 1 2 4 3 3 3 1  1
  Patrick   0 0 0 0 2 4 4 3 3  2


prop.table(XT,
           margin = 1)


           Likert.f
Speaker          1      2      3      4      5      6      7      8      9     10
  Spongebob 0.0000 0.0000 0.0556 0.1111 0.2222 0.1667 0.1667 0.1667 0.0556 0.0556
  Patrick   0.0000 0.0000 0.0000 0.0000 0.1111 0.2222 0.2222 0.1667
0.1667 0.1111


sum(XT)


[1] 36

   ### Sum of counts in the table


rowSums(XT)


Spongebob   Patrick
       18        18

   ### Sum of counts for each row


colSums(XT)


 1  2  3  4  5  6  7  8  9 10
 0  0  1  2  6  7  7  6  4  3

   ### Sum of counts for each column

Bar plot by group


XT = xtabs(~ Speaker + Likert.f, data=Data)

XT

barplot(XT,
        beside=TRUE,
        legend=TRUE,
        ylim=c(0, 5),  # adjust to remove legend overlap
        xlab="Likert score",
        ylab="Frequency")


image


xtabs to produce counts for two-way data


### Note that the dependent variable (Likert.f)
###   is the middle variable in the formula


XT2 = xtabs(~ Speaker + Likert.f + Question,
            data=Data)

XT2


, , Question = Information

           Likert.f
Speaker     1 2 3 4 5 6 7 8 9 10
  Spongebob 0 0 0 0 2 2 1 1 0  0
  Patrick   0 0 0 0 0 2 2 1 1  0

, , Question = Presentation

           Likert.f
Speaker     1 2 3 4 5 6 7 8 9 10
  Spongebob 0 0 0 0 0 0 2 2 1  1
  Patrick   0 0 0 0 0 0 0 2 2  2

, , Question = Questions

           Likert.f
Speaker     1 2 3 4 5 6 7 8 9 10
  Spongebob 0 0 1 2 2 1 0 0 0  0


Summary treating Likert data as numeric data

In this code the Likert response data is treated as numeric data.  The summary function produces the median, quartiles, and extremes.  But note that these statistics are for all values for the whole data frame.

 

The Summary function in the FSA package can produce these summary statistics across groups for one-way data or multi-way data.

 

Histograms are then produced to visualize the distribution of responses across groups.  Histograms for both one-way and two-way data can be produced with the histogram function in the lattice package.

 

Finally, box plots of the two-way data are produced.


summary(Data)


      Speaker           Question  Rater     Likert          Likert.f
 Spongebob:18   Information :12   a:6   Min.   : 3.000   6      :7 
 Patrick  :18   Presentation:12   b:6   1st Qu.: 5.750   7      :7 
                Questions   :12   c:6   Median : 7.000   5      :6 
                                  d:6   Mean   : 6.833   8      :6 
                                  e:6   3rd Qu.: 8.000   9      :4 
                                  f:6   Max.   :10.000   10     :3 
                                                         (Other):3


library(FSA)
 
Summarize(Likert ~ Question,
          data=Data,
          digits=3)


      Question  n  mean    sd min   Q1 median   Q3 max percZero
1  Information 12 6.667 1.231   5 6.00    6.5 7.25   9        0
2 Presentation 12 8.583 1.084   7 8.00    8.5 9.25  10        0
3    Questions 12 5.250 1.215   3 4.75    5.0 6.00   7        0


library(FSA)

Summarize(Likert ~ Speaker + Question,
          data=Data,
          digits=3)


    Speaker     Question n  mean    sd min   Q1 median   Q3 max percZero
1 Spongebob  Information 6 6.167 1.169   5 5.25    6.0 6.75   8        0
2   Patrick  Information 6 7.167 1.169   6 6.25    7.0 7.75   9        0
3 Spongebob Presentation 6 8.167 1.169   7 7.25    8.0 8.75  10        0
4   Patrick Presentation 6 9.000 0.894   8 8.25    9.0 9.75  10        0
5 Spongebob    Questions 6 4.500 1.049   3 4.00    4.5 5.00   6        0
6   Patrick    Questions 6 6.000 0.894   5 5.25    6.0 6.75   7        0


Bar plots by group

The histogram function in the lattice package will produce bar plots if the counted variable is ordinal in nature, as Likert.f is.


library(lattice)

histogram(~ Likert.f | Speaker,
          data=Data,
          layout=c(1,2)      #  columns and rows of individual plots
          )


image


library(lattice)

histogram(~ Likert.f | Question,
          data=Data,
          layout=c(1,3)      #  columns and rows of individual plots
          )


image


Bar plots for two-way data


library(lattice)

histogram(~ Likert.f | Speaker + Question,
          data=Data,
          layout=c(2,3)      #  columns and rows of individual plots
          )


image


Box plots for two-way data

In general, using the names option is not necessary, but here it is useful because the default group names are long.


boxplot(Likert ~ Speaker + Question,
        data=Data,
        names=c("SB.Inf","P.Inf","SB.Pres", "P.Pres", "SB.Quest", "P.Quest"),
        ylab="Value")


image

   ### Note in this example, SB is Spongebob, P is Patrick,
   ###   Inf is Information, Pres is Presentation,
   ###   and Quest is Question.


Interaction plot using medians and quartiles

Interaction plots are useful to visualize two-way data by plotting the value of a measured variable across the interaction of two factor variables.

 

In this example, the median Likert is calculated for each level of the interaction of two factors, and the first and third quartiles are used to indicate the spread of data about each median.

 

The Summarize function in the FSA package creates a data frame called Sum.  The variables median, Q1, and Q3 in this data frame will then be used in the plot.

 

The interaction plot is produced with the ggplot2 package.  This package is very versatile and powerful for creating plots, but the code can be intimidating at first.  Note that the code indicates that the data frame to use is Sum, and the use of the variables Speaker, median, Question, Q1, Q3 from this data frame.  Also note that the y axis label defined as “Median Likert score”.


### Create a data frame called Sum with median and quartiles


library(FSA)
 
Sum = Summarize(Likert ~ Speaker + Question,
                data=Data,
                digits=3)

Sum


    Speaker     Question n  mean    sd min   Q1 median   Q3 max percZero
1 Spongebob  Information 6 6.167 1.169   5 5.25    6.0 6.75   8        0
2   Patrick  Information 6 7.167 1.169   6 6.25    7.0 7.75   9        0
3 Spongebob Presentation 6 8.167 1.169   7 7.25    8.0 8.75  10        0
4   Patrick Presentation 6 9.000 0.894   8 8.25    9.0 9.75  10        0
5 Spongebob    Questions 6 4.500 1.049   3 4.00    4.5 5.00   6        0
6   Patrick    Questions 6 6.000 0.894   5 5.25    6.0 6.75   7        0


### Produce interaction plot

library(ggplot2)

pd = position_dodge(.2)

ggplot(Sum, aes(x=Speaker,
                y=median,
                color=Question)) +
    geom_errorbar(aes(ymin=Q1,
                      ymax=Q3),
                   width=.2, size=0.7, position=pd) +
    geom_point(shape=15, size=4, position=pd) +
    theme_bw() +
    theme(axis.title = element_text(face = "bold")) +

    ylab("Median Likert score")


image


Interaction plot using medians and confidence intervals

An interaction plot can also use confidence intervals of the medians for its error bars.  Confidence intervals are discussed in the Confidence Intervals chapter, and confidence intervals for medians are discussed in the Confidence Intervals for Medians chapter.

 

The function groupwiseMedian in the rcompanion package creates a data frame, called Sum, here that includes medians and confidence intervals of the medians for one-way or multi-way data.  By default the function uses confidence intervals by the BCa method, but can produce confidence intervals by other methods.

 

Note that the order of speakers has changed from the previous interaction plot.


library(rcompanion)

Sum = groupwiseMedian(data = Data,
                      group = c("Speaker", "Question"),
                      var = "Likert",
                      conf=0.95,
                      R=5000)

Sum


    Speaker     Question n Median Conf.level Bca.lower Bca.upper
1 Spongebob  Information 6    6.0       0.95         5       6.5
2 Spongebob Presentation 6    8.0       0.95         7       8.5
3 Spongebob    Questions 6    4.5       0.95         3       5.0
4   Patrick  Information 6    7.0       0.95         6       7.5
5   Patrick Presentation 6    9.0       0.95         8       9.5
6   Patrick    Questions 6    6.0       0.95         5       6.5


Alternate code:

Sum = groupwiseMedian(data = Data,
                      group = c("Speaker", "Question"),
                      var = "Likert",
                      conf=0.95,
                      R=5000)

Sum


library(ggplot2)

pd = position_dodge(.2)

ggplot(Sum, aes(x=Speaker,
                y=Median,
                color=Question)) +
    geom_errorbar(aes(ymin=Bca.lower,
                      ymax=Bca.upper),
                   width=.2, size=0.7, position=pd) +
    geom_point(shape=15, size=4, position=pd) +
    theme_bw() +
    theme(axis.title = element_text(face = "bold")) +

    ylab("Median Likert score")


image


Descriptive statistics with opt-out responses

Opt-out responses include options like “don’t know” or “not applicable”.  A typical approach to handling opt-out responses in Likert data is to separate them from the numeric or ordered factor responses of the question and report them separately.

 

For example, if responses to a question were 5, 5, 5, 4, 4, 2, ‘Don’t know’, ‘N/A’,  you might report that you had 8 total responses, with 12.5% “Don’t know”, 12.5% “N/A”, and a median response of 4.5.

 

Analysis for vector data

The xtabs function can be used to count the levels of responses with opt-out answers, treating the data as nominal.

 

The Summarize function in the FSA will calculate the median for numeric data.  Note that the numeric summary includes the number of observations and the number of “valid” observations, which excludes the non-numeric values.


Q1 = c("5", "5", "5", "4", "4", "2", "Don't know", "N/A")


Counts of responses


Q1.f = as.factor(Q1)

XT = xtabs(~Q1)

XT


Q1
         2          4          5 Don't know        N/A
         1          2          3          1          1


prop.table(XT)


Q1
         2          4          5 Don't know        N/A
     0.125      0.250      0.375      0.125      0.125


Numerical summary of responses


Q1.n = as.numeric(Q1)

library(FSA)

Summarize(Q1.n,
          digits = 2)


       n   nvalid     mean       sd      min       Q1   median       Q3      max
    8.00     6.00     4.17     1.17     2.00     4.00     4.50     5.00     5.00


 

Analysis for long-form data frame

Similar summaries can be obtained for data in a long-form data frame.

Note that the read.table function by default reads in the values of Likert as a factor variable, since it contains character values, not as a character variable.  Also note that you shouldn’t use NA for a value of “not applicable” since NA is reserved by R for a missing value.


Input =("
  Rater  Question  Likert
 '1'     Q1        5
 '2'     Q1        5
 '3'     Q1        5
 '4'     Q1        4
 '5'     Q1        4
 '6'     Q1        2
 '7'     Q1        DontKnow
 '8'     Q1        NotApp
 '1'     Q2        3
 '2'     Q2        3
 '3'     Q2        3
 '4'     Q2        2
 '5'     Q2        DontKnow
 '6'     Q2        2
 '7'     Q2        DontKnow
 '8'     Q2        DontKnow
 '1'     Q3        1
 '2'     Q3        2
 '3'     Q3        3
 '4'     Q3        2
 '5'     Q3        3
 '6'     Q3        1
 '7'     Q3        2
 '8'     Q3        3
 ")

Data = read.table(textConnection(Input),header=TRUE)


Counts of responses


XT = xtabs(~ Question + Likert,
           data = Data)

XT


        Likert
Question 1 2 3 4 5 DontKnow NotApp
      Q1 0 1 0 2 3        1      1
      Q2 0 2 3 0 0        3      0
      Q3 2 3 3 0 0        0      0

   ### Counts for each question


prop.table(XT,
           margin=1)


        Likert

Question     1     2     3     4     5 DontKnow NotApp
      Q1 0.000 0.125 0.000 0.250 0.375    0.125  0.125
      Q2 0.000 0.250 0.375 0.000 0.000    0.375  0.000
      Q3 0.250 0.375 0.375 0.000 0.000    0.000  0.000

   ### Proportions for each row



Numerical summary of responses


Data$Likert.n = as.numeric(as.character(Data$Likert))


library(FSA)

Summarize(Likert.n ~ Question,
          data = Data)


  Question n nvalid     mean        sd min   Q1 median Q3 max percZero
1       Q1 8      6 4.166667 1.1690452   2 4.00    4.5  5   5        0
2       Q2 8      5 2.600000 0.5477226   2 2.00    3.0  3   3        0
3       Q3 8      8 2.125000 0.8345230   1 1.75    2.0  3   3        0

   ### Numerical summary


Summary for whole data frame


XT = xtabs( ~ Likert,
            data = Data)

XT


Likert
       1        2        3        4        5 DontKnow   NotApp
       2        6        6        2        3        4        1


prop.table(XT)


Likert
         1          2          3          4          5   DontKnow     NotApp
0.08333333 0.25000000 0.25000000 0.08333333 0.12500000 0.16666667 0.04166667


sum(XT)


[1] 24

   ### Sum of counts in table


Summarize( ~ Likert.n,
           data = Data,
           digits = 2)


       n   nvalid     mean       sd      min       Q1   median       Q3      max
   24.00    19.00     2.89     1.24     1.00     2.00     3.00     3.50     5.00


Exercises G

 

1. Considering Maggie Simpson's data,

 

a.  How many Likert responses of "4" were there?

b.  What percentage of responses was that?

c.  What was the median Likert response?

d.  What was the minimum response?

e.  How would you summarize the bar plot of Maggie's responses?



2. Considering Spongebob and Patrick's workshops,

 

a.  How many Likert responses of "5" did Spongebob and Patrick get, respectively?

b.  What percentage of total responses for each of them were these?

c.  How would you summarize the bar plot of their responses?

d.  What were the median scores for Patrick for each of Information, Presentation, and Questions?

e.  What do the histograms for the two-way data (Spongebob and Patrick for each of Information, Presentation, and Questions) suggest to you?

 

f.  Are your impressions borne out by the interaction plots?

g.  Describe these results, including your opinions on the practical implications.


3.  Sandy Cheeks and Squidward each delivered presentations, and were each evaluated on each of the two courses (A and B).


Speaker    Course  Likert

Sandy      A       5

Sandy      A       5

Sandy      A       5

Sandy      A       4

Sandy      A       3

Sandy      A       3

Sandy      B       5

Sandy      B       4

Sandy      B       3

Sandy      B       3

Sandy      B       2

Sandy      B       2

Squidward  A       4

Squidward  A       4

Squidward  A       3

Squidward  A       3      

Squidward  A       2 

Squidward  A       2 

Squidward  B       3

Squidward  B       3

Squidward  B       2

Squidward  B       2      

Squidward  B       1

Squidward  B       1

 

a.  Summarize the data as one-way or two-way data, using at least one plot, and at least one numerical summary.  Interpret your results, and describe your conclusions.  Remember to use statistics that are appropriate for Likert data.  Include your opinion on the practical importance.