[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Introduction to Tests for Nominal Variables

The tests for nominal variables presented in this book are commonly used to determine if there is an association between two nominal variables or if counts of observations for a nominal variable match a theoretical set of proportions for that variable.

 

Tests of symmetry, or marginal homogeneity, can determine if frequencies for one nominal variable are greater than that for another, or if there was a change in frequencies from sampling at one time to another.

 

As a more advanced approach, models can be specified with nominal dependent variables.  A common type of model with a nominal dependent variable is logistic regression.

 

Descriptive statistics and plots for nominal data

 

Descriptive statistics for nominal data are discussed in the “Descriptive statistics for nominal data” section in the Descriptive Statistics chapter. 

 

Descriptive plots for nominal data are discussed in the “Examples of basic plots for nominal data” section in the Basic Plots chapter.

 

Contingency tables

 

Nominal data are often arranged in a contingency table of counts of observations for each cell of the table.  For example, if there were 6 males and 4 females reading Sappho, 5 males and 4 females reading Steven Crane, and 7 males and 10 females reading Judith Viorst, the data could be arranged as:


         Sex
         Male    Female
Poet
Sappho   6       4
Crane    3       4
Viorst   2       5

This table can be read into R in the following manner.


Input =("
Poet     Male    Female
Sappho   6       4      
Crane    3       4
Viorst   2       5
")

Matrix = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

Matrix


       Male Female
Sappho    6      4
Crane     3      4
Viorst    2      5


It is helpful to look at totals for columns and rows.


colSums(Matrix)


  Male Female
    11     13


rowSums(Matrix)


Sappho  Crane Viorst
    10      7      7


Bar plots

Simple bar charts and mosaic plots are also helpful.


barplot(Matrix,
        beside = TRUE,
        legend = TRUE,
        ylim = c(0, 8),   ### y-axis: used to prevent legend overlapping bars
        cex.names = 0.8,  ### Text size for bars
        cex.axis = 0.8,   ### Text size for axis
        args.legend = list(x   = "topright",   ### Legend location
                           cex = 0.8,          ### Legend text size
                           bty = "n"))         ### Remove legend box


image


Matrix.t = t(Matrix)      ### Transpose Matrix for the next plot

barplot(Matrix.t,
        beside = TRUE,
        legend = TRUE,
        ylim = c(0, 8),   ### y-axis: used to prevent legend overlapping bars
        cex.names = 0.8,  ### Text size for bars
        cex.axis = 0.8,   ### Text size for axis
        args.legend = list(x   = "topright",   ### Legend location
                           cex = 0.8,          ### Legend text size
                           bty = "n"))         ### Remove legend box



image


Mosaic plots

Mosaic plots are very useful for visualizing the association between two nominal variables, but can be somewhat tricky to interpret for those unfamiliar with them.  Note that the column width is determined by the number of observation in that category.  In this case, the Sappho column is wider because more students are reading Sappho than the other two poets.  Note, too, that the number of observations in each cell is determined by the area of the cell, not its height.  In this case, the Sappho–Female cell and the Crane–Female cell have the same count (4), and so the same area.  The Crane–Female cell is taller than the Sappho–Female because it is a higher proportion of observations for that author (4 out of 7 Crane readers compared with 4 out of 10 Sappho readers).

 

mosaicplot(Matrix,
           color=TRUE,
           cex.axis=0.8)


image


Optional analyses: converting long-format data to a matrix

 

In R, most simple analyses for nominal data expect the data to be in a matrix format.  However, data may be in a long format, either with each row representing a single observation, or with each row containing a count of observations. 

 

It is relatively easy to convert long-format data to a matrix.  The xtabs function will produce a table that can be used for most functions expecting data to be formatted as a matrix object.

 

Long-format with each row as an observations


Input =("
Poet     Sex
Sappho   Male
Sappho   Male
Sappho   Male
Sappho   Male
Sappho   Male
Sappho   Male
Sappho   Female
Sappho   Female
Sappho   Female
Sappho   Female
Crane    Male
Crane    Male
Crane    Male
Crane    Female
Crane    Female
Crane    Female
Crane    Female
Viorst   Male
Viorst   Male
Viorst   Female
Viorst   Female
Viorst   Female
Viorst   Female
Viorst   Female
")

Data = read.table(textConnection(Input),header=TRUE)


###  Order factors by the order in data frame

###  Otherwise, xtabs will alphabetize them

Data$Poet = factor(Data$Poet,
                   levels=unique(Data$Poet))

Data$Sex = factor(Data$Sex,
                  levels=unique(Data$Sex))


Table = xtabs(~ Poet + Sex, data=Data)

Table


        Sex
Poet     Male Female
  Sappho    6      4
  Crane     3      4
  Viorst    2      5


Long-format with counts of observations


Input =("
Poet     Sex      Count
Sappho   Male     6
Sappho   Female   4
Crane    Male     3
Crane    Female   4
Viorst   Male     2
Viorst   Female   5
")

Data = read.table(textConnection(Input),header=TRUE)

###  Order factors by the order in data frame
###  Otherwise, xtabs will alphabetize them


Data$Poet = factor(Data$Poet,
                   levels=unique(Data$Poet))

Data$Sex = factor(Data$Sex,
                  levels=unique(Data$Sex))

Table = xtabs(Count ~ Poet + Sex, data=Data)

Table


        Sex
Poet     Male Female
  Sappho    6      4
  Crane     3      4
  Viorst    2      5


Optional analyses: obtaining information about a matrix object


Matrix


       Male Female
Sappho    6      4
Crane     3      4
Viorst    2      5


class(Matrix)


[1] "matrix"


typeof(Matrix)


[1]"integer"


attributes(Matrix)


$dim
[1] 3 2

$dimnames

$dimnames[[1]]
[1] "Sappho" "Crane"  "Viorst"

$dimnames[[2]]
[1] "Male"   "Female"


str(Matrix)


 int [1:3, 1:2] 6 3 2 4 4 5
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:3] "Sappho" "Crane" "Viorst"
  ..$ : chr [1:2] "Male" "Female"