[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Introduction to Tests for Nominal Variables

The tests for nominal variables presented in this book are commonly used to determine if there is an association between two nominal variables or if counts of observations for a nominal variable match a theoretical set of proportions for that variable.

 

Tests of symmetry, or marginal homogeneity, can determine if frequencies for one nominal variable are greater than that for another, or if there was a change in frequencies from sampling at one time to another.

 

As a more advanced approach, models can be specified with nominal dependent variables.  A common type of model with a nominal dependent variable is logistic regression.

 

Descriptive statistics and plots for nominal data

 

Descriptive statistics for nominal data are discussed in the “Descriptive statistics for nominal data” section in the Descriptive Statistics chapter. 

 

Descriptive plots for nominal data are discussed in the “Examples of basic plots for nominal data” section in the Basic Plots chapter.

 

Contingency tables and matrices

 

Nominal data are often arranged in a contingency table of counts of observations for each cell of the table.  For example, if there were 6 males and 4 females reading Sappho, 5 males and 4 females reading Steven Crane, and 7 males and 10 females reading Judith Viorst, the data could be arranged as:


         Sex
         Male    Female
Poet
Sappho   6       4
Crane    3       4
Viorst   2       5

This data can be read into R in the following manner as a matrix.


Input =("
Poet     Male    Female
Sappho   6       4      
Crane    3       4
Viorst   2       5
")

Matrix = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

Matrix


       Male Female
Sappho    6      4
Crane     3      4
Viorst    2      5


It is helpful to look at totals for columns and rows.


colSums(Matrix)


  Male Female
    11     13


rowSums(Matrix)


Sappho  Crane Viorst
    10      7      7


Bar plots

Simple bar charts and mosaic plots are also helpful.


barplot(Matrix,
        beside = TRUE,
        legend = TRUE,
        ylim = c(0, 8),   ### y-axis: used to prevent legend overlapping bars
        cex.names = 0.8,  ### Text size for bars
        cex.axis = 0.8,   ### Text size for axis
        args.legend = list(x   = "topright",   ### Legend location
                           cex = 0.8,          ### Legend text size
                           bty = "n"))         ### Remove legend box


image


Matrix.t = t(Matrix)      ### Transpose Matrix for the next plot

barplot(Matrix.t,
        beside = TRUE,
        legend = TRUE,
        ylim = c(0, 8),   ### y-axis: used to prevent legend overlapping bars
        cex.names = 0.8,  ### Text size for bars
        cex.axis = 0.8,   ### Text size for axis
        args.legend = list(x   = "topright",   ### Legend location
                           cex = 0.8,          ### Legend text size
                           bty = "n"))         ### Remove legend box



image


Mosaic plots

Mosaic plots are very useful for visualizing the association between two nominal variables, but can be somewhat tricky to interpret for those unfamiliar with them.  Note that the column width is determined by the number of observation in that category.  In this case, the Sappho column is wider because more students are reading Sappho than the other two poets.  Note, too, that the number of observations in each cell is determined by the area of the cell, not its height.  In this case, the Sappho–Female cell and the Crane–Female cell have the same count (4), and so the same area.  The Crane–Female cell is taller than the Sappho–Female because it is a higher proportion of observations for that author (4 out of 7 Crane readers compared with 4 out of 10 Sappho readers).

 

mosaicplot(Matrix,
           color=TRUE,
           cex.axis=0.8)


image


Optional analyses: converting among matrices, tables, counts, and cases

 

In R, most simple analyses for nominal data expect the data to be in a matrix format.  However, data may be in a long format, either with each row representing a single observation (cases), or with each row containing a count of observations (counts). 

 

It is relatively easy to convert among these different forms of data.

 

Long-format with each row as an observation (cases)


Input =("
Poet     Sex
Sappho   Male
Sappho   Male
Sappho   Male
Sappho   Male
Sappho   Male
Sappho   Male
Sappho   Female
Sappho   Female
Sappho   Female
Sappho   Female
Crane    Male
Crane    Male
Crane    Male
Crane    Female
Crane    Female
Crane    Female
Crane    Female
Viorst   Male
Viorst   Male
Viorst   Female
Viorst   Female
Viorst   Female
Viorst   Female
Viorst   Female
")

Data = read.table(textConnection(Input),header=TRUE)


###  Order factors by the order in data frame

###  Otherwise, xtabs will alphabetize them

Data$Poet = factor(Data$Poet,
                   levels=unique(Data$Poet))

Data$Sex = factor(Data$Sex,
                  levels=unique(Data$Sex))

Cases to table


Table = xtabs(~ Poet + Sex,
              data=Data)

Table


        Sex
Poet     Male Female
  Sappho    6      4
  Crane     3      4
  Viorst    2      5


Cases to Counts


     Table = xtabs(~ Poet + Sex,
                   data=Data)

     Counts = as.data.frame(Table)
    
     Counts


    Poet    Sex Freq
1 Sappho   Male    6
2  Crane   Male    3
3 Viorst   Male    2
4 Sappho Female    4
5  Crane Female    4
6 Viorst Female    5


Long-format with counts of observations (counts)


Input =("
Poet     Sex      Freq
Sappho   Male     6
Sappho   Female   4
Crane    Male     3
Crane    Female   4
Viorst   Male     2
Viorst   Female   5
")

Counts = read.table(textConnection(Input),header=TRUE)


###  Order factors by the order in data frame
###  Otherwise, xtabs will alphabetize them


Counts$Poet = factor(Counts$Poet,
                   levels=unique(Counts$Poet))

Counts$Sex = factor(Counts$Sex,
                  levels=unique(Counts$Sex))


Counts to Table


Table = xtabs(Freq ~ Poet + Sex,
              data=Counts)

Table


        Sex
Poet     Male Female
  Sappho    6      4
  Crane     3      4
  Viorst    2      5


Counts to Cases


  Long = Counts[rep(row.names(Counts), Counts$Freq), c("Poet", "Sex")]
 
  rownames(Long) = seq(1:nrow(Long))
   
  Long


     Poet    Sex
1  Sappho   Male
2  Sappho   Male
3  Sappho   Male
4  Sappho   Male
5  Sappho   Male
6  Sappho   Male
7  Sappho Female
8  Sappho Female
9  Sappho Female
10 Sappho Female
11  Crane   Male
12  Crane   Male
13  Crane   Male
14  Crane Female
15  Crane Female
16  Crane Female
17  Crane Female
18 Viorst   Male
19 Viorst   Male
20 Viorst Female
21 Viorst Female
22 Viorst Female
23 Viorst Female
24 Viorst Female


Matrix form


Input =("
Poet     Male    Female
Sappho   6       4      
Crane    3       4
Viorst   2       5
")

Matrix = as.matrix(read.table(textConnection(Input),
                   header=TRUE,
                   row.names=1))

Matrix


       Male Female
Sappho    6      4
Crane     3      4
Viorst    2      5


Matrix to table


Table = as.table(Matrix)

Table


       Male Female
Sappho    6      4
Crane     3      4
Viorst    2      5


Matrix to counts


Table = as.table(Matrix)

Counts = as.data.frame(Table)

colnames(Counts) = c("Poet", "Sex", "Freq")

Counts


    Poet    Sex Freq
1 Sappho   Male    6
2  Crane   Male    3
3 Viorst   Male    2
4 Sappho Female    4
5  Crane Female    4
6 Viorst Female    5


Matrix to Cases


Table = as.table(Matrix)

Counts = as.data.frame(Table)

colnames(Counts) = c("Poet", "Sex", "Freq")

Long = Counts[rep(row.names(Counts), Counts$Freq), c("Poet", "Sex")]

rownames(Long) = seq(1:nrow(Long))

Long


     Poet    Sex
1  Sappho   Male
2  Sappho   Male
3  Sappho   Male
4  Sappho   Male
5  Sappho   Male
6  Sappho   Male
7   Crane   Male
8   Crane   Male
9   Crane   Male
10 Viorst   Male
11 Viorst   Male
12 Sappho Female
13 Sappho Female
14 Sappho Female
15 Sappho Female
16  Crane Female
17  Crane Female
18  Crane Female
19  Crane Female
20 Viorst Female
21 Viorst Female
22 Viorst Female
23 Viorst Female
24 Viorst Female


Table to matrix


Matrix = as.matrix(Table)

Matrix


       Male Female
Sappho    6      4
Crane     3      4
Viorst    2      5


Optional analyses: obtaining information about a matrix object


Matrix


       Male Female
Sappho    6      4
Crane     3      4
Viorst    2      5


class(Matrix)


[1] "matrix"


typeof(Matrix)


[1]"integer"


attributes(Matrix)


$dim
[1] 3 2

$dimnames

$dimnames[[1]]
[1] "Sappho" "Crane"  "Viorst"

$dimnames[[2]]
[1] "Male"   "Female"


str(Matrix)


 int [1:3, 1:2] 6 3 2 4 4 5
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:3] "Sappho" "Crane" "Viorst"
  ..$ : chr [1:2] "Male" "Female"


colnames(Matrix)


[1] "Male"   "Female"


rownames(Matrix)


[1] "Sappho" "Crane"  "Viorst"


References

 

Replicate each row of data.frame and specify the number of replications for each row.  Stack Overflow. 2011. stackoverflow.com/questions/2894775/replicate-each-row-of-data-frame-and-specify-the-number-of-replications-for-each.