## Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

# Introduction to Tests for Nominal Variables

The tests for nominal variables presented in this book are commonly used.  They might be used to determine if there is an association between two nominal variables (“association tests”), or if counts of observations for a nominal variable match a theoretical set of proportions for that variable (“goodness-of-fit tests”).

Tests of symmetry, or marginal homogeneity, can determine if frequencies for one nominal variable are greater than that for another, or if there was a change in frequencies from sampling at one time to another.  These are described here as “tests for paired nominal data.”

For tests of association, a measure of association, or effect size, should be reported.

When contingency tables include one or more ordinal variables, different tests of association are called for. (See Association Tests for Ordinal Tables).  Effect sizes are specific for these situations (See Measures of Association for Ordinal Tables).

As a more advanced approach, models can be specified with nominal dependent variables.  A common type of model with a nominal dependent variable is logistic regression.

### Descriptive statistics and plots for nominal data

Descriptive statistics for nominal data are discussed in the “Descriptive statistics for nominal data” section in the Descriptive Statistics chapter.

Descriptive plots for nominal data are discussed in the “Examples of basic plots for nominal data” section in the Basic Plots chapter.

### Contingency tables and matrices

Nominal data are often arranged in a contingency table of counts of observations for each cell of the table. For example, if there were 6 males and 4 females reading Sappho, 3 males and 4 females reading Stephen Crane, and 2 males and 5 females reading Judith Viorst, the data could be arranged as:

Sex
Male    Female
Poet
Sappho   6       4
Crane    3       4
Viorst   2       5

This data can be read into R in the following manner as a matrix.

Input =("
Poet     Male    Female
Sappho   6       4
Crane    3       4
Viorst   2       5
")

row.names=1))

Matrix

Male Female
Sappho    6      4
Crane     3      4
Viorst    2      5

It is helpful to look at totals for columns and rows.

colSums(Matrix)

Male Female
11     13

rowSums(Matrix)

Sappho  Crane Viorst
10      7      7

#### Bar plots

Simple bar charts and mosaic plots are also helpful.

barplot(Matrix,
beside = TRUE,
legend = TRUE,
ylim = c(0, 8),   ### y-axis: used to prevent legend overlapping bars
cex.names = 0.8,  ### Text size for bars
cex.axis = 0.8,   ### Text size for axis
args.legend = list(x   = "topright",   ### Legend location
cex = 0.8,          ### Legend text size
bty = "n"))         ### Remove legend box

Matrix.t = t(Matrix)      ### Transpose Matrix for the next plot

barplot(Matrix.t,
beside = TRUE,
legend = TRUE,
ylim = c(0, 8),   ### y-axis: used to prevent legend overlapping bars
cex.names = 0.8,  ### Text size for bars
cex.axis = 0.8,   ### Text size for axis
args.legend = list(x   = "topright",   ### Legend location
cex = 0.8,          ### Legend text size
bty = "n"))         ### Remove legend box

#### Mosaic plots

Mosaic plots are very useful for visualizing the association between two nominal variables, but can be somewhat tricky to interpret for those unfamiliar with them.  Note that the column width is determined by the number of observation in that category.  In this case, the Sappho column is wider because more students are reading Sappho than the other two poets.  Note, too, that the number of observations in each cell is determined by the area of the cell, not its height.  In this case, the Sappho–Female cell and the Crane–Female cell have the same count (4), and so the same area.  The Crane–Female cell is taller than the Sappho–Female because it is a higher proportion of observations for that author (4 out of 7 Crane readers compared with 4 out of 10 Sappho readers).

mosaicplot(Matrix,
color=TRUE,
cex.axis=0.8)

### Working with proportions

It is often useful to look at proportions of counts within nominal tables.

In this example we may want to look at the proportion of each Sex within each Poet.  That is, the proportions in each row of the first table below sum to 1.

Props = prop.table(Matrix, margin = 1)

Props

Male    Female
Sappho 0.6000000 0.4000000
Crane  0.4285714 0.5714286
Viorst 0.2857143 0.7142857

To plot these proportions, we will first transpose the table.

Props.t = t(Props)

Props.t

Sappho     Crane    Viorst
Male      0.6 0.4285714 0.2857143
Female    0.4 0.5714286 0.7142857

barplot(Props.t,
beside    = TRUE,
legend    = TRUE,
ylim      = c(0, 1),   ### y-axis: used to prevent legend overlapping bars
cex.names = 0.8,       ### Text size for bars
cex.axis  = 0.8,       ### Text size for axis
col       = c("mediumorchid1","mediumorchid4"),
ylab      = "Proportion within each Poet",
xlab      = "Poet",

args.legend = list(x   = "topright",   ### Legend location
cex = 0.8,          ### Legend text size
bty = "n"))         ### Remove box

### Optional analyses: converting among matrices, tables, counts, and cases

In R, most simple analyses for nominal data expect the data to be in a matrix format.  However, data may be in a long format, either with each row representing a single observation (cases), or with each row containing a count of observations (counts).

It is relatively easy to convert among these different forms of data.

#### Long-format with each row as an observation (cases)

Input =("
Poet     Sex
Sappho   Male
Sappho   Male
Sappho   Male
Sappho   Male
Sappho   Male
Sappho   Male
Sappho   Female
Sappho   Female
Sappho   Female
Sappho   Female
Crane    Male
Crane    Male
Crane    Male
Crane    Female
Crane    Female
Crane    Female
Crane    Female
Viorst   Male
Viorst   Male
Viorst   Female
Viorst   Female
Viorst   Female
Viorst   Female
Viorst   Female
")

###  Order factors by the order in data frame

###  Otherwise, xtabs will alphabetize them

Data\$Poet = factor(Data\$Poet,
levels=unique(Data\$Poet))

Data\$Sex = factor(Data\$Sex,
levels=unique(Data\$Sex))

##### Cases to table

Table = xtabs(~ Poet + Sex,
data=Data)

Table

Sex
Poet     Male Female
Sappho    6      4
Crane     3      4
Viorst    2      5

##### Cases to Counts

Table = xtabs(~ Poet + Sex,
data=Data)

Counts = as.data.frame(Table)

Counts

Poet    Sex Freq
1 Sappho   Male    6
2  Crane   Male    3
3 Viorst   Male    2
4 Sappho Female    4
5  Crane Female    4
6 Viorst Female    5

#### Long-format with counts of observations (counts)

Input =("
Poet     Sex      Freq
Sappho   Male     6
Sappho   Female   4
Crane    Male     3
Crane    Female   4
Viorst   Male     2
Viorst   Female   5
")

###  Order factors by the order in data frame
###  Otherwise, xtabs will alphabetize them

Counts\$Poet = factor(Counts\$Poet,
levels=unique(Counts\$Poet))

Counts\$Sex = factor(Counts\$Sex,
levels=unique(Counts\$Sex))

##### Counts to Table

Table = xtabs(Freq ~ Poet + Sex,
data=Counts)

Table

Sex
Poet     Male Female
Sappho    6      4
Crane     3      4
Viorst    2      5

##### Counts to Cases

Long = Counts[rep(row.names(Counts), Counts\$Freq), c("Poet", "Sex")]

rownames(Long) = seq(1:nrow(Long))

Long

Poet    Sex
1  Sappho   Male
2  Sappho   Male
3  Sappho   Male
4  Sappho   Male
5  Sappho   Male
6  Sappho   Male
7  Sappho Female
8  Sappho Female
9  Sappho Female
10 Sappho Female
11  Crane   Male
12  Crane   Male
13  Crane   Male
14  Crane Female
15  Crane Female
16  Crane Female
17  Crane Female
18 Viorst   Male
19 Viorst   Male
20 Viorst Female
21 Viorst Female
22 Viorst Female
23 Viorst Female
24 Viorst Female

#### Matrix form

Input =("
Poet     Male    Female
Sappho   6       4
Crane    3       4
Viorst   2       5
")

row.names=1))

Matrix

Male Female
Sappho    6      4
Crane     3      4
Viorst    2      5

##### Matrix to table

Table = as.table(Matrix)

Table

Male Female
Sappho    6      4
Crane     3      4
Viorst    2      5

##### Matrix to counts

Table = as.table(Matrix)

Counts = as.data.frame(Table)

colnames(Counts) = c("Poet", "Sex", "Freq")

Counts

Poet    Sex Freq
1 Sappho   Male    6
2  Crane   Male    3
3 Viorst   Male    2
4 Sappho Female    4
5  Crane Female    4
6 Viorst Female    5

##### Matrix to Cases

Table = as.table(Matrix)

Counts = as.data.frame(Table)

colnames(Counts) = c("Poet", "Sex", "Freq")

Long = Counts[rep(row.names(Counts), Counts\$Freq), c("Poet", "Sex")]

rownames(Long) = seq(1:nrow(Long))

Long

Poet    Sex
1  Sappho   Male
2  Sappho   Male
3  Sappho   Male
4  Sappho   Male
5  Sappho   Male
6  Sappho   Male
7   Crane   Male
8   Crane   Male
9   Crane   Male
10 Viorst   Male
11 Viorst   Male
12 Sappho Female
13 Sappho Female
14 Sappho Female
15 Sappho Female
16  Crane Female
17  Crane Female
18  Crane Female
19  Crane Female
20 Viorst Female
21 Viorst Female
22 Viorst Female
23 Viorst Female
24 Viorst Female

##### Table to matrix

Matrix = as.matrix(Table)

Matrix

Male Female
Sappho    6      4
Crane     3      4
Viorst    2      5

### Optional analyses: obtaining information about a matrix object

Matrix

Male Female
Sappho    6      4
Crane     3      4
Viorst    2      5

class(Matrix)

[1] "matrix"

typeof(Matrix)

[1]"integer"

attributes(Matrix)

\$dim
[1] 3 2

\$dimnames

\$dimnames[[1]]
[1] "Sappho" "Crane"  "Viorst"

\$dimnames[[2]]
[1] "Male"   "Female"

str(Matrix)

int [1:3, 1:2] 6 3 2 4 4 5
- attr(*, "dimnames")=List of 2
..\$ : chr [1:3] "Sappho" "Crane" "Viorst"
..\$ : chr [1:2] "Male" "Female"

colnames(Matrix)

[1] "Male"   "Female"

rownames(Matrix)

[1] "Sappho" "Crane"  "Viorst"

### References

Replicate each row of data.frame and specify the number of replications for each row.  Stack Overflow. 2011. stackoverflow.com/questions/2894775/replicate-each-row-of-data-frame-and-specify-the-number-of-replications-for-each.