Goodness-of-fit tests are used to compare proportions of levels of a nominal variable to theoretical or expected proportions. Common goodness-of-fit tests are G-test, chi-square, and binomial or multinomial exact tests.

In general, there are no assumptions about the distribution of data for these tests. However, the results of chi-square tests and G-tests can be inaccurate if cell counts are low. A rule of thumb is that all cell counts should be 5 or greater for chi-square- and G-tests. For a more complete discussion, see McDonald in the “Optional Readings” section for details on what constitutes low cell counts.

One approach is to use exact tests, which are not bothered by low cell counts. However, if there are not low cell counts, using G-test or chi-square test is fine. G-test is probably technically a better test than chi-square. The advantage of chi-square tests is that your audience may be more familiar with them.

G-tests are also called likelihood ratio tests, and “Likelihood Ratio Chi-Square” by SAS.

##### Appropriate data

• A nominal variable with two or more levels

• Theoretical, typical, expected, or neutral values for the proportions for this variable are needed for comparison

• G-test and chi-square test may not be appropriate if there are cells with low counts

##### Hypotheses

• Null hypothesis: The proportions for the levels for the nominal variable are not different from the expected proportions.

• Alternative hypothesis (two-sided): The proportions for the levels for the nominal variable are different from the expected proportions.

##### Interpretation

Significant results can be reported as “The proportions for the levels for the nominal variable were statistically different from the expected proportions.”

### Packages used in this chapter

The packages used in this chapter include:

• EMT

• DescTools

• ggplot2

The following commands will install these packages if they are not already installed:

if(!require(EMT)){install.packages("EMT")}

if(!require(DescTools)){install.packages("DescTools")}

if(!require(ggplot2)){install.packages("ggplot2")}

### Goodness-of-fit tests for nominal variables example

As part of a demographic survey of students in this environmental issues webinar series, Alucard recorded the race and ethnicity of his students. He wants to compare the data for his class to the demographic data for Cumberland County, New Jersey as a whole

Race Alucard’s_class County_proportion

White 20 0.775

Black 9 0.132

American Indian 9 0.012

Asian 1 0.054

Pacific Islander 1 0.002

Two or more races 1 0.025

----------------- --- ------

Total 41 1.000

Ethnicity Alucard’s_class County_proportion

Hispanic 7 0.174

Not Hispanic 34 0.826

----------------- --- -----

Total 41 1.000

#### Exact tests for goodness-of-fit

##### Race data

observed = c(20, 9, 9, 1, 1, 1)

expected = c(0.775, 0.132, 0.012, 0.054, 0.002, 0.025)

library(EMT)

multinomial.test(observed, expected)

### This can take a long time!

Exact Multinomial Test, distance measure: p

Events pObs p.value

1370754 0 0

### A faster, approximate test by Monte
Carlo simulation

observed = c(20, 9, 9, 1, 1, 1)

expected = c(0.775, 0.132, 0.012, 0.054, 0.002, 0.025)

library(EMT)

multinomial.test(observed, expected,

MonteCarlo = TRUE)

Exact Multinomial Test, distance measure: p

Events pObs p.value

1370754 0 0

##### Ethnicity data

x = 7

n = 41

expected = 0.174

binom.test(x, n, expected)

Exact binomial test

number of successes = 7, number of trials = 41, p-value = 1

#### G-test for goodness-of-fit

##### Race data

observed = c(20, 9, 9, 1, 1, 1)

expected = c(0.775, 0.132, 0.012, 0.054, 0.002, 0.025)

library(DescTools)

GTest(x=observed,

p=expected,

correct="none")

### Correct: "none"
"williams" "yates"

Log likelihood ratio (G-test) goodness of fit test

data: observed

G = 46.317, X-squared df = 5, p-value = 7.827e-09

##### Ethnicity data

observed = c(7, 34)

expected = c(0.174, 0.826)

library(DescTools)

GTest(x=observed,

p=expected,

correct="none")

### Correct: "none"
"williams" "yates"

Log likelihood ratio (G-test) goodness of fit test

G = 0.0030624, X-squared df = 1, p-value = 0.9559

#### Chi-square test for goodness-of-fit

##### Race data

observed = c(20, 9, 9, 1, 1, 1)

expected = c(0.775, 0.132, 0.012, 0.054, 0.002, 0.025)

chisq.test(x = observed,

p = expected)

Chi-squared test for given probabilities

X-squared = 164.81, df = 5, p-value < 2.2e-16

##### Ethnicity data

observed = c(7, 34)

expected = c(0.174, 0.826)

chisq.test(x = observed,

p = expected)

Chi-squared test for given probabilities

X-squared = 0.0030472, df = 1, p-value = 0.956

### Effect size for goodness-of-fit tests

Cramer's *V*, a measure of association used for
2-dimensional contingency tables, can be modified for use in goodness-of-fit
tests for nominal variables. In this context, a value of Cramer's *V* of 0
indicates that observed values match expected values perfectly. If expected proportions
are equally distributed, the maximum for value for Cramer's *V* is 1.
However, if expected proportions vary among categories, Cramer’s *V* can
exceed 1.

#### Examples of effect size for goodness-of-fit tests

##### Race data

observed = c(20, 9, 9, 1, 1, 1)

expected = c(0.775, 0.132, 0.012, 0.054, 0.002, 0.025)

library(rcompanion)

cramerVFit(x = observed,

p = expected)

Cramer V

0.8966

##### Ethnicity data

observed = c(7, 34)

expected = c(0.174, 0.826)

cramerVFit (x = observed,

p = expected)

Cramer V

0.008621

### Multinomial test example with plot and confidence intervals

This is an example of a multinomial test that includes a bar plot showing confidence intervals. The data is a simple vector of counts.

For a similar example using two-way count data which is
organized into a data frame, see the "Examples of basic plots for nominal
data" section in the *Basic Plots* chapter.

Walking to the store, Jerry Coyne observed the colors of nail polish on women’s toes (Coyne, 2016). Presumably because that’s the kind of thing retired professors are apt to do. He concluded that red was a more popular color but didn’t do any statistical analysis to support his conclusion.

__Color of polish__ __Count__

Red 19

None or clear 3

White 1

Green 1

Purple 2

Blue 2

We will use a multinomial goodness of fit test to determine
if there is an overall difference in the proportion of colors (*multinomial.test*
function in the *EMT* package). The confidence intervals for each
proportion can be found with the *MultinomCI* function in the *DescTools*
package. The data then needs to be manipulated some so that we can plot the
data as counts and not proportions.

Note here that the expected counts are simply *1*
divided by the number of treatments. In this case the null hypothesis is that
the observed proportions are all the same. In the examples above, the null
hypothesis was that the observed proportions were not different than expected
proportions. It’s two ways to think of the same null hypothesis.

The confidence intervals for each proportion can be used as
a post-hoc test, to determine which proportions differ from each other, or to
determine which proportions differ from *0*.

nail.color = c("Red", "None", "White",
"Green", "Purple", "Blue")

observed = c( 19, 3, 1, 1, 2, 2 )

expected = c( 1/6, 1/6, 1/6, 1/6, 1/6, 1/6 )

library(EMT)

multinomial.test(observed,

expected)

### This may take a while. Use Monte Carlo for
large numbers.

Exact Multinomial Test, distance measure: p

Events pObs p.value

237336 0 0

library(rcompanion)

cramerVFit(observed)

### Assumes equal proportions for expected
values.

Cramer V

0.6178

library(DescTools)

MCI = MultinomCI(observed,

conf.level=0.95,

method="sisonglaz")

MCI

est lwr.ci upr.ci

[1,] 0.67857143 0.5357143 0.8423162

[2,] 0.10714286 0.0000000 0.2708876

[3,] 0.03571429 0.0000000 0.1994590

[4,] 0.03571429 0.0000000 0.1994590

[5,] 0.07142857 0.0000000 0.2351733

[6,] 0.07142857 0.0000000 0.2351733

### Order the levels, otherwise R will
alphabetize them

Nail.color = factor(nail.color,

levels=unique(nail.color))

### For plot, Create variables of counts, and then
wrap them into a data frame

Total = sum(observed)

Count = observed

Lower = MCI[,'lwr.ci'] * Total

Upper = MCI[,'upr.ci'] * Total

Data = data.frame(Count, Lower, Upper)

Data

Count Lower Upper

1 19 15 23.584853

2 3 0 7.584853

3 1 0 5.584853

4 1 0 5.584853

5 2 0 6.584853

6 2 0 6.584853

library(ggplot2)

ggplot(Data, ### The data frame to
use.

aes(x = Nail.color,

y = Count)) +

geom_bar(stat = "identity",

color = "black",

fill = "gray50",

width = 0.7) +

geom_errorbar(aes(ymin = Lower,

ymax = Upper),

width = 0.2,

size = 0.7,

position = pd,

color = "black"

) +

theme_bw() +

theme(axis.title = element_text(face = "bold")) +

ylab("Count of observations") +

xlab("Nail color")

Bar plot of the count of the color of women’s toenail
polish observed by Jerry Coyne while walking to the store. Error bars indicate
95% confidence intervals (Sison and Glaz method).

### Optional readings

** “Small numbers in chi-square and G–tests”** in
McDonald, J.H. 2014.

*Handbook of Biological Statistics*. www.biostathandbook.com/small.html.

### References

Coyne, J. A. 2016. Why is red nail polish so popular? *Why
Evolution is True*. whyevolutionistrue.wordpress.com/2016/07/02/why-is-red-nail-polish-so-popular/.

“Chi-square Test of Goodness-of-Fit” in Mangiafico, S.S.
2015a. *An R Companion for the Handbook of Biological Statistics*, version
1.09. rcompanion.org/rcompanion/b_03.html.

“Exact Test of Goodness-of-Fit” in Mangiafico, S.S. 2015a. *An
R Companion for the Handbook of Biological Statistics*, version 1.09. rcompanion.org/rcompanion/b_01.html.

“G–test of Goodness-of-Fit” in Mangiafico, S.S. 2015a. *An
R Companion for the Handbook of Biological Statistics*, version 1.09. rcompanion.org/rcompanion/b_04.html.

“Repeated G–tests of Goodness-of-Fit” in Mangiafico, S.S.
2015a. *An R Companion for the Handbook of Biological Statistics*, version
1.09. rcompanion.org/rcompanion/b_09.html.