[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Hermite and Poisson Regression for Count Data

Introduction

 

Count data

In general, common parametric tests like t-test and anova shouldn’t be used for count data.  One reason is technical in nature:  that parametric analyses require continuous data.  Count data is by its nature discrete and is left-censored at zero.  (That is, usually counts can’t be less than zero.)

 

A second reason is more practical in nature.  Count data are often highly skewed, and often produce skewed residuals if a parametric approach is attempted.  In this case, the hypothesis tests will not be accurate.

 

For further discussion, see the “Count data may not be appropriate for common parametric tests” section in the Introduction to Parametric Tests chapter.

 

Regression approaches for count data

The most common regression approach for handling count data is probably Poisson regression.  However, Poisson regression makes assumptions about the distribution of the data that may not be appropriate in all cases.  Hermite regression is a more flexible approach, but at the time of writing doesn’t have a complete set of support functions in R.  Quasi-Poisson regression is also flexible with data assumptions, but also but at the time of writing doesn’t have a complete set of support functions in R.  Negative binomial regression allows for overdispersion in data; and zero-inflated regression is useful when there are a high proportion of zero counts in the data.

 

Cautionary note
Note that model assumptions and pitfalls of these regression techniques are not discussed in depth here.  The reader is urged to understand the assumptions of this kind of modeling before proceeding.

 

Generalized linear regression

Poisson, Hermite, and related regression approaches are a type of generalized linear model.  This should not be confused with general linear model, which is implemented with the lm function.  Generalized linear models are implemented with the glm function or other functions.

 

Generalized linear models are used when the dependent variable is count, binary, multinomial, etc.  More information on using the glm function can be found by using help(glm) and help(family).  For examples of logistic regression, see the chapter Models for Nominal Data; the chapter Beta Regression for Percent and Proportion Data; or Mangiafico (2015) in the “References” section.  For a table of common uses for family and link function in generalized linear models, see the Wikipedia article in the “References” section for this chapter.

 

Packages used in this chapter

 

The packages used in this chapter include:

•  pysch

•  hermite

•  lattice

•  plyr

•  boot

•  DescTools

•  ggplot2

•  car

•  multcompView

•  lsmeans

•  MASS

•  pscl

•  rcompanion

•  robust

 

The following commands will install these packages if they are not already installed:

 

if(!require(psych)){install.packages("psych")}
if(!require(hermite)){install.packages("hermite")}
if(!require(lattice)){install.packages("lattice")}
if(!require(plyr)){install.packages("plyr")}
if(!require(boot)){install.packages("boot")}
if(!require(DescTools)){install.packages("DescTools")}
if(!require(ggplot2)){install.packages("ggplot2")}
if(!require(car)){install.packages("car")}
if(!require(multcompView)){install.packages("multcompView")}
if(!require(lsmeans)){install.packages("lsmeans")}
if(!require(MASS)){install.packages("MASS")}
if(!require(pscl)){install.packages("pscl")}
if(!require(rcompanion)){install.packages("rcompanion")}
if(!require(robust)){install.packages("robust")}


Hermite regression

 

Hermite regression example

The generalized Hermite distribution is a more general distribution that can handle overdispersion or multimodality (Moriña and others, 2015).  This makes generalized Hermite regression a powerful and flexible tool for modeling count data.  It is implemented with the hermite package.

 

Fitting models with the hermite package can be somewhat difficult.  One issue is that model fitting may fail without some parameters being specified.  Often specifying an appropriate value for the m option will help. 

 

A further difficulty with this approach is that, at the time writing, the package isn’t supported by the anova function to compare models, the Anova function to test effects, or other useful functions like lsmeans for factor effects.  The function nagelkerkeHermite can be used to compare models to test effects.

In this example, extension researchers have set up garden plots with different suites of plants, with each suite identified as a level of the variable Garden below.  In September, they counted the number of monarch butterflies in each garden plot.


Input = ("
Garden   Monarchs
A        0
A        4
A        2
A        2
A        0
A        6
A        0
A        0
B        5
B        9
B        7
B        5
B        7
B        5
B        9
B        5
C       10
C       14
C       12
C       12
C       10
C       16
C       10
C       10
")

Data = read.table(textConnection(Input),header=TRUE)


###  Order factors by the order in data frame

###  Otherwise, R will alphabetize them

Data$Garden = factor(Data$Garden,
                     levels=unique(Data$Garden))


###  Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)


### Remove unnecessary objects

rm(Input)


Hermite regression

Here, the hermite package is used to conduct hermite regression.  The nagelkerkeHermite function is used to test the effect of Garden by likelihood ratio test, comparing the full model to null model with the Garden effect excluded.  It also produces a pseudo R-squared value for the model by comparison to a null model.

 

Here, the m=3 option is specified.  Often the default m=NULL can be used.  In this case, if the m value is not specified, the function cannot complete the model fitting, and errors are produced.  Using m=2 often works.  Here, m=3 was used because it produced a model with a lower AIC than did the m=2 option.


library(hermite)

model = glm.hermite(Monarchs ~ Garden,
                    data = Data,
                    link = "log",
                    m=3)

summary(model)


### Determine p-value and pseudo R-squared for model


null = glm.hermite(Monarchs ~ 1,
                   data = Data,
                   link = "log",
                   m=3)

library(rcompanion)

nagelkerkeHermite(model, null)


$Models
                                                    
Model: "glm.hermite, Monarchs ~ Garden, Data, log, 3"
Null:  "glm.hermite, Monarchs ~ 1, Data, log, 3"    

$Pseudo.R.squared.for.model.vs.null
                             Pseudo.R.squared
McFadden                             0.249943
Cox and Snell (ML)                   0.766548
Nagelkerke (Cragg and Uhler)         0.768829

$Likelihood.ratio.test
 Df.diff LogLik.diff  Chisq    p.value
      -2     -17.457 34.915 2.6204e-08

$AIC
          AIC
Model: 112.78
Null:  143.69

   ### Note that because the null model is the same as the full model
   ###   without the Garden effect, this test can also be considered a
   ###   test of the Garden effect, and the pseudo R-squared akin to
   ###   a partial pseudo R-squared due to the effect of Garden.



Histograms


library(lattice)

histogram(~ Monarchs | Garden,
          data=Data,
          layout=c(1,3)      #  columns and rows of individual plots
          )


image


Post-hoc analysis: Medians and confidence intervals

At the time of writing, the lsmeans package does not support post-hoc analysis of regressions produced with the hermite package.

 

One imperfect approach for post-hoc analysis would be to examine median counts for treatments and the confidence intervals of these medians.  We can conclude that groups with non-overlapping 95% confidence intervals for their medians are significantly different. 

However, this approach does not represent any information learned from the Hermite regression.

A second issue is that, because the dependent variable is not continuous, the distribution of the bootstrapped confidence intervals is not likely to be continuous, and so is may not be reliable.

 

To get confidence intervals for the medians for each group, we will use the groupwiseMedian function.  Here I used the percentile method for confidence intervals. 


library(rcompanion)

Sum = groupwiseMedian(Monarchs ~ Garden,
                      data=Data,
                      conf=0.95,
                      R=5000,
                      percentile=TRUE,
                      bca=FALSE,
                      digits=3)

Sum


  Garden n Median Conf.level Percentile.lower Percentile.upper
1      A 8      1       0.95                0                4
2      B 8      6       0.95                5                8
3      C 8     11       0.95               10               14

   ###  In this case, none of the confidence intervals overlap.


Plot of medians and confidence intervals

The data frame Sum created above will be passed to ggplot for plotting.  At the end of the code, annotate is used to add text to the plot to indicate which medians are significantly different from one another.


library(ggplot2)

ggplot(Sum,                ### The data frame to use.
       aes(x = Garden,
           y = Median)) +
   geom_errorbar(aes(ymin = Percentile.lower,
                     ymax = Percentile.upper),
                     width = 0.05,
                     size  = 1) +
   geom_point(shape = 15,
              size  = 5) +
   theme_bw() +
   theme(axis.title   = element_text(face  = "bold")) +
    ylab("Median count of monarch butterflies") +

annotate("text",
         x = 1:3,
         y = c(5, 10, 15),
         label = c("Group 3", "Group 2", "Group 1"))


image


Post-hoc analysis: Pairwise models

Another imperfect approach to post-hoc analysis would be to produce additional hermite regression models which isolate groups to compare in a pairwise manner.


Data1 = subset(Data,
               Garden=="A" | Garden=="B")
Data2 = subset(Data,
               Garden=="A" | Garden=="C")
Data3 = subset(Data,
               Garden=="B" | Garden=="C")


### Pairwise regressions

library(hermite)

model1 = glm.hermite(Monarchs ~ Garden,
                     data = Data1,
                     link = "log",
                     m=3)


null1 = glm.hermite(Monarchs ~ 1,
                    data = Data1,
                    link = "log",
                    m=3)

library(rcompanion)

nagelkerkeHermite(model1,
                  null1)


$Likelihood.ratio.test
 Df.diff LogLik.diff  Chisq    p.value
      -1     -6.7312 13.462 0.00024339


library(hermite)

model2 = glm.hermite(Monarchs ~ Garden,
                     data = Data2,
                     link = "log",
                     m=3)


null2 = glm.hermite(Monarchs ~ 1,
                    data = Data2,
                    link = "log",
                    m=3)

library(rcompanion)

nagelkerkeHermite(model2,
                  null2)


$Likelihood.ratio.test
 Df.diff LogLik.diff  Chisq    p.value
      -1      -13.42  26.84 2.2099e-07


library(hermite)

model3 = glm.hermite(Monarchs ~ Garden,
                     data = Data3,
                     link = "log",
                     m=3)


null3 = glm.hermite(Monarchs ~ 1,
                    data = Data3,
                    link = "log",
                    m=3)

library(rcompanion)

nagelkerkeHermite(model3,
                  null3)


$Likelihood.ratio.test
 Df.diff LogLik.diff  Chisq    p.value
      -1     -6.0386 12.077 0.00051042


p.value = c(0.00024339, 2.2099e-07, 0.00051042)

p.adj = p.adjust(p.value,
                 method = "fdr")

p.adj


[1] 3.65085e-04 6.62970e-07 5.10420e-04


Poisson regression example

 

Poisson regression makes certain assumptions about the relationship between the mean and the dispersion of the dependent variable.  Because this assumption may not be met for all data sets, Poisson regression may not be recommended for routine use.  Particularly, classic Poisson regression should be avoided if there is overdispersion in the data or if there are several zero counts in the dependent variable. 

 

An alternate approach for data with overdispersion is negative binomial regression. 

An alternative approach for data with many zeros is zero-inflated Poisson regression.

 

For further discussion, see the “Count data may not be appropriate for common parametric tests” section in the Introduction to Parametric Tests chapter.

 

Note that model assumptions and pitfalls of this approach are not discussed here.  The reader is urged to understand the assumptions of this kind of modeling before proceeding.


model.p = glm(Monarchs ~ Garden,
              data=Data,
              family="poisson")

library(car)

Anova(model.p,
      type="II",
      test="LR")


Analysis of Deviance Table (Type II tests)

       LR Chisq Df Pr(>Chisq)   
Garden   66.463  2  3.697e-15 ***


library(rcompanion)

nagelkerke(model.p)


$Pseudo.R.squared.for.model.vs.null
                             Pseudo.R.squared
McFadden                             0.387929
Cox and Snell (ML)                   0.937293
Nagelkerke (Cragg and Uhler)         0.938037

$Likelihood.ratio.test
 Df.diff LogLik.diff  Chisq    p.value
      -2     -33.231 66.463 3.6967e-15


library(multcompView)

library(lsmeans)

leastsquare = lsmeans(model.p,
                      pairwise ~ Garden,
                      adjust="tukey")      ### Tukey adjustment for comparisons

leastsquare

cld(leastsquare,
    alpha=0.05,
    Letters=letters,  ### Use lower-case letters for .group
    adjust="tukey")   ### Tukey adjustment for multiple comparisons


Garden    lsmean        SE df   asymp.LCL asymp.UCL .group
 A      0.5596158 0.2672450 NA -0.07849527  1.197727  a   
 B      1.8718022 0.1386750 NA  1.54068251  2.202922   b  
 C      2.4638532 0.1031421 NA  2.21757688  2.710130    c 

Results are given on the log (not the response) scale.
Confidence level used: 0.95
Conf-level adjustment: sidak method for 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates
Tests are performed on the log scale
significance level used: alpha = 0.05

   ### Note that estimates are on log scale


Negative binomial regression example

 

Negative binomial regression is similar in application to Poisson regression, but allows for overdispersion in the dependent count variable. 

 

This example will use the glm.nb function in the MASS package.  The Anova function in the car package will be used for an analysis of deviance, and the nagelkerke function will be used to determine a p-value and pseudo R-squared value for the model.  Post-hoc analysis can be conducted with the lsmeans package.

 

Note that model assumptions and pitfalls of this approach are not discussed here.  The reader is urged to understand the assumptions of this kind of modeling before proceeding.


library(MASS)

model.nb = glm.nb(Monarchs ~ Garden,
                  data=Data,
                  control = glm.control(maxit=10000))

library(car)

Anova(model.nb,
      type="II",
      test="LR")


Analysis of Deviance Table (Type II tests)

       LR Chisq Df Pr(>Chisq)   
Garden   66.464  2  3.694e-15 ***


Library(rcompanion)

nagelkerke(model.nb)


$Pseudo.R.squared.for.model.vs.null
                             Pseudo.R.squared
McFadden                             0.255141
Cox and Snell (ML)                   0.776007
Nagelkerke (Cragg and Uhler)         0.778217

$Likelihood.ratio.test
 Df.diff LogLik.diff  Chisq    p.value
      -2     -17.954 35.907 1.5952e-08


library(multcompView)

library(lsmeans)

leastsquare = lsmeans(model.nb,
                      pairwise ~ Garden,
                      adjust="tukey")      ### Tukey adjustment

leastsquare

cld(leastsquare,
    alpha=0.05,
    Letters=letters,  ### Use lower-case letters for .group
    adjust="tukey")   ### Tukey adjustment for multiple comparisons


Garden    lsmean        SE df asymp.LCL asymp.UCL .group
 A      0.5596158 0.2672612 NA -0.078534  1.197766  a   
 B      1.8718022 0.1386750 NA  1.540683  2.202922   b  
 C      2.4638532 0.1031421 NA  2.217577  2.710130    c 

Results are given on the log (not the response) scale.
Confidence level used: 0.95
Conf-level adjustment: sidak method for 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates
Tests are performed on the log scale
significance level used: alpha = 0.05

   ### Note that estimates are on log scale


Zero-inflated regression example

 

Zero-inflated regression is similar in application to Poisson regression, but allows for an abundance of zeros in the dependent count variable.

 

This example will use the zeroinfl function in the pscl package.  The Anova function in the car package will be used for an analysis of deviance, and the nagelkerke function will be used to determine a p-value and pseudo R-squared value for the model.  Post-hoc analysis can be conducted with the lsmeans package.


library(pscl)

model.zi = zeroinfl(Monarchs ~ Garden,
                    data = Data,
                    dist = "poisson")

                       ### dist = "negbin" may be used

summary(model.zi)


Call:
zeroinfl(formula = Monarchs ~ Garden | Garden, data = Data, dist = "poisson")

Count model coefficients (poisson with log link):
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)   1.2182     0.2847   4.278 1.89e-05 ***
GardenB       0.6536     0.3167   2.064    0.039 * 
GardenC       1.2457     0.3029   4.113 3.90e-05 ***

Zero-inflation model coefficients (binomial with logit link):
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.046e-02  7.363e-01  -0.096    0.924
GardenB     -2.057e+01  1.071e+04  -0.002    0.998
GardenC     -2.057e+01  1.071e+04  -0.002    0.998

   ### Note that there are separate coefficients for the
   ###  Poisson part of the analysis and for the zero-inflation part.



library(car)

Anova(model.zi,
      type="II",
      test="Chisq")


Analysis of Deviance Table (Type II tests)

       Df  Chisq Pr(>Chisq)   
Garden  2 23.914  6.414e-06 ***


library(rcompanion)

nagelkerke(model.zi)


$Pseudo.R.squared.for.model.vs.null
                             Pseudo.R.squared
McFadden                             0.284636
Cox and Snell (ML)                   0.797356
Nagelkerke (Cragg and Uhler)         0.800291

$Likelihood.ratio.test
 Df.diff LogLik.diff  Chisq    p.value
      -4     -19.156 38.311 9.6649e-08


library(multcompView)

library(lsmeans)

leastsquare = lsmeans(model.zi,
                      pairwise ~ Garden,
                      adjust="tukey")      ### Tukey adjustment

leastsquare

cld(leastsquare,
    alpha=0.05,
    Letters=letters,  ### Use lower-case letters for .group
    adjust="tukey")   ### Tukey adjustment for multiple comparisons


Garden lsmean        SE df   asymp.LCL asymp.UCL .group
 A        1.75 0.7586301 NA -0.06140972  3.561410  a   
 B        6.50 0.9013877 NA  4.34772229  8.652278   b  
 C       11.75 1.2119199 NA  8.85625304 14.643747    c 

Confidence level used: 0.95
Conf-level adjustment: sidak method for 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates
significance level used: alpha = 0.05              

   ### Note, lsmeans are on the original measurement scale


Robust Poisson regression example

 

Robust Poisson regression is robust to outliers in the dependent variable.

 

This example uses the glmRob function in the robust package.  The anova function can be used to conduct an analysis of deviance.  The p-value for the model can be found by comparing the model to a null model. However, at the time of writing, I don’t know of any way to determine AIC or pseudo R-squared for the model.

 

At the time of writing, the glmRob function can only use the Poisson and binomial families of models.

 

An alternate method is the glmrob function in the robustbase package.


library(robust)

model.rob = glmRob(Monarchs ~ Garden,
                   data = Data,
                   family = "poisson")

anova(model.rob, test="Chisq")


       Df Deviance Resid. Df Resid. Dev     Pr(>Chi)
NULL   NA       NA        23  430.19850           NA
Garden  2 400.9221        21   29.27641 3.567693e-63


model.rob.null = glmRob(Monarchs ~ 1,
                        data = Data,
                        family = "poisson")

anova(model.rob.null, model.rob, test="Chisq")


   Terms Resid. Df Resid. Dev Test Df Deviance     Pr(>Chi)
1      1        23   95.12606      NA       NA           NA
2 Garden        21   29.27641       2 65.84965 5.536815e-11


Quasi-Poisson regression

 

Quasi-Poisson regression is useful since it has a variable dispersion parameter, so that it can model over-dispersed data.   It may be better than negative binomial regression in some circumstances (Verhoef and Boveng. 2007).

 

At the time of writing, Quasi-Poisson regression doesn’t have complete set of support functions in R.   Using the quasipoisson family option in the glm function, the results will have the same parameter coefficients as with the poisson option, but the inference statistics are adjusted in the summary function.  The Anova function in the car package can be used for an analysis of deviance table, and the lsmeans package can be used for post-hoc comparisons.  Since the model doesn’t produce a log-likelihood value, I don’t know a way to produce a p-value for the mode, for a pseudo R-squared value for the model.

.


model.qp = glm(Monarchs ~ Garden,
               data=Data,
               family="quasipoisson")

library(car)

Anova(model.qp,
      type="II",
      test="LR")


Analysis of Deviance Table (Type II tests)

Response: Monarchs
       LR Chisq Df Pr(>Chisq)   
Garden   52.286  2  4.429e-12 ***


library(multcompView)

library(lsmeans)

leastsquare = lsmeans(model.qp,
                      pairwise ~ Garden,
                      adjust="tukey")

leastsquare

cld(leastsquare,
    alpha=0.05,
    Letters=letters,
    adjust="tukey")


 A      0.5596158 0.3013057 NA -0.1598233  1.279055  a   
 B      1.8718022 0.1563493 NA  1.4984809  2.245123   b  
 C      2.4638532 0.1162877 NA  2.1861887  2.741518    c 

Results are given on the log (not the response) scale.
Confidence level used: 0.95
Conf-level adjustment: sidak method for 3 estimates
P value adjustment: tukey method for comparing a family of 3 estimates
significance level used: alpha = 0.05


Optional code for chi-square goodness-of-fit test

 

An alternative approach to handling count data is to sum up the counts for treatments, and use a chi-square test or related test.  Here, a chi-square goodness-of-fit test is used to see if counts differ from “expected” equal proportions.

 

Omnibus test


Tabla = xtabs(Monarchs ~ Garden,
              data = Data)

Tabla


Garden
 A  B  C
14 52 94


chisq.test(Tabla)


Chi-squared test for given probabilities

X-squared = 60.05, df = 2, p-value = 9.127e-14


Post-hoc chi-square tests


Garden.A = sum(Data$Monarchs[Data$Garden=="A"])
Garden.B = sum(Data$Monarchs[Data$Garden=="B"])
Garden.C = sum(Data$Monarchs[Data$Garden=="C"])

observed = c(Garden.A, Garden.B)   # observed frequencies
expected = c(1/2, 1/2)             # expected proportions

chisq.test(x = observed,
           p = expected)


Chi-squared test for given probabilities

X-squared = 21.879, df = 1, p-value = 2.904e-06


observed = c(Garden.B, Garden.C)   # observed frequencies
expected = c(1/2, 1/2)             # expected proportions

chisq.test(x = observed,
           p = expected)


Chi-squared test for given probabilities

X-squared = 12.082, df = 1, p-value = 0.0005091


observed = c(Garden.A, Garden.C)   # observed frequencies
expected = c(1/2, 1/2)             # expected proportions

chisq.test(x = observed,
           p = expected)


Chi-squared test for given probabilities

X-squared = 59.259, df = 1, p-value = 1.382e-14


Optional analysis: Vuong test to compare Poisson, negative binomial, and zero-inflated models

 

The Vuong test, implemented by the pscl package, can test two non-nested models.  It works with negbin, zeroinfl, and some glm model objects which are fitted to the same data.

 

The null hypothesis is that there is no difference in models.  The function produces three tests, a “Raw” test, an AIC-corrected, and a BIC-corrected, any of which could be used.

 

It has been suggested that the Vuong test not be used to test for zero-inflation (Wilson, 2015).

 

Define models


model.p = glm(Monarchs ~ Garden,
              data=Data,
              family="poisson")

library(MASS)
 
model.nb = glm.nb(Monarchs ~ Garden,
                  data=Data,
                  control = glm.control(maxit=10000))
 
library(pscl)

model.zi = zeroinfl(Monarchs ~ Garden,
                    data = Data,
                    dist = "poisson")

Vuong test


library(pscl)

vuong(model.p,
      model.nb,
      digits = 4)


Vuong Non-Nested Hypothesis Test-Statistic:
(test-statistic is asymptotically distributed N(0,1) under the
 null that the models are indistinguishible)
-------------------------------------------------------------
              Vuong z-statistic             H_A p-value
Raw                  0.03324988 model1 > model2 0.48674
AIC-corrected        0.03324988 model1 > model2 0.48674
BIC-corrected        0.03324988 model1 > model2 0.48674

   ### Positive Vuong z-statistic suggests that model 1 is superior,
   ###   but, in this case, the difference is not significant,
   ###   and the value of the statistic is probably too tiny to be meaningful.



vuong(model.p,
      model.zi,
      digits = 4)


Vuong Non-Nested Hypothesis Test-Statistic:
(test-statistic is asymptotically distributed N(0,1) under the
 null that the models are indistinguishible)
-------------------------------------------------------------
              Vuong z-statistic             H_A  p-value
Raw                  -1.4424725 model2 > model1 0.074585
AIC-corrected        -0.4335210 model2 > model1 0.332318
BIC-corrected         0.1607786 model1 > model2 0.436134

   ### Negative Vuong z-statistic suggests that model 2 is superior.
   ### If the Raw statistic is used, p = 0.07 gives some evidence
   ###   that zi model is superior.



vuong(model.nb,
      model.zi,
      digits = 4)


Vuong Non-Nested Hypothesis Test-Statistic:
(test-statistic is asymptotically distributed N(0,1) under the
 null that the models are indistinguishible)
-------------------------------------------------------------
              Vuong z-statistic             H_A  p-value
Raw                  -1.4424725 model2 > model1 0.074585
AIC-corrected        -0.4335210 model2 > model1 0.332318
BIC-corrected         0.1607786 model1 > model2 0.436134

   ### Negative Vuong z-statistic suggests that model 2 is superior.
   ### If the Raw statistic is used, p = 0.07 gives some evidence
   ###   that zi model is superior.



References

 

Moriña, D., M. Higueras, P. Puig, and M. Oliveira. 2015. Generalized Hermite Distribution

Modelling with the R Package hermite. The R Journal 7(2):263–274. journal.r-project.org/archive/2015-2/morina-higueras-puig-etal.pdf.

 

help(package="hermite")

 

library(hermite); ?glm.hermite

 

library(MASS); ?glm.nb

 

library(pscl); ?zeroinfl

 

library(pscl); ?vuong

 

 “Simple Logistic Regression” in Mangiafico, S.S. 2015. An R Companion for the Handbook of Biological Statistics, version 1.09. rcompanion.org/rcompanion/e_06.html.

 

"Generalized linear model: Link function". No date. Wikipedia. Retrieved 31 Jan. 2016. en.wikipedia.org/wiki/Generalized_linear_model#Link_function.

 

Verhoef, J.M. and P.L. Boveng. 2007. Quasi-Poisson vs. negative binomial regression: How should we model overdispersed count data? Ecology 88(11) 2766–2772. http://fisher.utstat.toronto.edu/reid/sta2201s/QUASI-POISSON.pdf.

 

 

Wilson, P. 2015. The Misuse of the Vuong Test for Non-Nested Models to Test for Zero-Inflation. Economic Letters 127: 51–53. cybermetrics.wlv.ac.uk/paperdata/misusevuong.pdf

 

References for count data

 

Grace-Martin, K. No date. "Regression Models for Count Data". The Analysis Factor. www.theanalysisfactor.com/regression-models-for-count-data/.

 

Grace-Martin, K. No date. " Zero-Inflated Poisson Models for Count Outcomes". The Analysis Factor. www.theanalysisfactor.com/zero-inflated-poisson-models-for-count-outcomes/.

 

[IDRE] Institute for Digital Research and Education. 2015. “R Data Analysis Examples: Poisson Regression”. UCLA.  www.ats.ucla.edu/stat/r/dae/poissonreg.htm.

 

[IDRE] Institute for Digital Research and Education. 2015. “R Data Analysis Examples: Negative Binomial Regression”. UCLA.  www.ats.ucla.edu/stat/r/dae/nbreg.htm.

 

[IDRE] Institute for Digital Research and Education. 2015. “R Data Analysis Examples: Zero-Inflated Poisson Regression”. UCLA.  www.ats.ucla.edu/stat/r/dae/zipoisson.htm.

 

[IDRE] Institute for Digital Research and Education. 2015. “R Data Analysis Examples: Zero-Truncated Poisson Regression”. UCLA.  www.ats.ucla.edu/stat/r/dae/ztp.htm.