[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

What are Estimated Marginal Means?

 

Estimated marginal means are means for groups that are adjusted for means of other factors in the model.  These may also be referred to as least square means.  In practice, these values can be determined for a wide variety of models.

 

Imagine a case where you are measuring the height of 7th-grade students in two classrooms and want to see if there is a difference between the two classrooms.  You are also recording the gender of the students, and at this age girls tend to be taller than boys.  Say classroom A happens to have far more girls than boys.  If you were to look at the mean height in the classrooms, you might find that classroom A had a higher mean, but this may not be an effect of the different classrooms, but because of the difference in the counts of boys and girls in each.  In this case, reporting estimated marginal means for the classrooms may give a more representative result.  Reporting estimated marginal means for studies where there are not equal observations for each combination of treatments is sometimes recommended.  We say the design of these studies is unbalanced.

 

The following example details this hypothetical example.  Looking at the means from the Summarize function in FSA, we might think there is a meaningful difference between the classrooms, with a mean height of 153.5 cm vs. 155.0 cm.  But looking at the estimated marginal means (emmeans), which are adjusted for the difference in boys and girls in each classroom, this difference disappears.  Each classroom has an estimated marginal mean of 153.5 cm, indicating the mean of classroom B was inflated due to the higher proportion of girls.

 

Note that the following example uses a linear model with the lm function.  Here, Height is being treated as an interval/ratio variable. 

 

This kind of analysis makes certain assumptions about the distribution of the data, but for simplicity, this example will ignore the need to determine that the data meet these assumptions.

 

Packages used in this chapter

 

The packages used in this chapter include:

•  FSA

•  psych

•  emmeans

•  car

 

The following commands will install these packages if they are not already installed:


if(!require(FSA)){install.packages("FSA")}
if(!require(psych)){install.packages("psych")}
if(!require(emmeans)){install.packages("emmeans")}
if(!require(car)){install.packages("car")}


Estimated marginal means example

 

Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="

 Classroom  Gender    Height
 A          Male    151
 A          Male    150
 A          Male    152
 A          Male    149
 A          Female  155
 A          Female  156
 A          Female  157
 A          Female  158
 B          Male    151
 B          Male    150
 B          Female  155
 B          Female  156
 B          Female  157
 B          Female  158
 B          Female  156
 B          Female  157
")


###  Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)



Arithmetic means


library(FSA)

Summarize(Height ~ Classroom,
          data=Data,
          digits=3)

 

  Classroom n nvalid  mean    sd min    Q1 median    Q3 max percZero
1         A 8      8 153.5 3.423 149 150.8  153.5 156.2 158        0
2         B 8      8 155.0 2.928 150 154.0  156.0 157.0 158        0


Estimated marginal means


model = lm(Height ~ Classroom + Gender + Classroom:Gender,
           data = Data)

library(emmeans)

emmeans(model,
        ~ Classroom)


Classroom emmean    SE df lower.CL upper.CL
 A            154 0.408 12      153      154
 B            154 0.471 12      152      155


In this particular case, the emmeans output truncates the significant digits, so we’ll just convert it to a data frame to see the values more precisely.


marginal = emmeans(model, ~ Classroom)

as.data.frame(marginal)


 Classroom emmean        SE df lower.CL upper.CL
 A          153.5 0.4082483 12 152.6105 154.3895
 B          153.5 0.4714045 12 152.4729 154.5271


Note that an analysis of variance also would have told us that there is a difference between levels of Gender, but not between levels of Classroom.


library(car)

Anova(model)


Anova Table (Type II tests)

              Sum Sq Df F value    Pr(>F)   
Classroom          0  1     0.0         1   
Gender           126  1    94.5 4.857e-07 ***
Classroom:Gender   0  1     0.0         1   
Residuals         16 12