*Estimated marginal means* are means for groups that
are adjusted for means of other factors in the model. These may also be
referred to as *least square means*. In practice, these values can be
determined for a wide variety of models.

Imagine a case where you are measuring the height of
7th-grade students in two classrooms and want to see if there is a difference
between the two classrooms. You are also recording the gender of the students,
and at this age girls tend to be taller than boys. Say classroom *A*
happens to have far more girls than boys. If you were to look at the mean
height in the classrooms, you might find that classroom *A* had a higher
mean, but this may not be an effect of the different classrooms, but because of
the difference in the counts of boys and girls in each. In this case,
reporting estimated marginal means for the classrooms may give a more
representative result. Reporting estimated marginal means for studies where
there are not equal observations for each combination of treatments is
sometimes recommended. We say the design of these studies is *unbalanced*.

The following example details this hypothetical example.
Looking at the means from the *Summarize* function in *FSA*, we might
think there is a meaningful difference between the classrooms, with a mean
height of 153.5 cm vs. 155.0 cm. But looking at the estimated marginal means (*emmeans*),
which are adjusted for the difference in boys and girls in each classroom, this
difference disappears. Each classroom has an estimated marginal mean of 153.5
cm, indicating the mean of classroom *B* was inflated due to the higher
proportion of girls.

Note that the following example uses a linear model with the
*lm* function. Here, *Height* is being treated as an interval/ratio
variable.

This kind of analysis makes certain assumptions about the distribution of the data, but for simplicity, this example will ignore the need to determine that the data meet these assumptions.

### Packages used in this chapter

The packages used in this chapter include:

• FSA

• psych

• emmeans

• car

The following commands will install these packages if they are not already installed:

if(!require(FSA)){install.packages("FSA")}

if(!require(psych)){install.packages("psych")}

if(!require(emmeans)){install.packages("emmeans")}

if(!require(car)){install.packages("car")}

### Estimated marginal means example

Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="

Classroom Gender Height

A Male 151

A Male 150

A Male 152

A Male 149

A Female 155

A Female 156

A Female 157

A Female 158

B Male 151

B Male 150

B Female 155

B Female 156

B Female 157

B Female 158

B Female 156

B Female 157

")

### Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)

#### Arithmetic means

library(FSA)

Summarize(Height ~ Classroom,

data=Data,

digits=3)

Classroom n nvalid mean sd min Q1 median Q3 max
percZero

1 A 8 8 153.5 3.423 149 150.8 153.5 156.2 158 0

2 B 8 8 155.0 2.928 150 154.0 156.0 157.0 158 0

#### Estimated marginal means

model = lm(Height ~ Classroom + Gender + Classroom:Gender,

data = Data)

library(emmeans)

emmeans(model,

~ Classroom)

Classroom emmean SE df lower.CL upper.CL

A 154 0.408 12 153 154

B 154 0.471 12 152 155

In this particular case, the *emmeans *output truncates
the significant digits, so we’ll just convert it to a data frame to see the
values more precisely.

marginal = emmeans(model, ~ Classroom)

as.data.frame(marginal)

Classroom emmean SE df lower.CL upper.CL

A 153.5 0.4082483 12 152.6105 154.3895

B 153.5 0.4714045 12 152.4729 154.5271

Note that an analysis of variance also would have told us
that there is a difference between levels of *Gender*, but not between
levels of *Classroom*.

library(car)

Anova(model)

Anova Table (Type II tests)

Sum Sq Df F value Pr(>F)

Classroom 0 1 0.0 1

Gender 126 1 94.5 4.857e-07 ***

Classroom:Gender 0 1 0.0 1

Residuals 16 12