Measures of accuracy and error can be used to determine how well a given model fits the data. They are distinct from the R-squared and pseudo R-squared measures discussed in the last chapter.
These statistics are useful to compare a wide variety of models where the dependent variable is continuous. In general, one would want to compare only among models with the same data for the dependent variable.
Measure |
Units |
Interpretation |
Min-max accuracy |
Unitless |
1 is perfect fit |
Mean absolute error |
Same as variable |
0 is perfect fit |
Mean absolute percent error |
Unitless |
0 is perfect fit |
Mean square error |
Square of variable units |
0 is perfect fit |
Root mean square error |
Same as variable |
0 is perfect fit |
Normalized root mean square error |
Unitless |
0 is perfect fit |
Efron’s pseudo R-squared |
Unitless |
1 is perfect fit |
Packages used in this chapter
The packages used in this chapter include:
• rcompanion
• betareg
• nlme
• lme4
• quantreg
• mgcv
• MASS
• robust
The following commands will install these packages if they are not already installed:
if(!require(rcompanion)){install.packages("rcompanion")}
if(!require(betareg)){install.packages("betareg")}
if(!require(nlme)){install.packages("nlme")}
if(!require(lme4)){install.packages("lme4")}
if(!require(quantreg)){install.packages("quantreg")}
if(!require(mgcv)){install.packages("mgcv")}
if(!require(MASS)){install.packages("MASS")}
if(!require(robust)){install.packages("robust")}
Examples of accuracy and error
The following example fits various models to the BrendonSmall data set in the rcompanion package, and then uses the accuracy function to produce accuracy and error statistics for these models.
Note that model.5 has a different dependent variable than the others.
### Prepare data set
library(rcompanion)
data(BrendonSmall)
BrendonSmall$Calories = as.numeric(BrendonSmall$Calories)
BrendonSmall$Sodium = as.numeric(BrendonSmall$Sodium)
BrendonSmall$Calories2 = BrendonSmall$Calories ^ 2
BrendonSmall$Sodium.ratio = BrendonSmall$Sodium / 1500
BrendonSmall$Grade = factor(BrendonSmall$Grade)
head(BrendonSmall)
Instructor Grade Weight Calories Sodium Score Calories2
Sodium.ratio
1 Brendon Small 6 43 2069 1287 77 4280761 0.8580000
2 Brendon Small 6 41 1990 1164 76 3960100 0.7760000
3 Brendon Small 6 40 1975 1177 76 3900625 0.7846667
4 Brendon Small 6 44 2116 1262 84 4477456 0.8413333
5 Brendon Small 6 45 2161 1271 86 4669921 0.8473333
6 Brendon Small 6 44 2091 1222 87 4372281 0.8146667
### Fit various models
model.1 = lm(Sodium ~ Calories, data = BrendonSmall)
model.2 = lm(Sodium ~ Calories + Calories2, data = BrendonSmall)
model.3 = glm(Sodium ~ Calories, data = BrendonSmall, family="Gamma")
quadplat = function(x, a, b, clx) {
ifelse(x < clx, a + b * x + (-0.5*b/clx) * x * x,
a + b * clx + (-0.5*b/clx) * clx * clx)}
model.4 = nls(Sodium ~ quadplat(Calories, a, b, clx),
data = BrendonSmall,
start = list(a = 519, b = 0.359, clx = 2300))
library(betareg)
model.5 = betareg(Sodium.ratio ~ Calories, data = BrendonSmall)
library(nlme)
model.6 = gls(Sodium ~ Calories, data = BrendonSmall, method="REML")
model.7 = lme(Sodium ~ Calories,
random = ~1|Instructor,
data = BrendonSmall,
method="REML")
library(lme4)
model.8 = lmer(Sodium ~ Calories + (1|Instructor),
data=BrendonSmall,
REML=TRUE)
library(lme4)
model.9 = suppressWarnings(glmer(Sodium ~ Calories + (1|Instructor),
data=BrendonSmall,
family = gaussian))
library(quantreg)
model.10 = rq(Sodium ~ Calories,
data = BrendonSmall,
tau = 0.5)
model.11 = loess(Sodium ~ Calories,
data = BrendonSmall,
span = 0.75,
degree=2,
family="gaussian")
library(mgcv)
model.12 = gam(Sodium ~ s(Calories),
data = BrendonSmall,
family=gaussian())
library(MASS)
model.13 = glm.nb(Sodium ~ Grade,
data = BrendonSmall)
library(robust)
model.14 = glmRob(Sodium ~ Grade,
data = BrendonSmall,
family = "poisson")
### Produce accuracy and error statisitcs
library(rcompanion)
accuracy(list(model.1, model.2, model.3, model.4, model.5, model.6,
model.7, model.8, model.9, model.10, model.11, model.12,
model.13, model.14),
plotit=TRUE)
$Fit.criteria
Min.max.accuracy MAE MedAE MAPE MSE RMSE NRMSE.mean
NRMSE.median Efron.r.squared CV.prcnt
1 0.976 32.7000 30.0000 0.0244 1.44e+03 38.0000 0.0282
0.0275 0.721 2.82
2 0.983 22.6000 20.9000 0.0170 7.13e+02 26.7000 0.0198
0.0194 0.862 1.98
3 0.975 34.4000 31.6000 0.0257 1.63e+03 40.4000 0.0300
0.0293 0.684 3.00
4 0.984 22.0000 20.0000 0.0166 7.00e+02 26.5000 0.0196
0.0192 0.865 1.96
5 0.980 0.0179 0.0177 0.0201 4.09e-04 0.0202 0.0225
0.0220 0.822 2.25
6 0.976 32.7000 30.0000 0.0244 1.44e+03 38.0000 0.0282
0.0275 0.721 2.82
7 0.979 28.5000 25.2000 0.0213 1.15e+03 33.9000 0.0252
0.0246 0.778 2.52
8 0.979 28.5000 25.2000 0.0213 1.15e+03 33.9000 0.0252
0.0246 0.778 2.52
9 0.979 28.5000 25.2000 0.0213 1.15e+03 33.9000 0.0252
0.0246 0.778 2.52
10 0.976 32.6000 29.3000 0.0243 1.50e+03 38.7000 0.0287
0.0280 0.710 2.87
11 0.987 17.9000 15.1000 0.0135 5.03e+02 22.4000 0.0167
0.0163 0.903 1.67
12 0.987 17.1000 15.7000 0.0129 4.53e+02 21.3000 0.0158
0.0154 0.913 1.58
13 0.979 27.7000 18.4000 0.0213 1.61e+03 40.1000 0.0298
0.0290 0.690 2.98
14 0.728 365.0000 283.0000 0.2720 2.03e+05 450.0000 0.3340
0.3260 -38.200 33.40
Note in the previous results that model.14, the glmRob model, did not fit the data well. In this case, fitting a Poisson regression model is probably not appropriate for the data here, but is included since this type of model is accepted by the accuracy function. Also remember that model.5 has a different dependent variable than the other models, and so some statistics may not be comparable with other those from other models.
It is useful to examine plots of the predicted values vs. the actual values to see how well the model reflects the actual values, and to see if patterns in the plots suggest another model may be better.
The accuracy measures produced here—except for Efron’s R-squared—are different in type than R-squared or pseudo R-squared measures. Different statistics may lead to different conclusions. The following plots and captions show some examples.
plotPredy(data = BrendonSmall,
x = Calories,
y = Sodium,
model = model.1,
xlab = "Calories",
ylab = "Sodium")
summary(model.1)
Plot of model.1, a linear OLS model. R-squared = 0.721.
Accuracy with RMSE / median = 0.972.
plotPredy(data = BrendonSmall,
x = Calories,
y = Sodium,
model = model.4,
xlab = "Calories",
ylab = "Sodium")
nullfunct = function(x, m){m}
m.ini = 1300
null = nls(Sodium ~ nullfunct(Calories, m),
data = BrendonSmall,
start = list(m = m.ini),
trace = FALSE,
nls.control(maxiter = 1000))
nagelkerke(model.4, null)
Plot of model.4, a nonlinear regression model. Nagelkerke pseudo R-squared
= 0.865. Efron’s pseudo R-squared = 0.865. MAE = 22.0. Median
absolute error = 20.0.
plotPredy(data = BrendonSmall,
x = Calories,
y = Sodium,
model = model.11,
xlab = "Calories",
ylab = "Sodium")
Plot of model.11, a loess model. Efron’s pseudo R-squared =
0.903. MAE = 17.9. Median absolute error = 15.1.