This book covers measures and tests of association for numeric, ordinal, binary, and nominal variables. The purpose of this chapter is to present a function that will report the correlation or bivariate association across these types of variables.
Packages used in this chapter
The packages used in this chapter include:
• rcompanion
The following commands will install these packages if they are not already installed:
if(!require(rcompanion)){install.packages("rcompanion")}
Hypothetical example
Length = c(0.29, 0.25, NA, 0.40, 0.50, 0.57, 0.62, 0.88, 0.99,
0.90)
Rating = factor(ordered=TRUE, levels=c("Low", "Medium",
"High"),
x = rep(c("Low", "Medium",
"High"), c(3,3,4)))
Color = factor(rep(c("Red", "Green", "Blue"),
c(4,4,2)))
Flag = factor(rep(c(TRUE, FALSE, TRUE), c(5,4,1)))
Answer = factor(rep(c("Yes", "No", "Yes"),
c(4,3,3)), levels=c("Yes", "No"))
Location = factor(rep(c("Home", "Away", "Other"),
c(2,4,4)))
Distance = factor(ordered=TRUE, levels=c("Low", "Medium",
"High"),
x = rep(c("Low", "Medium",
"High"), c(5,2,3)))
Start = seq(as.Date("2024-01-01"), by = "month",
length.out = 10)
Data = data.frame(Length, Rating, Color, Flag, Answer, Location,
Distance, Start)
Data
Length Rating Color Flag Answer Location Distance Start
1 0.29 Low Red TRUE Yes Home Low 2024-01-01
2 0.25 Low Red TRUE Yes Home Low 2024-02-01
3 NA Low Red TRUE Yes Away Low 2024-03-01
4 0.40 Medium Red TRUE Yes Away Low 2024-04-01
5 0.50 Medium Green TRUE No Away Low 2024-05-01
6 0.57 Medium Green FALSE No Away Medium 2024-06-01
7 0.62 High Green FALSE No Other Medium 2024-07-01
8 0.88 High Green FALSE Yes Other High 2024-08-01
9 0.99 High Blue FALSE Yes Other High 2024-09-01
10 0.90 High Blue TRUE Yes Other High 2024-10-01
The following uses the default measures of association and requests the bootstrap confidence intervals where available. The printClasses=TRUE option requests a separate table displaying the classes for the variables.
It’s important that variables are assigned the correct class to get an appropriate measure of association. That is, factor variables should have the class factor, not character. Ordered factors should be ordered factors (and have their levels in the correct order !). Note that dates are treated as numeric.
library(rcompanion)
library(DescTools)
library(coin)
source("http://rcompanion.org/r_script/correlation.r")
correlation(Data, ci=TRUE, printClasses=TRUE)
Variable Class Treatment
1 Length numeric Numeric
2 Rating ordered Ordinal
3 Color factor Nominal
4 Flag factor Binary
5 Answer factor Binary
6 Location factor Nominal
7 Distance ordered Ordinal
8 Start Date Numeric
Var1 Var2
Type N Measure Statistic Lower.CL Upper.CL Test p.value Signif
Length Rating Numeric x Ordinal 9 Spearman 0.935 0.716
0.987 cor.test 0.0002 ***
Length Color Numeric x Nominal 9 Eta 0.913 0.812
1.000 Anova 0.0047 **
Length Flag Numeric x Binary 9 Pearson -0.576 -0.897
0.142 cor.test 0.1044 n.s.
Length Answer Numeric x Binary 9 Pearson -0.101 -0.717
0.603 cor.test 0.7955 n.s.
Length Location Numeric x Nominal 9 Eta 0.919 0.827
1.000 Anova 0.0037 **
Length Distance Numeric x Ordinal 9 Spearman 0.935 0.716
0.987 cor.test 0.0002 ***
Length Start Numeric x Numeric 9 Pearson 0.959 0.812
0.992 cor.test 0.0000 ****
Rating Color Ordinal x Nominal 10 Freeman 0.812 0.581 1.000
Cochran-Armitage 0.0239 *
Rating Flag Ordinal x Binary 10 Glass rank -0.333 0.000
-1.000 wilcox.test 0.0708 n.s.
Rating Answer Ordinal x Binary 10 Glass rank 0.667 1.000
0.000 wilcox.test 0.7172 n.s.
Rating Location Ordinal x Nominal 10 Freeman 0.938 0.733 1.000
Cochran-Armitage 0.0116 *
Rating Distance Ordinal x Ordinal 10 Kendall 0.780 0.641 0.919
Linear by linear 0.0102 *
Rating Start Ordinal x Numeric 10 Spearman 0.944 0.775
0.987 cor.test 0.0000 ****
Color Flag Nominal x Binary 10 Cramer 0.692 0.408
1.000 chisq.test 0.0911 n.s.
Color Answer Nominal x Binary 10 Cramer 0.802 0.500
1.000 chisq.test 0.0402 *
Color Location Nominal x Nominal 10 Cramer 0.612 0.532
0.876 chisq.test 0.1117 n.s.
Color Distance Nominal x Ordinal 10 Freeman 0.812 0.571 1.000
Cochran-Armitage 0.0251 *
Color Start Nominal x Numeric 10 Eta 0.935 0.885
0.982 Anova 0.0007 ***
Flag Answer Binary x Binary 10 Phi -0.356 -1.000
0.327 chisq.test 0.2598 n.s.
Flag Location Binary x Nominal 10 Cramer 0.612 0.286
1.000 chisq.test 0.1534 n.s.
Flag Distance Binary x Ordinal 10 Glass rank -0.750 -0.167
-1.000 wilcox.test 0.0491 *
Flag Start Binary x Numeric 10 Pearson -0.569 -0.882
0.095 cor.test 0.0862 n.s.
Answer Location Binary x Nominal 10 Cramer 0.408 0.218
1.000 chisq.test 0.4346 n.s.
Answer Distance Binary x Ordinal 10 Glass rank -0.048 0.750
-0.712 wilcox.test 1.0000 n.s.
Answer Start Binary x Numeric 10 Pearson 0.111 -0.557
0.692 cor.test 0.7597 n.s.
Location Distance Nominal x Ordinal 10 Freeman 0.781 0.500 0.931
Cochran-Armitage 0.0181 *
Location Start Nominal x Numeric 10 Eta 0.933 0.883
0.981 Anova 0.0008 ***
Distance Start Ordinal x Numeric 10 Spearman 0.921 0.694
0.982 cor.test 0.0002 ***
The following uses options for nonparametric methods on a smaller data set.
Data1 = Data[,c("Length", "Rating", "Color",
"Flag")]
correlation(Data1, methodNum = "spearman", methodNumNom =
"epsilon",
methodNumBin = "glass", ci=TRUE)
Var1 Var2 Type N Measure Statistic Lower.CL
Upper.CL Test p.value Signif
Length Rating Numeric x Ordinal 9 Spearman 0.935 0.716
0.987 cor.test 0.0002 ***
Length Color Numeric x Nominal 9 Epsilon 0.935 0.861 1.000
Anova on ranks 0.0020 **
Length Flag Numeric x Binary 9 Glass rank -0.700 0.111 -1.000
wilcox.test 0.1111 n.s.
Rating Color Ordinal x Nominal 10 Freeman 0.812 0.581 0.971
Cochran-Armitage 0.0239 *
Rating Flag Ordinal x Binary 10 Glass rank -0.333 0.000 -1.000
wilcox.test 0.0708 n.s.
Color Flag Nominal x Binary 10 Cramer 0.692 0.408 1.000
chisq.test 0.0911 n.s.
Palmer’s penguins example
PalmerPenguins =
read.csv("https://rcompanion.org/documents/PalmerPenguins.csv")
PalmerPenguins$species = factor(PalmerPenguins$species)
PalmerPenguins$island = factor(PalmerPenguins$island)
PalmerPenguins$sex = factor(PalmerPenguins$sex)
PalmerPenguins = subset(PalmerPenguins, select = -c(rowid))
correlation(PalmerPenguins, ci=TRUE)
Var1 Var2 Type N Measure Statistic Lower.CL
Upper.CL Test p.value Signif
species island Nominal x Nominal 344 Cramer 0.660 0.622 0.698
chisq.test 0.0000 ****
species bill_len Nominal x Numeric 342 Eta 0.841 0.810
0.871 Anova 0.0000 ****
species bill_dep Nominal x Numeric 342 Eta 0.824 0.790
0.857 Anova 0.0000 ****
species flipper_l Nominal x Numeric 342 Eta 0.882 0.859
0.905 Anova 0.0000 ****
species body_mass Nominal x Numeric 342 Eta 0.818 0.783
0.852 Anova 0.0000 ****
species sex Nominal x Binary 333 Cramer 0.012 0.013 0.153
chisq.test 0.9760 n.s.
species year Nominal x Numeric 344 Eta 0.051 0.000
0.115 Anova 0.6398 n.s.
island bill_len Nominal x Numeric 342 Eta 0.392 0.291
0.473 Anova 0.0000 ****
island bill_dep Nominal x Numeric 342 Eta 0.632 0.566
0.692 Anova 0.0000 ****
island flipper_l Nominal x Numeric 342 Eta 0.613 0.544
0.675 Anova 0.0000 ****
island body_mass Nominal x Numeric 342 Eta 0.627 0.560
0.688 Anova 0.0000 ****
island sex Nominal x Binary 333 Cramer 0.013 0.011 0.146
chisq.test 0.9716 n.s.
island year Nominal x Numeric 344 Eta 0.083 0.000
0.155 Anova 0.3099 n.s.
bill_len bill_d Numeric x Numeric 342 Pearson -0.235 -0.333 -0.132
cor.test 0.0000 ****
bill_len flipper_l Numeric x Numeric 342 Pearson 0.656 0.591 0.713
cor.test 0.0000 ****
bill_len body_mass Numeric x Numeric 342 Pearson 0.595 0.522 0.660
cor.test 0.0000 ****
bill_len sex Numeric x Binary 333 Pearson 0.344 0.246 0.435
cor.test 0.0000 ****
bill_len year Numeric x Numeric 342 Pearson 0.055 -0.052 0.160
cor.test 0.3145 n.s.
bill_dep flipper_l Numeric x Numeric 342 Pearson -0.584 -0.650 -0.509
cor.test 0.0000 ****
bill_dep body_mass Numeric x Numeric 342 Pearson -0.472 -0.550 -0.385
cor.test 0.0000 ****
bill_dep sex Numeric x Binary 333 Pearson 0.373 0.276 0.462
cor.test 0.0000 ****
bill_dep year Numeric x Numeric 342 Pearson -0.060 -0.165 0.046
cor.test 0.2657 n.s.
flipper_l body_mass Numeric x Numeric 342 Pearson 0.871 0.843 0.895
cor.test 0.0000 ****
flipper_l sex Numeric x Binary 333 Pearson 0.255 0.152 0.353
cor.test 0.0000 ****
flipper_l year Numeric x Numeric 342 Pearson 0.170 0.065 0.271
cor.test 0.0016 **
body_mass sex Numeric x Binary 333 Pearson 0.425 0.333 0.509
cor.test 0.0000 ****
body_mass year Numeric x Numeric 342 Pearson 0.042 -0.064 0.148
cor.test 0.4365 n.s.
sex year Binary x Numeric 333 Pearson 0.000 -0.108 0.107
cor.test 0.9932 n.s.
The following uses options for nonparametric methods.
correlation(PalmerPenguins, ci=TRUE, conf=0.05,methodNum =
"spearman",
methodNumNom = "epsilon", methodNumBin =
"glass")