[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Correlation and Association for Different Types of Variables

This book covers measures and tests of association for numeric, ordinal, binary, and nominal variables.  The purpose of this chapter is to present a function that will report the correlation or bivariate association across these types of variables.

 

Packages used in this chapter


The packages used in this chapter include:

•  rcompanion

 

The following commands will install these packages if they are not already installed:


if(!require(rcompanion)){install.packages("rcompanion")}


Hypothetical example

 

Length   = c(0.29, 0.25, NA, 0.40, 0.50, 0.57, 0.62, 0.88, 0.99, 0.90)
Rating   = factor(ordered=TRUE, levels=c("Low", "Medium", "High"),
                  x = rep(c("Low", "Medium", "High"), c(3,3,4)))
Color    = factor(rep(c("Red", "Green", "Blue"), c(4,4,2)))
Flag     = factor(rep(c(TRUE, FALSE, TRUE), c(5,4,1)))
Answer   = factor(rep(c("Yes", "No", "Yes"), c(4,3,3)), levels=c("Yes", "No"))
Location = factor(rep(c("Home", "Away", "Other"), c(2,4,4)))
Distance = factor(ordered=TRUE, levels=c("Low", "Medium", "High"),
                         x = rep(c("Low", "Medium", "High"), c(5,2,3)))
Start    = seq(as.Date("2024-01-01"), by = "month", length.out = 10)

 

Data = data.frame(Length, Rating, Color, Flag, Answer, Location, Distance, Start) 

Data


   Length Rating Color  Flag Answer Location Distance      Start
1    0.29    Low   Red  TRUE    Yes     Home      Low 2024-01-01
2    0.25    Low   Red  TRUE    Yes     Home      Low 2024-02-01
3      NA    Low   Red  TRUE    Yes     Away      Low 2024-03-01
4    0.40 Medium   Red  TRUE    Yes     Away      Low 2024-04-01
5    0.50 Medium Green  TRUE     No     Away      Low 2024-05-01
6    0.57 Medium Green FALSE     No     Away   Medium 2024-06-01
7    0.62   High Green FALSE     No    Other   Medium 2024-07-01
8    0.88   High Green FALSE    Yes    Other     High 2024-08-01
9    0.99   High  Blue FALSE    Yes    Other     High 2024-09-01
10   0.90   High  Blue  TRUE    Yes    Other     High 2024-10-01


The following uses the default measures of association and requests the bootstrap confidence intervals where available.  The printClasses=TRUE option requests a separate table displaying the classes for the variables.

 

It’s important that variables are assigned the correct class to get an appropriate measure of association.  That is, factor variables should have the class factor, not character.  Ordered factors should be ordered factors (and have their levels in the correct order !).  Note that dates are treated as numeric.


library(rcompanion)
library(DescTools)
library(coin)

source("http://rcompanion.org/r_script/correlation.r")

correlation(Data, ci=TRUE, printClasses=TRUE)


  Variable   Class Treatment
1   Length numeric   Numeric
2   Rating ordered   Ordinal
3    Color  factor   Nominal
4     Flag  factor    Binary
5   Answer  factor    Binary
6 Location  factor   Nominal
7 Distance ordered   Ordinal
8    Start    Date   Numeric


    Var1     Var2              Type  N    Measure Statistic Lower.CL Upper.CL             Test p.value Signif
  Length   Rating Numeric x Ordinal  9   Spearman     0.935    0.716    0.987         cor.test  0.0002    ***
  Length    Color Numeric x Nominal  9        Eta     0.913    0.812    1.000            Anova  0.0047     **
  Length     Flag  Numeric x Binary  9    Pearson    -0.576   -0.897    0.142         cor.test  0.1044   n.s.
  Length   Answer  Numeric x Binary  9    Pearson    -0.101   -0.717    0.603         cor.test  0.7955   n.s.
  Length Location Numeric x Nominal  9        Eta     0.919    0.827    1.000            Anova  0.0037     **
  Length Distance Numeric x Ordinal  9   Spearman     0.935    0.716    0.987         cor.test  0.0002    ***
  Length    Start Numeric x Numeric  9    Pearson     0.959    0.812    0.992         cor.test  0.0000   ****
  Rating    Color Ordinal x Nominal 10    Freeman     0.812    0.581    1.000 Cochran-Armitage  0.0239      *
  Rating     Flag  Ordinal x Binary 10 Glass rank    -0.333    0.000   -1.000      wilcox.test  0.0708   n.s.
  Rating   Answer  Ordinal x Binary 10 Glass rank     0.667    1.000    0.000      wilcox.test  0.7172   n.s.
  Rating Location Ordinal x Nominal 10    Freeman     0.938    0.733    1.000 Cochran-Armitage  0.0116      *
  Rating Distance Ordinal x Ordinal 10    Kendall     0.780    0.641    0.919 Linear by linear  0.0102      *
  Rating    Start Ordinal x Numeric 10   Spearman     0.944    0.775    0.987         cor.test  0.0000   ****
   Color     Flag  Nominal x Binary 10     Cramer     0.692    0.408    1.000       chisq.test  0.0911   n.s.
   Color   Answer  Nominal x Binary 10     Cramer     0.802    0.500    1.000       chisq.test  0.0402      *
   Color Location Nominal x Nominal 10     Cramer     0.612    0.532    0.876       chisq.test  0.1117   n.s.
   Color Distance Nominal x Ordinal 10    Freeman     0.812    0.571    1.000 Cochran-Armitage  0.0251      *
   Color    Start Nominal x Numeric 10        Eta     0.935    0.885    0.982            Anova  0.0007    ***
    Flag   Answer   Binary x Binary 10        Phi    -0.356   -1.000    0.327       chisq.test  0.2598   n.s.
    Flag Location  Binary x Nominal 10     Cramer     0.612    0.286    1.000       chisq.test  0.1534   n.s.
    Flag Distance  Binary x Ordinal 10 Glass rank    -0.750   -0.167   -1.000      wilcox.test  0.0491      *
    Flag    Start  Binary x Numeric 10    Pearson    -0.569   -0.882    0.095         cor.test  0.0862   n.s.
  Answer Location  Binary x Nominal 10     Cramer     0.408    0.218    1.000       chisq.test  0.4346   n.s.
  Answer Distance  Binary x Ordinal 10 Glass rank    -0.048    0.750   -0.712      wilcox.test  1.0000   n.s.
  Answer    Start  Binary x Numeric 10    Pearson     0.111   -0.557    0.692         cor.test  0.7597   n.s.
Location Distance Nominal x Ordinal 10    Freeman     0.781    0.500    0.931 Cochran-Armitage  0.0181      *
Location    Start Nominal x Numeric 10        Eta     0.933    0.883    0.981            Anova  0.0008    ***
Distance    Start Ordinal x Numeric 10   Spearman     0.921    0.694    0.982         cor.test  0.0002    ***



The following uses options for nonparametric methods on a smaller data set.


Data1 = Data[,c("Length", "Rating", "Color", "Flag")]

correlation(Data1, methodNum = "spearman", methodNumNom = "epsilon",
            methodNumBin = "glass", ci=TRUE)


  Var1   Var2              Type  N    Measure Statistic Lower.CL Upper.CL             Test p.value Signif
Length Rating Numeric x Ordinal  9   Spearman     0.935    0.716    0.987         cor.test  0.0002    ***
Length  Color Numeric x Nominal  9    Epsilon     0.935    0.861    1.000   Anova on ranks  0.0020     **
Length   Flag  Numeric x Binary  9 Glass rank    -0.700    0.111   -1.000      wilcox.test  0.1111   n.s.
Rating  Color Ordinal x Nominal 10    Freeman     0.812    0.581    0.971 Cochran-Armitage  0.0239      *
Rating   Flag  Ordinal x Binary 10 Glass rank    -0.333    0.000   -1.000      wilcox.test  0.0708   n.s.
 Color   Flag  Nominal x Binary 10     Cramer     0.692    0.408    1.000       chisq.test  0.0911   n.s.



Palmer’s penguins example


PalmerPenguins = read.csv("https://rcompanion.org/documents/PalmerPenguins.csv")

PalmerPenguins$species = factor(PalmerPenguins$species)
PalmerPenguins$island  = factor(PalmerPenguins$island)
PalmerPenguins$sex     = factor(PalmerPenguins$sex)

PalmerPenguins = subset(PalmerPenguins, select = -c(rowid))

correlation(PalmerPenguins, ci=TRUE)


     Var1      Var2              Type   N Measure Statistic Lower.CL Upper.CL       Test p.value Signif
  species    island Nominal x Nominal 344  Cramer     0.660    0.622    0.698 chisq.test  0.0000   ****
  species  bill_len Nominal x Numeric 342     Eta     0.841    0.810    0.871      Anova  0.0000   ****
  species  bill_dep Nominal x Numeric 342     Eta     0.824    0.790    0.857      Anova  0.0000   ****
  species flipper_l Nominal x Numeric 342     Eta     0.882    0.859    0.905      Anova  0.0000   ****
  species body_mass Nominal x Numeric 342     Eta     0.818    0.783    0.852      Anova  0.0000   ****
  species       sex  Nominal x Binary 333  Cramer     0.012    0.013    0.153 chisq.test  0.9760   n.s.
  species      year Nominal x Numeric 344     Eta     0.051    0.000    0.115      Anova  0.6398   n.s.
   island  bill_len Nominal x Numeric 342     Eta     0.392    0.291    0.473      Anova  0.0000   ****
   island  bill_dep Nominal x Numeric 342     Eta     0.632    0.566    0.692      Anova  0.0000   ****
   island flipper_l Nominal x Numeric 342     Eta     0.613    0.544    0.675      Anova  0.0000   ****
   island body_mass Nominal x Numeric 342     Eta     0.627    0.560    0.688      Anova  0.0000   ****
   island       sex  Nominal x Binary 333  Cramer     0.013    0.011    0.146 chisq.test  0.9716   n.s.
   island      year Nominal x Numeric 344     Eta     0.083    0.000    0.155      Anova  0.3099   n.s.
 bill_len    bill_d Numeric x Numeric 342 Pearson    -0.235   -0.333   -0.132   cor.test  0.0000   ****
 bill_len flipper_l Numeric x Numeric 342 Pearson     0.656    0.591    0.713   cor.test  0.0000   ****
 bill_len body_mass Numeric x Numeric 342 Pearson     0.595    0.522    0.660   cor.test  0.0000   ****
 bill_len       sex  Numeric x Binary 333 Pearson     0.344    0.246    0.435   cor.test  0.0000   ****
 bill_len      year Numeric x Numeric 342 Pearson     0.055   -0.052    0.160   cor.test  0.3145   n.s.
 bill_dep flipper_l Numeric x Numeric 342 Pearson    -0.584   -0.650   -0.509   cor.test  0.0000   ****
 bill_dep body_mass Numeric x Numeric 342 Pearson    -0.472   -0.550   -0.385   cor.test  0.0000   ****
 bill_dep       sex  Numeric x Binary 333 Pearson     0.373    0.276    0.462   cor.test  0.0000   ****
 bill_dep      year Numeric x Numeric 342 Pearson    -0.060   -0.165    0.046   cor.test  0.2657   n.s.
flipper_l body_mass Numeric x Numeric 342 Pearson     0.871    0.843    0.895   cor.test  0.0000   ****
flipper_l       sex  Numeric x Binary 333 Pearson     0.255    0.152    0.353   cor.test  0.0000   ****
flipper_l      year Numeric x Numeric 342 Pearson     0.170    0.065    0.271   cor.test  0.0016     **
body_mass       sex  Numeric x Binary 333 Pearson     0.425    0.333    0.509   cor.test  0.0000   ****
body_mass      year Numeric x Numeric 342 Pearson     0.042   -0.064    0.148   cor.test  0.4365   n.s.
      sex      year  Binary x Numeric 333 Pearson     0.000   -0.108    0.107   cor.test  0.9932   n.s.



The following uses options for nonparametric methods.

 

correlation(PalmerPenguins, ci=TRUE, conf=0.05,methodNum = "spearman",
            methodNumNom = "epsilon", methodNumBin = "glass")