[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Correlation and Association for Different Types of Variables

This chapter covers measures and tests of association for combinations of numeric, ordinal, binary, and nominal variables.  The correlation function in the rcompanion package will report the correlation or bivariate association across these types of variables.

 

Packages used in this chapter


The packages used in this chapter include:

•  rcompanion

 

The following commands will install these packages if they are not already installed:


if(!require(rcompanion)){install.packages("rcompanion")}


Measures of association across different types of variables

 

For the most part, these measures are non-directional measures of association.  That is, the association of ~ A + B is the same as the association of ~ B + A.  There are other measures of association, not discussed here, that may be preferable in some cases.  The following table includes only those measures and tests in the rcompanion::correlation function.

 

Measures of association

 

First variable

Second variable

Default measure

Alternative measure

Alternative measure

Numeric

Numeric

Pearson correlation

Spearman correlation

Kendall tau-b correlation

Numeric

Ordinal

Pearson correlation

Spearman correlation

Kendall tau-b correlation

Numeric

Nominal

eta

epsilon

 

Numeric

Binary

Pearson correlation

Glass rank biserial correlation

 

Ordinal

Ordinal

Kendall tau-c correlation

Spearman correlation

 

Ordinal

Nominal

Freeman’s theta

 

 

Ordinal

Binary

Glass rank biserial correlation

 

 

Nominal

Nominal

Cramer’s V

 

 

Nominal

Binary

Cramer’s V

 

 

Binary

Binary

phi

 

 

 

Tests of association

 

First variable

Second variable

Default test

Alternative measure

Alternative measure

Numeric

Numeric

Pearson correlation t-test

Spearman correlation t-test

Kendall correlation q-test

Numeric

Ordinal

Pearson correlation t-test

Spearman correlation t-test

Kendall correlation q-test

Numeric

Nominal

Anova

Anova on ranks

 

Numeric

Binary

Pearson correlation t-test

Wilcoxon rank sum test

 

Ordinal

Ordinal

Kendall correlation q-test

Spearman correlation t-test

 

Ordinal

Nominal

Cochran–Armitage test

 

 

Ordinal

Binary

Wilcoxon rank sum test

 

 

Nominal

Nominal

Chi-square test of association

Fisher’s exact test

 

Nominal

Binary

Chi-square test of association

Fisher’s exact test

 

Binary

Binary

Chi-square test of association

Fisher’s exact test

 

 

 

Hypothetical example

 

Length   = c(0.29, 0.25, NA, 0.40, 0.50, 0.57, 0.62, 0.88, 0.99, 0.90)
Rating   = factor(ordered=TRUE, levels=c("Low", "Medium", "High"),
                  x = rep(c("Low", "Medium", "High"), c(3,3,4)))
Color    = factor(rep(c("Red", "Green", "Blue"), c(4,4,2)))
Flag     = factor(rep(c(TRUE, FALSE, TRUE), c(5,4,1)))
Answer   = factor(rep(c("Yes", "No", "Yes"), c(4,3,3)), levels=c("Yes", "No"))
Location = factor(rep(c("Home", "Away", "Other"), c(2,4,4)))
Distance = factor(ordered=TRUE, levels=c("Low", "Medium", "High"),
                         x = rep(c("Low", "Medium", "High"), c(5,2,3)))
Start    = seq(as.Date("2024-01-01"), by = "month", length.out = 10)

 

Data = data.frame(Length, Rating, Color, Flag, Answer, Location, Distance, Start) 

Data

   Length Rating Color  Flag Answer Location Distance      Start
1    0.29    Low   Red  TRUE    Yes     Home      Low 2024-01-01
2    0.25    Low   Red  TRUE    Yes     Home      Low 2024-02-01
3      NA    Low   Red  TRUE    Yes     Away      Low 2024-03-01
4    0.40 Medium   Red  TRUE    Yes     Away      Low 2024-04-01
5    0.50 Medium Green  TRUE     No     Away      Low 2024-05-01
6    0.57 Medium Green FALSE     No     Away   Medium 2024-06-01
7    0.62   High Green FALSE     No    Other   Medium 2024-07-01
8    0.88   High Green FALSE    Yes    Other     High 2024-08-01
9    0.99   High  Blue FALSE    Yes    Other     High 2024-09-01
10   0.90   High  Blue  TRUE    Yes    Other     High 2024-10-01


The following uses the default measures of association and requests the bootstrap confidence intervals where available.  The printClasses=TRUE option requests a separate table displaying the classes for the variables.

 

It’s important that variables are assigned the correct class to get an appropriate measure of association.  That is, factor variables should have the class factor, not character.  Ordered factors should be ordered factors (and have their levels in the correct order !).  Note that dates are treated as numeric.

 

Data is a data frame passed to the function.  It should contain only those variables you wish to include in the correlation analysis.


library(rcompanion)

correlation(Data, ci=TRUE, printClasses=TRUE)


  Variable   Class Treatment
1   Length numeric   Numeric
2   Rating ordered   Ordinal
3    Color  factor   Nominal
4     Flag  factor    Binary
5   Answer  factor    Binary
6 Location  factor   Nominal
7 Distance ordered   Ordinal
8    Start    Date   Numeric


    Var1     Var2              Type  N    Measure Statistic Lower.CL Upper.CL             Test p.value Signif
  Length   Rating Numeric x Ordinal  9   Spearman     0.935    0.716    0.987         cor.test  0.0002    ***
  Length    Color Numeric x Nominal  9        Eta     0.913    0.812    1.000            Anova  0.0047     **
  Length     Flag  Numeric x Binary  9    Pearson    -0.576   -0.897    0.142         cor.test  0.1044   n.s.
  Length   Answer  Numeric x Binary  9    Pearson    -0.101   -0.717    0.603         cor.test  0.7955   n.s.
  Length Location Numeric x Nominal  9        Eta     0.919    0.827    1.000            Anova  0.0037     **
  Length Distance Numeric x Ordinal  9   Spearman     0.935    0.716    0.987         cor.test  0.0002    ***
  Length    Start Numeric x Numeric  9    Pearson     0.959    0.812    0.992         cor.test  0.0000   ****
  Rating    Color Ordinal x Nominal 10    Freeman     0.812    0.581    1.000 Cochran-Armitage  0.0239      *
  Rating     Flag  Ordinal x Binary 10 Glass rank    -0.333    0.000   -1.000      wilcox.test  0.0708   n.s.
  Rating   Answer  Ordinal x Binary 10 Glass rank     0.667    1.000    0.000      wilcox.test  0.7172   n.s.
  Rating Location Ordinal x Nominal 10    Freeman     0.938    0.733    1.000 Cochran-Armitage  0.0116      *
  Rating Distance Ordinal x Ordinal 10    Kendall     0.780    0.641    0.919 Linear by linear  0.0102      *
  Rating    Start Ordinal x Numeric 10   Spearman     0.944    0.775    0.987         cor.test  0.0000   ****
   Color     Flag  Nominal x Binary 10     Cramer     0.692    0.408    1.000       chisq.test  0.0911   n.s.
   Color   Answer  Nominal x Binary 10     Cramer     0.802    0.500    1.000       chisq.test  0.0402      *
   Color Location Nominal x Nominal 10     Cramer     0.612    0.532    0.876       chisq.test  0.1117   n.s.
   Color Distance Nominal x Ordinal 10    Freeman     0.812    0.571    1.000 Cochran-Armitage  0.0251      *
   Color    Start Nominal x Numeric 10        Eta     0.935    0.885    0.982            Anova  0.0007    ***
    Flag   Answer   Binary x Binary 10        Phi    -0.356   -1.000    0.327       chisq.test  0.2598   n.s.
    Flag Location  Binary x Nominal 10     Cramer     0.612    0.286    1.000       chisq.test  0.1534   n.s.
    Flag Distance  Binary x Ordinal 10 Glass rank    -0.750   -0.167   -1.000      wilcox.test  0.0491      *
    Flag    Start  Binary x Numeric 10    Pearson    -0.569   -0.882    0.095         cor.test  0.0862   n.s.
  Answer Location  Binary x Nominal 10     Cramer     0.408    0.218    1.000       chisq.test  0.4346   n.s.
  Answer Distance  Binary x Ordinal 10 Glass rank    -0.048    0.750   -0.712      wilcox.test  1.0000   n.s.
  Answer    Start  Binary x Numeric 10    Pearson     0.111   -0.557    0.692         cor.test  0.7597   n.s.
Location Distance Nominal x Ordinal 10    Freeman     0.781    0.500    0.931 Cochran-Armitage  0.0181      *
Location    Start Nominal x Numeric 10        Eta     0.933    0.883    0.981            Anova  0.0008    ***
Distance    Start Ordinal x Numeric 10   Spearman     0.921    0.694    0.982         cor.test  0.0002    ***


The following uses options for nonparametric methods on a smaller data set.


Data1 = Data[,c("Length", "Rating", "Color", "Flag")]

correlation(Data1, methodNum = "spearman", methodNumNom = "epsilon",
            methodNumBin = "glass", methodNumOrd = "spearman", ci=TRUE)


  Var1   Var2              Type  N    Measure Statistic Lower.CL Upper.CL             Test p.value Signif
Length Rating Numeric x Ordinal  9   Spearman     0.935    0.716    0.987         cor.test  0.0002    ***
Length  Color Numeric x Nominal  9    Epsilon     0.935    0.861    1.000   Anova on ranks  0.0020     **
Length   Flag  Numeric x Binary  9 Glass rank    -0.700    0.111   -1.000      wilcox.test  0.1111   n.s.
Rating  Color Ordinal x Nominal 10    Freeman     0.812    0.581    0.971 Cochran-Armitage  0.0239      *
Rating   Flag  Ordinal x Binary 10 Glass rank    -0.333    0.000   -1.000      wilcox.test  0.0708   n.s.
 Color   Flag  Nominal x Binary 10     Cramer     0.692    0.408    1.000       chisq.test  0.0911   n.s.


Palmer’s penguins example


PalmerPenguins = read.csv("https://rcompanion.org/documents/PalmerPenguins.csv")

PalmerPenguins$species = factor(PalmerPenguins$species)
PalmerPenguins$island  = factor(PalmerPenguins$island)
PalmerPenguins$sex     = factor(PalmerPenguins$sex)

PalmerPenguins = subset(PalmerPenguins, select = -c(rowid))

library(rcompanion)

correlation(PalmerPenguins, ci=TRUE)


     Var1      Var2              Type   N Measure Statistic Lower.CL Upper.CL       Test p.value Signif
  species    island Nominal x Nominal 344  Cramer     0.660    0.622    0.698 chisq.test  0.0000   ****
  species  bill_len Nominal x Numeric 342     Eta     0.841    0.810    0.871      Anova  0.0000   ****
  species  bill_dep Nominal x Numeric 342     Eta     0.824    0.790    0.857      Anova  0.0000   ****
  species flipper_l Nominal x Numeric 342     Eta     0.882    0.859    0.905      Anova  0.0000   ****
  species body_mass Nominal x Numeric 342     Eta     0.818    0.783    0.852      Anova  0.0000   ****
  species       sex  Nominal x Binary 333  Cramer     0.012    0.013    0.153 chisq.test  0.9760   n.s.
  species      year Nominal x Numeric 344     Eta     0.051    0.000    0.115      Anova  0.6398   n.s.
   island  bill_len Nominal x Numeric 342     Eta     0.392    0.291    0.473      Anova  0.0000   ****
   island  bill_dep Nominal x Numeric 342     Eta     0.632    0.566    0.692      Anova  0.0000   ****
   island flipper_l Nominal x Numeric 342     Eta     0.613    0.544    0.675      Anova  0.0000   ****
   island body_mass Nominal x Numeric 342     Eta     0.627    0.560    0.688      Anova  0.0000   ****
   island       sex  Nominal x Binary 333  Cramer     0.013    0.011    0.146 chisq.test  0.9716   n.s.
   island      year Nominal x Numeric 344     Eta     0.083    0.000    0.155      Anova  0.3099   n.s.
 bill_len    bill_d Numeric x Numeric 342 Pearson    -0.235   -0.333   -0.132   cor.test  0.0000   ****
 bill_len flipper_l Numeric x Numeric 342 Pearson     0.656    0.591    0.713   cor.test  0.0000   ****
 bill_len body_mass Numeric x Numeric 342 Pearson     0.595    0.522    0.660   cor.test  0.0000   ****
 bill_len       sex  Numeric x Binary 333 Pearson     0.344    0.246    0.435   cor.test  0.0000   ****
 bill_len      year Numeric x Numeric 342 Pearson     0.055   -0.052    0.160   cor.test  0.3145   n.s.
 bill_dep flipper_l Numeric x Numeric 342 Pearson    -0.584   -0.650   -0.509   cor.test  0.0000   ****
 bill_dep body_mass Numeric x Numeric 342 Pearson    -0.472   -0.550   -0.385   cor.test  0.0000   ****
 bill_dep       sex  Numeric x Binary 333 Pearson     0.373    0.276    0.462   cor.test  0.0000   ****
 bill_dep      year Numeric x Numeric 342 Pearson    -0.060   -0.165    0.046   cor.test  0.2657   n.s.
flipper_l body_mass Numeric x Numeric 342 Pearson     0.871    0.843    0.895   cor.test  0.0000   ****
flipper_l       sex  Numeric x Binary 333 Pearson     0.255    0.152    0.353   cor.test  0.0000   ****
flipper_l      year Numeric x Numeric 342 Pearson     0.170    0.065    0.271   cor.test  0.0016     **
body_mass       sex  Numeric x Binary 333 Pearson     0.425    0.333    0.509   cor.test  0.0000   ****
body_mass      year Numeric x Numeric 342 Pearson     0.042   -0.064    0.148   cor.test  0.4365   n.s.
      sex      year  Binary x Numeric 333 Pearson     0.000   -0.108    0.107   cor.test  0.9932   n.s.

 

The following uses options for nonparametric methods.

 

correlation(PalmerPenguins, ci=TRUE, conf=0.05, methodNum = "spearman",
            methodNumNom = "epsilon", methodNumBin = "glass",
            methodNumOrd = "spearman")