[banner]

An R Companion for the Handbook of Biological Statistics

Salvatore S. Mangiafico

 

Advertisement

Cate–Nelson Analysis

 

Advertisement

 

Cate–Nelson analysis is used to divide bivariate data into two groups: one where a change in the x variable is likely to correspond to a change in the y variable, and the other group where a change in x is unlikely to correspond to a change y.  Traditionally this method was used for soil test calibration in agronomy studies.  For example, to determine if a certain level of soil test phosphorus would indicate that adding phosphorus to the soil would likely cause an increase in crop yield or not.

 

The method can be used for any case in which bivariate data can be separated into two groups, one with a large x variable is associated with a large y, and a small x associated with a small y.  Or vice-versa.

 

For a fuller description of Cate–Nelson analysis and examples in soil-test and other applications, see Mangiafico (2013) and the references there.

 

Custom function to develop Cate–Nelson models

 

My cateNelson function follows the method of Cate and Nelson (1971).  A critical x value is determined by iteratively breaking the data into two groups and comparing the explained sum of squares of the iterations.  A critical y value is determined by using an iterative process which minimizes the number of data point which fall into Quadrant I and III for data with a positive trend.

 

Options in the cateNelson function:

 

·       plotit=TRUE (the default) produces a plot of the data, a plot of the sum of squares of the iterations, a plot of the data points in error quadrants, and a final plot with critical x and critical y drawn as lines on the plot.

·       hollow=TRUE (the default) for the final plot, points in the error quadrants as open circles

·       trend="negative" (not the default) needs to be used if the trend of the data is negative.

·       xthreshold and ythreshold determine how many options the function will return for critical x and critical y.  A value of 1 would return all possibilities.  A value of 0.10 returns values in the top 10% of the range of maximum sum of squares.

·       clx and cly determine which of the listed critical x and critical y the function should use to build the final model.  A value of 1 selects the first displayed value, and a value of 2 selects the second.  This is useful when you have more than one critical x that maximizes or nearly maximizes the sum of squares, or if you want to force the critical y value to be close to some value such as 90% of maximum yield.  Note that changing the clx value will also change the list of critical y values that is displayed.  In the second example I set clx=2 to select a critical x that more evenly divides the errors across the quadrants.

 

Example of Cate–Nelson analysis

 

##--------------------------------------------------------------------
## Cate-Nelson analysis
## Data from Mangiafico, S.S., Newman, J.P., Mochizuki, M.J.,
##   & Zurawski, D. (2008). Adoption of sustainable practices
##   to protect and conserve water resources in container nurseries
##   with greenhouse facilities. Acta horticulturae 797, 367–372.
##--------------------------------------------------------------------


size =  c(68.55,6.45,6.98,1.05,4.44,0.46,4.02,1.21,4.03,
          6.05,48.39,9.88,3.63,38.31,22.98,5.24,2.82,1.61,
          76.61,4.64,0.28,0.37,0.81,1.41,0.81,2.02,20.16,
          4.04,8.47,8.06,20.97,11.69,16.13,6.85,4.84,80.65,1.61,0.10)

proportion =  c(0.850,0.729,0.737,0.752,0.639,0.579,0.594,0.534,
                0.541,0.759,0.677,0.820,0.534,0.684,0.504,0.662,
                0.624,0.647,0.609,0.647,0.632,0.632,0.459,0.684,
                0.361,0.556,0.850,0.729,0.729,0.669,0.880,0.774,
                0.729,0.774,0.662,0.737,0.586,0.316)

library(rcompanion)

cateNelson(x = size,
           y = proportion,
           plotit=TRUE,
           hollow=TRUE,
           xlab="Nursery size in hectares",
           ylab="Proportion of good practices adopted",
           trend="positive",
           clx=1,
           cly=1,
           xthreshold=0.10,
           ythreshold=0.15)


Critical x that maximize sum of squares:

  Critical.x.value Sum.of.squares
1            4.035      0.2254775
2            4.740      0.2046979


Critical y that minimize errors:
 
  Critical.y.value Q.i Q.ii Q.iii Q.iv Q.model Q.err Cramer.V
1           0.6355   3   20     2   13      33     5   0.7289
2           0.6430   3   19     3   13      32     6   0.6761
3           0.6470   3   19     3   13      32     6   0.6761
4           0.6545   2   18     4   14      32     6   0.6854
5           0.6620   2   18     4   14      32     6   0.6854
6           0.6015   6   21     1   10      31     7   0.6309
7           0.6280   5   20     2   11      31     7   0.6209
8           0.6320   5   20     2   11      31     7   0.6209

n         = Number of observations
CLx       = Critical value of x
SS        = Sum of squares for that critical value of x
CLy       = Critical value of y
Q         = Number of observations which fall into quadrants I, II, III, IV
Q.Model   = Total observations which fall into the quadrants predicted by the model
p.Model   = Percent observations which fall into the quadrants predicted by the model
Q.Error   = Observations which do not fall into the quadrants predicted by the model
p.Error   = Percent observations which do not fall into the quadrants predicted by the model
Fisher.p  = p-value from Fisher exact test dividing data into these quadrants
Cramer.V  = Cramer's V statistic from dividing data into these quadrants

Final model:

   n   CLx        SS    CLy Q.I Q.II Q.III Q.IV Q.Model   p.Model Q.Error
1 38 4.035 0.2254775 0.6355   3   20     2   13      33 0.8684211       5

   p.Error Fisher.p.value Cramer.V
 0.1315789   8.532968e-06   0.7289


 

Plots showing the results of Cate–Nelson analysis.  In the final plot, the critical x value is indicated with a vertical blue line, and the critical y value is indicated with a horizontal blue line.  Points agreeing with the model are solid, while hollow points indicate data not agreeing with model.  (Data from Mangiafico, S.S., Newman, J.P., Mochizuki, M.J., & Zurawski, D. (2008). Adoption of sustainable practices to protect and conserve water resources in container nurseries with greenhouse facilities. Acta horticulturae 797, 367–372.)

 

#     #     #

 

Example of Cate–Nelson analysis with negative trend data

 

##--------------------------------------------------------------------
## Cate-Nelson analysis
## Hypothetical data
##--------------------------------------------------------------------


Input =("
  x      y
  5     55
  7    110
  6    120
  5    130
  7    120
 10     55
 12     60
 11    110
 15     50
 21     55
 22     60
 20     70
 24     55
")

Data = read.table(textConnection(Input),header=TRUE)

library(rcompanion)

cateNelson(x = Data$x,
           y = Data$y,
           plotit=TRUE,
           hollow=TRUE,
           xlab="x",
           ylab="y",
           trend="negative",
           clx=2,      # Normally leave as 1 unless you wish to
           cly=1,      # select a specific critical x value
           xthreshold=0.10,
           ythreshold=0.15)


Critical x that maximize sum of squares:
 
  Critical.x.value Sum.of.squares
1             11.5       5608.974
2              8.5       5590.433
.............

Critical y that minimize errors:
 
  Critical.y.value Q.i Q.ii Q.iii Q.iv Q.model Q.err Cramer.V
1               90   4    1     7    1      11     2   0.6750
2              110   4    1     7    1      11     2   0.6750
3              115   3    0     8    2      11     2   0.6928
4              120   3    0     8    2      11     2   0.6928

n         = Number of observations
CLx       = Critical value of x
SS        = Sum of squares for that critical value of x
CLy       = Critical value of y
Q         = Number of observations which fall into quadrants I, II, III, IV
Q.Model   = Total observations which fall into the quadrants predicted by the model
p.Model   = Percent observations which fall into the quadrants predicted by the model
Q.Error   = Observations which do not fall into the quadrants predicted by the model
p.Error   = Percent observations which do not fall into the quadrants predicted by the model
Fisher.p  = p-value from Fisher exact test dividing data into these quadrants
Cramer.V  = Cramer's V statistic from dividing data into these quadrants

Final model:

   n CLx       SS CLy Q.I Q.II Q.III Q.IV Q.Model   p.Model Q.Error
1 13 8.5 5608.974  90   4    1     7    1      11 0.8461538       2

  p.Error Fisher.p.value Cramer.V
0.1538462     0.03185703    0.675


 

Plot showing the final result of Cate–Nelson analysis, for data with a negative trend.

 

#     #     #

 

Example of Cate–Nelson analysis with a fixed critical y value

 

Often when using a Cate–Nelson analysis, we wish to set the critical y at some pre-determined value, and then find the critical x value that best divides the data.  This is common, for example, in agromony studies where we might want to set the critical value at e.g. 90% of maximum potential yield for a crop.

 

The following example revisits the sustainable nursery practices data above but sets the critical y value at 0.70 (or, 70%).

 

Here, the first two critical x values in the results (5.24 and 6.05), both result in maximizing the count of observations fitting the model.  It’s possible to sort the resultant data frame by other statistics, like the Pearson chi-square value or the effect size statistics phi.

 

##--------------------------------------------------------------------
## Data from Mangiafico, S.S., Newman, J.P., Mochizuki, M.J.,
##   & Zurawski, D. (2008). Adoption of sustainable practices
##   to protect and conserve water resources in container nurseries
##   with greenhouse facilities. Acta horticulturae 797, 367–372.
##--------------------------------------------------------------------


size =  c(68.55,6.45,6.98,1.05,4.44,0.46,4.02,1.21,4.03,
          6.05,48.39,9.88,3.63,38.31,22.98,5.24,2.82,1.61,
          76.61,4.64,0.28,0.37,0.81,1.41,0.81,2.02,20.16,
          4.04,8.47,8.06,20.97,11.69,16.13,6.85,4.84,80.65,1.61,0.10)

proportion =  c(0.850,0.729,0.737,0.752,0.639,0.579,0.594,0.534,
                0.541,0.759,0.677,0.820,0.534,0.684,0.504,0.662,
                0.624,0.647,0.609,0.647,0.632,0.632,0.459,0.684,
                0.361,0.556,0.850,0.729,0.729,0.669,0.880,0.774,
                0.729,0.774,0.662,0.737,0.586,0.316)

library(rcompanion)

cateNelsonFixedY (x         = size,
                  y         = proportion,
                  cly       = 0.70,
                  plotit    = TRUE,
                  hollow    = TRUE,
                  xlab      = "Nursery size in hectares",
                  ylab      = "Proportion of good practices adopted",
                  trend     = "positive",
                  clx       = 1,
                  outlength = 5,
                  sortstat  = "error")



   Critx Crity Q1 Q2 Q3 Q4 Model Error  N   pQ1   pQ2   pQ3   pQ4  pModel  pError
1   5.24   0.7  2 12  5 19    31     7 38 0.053 0.316 0.132 0.500   0.816   0.184
2   6.05   0.7  2 12  5 19    31     7 38 0.053 0.316 0.132 0.500   0.816   0.184
3   4.84   0.7  2 12  6 18    30     8 38 0.053 0.316 0.158 0.474   0.789   0.211
4   6.45   0.7  3 11  5 19    30     8 38 0.079 0.289 0.132 0.500   0.789   0.211
5   4.64   0.7  2 12  7 17    29     9 38 0.053 0.316 0.184 0.447   0.763   0.237



     Fisher.p  Pearson.chisq    Pearson.p      phi
1   0.0001517        12.5500    0.0003972   -0.629

2   0.0001517        12.5500    0.0003972   -0.62
3   0.0005311        10.7500    0.0010420   -0.587
4   0.0007735         9.8400    0.0017080   -0.564
5   0.0018910         9.1610    0.0024730   -0.546



 

 

 

Plots showing the final result of Cate–Nelson analysis, for an alysis with a fixed critical y value.

 

#     #     #

 

References

 

Mangiafico, S.S. 2013. Cate-Nelson Analysis for Bivariate Data Using R-project. Journal of Extension 51:5, 5TOT1. tigerprints.clemson.edu/cgi/viewcontent.cgi?article=2547&context=joe.

 

Cate, R. B., & Nelson, L.A. (1971). A simple statistical procedure for partitioning soil test correlation data into two classes. Soil Science Society of America Proceedings 35, 658–660.