Cate–Nelson analysis is used to divide bivariate data into two groups: one where a change in the x variable is likely to correspond to a change in the y variable, and the other group where a change in x is unlikely to correspond to a change y. Traditionally this method was used for soil test calibration. For example to determine if a certain level of soil test phosphorus would indicate that adding phosphorus to the soil would likely cause an increase in crop yield or not.

The method can be used for any case in which bivariate data can be separated into two groups, one with a large x variable is associated with a large y, and a small x associated with a small y. Or vice-versa.

For a fuller description of Cate–Nelson analysis and examples in soil-test and other applications, see Mangiafico (2013) and the references there.

### Custom function to develop Cate–Nelson models

My *cateNelson* function follows the method of Cate
and Nelson (1971). A critical x value is determined by iteratively
breaking the data into two groups and comparing the explained sum of squares of
the iterations. A critical y value is determined by using an iterative
process which minimizes the number of data point which fall into Quadrant I and
III for data with a positive trend.

Options in the *cateNelson* function:

*plotit=TRUE*(the default) produces a plot of the data, a plot of the sum of squares of the iterations, a plot of the data points in error quadrants, and a final plot with critical x and critical y drawn as lines on the plot.*hollow=TRUE*(the default) for the final plot, points in the error quadrants as open circles*trend="negative"*(not the default) needs to be used if the trend of the data is negative.*xthreshold*and*ythreshold*determine how many options the function will return for critical x and critical y. A value of 1 would return all possibilities. A value of 0.10 returns values in the top 10% of the range of maximum sum of squares.*clx*and*cly*determine which of the listed critical x and critical y the function should use to build the final model. A value of 1 selects the first displayed value, and a value of 2 selects the second. This is useful when you have more than one critical x that maximizes or nearly maximizes the sum of squares, or if you want to force the critical y value to be close to some value such as 90% of maximum yield. Note that changing the clx value will also change the list of critical y values that is displayed. In the second example I set*clx=2*to select a critical x that more evenly divides the errors across the quadrants.

#### Example of Cate–Nelson analysis

##--------------------------------------------------------------------

## Cate-Nelson analysis

## Data from Mangiafico, S.S., Newman, J.P., Mochizuki, M.J.,

## & Zurawski, D. (2008). Adoption of sustainable practices

## to protect and conserve water resources in container nurseries

## with greenhouse facilities. Acta horticulturae 797, 367–372.

##--------------------------------------------------------------------

size = c(68.55,6.45,6.98,1.05,4.44,0.46,4.02,1.21,4.03,

6.05,48.39,9.88,3.63,38.31,22.98,5.24,2.82,1.61,

76.61,4.64,0.28,0.37,0.81,1.41,0.81,2.02,20.16,

4.04,8.47,8.06,20.97,11.69,16.13,6.85,4.84,80.65,1.61,0.10)

proportion = c(0.850,0.729,0.737,0.752,0.639,0.579,0.594,0.534,

0.541,0.759,0.677,0.820,0.534,0.684,0.504,0.662,

0.624,0.647,0.609,0.647,0.632,0.632,0.459,0.684,

0.361,0.556,0.850,0.729,0.729,0.669,0.880,0.774,

0.729,0.774,0.662,0.737,0.586,0.316)

library(rcompanion)

cateNelson(x = size,

y = proportion,

plotit=TRUE,

hollow=TRUE,

xlab="Nursery size in hectares",

ylab="Proportion of good practices adopted",

trend="positive",

clx=1,

cly=1,

xthreshold=0.10,

ythreshold=0.15)

Critical x that maximize sum of squares:

Critical.x.value Sum.of.squares

1 4.035 0.2254775

2 4.740 0.2046979

Critical y that minimize errors:

Critical.y.value Q.i Q.ii Q.iii Q.iv Q.model Q.err

1 0.6355 3 20 2 13 33 5

2 0.6430 3 19 3 13 32 6

3 0.6470 3 19 3 13 32 6

4 0.6545 2 18 4 14 32 6

5 0.6620 2 18 4 14 32 6

6 0.6015 6 21 1 10 31 7

7 0.6280 5 20 2 11 31 7

8 0.6320 5 20 2 11 31 7

n = Number of observations

CLx = Critical value of x

SS = Sum of squares for that critical value of x

CLy = Critical value of y

Q = Number of observations which fall into quadrants I, II, III, IV

Q.model = Total observations which fall into the quadrants predicted by the model

p.model = Percent observations which fall into the quadrants predicted by the model

Q.Error = Observations which do not fall into the quadrants predicted by the model

p.Error = Percent observations which do not fall into the quadrants predicted by the model

Fisher = p-value from Fisher exact test dividing data into these quadrants

Final result:

n CLx SS CLy Q.I Q.II Q.III Q.IV Q.Model p.Model Q.Error

1 38 4.035 0.2254775 0.6355 3 20 2 13 33 0.8684211 5

p.Error Fisher.p.value

0.1315789 8.532968e-06

Plots showing the results of Cate–Nelson analysis. In the final plot, the critical x value is indicated with a vertical blue line, and the critical y value is indicated with a horizontal blue line. Points agreeing with the model are solid, while hollow points indicate data not agreeing with model. (Data from Mangiafico, S.S., Newman, J.P., Mochizuki, M.J., & Zurawski, D. (2008). Adoption of sustainable practices to protect and conserve water resources in container nurseries with greenhouse facilities. Acta horticulturae 797, 367–372.)

# # #

#### Example of Cate–Nelson analysis with negative trend data

##--------------------------------------------------------------------

## Cate-Nelson analysis

## Hypothetical data

##--------------------------------------------------------------------

Input =("

x y

5 55

7 110

6 120

5 130

7 120

10 55

12 60

11 110

15 50

21 55

22 60

20 70

24 55

")

Data = read.table(textConnection(Input),header=TRUE)

library(rcompanion)

cateNelson(x = Data$x,

y = Data$y,

plotit=TRUE,

hollow=TRUE,

xlab="x",

ylab="y",

trend="negative",

clx=2, # Normally leave as 1
unless you wish to

cly=1, # select a specific
critical x value

xthreshold=0.10,

ythreshold=0.15)

Critical x that maximize sum of squares:

Critical.x.value Sum.of.squares

1 11.5 5608.974

2 8.5 5590.433

Critical y that minimize errors:

Critical.y.value Q.i Q.ii Q.iii Q.iv Q.model Q.err

1 90 4 1 7 1 11 2

2 110 4 1 7 1 11 2

3 115 3 0 8 2 11 2

4 120 3 0 8 2 11 2

n = Number of observations

CLx = Critical value of x

SS = Sum of squares for that critical value of x

CLy = Critical value of y

Q = Number of observations which fall into quadrants I, II, III, IV

Q.Model = Total observations which fall into the quadrants predicted by the model

p.Model = Percent observations which fall into the quadrants predicted by the model

Q.Error = Observations which do not fall into the quadrants predicted by the model

p.Error = Percent observations which do not fall into the quadrants predicted by the model

Fisher = p-value from Fisher exact test dividing data into these quadrants

Final model:

n CLx SS CLy Q.I Q.II Q.III Q.IV Q.Model p.Model Q.Error

1 13 8.5 5608.974 90 4 1 7 1 11 0.8461538 2

p.Error Fisher.p.value

0.1538462 0.03185703

Plot showing the final result of Cate–Nelson analysis, for data with a negative trend.

# # #

### References

Mangiafico, S.S. 2013. Cate-Nelson Analysis for Bivariate Data Using R-project. J.of Extension 51:5, 5TOT1. http://www.joe.org/joe/2013october/tt1.php.

Cate, R. B., & Nelson, L.A. (1971). A simple statistical procedure for partitioning soil test correlation data into two classes. Soil Science Society of America Proceedings 35, 658–660.