## Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

# Nonparametric Regression for Time Series

There are specific nonparametric techniques that are commonly used for time series data.  The Mann–Kendall trend test is commonly used to determine if a trend exists, and can handle seasonal patterns within the data.  The slope of the trend is often determined with Sen’s slope, which can also handle seasonality in the data.  Finally, Pettitt’s test is use to determine if there is a point at which the values in the data change.

It is my understanding that the tests in this chapter do not take into consideration any autocorrelation in the data.  Autocorrelation would be for example if an observation with a high value would tend to be followed by another observation with a high value.  For the stage data in this chapter, since the stage of a tidal river is affected by the phase of the moon and by the direction and strength of the wind, it would be reasonable to assume that the average stage for one day would be correlated to the previous day.   Due to this limitation in these tests, there are two conditions under which these they are usually used.  1) When the observations are far enough apart in time that autocorrelation is unlikely to be important.  For example, in the stage data in this chapter, monthly averages are used.  2) When the values are likely to be influenced mostly by non-autocorrelated factors.  For data where autocorrelation is likely to be important, other models, such as autoregressive integrated moving average (ARIMA), could be used.

### Packages used in this chapter

The packages used in this chapter include:

•  mice

•  Kendall

•  trend

The following commands will install these packages if they are not already installed:

if(!require(mice)){install.packages("mice")}
if(!require(Kendall)){install.packages("Kendall")}
if(!require(trend)){install.packages("trend")}

### Nonparametric regression examples

The data used in this chapter is a times series of stage measurements of the tidal Cohansey River in Greenwich, NJ.  Stage is the height of the river, in this case given in feet, with an arbitrary 0 datum.  The data are from U.S. Geology Survey site 01413038, and are monthly averages.

The MannKendall and SeasonalMannKendall functions can handle missing data.  However, the sens.slope, sea.sens.slope, decompose, and stl functions, at the time of writing, cannot handle missing data.

Because the data set has a few missing data points, we will impute those values with the mice function in the mice package.

The function ts converts data into a time series object.  The user needs to be careful using this function, since it does not check for missing lines in the data.  For example, if there is no observation for February, the function will simply take the March the value for February and continue.  However, one could add an observation for February with a value of NA.

Greenwich[125:136,]

Agency Station Parameter TS_ID Year Month  Stage
125   USGS 1413038        65 97255 2010    11  0.107
126   USGS 1413038        65 97255 2010    12 -0.297
127   USGS 1413038        65 97255 2011     1 -0.350
128   USGS 1413038        65 97255 2011     2     NA
129   USGS 1413038        65 97255 2011     3     NA
130   USGS 1413038        65 97255 2011     4     NA
131   USGS 1413038        65 97255 2011     5     NA
132   USGS 1413038        65 97255 2011     6     NA
133   USGS 1413038        65 97255 2011     7     NA
134   USGS 1413038        65 97255 2011     8  0.347
135   USGS 1413038        65 97255 2011     9  0.738
136   USGS 1413038        65 97255 2011    10  0.499

library(mice)

Mousey = mice(Greenwich)

Greenwich = complete(Mousey)

Greenwich[125:136,]

Agency Station Parameter TS_ID Year Month  Stage
125   USGS 1413038        65 97255 2010    11  0.107
126   USGS 1413038        65 97255 2010    12 -0.297
127   USGS 1413038        65 97255 2011     1 -0.350
128   USGS 1413038        65 97255 2011     2 -0.076
129   USGS 1413038        65 97255 2011     3  0.176
130   USGS 1413038        65 97255 2011     4 -0.314
131   USGS 1413038        65 97255 2011     5  0.051
132   USGS 1413038        65 97255 2011     6  0.035
133   USGS 1413038        65 97255 2011     7  0.347
134   USGS 1413038        65 97255 2011     8  0.347
135   USGS 1413038        65 97255 2011     9  0.738
136   USGS 1413038        65 97255 2011    10  0.499

TS = ts(Greenwich\$Stage,
frequency=12,
start=c(2000,4))

TS

Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct    Nov    Dec
2000                       0.297  0.219  0.055  0.260  0.293  0.261  0.074 -0.013 -0.523
2001 -0.227 -0.465 -0.076 -0.008  0.183  0.157  0.206  0.198  0.412  0.037 -0.089 -0.263
2002 -0.513 -0.276 -0.522 -0.235 -0.120  0.172  0.135  0.253  0.362  0.448 -0.205 -0.276
2003  0.011  0.120  0.350  0.337  0.391  0.212  0.299 -0.132 -0.313 -0.534 -0.395 -0.310
2004 -0.419 -0.314  0.051  0.086  0.039  0.314  0.432 -0.099 -0.408 -0.267 -0.121 -0.140
2005  0.293  0.364  0.168  0.233  0.164  0.100  0.279 -0.375 -0.603 -0.427 -0.544 -0.532
2006 -0.122  0.164 -0.067 -0.106  0.115  0.446  0.034  0.160 -0.623 -0.692 -0.311 -0.731
2007 -0.199 -0.223  0.100 -0.063  0.113 -0.024  0.105 -0.284 -0.456 -0.591 -0.565 -0.381
2008  0.020  0.277 -0.037 -0.107  0.011  0.241 -0.038 -0.222 -0.806 -0.802 -0.654 -0.247
2009 -0.144 -0.119  0.510  0.274  0.097  0.412  0.418  0.350 -0.286  0.047 -0.323  0.393
2010  0.021 -0.049  0.141  0.035  0.318  0.263  0.023  0.107 -0.297 -0.350  0.051  0.289
2011 -0.063  0.164  0.314  0.037  0.347  0.738  0.499  0.087 -0.119 -0.355 -0.277 -0.025
2012  0.088  0.254  0.581  0.415  0.435  0.277  0.524  0.263  0.153 -0.290 -0.330  0.176
2013  0.054  0.023  0.289  0.341  0.392  0.378  0.536 -0.314 -0.265 -0.132 -0.408 -0.134
2014  0.158  0.300  0.401  0.172  0.456  0.492

plot(TS)

### Decomposing time series objects

It is helpful to decompose time series data into seasonal and trend components.  The decompose function in the native stats package uses “classical seasonal decomposition by moving averages”, and the stl function in the native stats package uses “seasonal decomposition of time series by loess”.

plot(decompose(TS))

plot(stl(TS,
s.window="periodic"))

### Mann–Kendall trend test

The Mann–Kendall trend test is completely nonparametric.  The MannKendall function in the Kendall package can be used with a time series object.  The SeasonalMannKendall function performs the test while taking into account the seasonality of the data.

library(Kendall)

MK = MannKendall(TS)

summary(MK)

Score =  1371 , Var(Score) = 560382.3
denominator =  14524
tau = 0.0944, 2-sided pvalue =0.067233

###  tau is a measure of the strength of the trend
###  p-value for trend is also shown

library(Kendall)

SMK = SeasonalMannKendall(TS)

summary(SMK)

Score =  217 , Var(Score) = 4227
denominator =  1133.499
tau = 0.191, 2-sided pvalue =0.00084484

###  tau is a measure of the strength of the trend
###  p-value for trend is also shown

### Sen’s slope for time series data

The sens.slope function in the trend package is used with a time series object.  The sea.sens.slope function performs the test while taking into account the seasonality of the data.

library(trend)

sens.slope(TS)

Sen's slope and intercept

slope:  0.001
95 percent confidence intervall for slope
0.002 -1e-04

intercept: -0.0475
nr. of observations: 171

library(trend)

sea.sens.slope(TS)

The sea.sens.slope function appears to not tolerate incomplete time series.  For these data, we can remove the data for the incomplete years (2000 and 2014).

GW = Greenwich[10:165,]

TS2 = ts(GW\$Stage,
frequency=12,
start=c(2001,1))

TS2

Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct    Nov    Dec
2001 -0.227 -0.465 -0.076 -0.008  0.183  0.157  0.206  0.198  0.412  0.037 -0.089 -0.263
2002 -0.513 -0.276 -0.522 -0.235 -0.120  0.172  0.135  0.253  0.362  0.448 -0.205 -0.276
2003  0.011  0.120  0.350  0.337  0.391  0.212  0.299 -0.132 -0.313 -0.534 -0.395 -0.310
2004 -0.419 -0.314  0.051  0.086  0.039  0.314  0.432 -0.099 -0.408 -0.267 -0.121 -0.140
2005  0.293  0.364  0.168  0.233  0.164  0.100  0.279 -0.375 -0.603 -0.427 -0.544 -0.532
2006 -0.122  0.164 -0.067 -0.106  0.115  0.446  0.034  0.160 -0.623 -0.692 -0.311 -0.731
2007 -0.199 -0.223  0.100 -0.063  0.113 -0.024  0.105 -0.284 -0.456 -0.591 -0.565 -0.381
2008  0.020  0.277 -0.037 -0.107  0.011  0.241 -0.038 -0.222 -0.806 -0.802 -0.654 -0.247
2009 -0.144 -0.119  0.510  0.274  0.097  0.412  0.418  0.350 -0.286  0.047 -0.323  0.393
2010  0.021 -0.049  0.141  0.035  0.318  0.263  0.023  0.107 -0.297 -0.350 -0.565 -0.314
2011 -0.330  0.233  0.164  0.035  0.347  0.738  0.499  0.087 -0.119 -0.355 -0.277 -0.025
2012  0.088  0.254  0.581  0.415  0.435  0.277  0.524  0.263  0.153 -0.290 -0.330  0.176
2013  0.054  0.023  0.289  0.341  0.392  0.378  0.536 -0.314 -0.265 -0.132 -0.408 -0.134

library(trend)

sea.sens.slope(TS2)

[1] 0.0185

### Pettitt’s test for change in value

The pettitt.test function in the trend package identifies a point at which the values in the data change.  The results here suggest that the stage values were higher after May 2009 than before.

library(trend)

pettitt.test(TS)

Pettitt's test for single change-point detection

K = 2453, p-value = 0.001526
alternative hypothesis: true change point is present in the series

sample estimates:
probable change point at tau
107

Greenwich[107,]

Agency Station Parameter TS_ID Year Month  Stage
107   USGS 1413038        65 97255 2009     5 -0.119