[banner]

Summary and Analysis of Extension Program Evaluation in R

Salvatore S. Mangiafico

Nonparametric Regression for Time Series

 

There are specific nonparametric techniques that are commonly used for time series data.  The Mann–Kendall trend test is commonly used to determine if a trend exists and can handle seasonal patterns within the data.  The slope of the trend is often determined with Sen’s slope, which can also handle seasonality in the data.  Finally, Pettitt’s test is used to determine if there is a point at which the values in the data change.

 

It is my understanding that the tests in this chapter do not take into consideration any autocorrelation in the data.  Autocorrelation would be for example if an observation with a high value would tend to be followed by another observation with a high value.  For the stage data in this chapter, since the stage of a tidal river is affected by the phase of the moon and by the direction and strength of the wind, it would be reasonable to assume that the average stage for one day would be correlated to the previous day.   Due to this limitation in these tests, there are two conditions under which these they are usually used.  1) When the observations are far enough apart in time that autocorrelation is unlikely to be important.  For example, in the stage data in this chapter, monthly averages are used.  2) When the values are likely to be influenced mostly by non-autocorrelated factors.  For data where autocorrelation is likely to be important, other models, such as autoregressive integrated moving average (ARIMA), could be used.

 

Packages used in this chapter

 

The packages used in this chapter include:

•  mice

•  Kendall

•  trend

 

The following commands will install these packages if they are not already installed:


if(!require(mice)){install.packages("mice")}
if(!require(Kendall)){install.packages("Kendall")}
if(!require(trend)){install.packages("trend")}


Nonparametric regression examples

 

The data used in this chapter is a times series of stage measurements of the tidal Cohansey River in Greenwich, NJ.  Stage is the height of the river, in this case given in feet, with an arbitrary 0 datum.  The data are from U.S. Geology Survey site 01413038, and are monthly averages.

 

The MannKendall and SeasonalMannKendall functions can handle missing data.  However, the sens.slope, sea.sens.slope, decompose, and stl functions, at the time of writing, cannot handle missing data.

 

Because the data set has a few missing data points, we will impute those values with the mice function in the mice package.

 

The function ts converts data into a time series object.  The user needs to be careful using this function, since it does not check for missing lines in the data.  For example, if there is no observation for February, the function will simply take the March the value for February and continue.  However, one could add an observation for February with a value of NA.


Greenwich = read.table("http://rcompanion.org/documents/Greenwich.csv",
                        header=TRUE, sep=",")


Greenwich[125:136,]


    Agency Station Parameter TS_ID Year Month  Stage
125   USGS 1413038        65 97255 2010    11  0.107
126   USGS 1413038        65 97255 2010    12 -0.297
127   USGS 1413038        65 97255 2011     1 -0.350
128   USGS 1413038        65 97255 2011     2     NA
129   USGS 1413038        65 97255 2011     3     NA
130   USGS 1413038        65 97255 2011     4     NA
131   USGS 1413038        65 97255 2011     5     NA
132   USGS 1413038        65 97255 2011     6     NA
133   USGS 1413038        65 97255 2011     7     NA
134   USGS 1413038        65 97255 2011     8  0.347
135   USGS 1413038        65 97255 2011     9  0.738
136   USGS 1413038        65 97255 2011    10  0.499


library(mice)

Mousey = mice(Greenwich)

Greenwich = complete(Mousey)


Greenwich[125:136,]


    Agency Station Parameter TS_ID Year Month  Stage
125   USGS 1413038        65 97255 2010    11  0.107
126   USGS 1413038        65 97255 2010    12 -0.297
127   USGS 1413038        65 97255 2011     1 -0.350
128   USGS 1413038        65 97255 2011     2 -0.076
129   USGS 1413038        65 97255 2011     3  0.176
130   USGS 1413038        65 97255 2011     4 -0.314
131   USGS 1413038        65 97255 2011     5  0.051
132   USGS 1413038        65 97255 2011     6  0.035
133   USGS 1413038        65 97255 2011     7  0.347
134   USGS 1413038        65 97255 2011     8  0.347
135   USGS 1413038        65 97255 2011     9  0.738
136   USGS 1413038        65 97255 2011    10  0.499


TS = ts(Greenwich$Stage,
        frequency=12,
        start=c(2000,4))

TS


        Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct    Nov    Dec
2000                       0.297  0.219  0.055  0.260  0.293  0.261  0.074 -0.013 -0.523
2001 -0.227 -0.465 -0.076 -0.008  0.183  0.157  0.206  0.198  0.412  0.037 -0.089 -0.263
2002 -0.513 -0.276 -0.522 -0.235 -0.120  0.172  0.135  0.253  0.362  0.448 -0.205 -0.276
2003  0.011  0.120  0.350  0.337  0.391  0.212  0.299 -0.132 -0.313 -0.534 -0.395 -0.310
2004 -0.419 -0.314  0.051  0.086  0.039  0.314  0.432 -0.099 -0.408 -0.267 -0.121 -0.140
2005  0.293  0.364  0.168  0.233  0.164  0.100  0.279 -0.375 -0.603 -0.427 -0.544 -0.532
2006 -0.122  0.164 -0.067 -0.106  0.115  0.446  0.034  0.160 -0.623 -0.692 -0.311 -0.731
2007 -0.199 -0.223  0.100 -0.063  0.113 -0.024  0.105 -0.284 -0.456 -0.591 -0.565 -0.381
2008  0.020  0.277 -0.037 -0.107  0.011  0.241 -0.038 -0.222 -0.806 -0.802 -0.654 -0.247
2009 -0.144 -0.119  0.510  0.274  0.097  0.412  0.418  0.350 -0.286  0.047 -0.323  0.393
2010  0.021 -0.049  0.141  0.035  0.318  0.263  0.023  0.107 -0.297 -0.350  0.051  0.289
2011 -0.063  0.164  0.314  0.037  0.347  0.738  0.499  0.087 -0.119 -0.355 -0.277 -0.025
2012  0.088  0.254  0.581  0.415  0.435  0.277  0.524  0.263  0.153 -0.290 -0.330  0.176
2013  0.054  0.023  0.289  0.341  0.392  0.378  0.536 -0.314 -0.265 -0.132 -0.408 -0.134
2014  0.158  0.300  0.401  0.172  0.456  0.492


plot(TS)





Decomposing time series objects

 

It is helpful to decompose time series data into seasonal and trend components.  The decompose function in the native stats package uses “classical seasonal decomposition by moving averages”, and the stl function in the native stats package uses “seasonal decomposition of time series by loess”.


plot(decompose(TS))





plot(stl(TS,
         s.window="periodic"))






Mann–Kendall trend test

 

The Mann–Kendall trend test is completely nonparametric.  The MannKendall function in the Kendall package can be used with a time series object.  The SeasonalMannKendall function performs the test while taking into account the seasonality of the data.


library(Kendall)

MK = MannKendall(TS)

summary(MK)


Score =  1452 , Var(Score) = 560384
denominator =  14524.5
tau = 0.1, 2-sided pvalue =0.052585

###  tau is a measure of the strength of the trend

###  p-value for trend is also shown


library(Kendall)

SMK = SeasonalMannKendall(TS)

summary(SMK)


Score =  228 , Var(Score) = 4226
denominator =  1132.997
tau = 0.201, 2-sided pvalue =0.00045272

###  tau is a measure of the strength of the trend

###  p-value for trend is also shown


Sen’s slope for time series data

 

The sens.slope function in the trend package is used with a time series object.  The sea.sens.slope function performs the test while taking into account the seasonality of the data.


library(trend)

sens.slope(TS)


      Sen's slope

z = 1.9383, n = 171, p-value = 0.05258

95 percent confidence interval:
 0.000000000 0.002070513

sample estimates:
Sen's slope
0.001029703


The sea.sens.slope function appears to not tolerate incomplete time series.  For these data, we can remove the data for the incomplete years (2000 and 2014).


GW = Greenwich[10:165,]

TS2 = ts(GW$Stage,
        frequency=12,
        start=c(2001,1))

TS2


        Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct    Nov    Dec
2001 -0.227 -0.465 -0.076 -0.008  0.183  0.157  0.206  0.198  0.412  0.037 -0.089 -0.263
2002 -0.513 -0.276 -0.522 -0.235 -0.120  0.172  0.135  0.253  0.362  0.448 -0.205 -0.276
2003  0.011  0.120  0.350  0.337  0.391  0.212  0.299 -0.132 -0.313 -0.534 -0.395 -0.310
2004 -0.419 -0.314  0.051  0.086  0.039  0.314  0.432 -0.099 -0.408 -0.267 -0.121 -0.140
2005  0.293  0.364  0.168  0.233  0.164  0.100  0.279 -0.375 -0.603 -0.427 -0.544 -0.532
2006 -0.122  0.164 -0.067 -0.106  0.115  0.446  0.034  0.160 -0.623 -0.692 -0.311 -0.731
2007 -0.199 -0.223  0.100 -0.063  0.113 -0.024  0.105 -0.284 -0.456 -0.591 -0.565 -0.381
2008  0.020  0.277 -0.037 -0.107  0.011  0.241 -0.038 -0.222 -0.806 -0.802 -0.654 -0.247
2009 -0.144 -0.119  0.510  0.274  0.097  0.412  0.418  0.350 -0.286  0.047 -0.323  0.393
2010  0.021 -0.049  0.141  0.035  0.318  0.263  0.023  0.107 -0.297 -0.350 -0.565 -0.314
2011 -0.330  0.233  0.164  0.035  0.347  0.738  0.499  0.087 -0.119 -0.355 -0.277 -0.025
2012  0.088  0.254  0.581  0.415  0.435  0.277  0.524  0.263  0.153 -0.290 -0.330  0.176
2013  0.054  0.023  0.289  0.341  0.392  0.378  0.536 -0.314 -0.265 -0.132 -0.408 -0.134


library(trend)

sea.sens.slope(TS2)


[1] 0.02240179


Pettitt’s test for change in value

 

The pettitt.test function in the trend package identifies a point at which the values in the data change.  The results here suggest that the stage values were higher after May 2009 than before.


library(trend)

pettitt.test(TS)


Pettitt's test for single change-point detection

U* = 2568, p-value = 0.0007662

sample estimates:
probable change point at time K
                            107


Greenwich[107,]


    Agency Station Parameter TS_ID Year Month  Stage
107   USGS 1413038        65 97255 2009     5 -0.119


References

 

Dennis R. Helsel,D.R., R.M. Hirsch, K.R. Ryberg, S.A. Archfield, and E.J. Gilroy. 2020. Statistical Methods in Water Resources, 4-A3. pubs.er.usgs.gov/publication/tm4A3.