There are specific nonparametric techniques that are commonly used for time series data. The Mann–Kendall trend test is commonly used to determine if a trend exists and can handle seasonal patterns within the data. The slope of the trend is often determined with Sen’s slope, which can also handle seasonality in the data. Finally, Pettitt’s test is used to determine if there is a point at which the values in the data change.
It is my understanding that the tests in this chapter do not take into consideration any autocorrelation in the data. Autocorrelation would be for example if an observation with a high value would tend to be followed by another observation with a high value. For the stage data in this chapter, since the stage of a tidal river is affected by the phase of the moon and by the direction and strength of the wind, it would be reasonable to assume that the average stage for one day would be correlated to the previous day. Due to this limitation in these tests, there are two conditions under which these they are usually used. 1) When the observations are far enough apart in time that autocorrelation is unlikely to be important. For example, in the stage data in this chapter, monthly averages are used. 2) When the values are likely to be influenced mostly by non-autocorrelated factors. For data where autocorrelation is likely to be important, other models, such as autoregressive integrated moving average (ARIMA), could be used.
Packages used in this chapter
The packages used in this chapter include:
• mice
• Kendall
• trend
The following commands will install these packages if they are not already installed:
if(!require(mice)){install.packages("mice")}
if(!require(Kendall)){install.packages("Kendall")}
if(!require(trend)){install.packages("trend")}
Nonparametric regression examples
The data used in this chapter is a times series of stage measurements of the tidal Cohansey River in Greenwich, NJ. Stage is the height of the river, in this case given in feet, with an arbitrary 0 datum. The data are from U.S. Geology Survey site 01413038, and are monthly averages.
The MannKendall and SeasonalMannKendall functions can handle missing data. However, the sens.slope, sea.sens.slope, decompose, and stl functions, at the time of writing, cannot handle missing data.
Because the data set has a few missing data points, we will impute those values with the mice function in the mice package.
The function ts converts data into a time series object. The user needs to be careful using this function, since it does not check for missing lines in the data. For example, if there is no observation for February, the function will simply take the March the value for February and continue. However, one could add an observation for February with a value of NA.
Greenwich = read.table("http://rcompanion.org/documents/Greenwich.csv",
header=TRUE, sep=",")
Greenwich[125:136,]
Agency Station Parameter TS_ID Year Month Stage
125 USGS 1413038 65 97255 2010 11 0.107
126 USGS 1413038 65 97255 2010 12 -0.297
127 USGS 1413038 65 97255 2011 1 -0.350
128 USGS 1413038 65 97255 2011 2 NA
129 USGS 1413038 65 97255 2011 3 NA
130 USGS 1413038 65 97255 2011 4 NA
131 USGS 1413038 65 97255 2011 5 NA
132 USGS 1413038 65 97255 2011 6 NA
133 USGS 1413038 65 97255 2011 7 NA
134 USGS 1413038 65 97255 2011 8 0.347
135 USGS 1413038 65 97255 2011 9 0.738
136 USGS 1413038 65 97255 2011 10 0.499
library(mice)
Mousey = mice(Greenwich)
Greenwich = complete(Mousey)
Greenwich[125:136,]
Agency Station Parameter TS_ID Year Month Stage
125 USGS 1413038 65 97255 2010 11 0.107
126 USGS 1413038 65 97255 2010 12 -0.297
127 USGS 1413038 65 97255 2011 1 -0.350
128 USGS 1413038 65 97255 2011 2 -0.076
129 USGS 1413038 65 97255 2011 3 0.176
130 USGS 1413038 65 97255 2011 4 -0.314
131 USGS 1413038 65 97255 2011 5 0.051
132 USGS 1413038 65 97255 2011 6 0.035
133 USGS 1413038 65 97255 2011 7 0.347
134 USGS 1413038 65 97255 2011 8 0.347
135 USGS 1413038 65 97255 2011 9 0.738
136 USGS 1413038 65 97255 2011 10 0.499
TS = ts(Greenwich$Stage,
frequency=12,
start=c(2000,4))
TS
Jan Feb
Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2000 0.297 0.219 0.055 0.260 0.293 0.261 0.074
-0.013 -0.523
2001 -0.227 -0.465 -0.076 -0.008 0.183 0.157 0.206 0.198 0.412 0.037
-0.089 -0.263
2002 -0.513 -0.276 -0.522 -0.235 -0.120 0.172 0.135 0.253 0.362 0.448
-0.205 -0.276
2003 0.011 0.120 0.350 0.337 0.391 0.212 0.299 -0.132 -0.313 -0.534
-0.395 -0.310
2004 -0.419 -0.314 0.051 0.086 0.039 0.314 0.432 -0.099 -0.408 -0.267
-0.121 -0.140
2005 0.293 0.364 0.168 0.233 0.164 0.100 0.279 -0.375 -0.603 -0.427
-0.544 -0.532
2006 -0.122 0.164 -0.067 -0.106 0.115 0.446 0.034 0.160 -0.623 -0.692
-0.311 -0.731
2007 -0.199 -0.223 0.100 -0.063 0.113 -0.024 0.105 -0.284 -0.456 -0.591
-0.565 -0.381
2008 0.020 0.277 -0.037 -0.107 0.011 0.241 -0.038 -0.222 -0.806 -0.802 -0.654
-0.247
2009 -0.144 -0.119 0.510 0.274 0.097 0.412 0.418 0.350 -0.286 0.047
-0.323 0.393
2010 0.021 -0.049 0.141 0.035 0.318 0.263 0.023 0.107 -0.297 -0.350
0.051 0.289
2011 -0.063 0.164 0.314 0.037 0.347 0.738 0.499 0.087 -0.119 -0.355
-0.277 -0.025
2012 0.088 0.254 0.581 0.415 0.435 0.277 0.524 0.263 0.153 -0.290
-0.330 0.176
2013 0.054 0.023 0.289 0.341 0.392 0.378 0.536 -0.314 -0.265 -0.132
-0.408 -0.134
2014 0.158 0.300 0.401 0.172 0.456 0.492
plot(TS)
Decomposing time series objects
It is helpful to decompose time series data into seasonal and trend components. The decompose function in the native stats package uses “classical seasonal decomposition by moving averages”, and the stl function in the native stats package uses “seasonal decomposition of time series by loess”.
plot(decompose(TS))
plot(stl(TS,
s.window="periodic"))
Mann–Kendall trend test
The Mann–Kendall trend test is completely nonparametric. The MannKendall function in the Kendall package can be used with a time series object. The SeasonalMannKendall function performs the test while taking into account the seasonality of the data.
library(Kendall)
MK = MannKendall(TS)
summary(MK)
Score = 1452 , Var(Score) = 560384
denominator = 14524.5
tau = 0.1, 2-sided pvalue =0.052585
### tau is a measure of the strength of the trend
### p-value for trend is also shown
library(Kendall)
SMK = SeasonalMannKendall(TS)
summary(SMK)
Score = 228 , Var(Score) = 4226
denominator = 1132.997
tau = 0.201, 2-sided pvalue =0.00045272
### tau is a measure of the strength of the trend
### p-value for trend is also shown
Sen’s slope for time series data
The sens.slope function in the trend package is used with a time series object. The sea.sens.slope function performs the test while taking into account the seasonality of the data.
library(trend)
sens.slope(TS)
Sen's slope
z = 1.9383, n = 171, p-value = 0.05258
95 percent confidence interval:
0.000000000 0.002070513
sample estimates:
Sen's slope
0.001029703
The sea.sens.slope function appears to not tolerate incomplete time series. For these data, we can remove the data for the incomplete years (2000 and 2014).
GW = Greenwich[10:165,]
TS2 = ts(GW$Stage,
frequency=12,
start=c(2001,1))
TS2
Jan Feb Mar Apr May Jun Jul Aug Sep Oct
Nov Dec
2001 -0.227 -0.465 -0.076 -0.008 0.183 0.157 0.206 0.198 0.412 0.037 -0.089
-0.263
2002 -0.513 -0.276 -0.522 -0.235 -0.120 0.172 0.135 0.253 0.362 0.448
-0.205 -0.276
2003 0.011 0.120 0.350 0.337 0.391 0.212 0.299 -0.132 -0.313 -0.534
-0.395 -0.310
2004 -0.419 -0.314 0.051 0.086 0.039 0.314 0.432 -0.099 -0.408 -0.267
-0.121 -0.140
2005 0.293 0.364 0.168 0.233 0.164 0.100 0.279 -0.375 -0.603 -0.427
-0.544 -0.532
2006 -0.122 0.164 -0.067 -0.106 0.115 0.446 0.034 0.160 -0.623 -0.692
-0.311 -0.731
2007 -0.199 -0.223 0.100 -0.063 0.113 -0.024 0.105 -0.284 -0.456 -0.591
-0.565 -0.381
2008 0.020 0.277 -0.037 -0.107 0.011 0.241 -0.038 -0.222 -0.806 -0.802
-0.654 -0.247
2009 -0.144 -0.119 0.510 0.274 0.097 0.412 0.418 0.350 -0.286 0.047
-0.323 0.393
2010 0.021 -0.049 0.141 0.035 0.318 0.263 0.023 0.107 -0.297 -0.350
-0.565 -0.314
2011 -0.330 0.233 0.164 0.035 0.347 0.738 0.499 0.087 -0.119 -0.355
-0.277 -0.025
2012 0.088 0.254 0.581 0.415 0.435 0.277 0.524 0.263 0.153 -0.290
-0.330 0.176
2013 0.054 0.023 0.289 0.341 0.392 0.378 0.536 -0.314 -0.265 -0.132
-0.408 -0.134
library(trend)
sea.sens.slope(TS2)
[1] 0.02240179
Pettitt’s test for change in value
The pettitt.test function in the trend package identifies a point at which the values in the data change. The results here suggest that the stage values were higher after May 2009 than before.
library(trend)
pettitt.test(TS)
Pettitt's test for single change-point detection
U* = 2568, p-value = 0.0007662
sample estimates:
probable change point at time K
107
Greenwich[107,]
Agency Station Parameter TS_ID Year Month Stage
107 USGS 1413038 65 97255 2009 5 -0.119
References
Dennis R. Helsel,D.R., R.M. Hirsch, K.R. Ryberg, S.A. Archfield, and E.J. Gilroy. 2020. Statistical Methods in Water Resources, 4-A3. pubs.er.usgs.gov/publication/tm4A3.