Kolmogorov–Smirnov and Wald–Wolfowitz Tests

The Kolmogorov–Smirnov, Wald–Wolfowitz, and runs tests are non-parametric tests.

Two-sample tests

The two-sample versions of both the Kolmogorov–Smirnov (K–S) and Wald–Wolfowitz (W–W) tests compare two data sets to determine if they come from the same distribution. For example, if one data set came from a uniform distribution and one came from a normal distribution, these tests should be able to detect this difference, even if the central tendencies are similar or the samples show stochastic equality. This makes these tests difference in practice than, for example, the Mann–Whitney test. However, these tests are also sensitive to differences in central tendency and variance.

It's a general rule of thumb that the K–S test is more powerful than the W–W test when the distributions have a different location (central tendency), but that the W–W test is more powerful than the K–S test when the locations of the two distributions are similar.

One-sample tests

The one-sample K–S test compares the observed sample to a known distribution. In general, for this test, the parameters of the known distribution shouldn’t be defined based on the observed data. That is, if one were to want to use the K–S test to determine if the observed observations follow a normal distribution, the mean and standard deviation of the distribution shouldn’t be determined from the observed data.

The one-sample test similar to the W–W test is a test of runs. That is, if the sample were divided into two groups, those above some value and those below some value — here, designated 0 and 1 for convenience —if the sample tends to have a random arrangement of 0’s and 1’s, there doesn’t appear to be any significant runs. However, if the sample tends to have 0’s in the beginning of the sequence and 1’s in the end of the sequence, there are likely significant runs. Likewise, if there are long runs of 0’s or 1’s anywhere in the sequence, or even if there are fewer runs than are expected for a random sequence. The null hypothesis is randomness in the observations, and the order of the observations matters.

Appropriate data

• Two-sample data for two-sample tests; one-sample data for one sample tests

• Data are continuous or at least interval

• For the one-sample Wald–Wolfowitz test, dichotomous data can be used

Packages used in this chapter

The packages used in this chapter include:

• DescTools

• FSA

The following commands will install these packages if they are not already installed:

if(!require(DescTools)){install.packages("DescTools")}
if(!require(FSA)){install.packages("FSA")}

Two-sample examples

Example from Siegel and Castellan

Note in this example, that the distributions of the two groups have similar shapes and spreads, but that the central tendencies (locations) of the distributions are decidedly different. The K–S test is sensitive to this difference and reveals a significant result. Because the sample size is small, the W–W test does not return a significant result. But if the samples were similar with larger sample sizes, the W–W test would return a significant result also.

Eleventh_grade = c(35.2, 39.2, 40.9, 38.1, 34.4, 29.1, 41.8, 24.3, 32.4)
Seventh_grade = c(39.1, 41.2, 45.2, 46.2, 48.4, 48.7, 55.0, 40.6, 52.1, 47.2)

Plot

hist(Eleventh_grade, xlim=c(20,60), col=rgb(1,0,0,0.5), xlab="Errors",
ylab="Count", main="Errors by students in two grades")

hist(Seventh_grade, xlim=c(20,60), col=rgb(0,0,1,0.5), add=TRUE)

legend("topright", legend=c("Eleventh grade","Seventh grade"),
col=c(rgb(1,0,0,0.5), rgb(0,0,1,0.5)), pt.cex=2, pch=15)

Kolmogorov–Smirnov test

library(FSA)

ksTest(Eleventh_grade, Seventh_grade)

Exact two-sample Kolmogorov-Smirnov test

D = 0.7, p-value = 0.007036

Wald–Wolfowitz test

library(DescTools)

RunsTest(Eleventh_grade, Seventh_grade)

Wald-Wolfowitz Runs Test

### runs = 8, m = 9, n = 10, p-value = 0.3444

Hypothetical example

In this example, the two groups have distributions with equal means, medians, and ranges, but with different shapes. X has a uniform distribution, and Y has a bell-shaped distribution. Here, the K–S test fails to return a significant result, while the W–W returns a significant result.

Note also that because the groups are stochastically equal, the Wilcoxon–Mann–Whitney test returns a p-value of 1.

X = c(5,5,5,6,6,6,7,7,7,8,8,8,9,9,9,10,10,10,11,11,11,12,12,12)

Y = c(5,6,8,7,9,10,9,9,7,8,10,8,9,11,8,8,9,12,7,10,6,11,7,10)

summary(X)

### Min. 1st Qu. Median Mean 3rd Qu. Max.
### 5.00 6.75 8.50 8.50 10.25 12.00

summary(Y)

### Min. 1st Qu. Median Mean 3rd Qu. Max.
### 5.0 7.0 8.5 8.5 10.0 12.0

Plots

barplot(table(X))

barplot(table(Y))

Kolmogorov–Smirnov test

library(FSA)

ksTest(X, Y)

Exact two-sample Kolmogorov-Smirnov test

D = 0.125, p-value = 0.8652

Wald–Wolfowitz test

library(DescTools)

RunsTest(X, Y)

Wald-Wolfowitz Runs Test

z = -2.4803, runs = 16, m = 24, n = 24, p-value = 0.01313

Wilcoxon–Mann–Whitney test

wilcox.test(X,Y, correct=FALSE)

Wilcoxon rank sum test

W = 288, p-value = 1

One-sample examples

One-sample Kolmogorov–Smirnov test

For these examples, we’ll use the X and Y data from the previous example, and test each against a uniform distribution and a normal distribution. Note, here, that none of these tests return a significant result, suggesting that, for these data, the test can’t reliably report that the data don’t follow the uniform or normal distributions described in the function calls.

X = c(5,5,5,6,6,6,7,7,7,8,8,8,9,9,9,10,10,10,11,11,11,12,12,12)

Y = c(5,6,8,7,9,10,9,9,7,8,10,8,9,11,8,8,9,12,7,10,6,11,7,10)

Test of X against a uniform distribution from 5 to 12

library(FSA)

ksTest(X, "punif", 5, 12)

Asymptotic one-sample Kolmogorov-Smirnov test

D = 0.125, p-value = 0.8475

Test of X against a normal distribution with mean = 8 and sd = 2

ksTest(X, "pnorm", 8, 2)

Asymptotic one-sample Kolmogorov-Smirnov test

D = 0.21634, p-value = 0.2113

Test of Y against a uniform distribution from 5 to 12

library(FSA)

ksTest(Y, "punif", 5, 12)

Asymptotic one-sample Kolmogorov-Smirnov test

D = 0.16071, p-value = 0.5649

Test of Y against a normal distribution with mean = 8 and sd = 2

ksTest(Y, "pnorm", 8, 2)

Asymptotic one-sample Kolmogorov-Smirnov test

D = 0.20833, p-value = 0.2485

One-sample test of runs

For the one-sample test of runs, the order of the observations matter, as should be clear with the following examples.

Example from Siegel and Castellan 1

Note here, observations for UnfairCoin1 are obviously not random. The first 10 observations are heads and the last 10 observations are tails. This wouldn’t be expected from a fair coin flip.

But also, observations for UnfairCoin2 are obviously not random either. The fact that heads and tails always alternate without a sustained run of either heads or tails suggests a non-random process.

Finally, observations for FairCoin appear to be random. There are nearly equal numbers of heads and tails, and there are a few short runs of each.

UnfairCoin1 = c("H","H","H","H","H","H","H","H","H","H",
"T","T","T","T","T","T","T","T","T","T")

library(DescTools)

RunsTest(UnfairCoin1)

Runs Test for Randomness

runs = 2, m = 10, n = 10, p-value = 2.165e-05

UnfairCoin2 = c("H","T","H","T","H","T","H","T","H","T",
"H","T","H","T","H","T","H","T","H","T")

library(DescTools)

RunsTest(UnfairCoin2)

Runs Test for Randomness

runs = 20, m = 10, n = 10, p-value = 2.165e-05

FairCoin = c("T","T","T","H","T","T","H","H","H","T",
"H","H","T","H","H","T","H","T","T","T")

library(DescTools)

RunsTest(FairCoin)

Runs Test for Randomness

runs = 11, m = 9, n = 11, p-value = 1

Example from Siegel and Castellan 2

The default analysis for the one-sample test of runs on continuous data is to compare each observation to the median and reduce the data set to a dichotomous set of observations indicating if each observation is greater than the median or less than the median.

Here, the median is 25, and it appears there is a relatively random arrangement of observations greater than this median and less than this median.

Data = read.table(header=TRUE, text="
Child Score
1     31
2     23
3     36
4     43
5     51
6     44
7     12
8     26
9     43
10    75
11    2
12     3
13    15
14    18
15    78
16    24
17    13
18    27
19    86
20    61
21    13
22     7
23     6
24     8
")

library(DescTools)

RunsTest(Data$Score)

Runs Test for Randomness

runs = 10, m = 12, n = 12, p-value = 0.3009

sample estimates:
median(x)
25

References

Siegel, S. and Castellan, N.J. 1988. Nonparametric Statistics for the Behavioral Sciences, 2nd Edition. McGraw-Hill.

Wald, A. and Wolfowitz, J. 1940. On a test whether two samples are from the same population. Annals of Mathematical Statistics.

Non-commercial reproduction of this content, with attribution, is permitted.
For-profit reproduction without permission is prohibited.

If you use the code or information in this site in a published work, please cite it as a source. Also, if you are an instructor and use this book in your course, please let me know. My contact information is on the About the Author of this Book page.

Citation

Mangiafico, S.S. 2016. Summary and Analysis of Extension Program Evaluation in R, version 1.23.0, revised 2025. rcompanion.org/handbook/. (Pdf version: rcompanion.org/documents/RHandbookProgramEvaluation.pdf.)