Wolfram Language & System Documentation Center

KolmogorovSmirnovTest

KolmogorovSmirnovTest[data]

tests whether data is normally distributed using the Kolmogorov–Smirnov test.

KolmogorovSmirnovTest[data,dist]

tests whether data is distributed according to dist using the Kolmogorov–Smirnov test.

KolmogorovSmirnovTest[data,dist,"property"]

returns the value of "property".

Details and Options

KolmogorovSmirnovTest performs the Kolmogorov–Smirnov goodness-of-fit test with null hypothesis that data was drawn from a population with distribution dist and alternative hypothesis that it was not.
By default, a probability value or -value is returned.
A small -value suggests that it is unlikely that the data came from dist.
The dist can be any symbolic distribution with numeric and symbolic parameters or a dataset.
The data can be univariate {x₁,x₂,…} or multivariate {{x₁,y₁,…},{x₂,y₂,…},…}.
The Kolmogorov–Smirnov test assumes that the data came from a continuous distribution.
The Kolmogorov–Smirnov test effectively uses a test statistic based on $sup_x TemplateBox[{{{{F, ^, ^}, (, x, )}, -, {F, (, x, )}}}, Abs]$ where is the empirical CDF of data and is the CDF of dist.
For multivariate tests, the sum of the univariate marginal -values is used and is assumed to follow a UniformSumDistribution under .
KolmogorovSmirnovTest[data,dist,"HypothesisTestData"] returns a HypothesisTestData object htd that can be used to extract additional test results and properties using the form htd["property"].
KolmogorovSmirnovTest[data,dist,"property"] can be used to directly give the value of "property".
Properties related to the reporting of test results include:

	"PValue"	-value
	"PValueTable"	formatted version of "PValue"
	"ShortTestConclusion"	a short description of the conclusion of a test
	"TestConclusion"	a description of the conclusion of a test
	"TestData"	test statistic and -value
	"TestDataTable"	formatted version of "TestData"
	"TestStatistic"	test statistic
	"TestStatisticTable"	formatted "TestStatistic"

The following properties are independent of which test is being performed.
Properties related to the data distribution include:
"FittedDistribution" fitted distribution of data

"FittedDistributionParameters" distribution parameters of data
The following options can be given:
Method Automatic the method to use for computing -values

SignificanceLevel 0.05 cutoff for diagnostics and reporting
For a test for goodness of fit, a cutoff is chosen such that is rejected only if . The value of used for the "TestConclusion" and "ShortTestConclusion" properties is controlled by the SignificanceLevel option. By default, is set to 0.05.
With the setting Method->"MonteCarlo", datasets of the same length as the input are generated under using the fitted distribution. The EmpiricalDistribution from KolmogorovSmirnovTest[s_i,dist,"TestStatistic"] is then used to estimate the -value.

Examples

open all close all

Basic Examples (3)

Perform a Kolmogorov–Smirnov test for normality:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 10^4];

Wolfram Language code: KolmogorovSmirnovTest[data]

Wolfram Language code: Show[SmoothHistogram[data, PlotStyle -> Orange], Plot[PDF[NormalDistribution[], x], {x, -4, 4}]]

Test the fit of some data to a particular distribution:

Wolfram Language code: data = RandomVariate[LaplaceDistribution[1, 2], 10^3];

Wolfram Language code: KolmogorovSmirnovTest[data, LaplaceDistribution[1, 2]]

Wolfram Language code:

Show[SmoothHistogram[data, PlotStyle -> Orange, PlotRange -> {0, .25}], Plot[PDF[LaplaceDistribution[1, 2], x], {x, -15, 15}, PlotRange -> All]]

Compare the distributions of two datasets:

Wolfram Language code: data1 = RandomVariate[NormalDistribution[], 100];

Wolfram Language code: data2 = RandomVariate[NormalDistribution[], 150];

There is not a sufficient evidence that data may be samples from different distributions:

Wolfram Language code: KolmogorovSmirnovTest[data1, data2]

Wolfram Language code: SmoothHistogram[{data1, data2}]

Scope (9)

Testing (6)

Perform a Kolmogorov–Smirnov test for normality:

Wolfram Language code:

data1 = RandomVariate[NormalDistribution[], 10^4];
data2 = RandomVariate[StudentTDistribution[3], 10^4];

The -value for the normal data is large compared to the -value for the non-normal data:

Wolfram Language code: KolmogorovSmirnovTest[data1]

Wolfram Language code: KolmogorovSmirnovTest[data2]

Test the goodness of fit to a particular distribution:

Wolfram Language code:

data1 = RandomVariate[NormalDistribution[], 10^3];
data2 = RandomVariate[CauchyDistribution[0, 1], 10^3];

Wolfram Language code: KolmogorovSmirnovTest[data1, CauchyDistribution[0, 1]]

Wolfram Language code: KolmogorovSmirnovTest[data2, CauchyDistribution[0, 1]]

Compare the distributions of two datasets:

Wolfram Language code:

data1 = RandomVariate[NormalDistribution[], 10^3];
data2 = RandomVariate[NormalDistribution[], 10^3];

Wolfram Language code: KolmogorovSmirnovTest[data1, data2]

The two datasets do not have the same distribution:

Wolfram Language code: data3 = RandomVariate[NormalDistribution[0, 1.25], 10^3];

Wolfram Language code: KolmogorovSmirnovTest[data1, data3]

Test for multivariate normality:

Wolfram Language code:

data1 = RandomVariate[BinormalDistribution[.5], 10^3];
data2 = RandomVariate[LaplaceDistribution[1, 2], {10^3, 2}];

Wolfram Language code: KolmogorovSmirnovTest[data1]

Wolfram Language code: KolmogorovSmirnovTest[data2]

Test for goodness of fit to any multivariate distribution:

Wolfram Language code:

data1 = RandomVariate[BinormalDistribution[.5], 10^3];
data2 = RandomVariate[𝒹 = LaplaceDistribution[1, 2], {10^3, 2}];

Wolfram Language code: 𝒟 = ProductDistribution[𝒹, 𝒹];

Wolfram Language code: KolmogorovSmirnovTest[data1, 𝒟]

Wolfram Language code: KolmogorovSmirnovTest[data2, 𝒟]

Create a HypothesisTestData object for repeated property extraction:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 10^5];

Wolfram Language code: ℋ = KolmogorovSmirnovTest[data, Automatic, "HypothesisTestData"]

The properties available for extraction:

Wolfram Language code: ℋ["Properties"]

Reporting (3)

Tabulate the results of the Kolmogorov–Smirnov test:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 100];

Wolfram Language code: ℋ = KolmogorovSmirnovTest[data, Automatic, "HypothesisTestData"];

The full test table:

Wolfram Language code: ℋ["TestDataTable"]

A -value table:

Wolfram Language code: ℋ["PValueTable"]

The test statistic:

Wolfram Language code: ℋ["TestStatisticTable"]

Retrieve the entries from a Kolmogorov–Smirnov test table for custom reporting:

Wolfram Language code:

data1 = RandomVariate[NormalDistribution[], 100];
data2 = RandomVariate[NormalDistribution[], 100];

Wolfram Language code: ℋ1 = KolmogorovSmirnovTest[data1, Automatic, "TestData"]

Wolfram Language code: ℋ2 = KolmogorovSmirnovTest[data2, Automatic, "TestData"]

Wolfram Language code:

BarChart[{Labeled[ℋ1, "data1"], Labeled[ℋ2, "data2"]}, ChartLabels -> {"SubscriptBox[D, n]", "p‐value"}]

Report test conclusions using "ShortTestConclusion" and "TestConclusion":

Wolfram Language code: data = BlockRandom[SeedRandom[1];RandomVariate[ParetoDistribution[1.05, 2], 100]];

Wolfram Language code: ℋ = KolmogorovSmirnovTest[data, ParetoDistribution[1, 2], "HypothesisTestData"];

Wolfram Language code: ℋ["ShortTestConclusion"]

Wolfram Language code: ℋ["TestConclusion"]//TraditionalForm

The conclusion may differ at a different significance level:

Wolfram Language code: ℋ = KolmogorovSmirnovTest[data, ParetoDistribution[1, 2], "HypothesisTestData", SignificanceLevel -> .001];

Wolfram Language code: ℋ["ShortTestConclusion"]

Wolfram Language code: ℋ["TestConclusion"]//TraditionalForm

Options (4)

Method (3)

Use Monte Carlo-based methods for a computation formula:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 100];

Wolfram Language code: KolmogorovSmirnovTest[data, NormalDistribution[], Method -> "MonteCarlo"]

Wolfram Language code: KolmogorovSmirnovTest[data, NormalDistribution[], Method -> Automatic]

Set the number of samples to use for Monte Carlo-based methods:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 100];

Wolfram Language code:

pts = Table[{i, KolmogorovSmirnovTest[data, NormalDistribution[], Method -> {"MonteCarlo", "MonteCarloSamples" -> i}]}, {i, Range[5, 1000, 100]}];

The Monte Carlo estimate converges to the true -value with increasing samples:

Wolfram Language code: pval = KolmogorovSmirnovTest[data, NormalDistribution[]];

Wolfram Language code:

Show[ListLinePlot[pts, PlotRange -> {0, 1}, FrameLabel -> {"Samples", "P-Value"}, Frame -> True, AxesOrigin -> {0, 0}], Graphics[{Dashed, Line[{{0, pval}, {1000, pval}}]}]]

Set the random seed used in Monte Carlo-based methods:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 100];

Wolfram Language code:

pts = Table[{i, KolmogorovSmirnovTest[data, NormalDistribution[], Method -> {"MonteCarlo", "RandomSeed" -> i, "MonteCarloSamples" -> 50}]}, {i, Range[1, 10]}];

The seed affects the state of the generator and has some effect on the resulting -value:

Wolfram Language code: pval = KolmogorovSmirnovTest[data, NormalDistribution[]];

Wolfram Language code:

Show[ListLinePlot[pts, PlotRange -> {Min[pts[[All, 2]]], Max[pts[[All, 2]]]}, FrameLabel -> {"Seed", "P-Value"}, Frame -> True, AxesOrigin -> {0, 0}], Graphics[{Dashed, Line[{{0, pval}, {100, pval}}]}]]

SignificanceLevel (1)

Set the significance level used for "TestConclusion" and "ShortTestConclusion":

Wolfram Language code: data = BlockRandom[SeedRandom[1];RandomVariate[NormalDistribution[], 100]];

Wolfram Language code: KolmogorovSmirnovTest[data, NormalDistribution[0, 1.5], "ShortTestConclusion", SignificanceLevel -> .1]

Wolfram Language code: KolmogorovSmirnovTest[data, NormalDistribution[0, 1.5], "ShortTestConclusion", SignificanceLevel -> .01]

By default, is used:

Wolfram Language code: KolmogorovSmirnovTest[data, NormalDistribution[0, 1.5], "TestConclusion"]//TraditionalForm

Applications (2)

A power curve for the Kolmogorov–Smirnov test:

Wolfram Language code: data = Table[RandomVariate[UniformDistribution[{-4, 4}], {500, i}], {i, n = {5, 7, 10, 15, 20, 25, 30}}];

Wolfram Language code: ℋ = Table[KolmogorovSmirnovTest[data[[i, j]], NormalDistribution[]], {i, Length[data]}, {j, Length[data[[i]]]}];

Wolfram Language code: pC = Interpolation[Transpose[{n, Table[Probability[x ≤ 0.05, xi], {i, ℋ}]}], InterpolationOrder -> 1];

Visualize the approximate power curve:

Wolfram Language code: Plot[pC[x], {x, 5, 30}, PlotRange -> {0, 1}, Ticks -> {n, Automatic}, AxesOrigin -> {0, 0}]

Estimate the power of the Kolmogorov–Smirnov test when the underlying distribution is a UniformDistribution[{-4,4}], the test size is 0.05, and the sample size is 12:

Wolfram Language code: pC[12.]

A sample of 31 sheets of airplane glass were subjected to a constant stress until breakage. Investigate whether the data is drawn from a NormalDistribution or a GammaDistribution:

Wolfram Language code: ExampleData[{"Statistics", "AirplaneGlass"}, "Description"]

Wolfram Language code: data = ExampleData[{"Statistics", "AirplaneGlass"}];

Wolfram Language code:

ℋ1 = KolmogorovSmirnovTest[data, NormalDistribution[a, b], "HypothesisTestData"];
ℋ2 = KolmogorovSmirnovTest[data, GammaDistribution[a, b], "HypothesisTestData"];

Compare the quantile-quantile plots for the candidate distributions:

Wolfram Language code: Table[QuantilePlot[data, ℋ["FittedDistribution"]], {ℋ, {ℋ1, ℋ2}}]

The data appears to fit a GammaDistribution slightly better than a NormalDistribution:

Wolfram Language code: {ℋ1["TestDataTable"], ℋ2["TestDataTable"]}

Properties & Relations (9)

By default, univariate data is compared to a NormalDistribution:

Wolfram Language code: data = RandomVariate[NormalDistribution[2, 3], 10^4];

Wolfram Language code: ℋ = KolmogorovSmirnovTest[data, Automatic, "HypothesisTestData"];

Wolfram Language code: ℋ["TestDataTable"]

The parameters have been estimated from the data:

Wolfram Language code: ℋ["FittedDistribution"]

Multivariate data is compared to a MultinormalDistribution by default:

Wolfram Language code: data = RandomVariate[MultinormalDistribution[{1, 2, 3}, IdentityMatrix[3]], 1000];

Wolfram Language code: ℋ = KolmogorovSmirnovTest[data, Automatic, "HypothesisTestData"];

Wolfram Language code: ℋ["TestDataTable"]

Wolfram Language code: ℋ["FittedDistribution"]//TraditionalForm

The parameters of the test distribution are estimated from the data if not specified:

Wolfram Language code: data = RandomVariate[NormalDistribution[1, 2], 1000];

Wolfram Language code: KolmogorovSmirnovTest[data, NormalDistribution[μ, σ], "FittedDistribution"]

Specified parameters are not estimated:

Wolfram Language code: KolmogorovSmirnovTest[data, NormalDistribution[μ, 2], "FittedDistribution"]

Wolfram Language code: KolmogorovSmirnovTest[data, NormalDistribution[1, 2], "FittedDistribution"]

Maximum-likelihood estimates are used for unspecified parameters of the test distribution:

Wolfram Language code: data = RandomVariate[ExponentialDistribution[3], 10^3];

Wolfram Language code: ℋ = KolmogorovSmirnovTest[data, ExponentialDistribution[λ], "FittedDistribution"]

Wolfram Language code: KolmogorovSmirnovTest[data, ExponentialDistribution[λ]]

If the parameters are unknown, KolmogorovSmirnovTest applies a correction when possible:

Wolfram Language code: data = RandomVariate[NormalDistribution[3, 4], 10^4];

Wolfram Language code: est = EstimatedDistribution[data, NormalDistribution[μ, σ]]

The parameters are estimated but no correction is applied:

Wolfram Language code: KolmogorovSmirnovTest[data, est]

Wolfram Language code: ℋ = KolmogorovSmirnovTest[data, NormalDistribution[μ, σ], "HypothesisTestData"];

The fitted distribution is the same as before and the -value is corrected:

Wolfram Language code: ℋ["FittedDistribution"]

Wolfram Language code: ℋ["PValue"]

When parameters are estimated, Lilliefors' correction is used:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 10 ^ 4];

Wolfram Language code: ℋ = DistributionFitTest[data, NormalDistribution[μ, σ], {"TestDataTable", "KolmogorovSmirnov"}]

Estimate the parameters prior to testing to perform the classical Kolmogorov–Smirnov test:

Wolfram Language code: 𝒟 = EstimatedDistribution[data, NormalDistribution[μ, σ]];

Wolfram Language code: DistributionFitTest[data, 𝒟, {"TestDataTable", "KolmogorovSmirnov"}]

Conceptually, the Kolmogorov–Smirnov test computes the maximum absolute difference between the empirical and theoretical CDFs:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 5];

Wolfram Language code: ℋ = KolmogorovSmirnovTest[data, NormalDistribution[μ, σ], "HypothesisTestData"];

Wolfram Language code:

𝒟emp = EmpiricalDistribution[data];
𝒟 = ℋ["FittedDistribution"];

Wolfram Language code: AbsD = Abs[CDF[𝒟emp, data] - CDF[𝒟, data]];

Wolfram Language code: MaxD = data[[Ordering[AbsD, -1]]][[1]];

Plot the CDFs, showing the maximum absolute difference:

Wolfram Language code:

Show[Plot[{CDF[𝒟emp, x], CDF[𝒟, x]}, {x, -4, 4}, Exclusions -> None, Axes -> None, Frame -> True], Graphics[{Thick, Purple, Line[{{MaxD, CDF[𝒟emp, MaxD]}, {MaxD, CDF[𝒟, MaxD]}}]}]]

Independent marginal densities are assumed in tests for multivariate goodness of fit:

Wolfram Language code: data = RandomVariate[MultinormalDistribution[{0, 0}, {{0.118, 0.252}, {0.252, 0.665}}], 100];

Wolfram Language code: KolmogorovSmirnovTest[data, MultinormalDistribution[{0, 0}, {{0.118, 0.252}, {0.252, 0.665}}], "TestStatistic"]

The test statistic is identical when independence is assumed:

Wolfram Language code: KolmogorovSmirnovTest[data, MultinormalDistribution[{0, 0}, {{0.118, 0}, {0, 0.665}}], "TestStatistic"]

The Kolmogorov–Smirnov test works with the values only when the input is a TimeSeries:

Wolfram Language code:

ts = TemporalData[TimeSeries, {{{1.224578634529677, 0.47929635789978015, 0.6572781300178168, 
    0.21496048742669355, 0.7299608014554928, -0.2495111111278263, -1.3286551762002712, 
    0.552725018274874, 0.19272112205837066, 1.1809144012420882, -1.1671 ... 40938613662046, 1.052394590214582, 0.9345044123980388, 0.38537803109557855, 
    -0.48660931166089394, -0.71203560340161}}, {{0, 100, 1}}, 1, {"Continuous", 1}, 
  {"Discrete", 1}, 1, {ValueDimensions -> 1, ResamplingMethod -> None}}, False, 10.1];

Wolfram Language code: KolmogorovSmirnovTest[ts]

Wolfram Language code: KolmogorovSmirnovTest[ts["Values"]]

Possible Issues (3)

The Kolmogorov–Smirnov test is not intended for discrete distributions:

Wolfram Language code: data = RandomVariate[PoissonDistribution[30], 35];

Wolfram Language code: KolmogorovSmirnovTest[data, PoissonDistribution[30]]

The test tends to be conservative:

Wolfram Language code: sim = RandomVariate[PoissonDistribution[30], {500, 35}];

Wolfram Language code: p = Quiet[KolmogorovSmirnovTest[#, PoissonDistribution[30]]]& /@ sim;

Wolfram Language code:

Show[ListLinePlot[Table[{α, Probability[pv ≤ α, pvp]}, {α, .01, 1, .01}]], Plot[x, {x, 0, 1}, PlotStyle -> {Green, Dashed}]]

Use Monte Carlo methods or PearsonChiSquareTest in these cases:

Wolfram Language code: KolmogorovSmirnovTest[data, PoissonDistribution[30], Method -> "MonteCarlo"]

Wolfram Language code: PearsonChiSquareTest[data, PoissonDistribution[30]]

The Kolmogorov–Smirnov test is not valid for some distributions when parameters have been estimated from the data:

Wolfram Language code: data = RandomVariate[BetaDistribution[1, 2], 100];

Wolfram Language code: KolmogorovSmirnovTest[data, BetaDistribution[1, b]]

Provide parameter values if they are known:

Wolfram Language code: KolmogorovSmirnovTest[data, BetaDistribution[1, 2]]

Alternatively, use Monte Carlo methods to approximate the -value:

Wolfram Language code: KolmogorovSmirnovTest[data, BetaDistribution[1, b], Method -> "MonteCarlo"]

Ties in the data are ignored:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 1000];

Wolfram Language code: KolmogorovSmirnovTest[Join[data, {First[data]}]]

Wolfram Language code: PearsonChiSquareTest[Join[data, {First[data]}]]

Differences may be more apparent with larger numbers of ties:

Wolfram Language code: KolmogorovSmirnovTest[Join[data, data]]

Wolfram Language code: PearsonChiSquareTest[Join[data, data]]

Neat Examples (1)

Compute the statistic when the null hypothesis is true:

Wolfram Language code: data = RandomVariate[NormalDistribution[], {2500, 100}];

Wolfram Language code: T1 = KolmogorovSmirnovTest[#, NormalDistribution[], "TestStatistic"]& /@ data;

The test statistic given a particular alternative:

Wolfram Language code: T2 = KolmogorovSmirnovTest[#, NormalDistribution[1, 2], "TestStatistic"]& /@ data;

Compare the distributions of the test statistics:

Wolfram Language code:

SmoothHistogram[{T1, T2}, Filling -> Axis, PlotLegends -> {"SubscriptBox[H, 0] is True", "SubscriptBox[H, 0] is False"}]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

KolmogorovSmirnovTest

Details and Options

Examples

Basic Examples (3)

Scope (9)

Testing (6)

Reporting (3)

Options (4)

Method (3)

SignificanceLevel (1)

Applications (2)

Properties & Relations (9)

Possible Issues (3)

Neat Examples (1)

Text

CMS

APA

BibTeX

BibLaTeX

	Method	Automatic	the method to use for computing -values
	SignificanceLevel	0.05	cutoff for diagnostics and reporting

KolmogorovSmirnovTest

Details and Options

Examples

Basic Examples (3)

Scope (9)

Testing (6)

Reporting (3)

Options (4)

Method (3)

SignificanceLevel (1)

Applications (2)

Properties & Relations (9)

Possible Issues (3)

Neat Examples (1)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX