Wolfram Language & System Documentation Center

PearsonChiSquareTest

PearsonChiSquareTest[data]

tests whether data is normally distributed using the Pearson test.

PearsonChiSquareTest[data,dist]

tests whether data is distributed according to dist using the Pearson test.

PearsonChiSquareTest[data,dist,"property"]

returns the value of "property".

Details and Options

PearsonChiSquareTest performs the Pearson goodness-of-fit test with null hypothesis that data was drawn from a population with distribution dist, and alternative hypothesis that it was not.
By default, a probability value or -value is returned.
A small -value suggests that it is unlikely that the data came from dist.
The dist can be any symbolic distribution with numeric and symbolic parameters or a dataset.
The data can be univariate {x₁,x₂,…} or multivariate {{x₁,y₁,…},{x₂,y₂,…},…}.
The Pearson test effectively compares a histogram of data to a theoretical histogram based on dist. The bins are chosen to have equal probability in dist. »
For univariate data, the test statistic is given by , where and are the observed and expected counts for the histogram bin, respectively.
For multivariate tests, the sum of the univariate marginal -values is used and is assumed to follow a UniformSumDistribution under .
PearsonChiSquareTest[data,dist,"HypothesisTestData"] returns a HypothesisTestData object htd that can be used to extract additional test results and properties using the form htd["property"].
PearsonChiSquareTest[data,dist,"property"] can be used to directly give the value of "property".
Properties related to the reporting of test results include:

	"DegreesOfFreedom"	the degrees of freedom used in a test
	"PValue"	-value
	"PValueTable"	formatted version of "PValue"
	"ShortTestConclusion"	a short description of the conclusion of a test
	"TestConclusion"	a description of the conclusion of a test
	"TestData"	test statistic and -value
	"TestDataTable"	formatted version of "TestData"
	"TestStatistic"	test statistic
	"TestStatisticTable"	formatted "TestStatistic"

The following properties are independent of which test is being performed.
Properties related to the data distribution include:
"FittedDistribution" fitted distribution of data

"FittedDistributionParameters" distribution parameters of data
The following options can be given:
Method Automatic the method to use for computing -values

SignificanceLevel 0.05 cutoff for diagnostics and reporting
For a test for goodness of fit, a cutoff is chosen such that is rejected only if . The value of used for the "TestConclusion" and "ShortTestConclusion" properties is controlled by the SignificanceLevel option. By default, is set to 0.05.
With the setting Method->"MonteCarlo", datasets of the same length as the input are generated under using the fitted distribution. The EmpiricalDistribution from PearsonChiSquareTest[s_i,dist,"TestStatistic"] is then used to estimate the -value.

Examples

open all close all

Basic Examples (4)

Perform the Pearson test for normality:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 10^4];

Wolfram Language code: PearsonChiSquareTest[data]

Test the fit of some data to a particular distribution:

Wolfram Language code: data = RandomVariate[LaplaceDistribution[1, 2], 10^3];

Wolfram Language code: PearsonChiSquareTest[data, LaplaceDistribution[1, 2]]

Compare the distributions of two datasets:

Wolfram Language code: data1 = RandomVariate[NormalDistribution[], 100];

Wolfram Language code: data2 = RandomVariate[NormalDistribution[], 150];

Wolfram Language code: PearsonChiSquareTest[data1, data2]

Extract the test statistic from the Pearson test:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 10^3];

Wolfram Language code: PearsonChiSquareTest[data, NormalDistribution[], "TestStatistic"]

Scope (9)

Testing (6)

Perform a Pearson test for normality:

Wolfram Language code:

data1 = RandomVariate[NormalDistribution[], 10^4];
data2 = RandomVariate[StudentTDistribution[3], 10^4];

The -value for the normal data is large compared to the -value for the non-normal data:

Wolfram Language code: PearsonChiSquareTest[data1]

Wolfram Language code: PearsonChiSquareTest[data2]

Test the goodness of fit to a particular distribution:

Wolfram Language code:

data1 = RandomVariate[NormalDistribution[], 10^3];
data2 = RandomVariate[CauchyDistribution[0, 1], 10^3];

Wolfram Language code: PearsonChiSquareTest[data1, CauchyDistribution[0, 1]]

Wolfram Language code: PearsonChiSquareTest[data2, CauchyDistribution[0, 1]]

Compare the distributions of two datasets:

Wolfram Language code:

data1 = RandomVariate[NormalDistribution[], 10^3];
data2 = RandomVariate[NormalDistribution[], 10^3];

Wolfram Language code: PearsonChiSquareTest[data1, data2]

The two datasets do not have the same distribution:

Wolfram Language code: data3 = RandomVariate[NormalDistribution[0, 1.25], 10^3];

Wolfram Language code: PearsonChiSquareTest[data1, data3]

Test for multivariate normality:

Wolfram Language code:

data1 = RandomVariate[BinormalDistribution[.5], 10^3];
data2 = RandomVariate[LaplaceDistribution[1, 2], {10^3, 2}];

Wolfram Language code: PearsonChiSquareTest[data1]

Wolfram Language code: PearsonChiSquareTest[data2]

Test for goodness of fit to any multivariate distribution:

Wolfram Language code:

data1 = RandomVariate[BinormalDistribution[.5], 10^3];
data2 = RandomVariate[𝒹 = LaplaceDistribution[1, 2], {10^3, 2}];

Wolfram Language code: 𝒟 = ProductDistribution[𝒹, 𝒹];

Wolfram Language code: PearsonChiSquareTest[data1, 𝒟]

Wolfram Language code: PearsonChiSquareTest[data2, 𝒟]

Create a HypothesisTestData object for repeated property extraction:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 10^5];

Wolfram Language code: ℋ = PearsonChiSquareTest[data, Automatic, "HypothesisTestData"]

The properties available for extraction:

Wolfram Language code: ℋ["Properties"]

Reporting (3)

Tabulate the results of the Pearson test:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 100];

Wolfram Language code: ℋ = PearsonChiSquareTest[data, Automatic, "HypothesisTestData"];

The full test table:

Wolfram Language code: ℋ["TestDataTable"]

A -value table:

Wolfram Language code: ℋ["PValueTable"]

The test statistic:

Wolfram Language code: ℋ["TestStatisticTable"]

Retrieve the entries from a Pearson test table for custom reporting:

Wolfram Language code:

data1 = RandomVariate[NormalDistribution[], 100];
data2 = RandomVariate[NormalDistribution[], 100];

Wolfram Language code: ℋ1 = PearsonChiSquareTest[data1, Automatic, "TestStatistic"]

Wolfram Language code: ℋ2 = PearsonChiSquareTest[data2, Automatic, "TestStatistic"]

Wolfram Language code: BarChart[{Labeled[ℋ1, "Set 1"], Labeled[ℋ2, "Set 2"]}]

Report test conclusions using "ShortTestConclusion" and "TestConclusion":

Wolfram Language code: data = BlockRandom[SeedRandom[1];RandomVariate[ParetoDistribution[1.05, 2], 100]];

Wolfram Language code: ℋ = PearsonChiSquareTest[data, ParetoDistribution[1, 2], "HypothesisTestData"];

Wolfram Language code: ℋ["ShortTestConclusion"]

Wolfram Language code: ℋ["TestConclusion"]//TraditionalForm

The conclusion may differ at a different significance level:

Wolfram Language code: ℋ = PearsonChiSquareTest[data, ParetoDistribution[1, 2], "HypothesisTestData", SignificanceLevel -> .001];

Wolfram Language code: ℋ["ShortTestConclusion"]

Wolfram Language code: ℋ["TestConclusion"]//TraditionalForm

Options (3)

Method (3)

Use Monte Carlo-based methods or a computation formula:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 100];

Wolfram Language code: PearsonChiSquareTest[data, NormalDistribution[], Method -> "MonteCarlo"]

Wolfram Language code: PearsonChiSquareTest[data, NormalDistribution[], Method -> Automatic]

Set the number of samples to use for Monte Carlo-based methods:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 100];

Wolfram Language code:

pts = Table[{i, PearsonChiSquareTest[data, NormalDistribution[], Method -> {"MonteCarlo", "MonteCarloSamples" -> i}]}, {i, Range[5, 100, 5]}];

The Monte Carlo estimate converges to the true -value with increasing samples:

Wolfram Language code: pval = PearsonChiSquareTest[data, NormalDistribution[]];

Wolfram Language code:

Show[ListLinePlot[pts, PlotRange -> {0, 1}, FrameLabel -> {"Samples", "P-Value"}, Frame -> True, AxesOrigin -> {0, 0}], Graphics[{Dashed, Line[{{0, pval}, {100, pval}}]}]]

Set the random seed used in Monte Carlo-based methods:

Wolfram Language code: data = RandomVariate[NormalDistribution[], 100];

Wolfram Language code:

pts = Table[{i, PearsonChiSquareTest[data, NormalDistribution[], Method -> {"MonteCarlo", "RandomSeed" -> i, "MonteCarloSamples" -> 50}]}, {i, Range[1, 10]}];

The seed affects the state of the generator and has some effect on the resulting -value:

Wolfram Language code: pval = PearsonChiSquareTest[data, NormalDistribution[]];

Wolfram Language code:

Show[ListLinePlot[pts, PlotRange -> {Min[pts[[All, 2]]], Max[pts[[All, 2]]]}, FrameLabel -> {"Seed", "P-Value"}, Frame -> True, AxesOrigin -> {0, 0}], Graphics[{Dashed, Line[{{0, pval}, {10, pval}}]}]]

Applications (2)

A power curve for the Pearson test:

Wolfram Language code: data = Table[RandomVariate[UniformDistribution[{-4, 4}], {500, i}], {i, n = {5, 7, 10, 15, 20, 25, 30}}];

Wolfram Language code: ℋ = Table[PearsonChiSquareTest[data[[i, j]], NormalDistribution[]], {i, Length[data]}, {j, Length[data[[i]]]}];

Wolfram Language code: pC = Interpolation[Transpose[{n, Table[Probability[x ≤ 0.05, xi], {i, ℋ}]}], InterpolationOrder -> 1];

Visualize the approximate power curve:

Wolfram Language code: Plot[pC[x], {x, 5, 30}, PlotRange -> {0, 1}, Ticks -> {n, Automatic}, AxesOrigin -> {0, 0}]

Estimate the power of the Pearson test when the underlying distribution is UniformDistribution[{-4,4}], the test size is 0.05, and the sample size is 12:

Wolfram Language code: pC[12.]

The number of auto accidents was recorded for a city over the course of 30 days. The city council is planning on lowering speed limits in the city and wants a model of the accident rate as a baseline for later comparison:

Wolfram Language code:

auto30 = {98, 90, 111, 91, 107, 103, 109, 122, 95, 112, 114, 97, 101, 118, 96, 102, 101, 107, 116, 97, 89, 108, 96, 105, 108, 114, 91, 98, 87, 87};

Wolfram Language code: μ1 = Mean[auto30]//N

Count data is often modeled well by PoissonDistribution:

Wolfram Language code: PearsonChiSquareTest[auto30, PoissonDistribution[μ1], "TestDataTable"]

Suppose the city collected data over another 30-day period after reducing the speed limit. Compare the distributions before and after the reduction:

Wolfram Language code:

newData = {83, 84, 85, 80, 76, 91, 96, 101, 93, 84, 75, 89, 94, 91, 96, 84, 74, 102, 86, 97, 84, 80, 89, 92, 84, 98, 84, 93, 92, 81};

Wolfram Language code: μ2 = Mean[newData]//N

Wolfram Language code: Histogram[{auto30, newData}, ChartStyle -> {Red, Blue}]

The distributions are significantly different:

Wolfram Language code: PearsonChiSquareTest[auto30, newData, "TestDataTable"]

Properties & Relations (10)

By default, univariate data is compared to NormalDistribution:

Wolfram Language code: data = RandomVariate[NormalDistribution[2, 3], 10^4];

Wolfram Language code: ℋ = PearsonChiSquareTest[data, Automatic, "HypothesisTestData"];

Wolfram Language code: ℋ["TestDataTable"]

The parameters have been estimated from the data:

Wolfram Language code: ℋ["FittedDistribution"]

Multivariate data is compared to MultinormalDistribution by default:

Wolfram Language code: data = RandomVariate[MultinormalDistribution[{1, 2, 3}, IdentityMatrix[3]], 1000];

Wolfram Language code: ℋ = PearsonChiSquareTest[data, Automatic, "HypothesisTestData"];

Wolfram Language code: ℋ["TestDataTable"]

Wolfram Language code: ℋ["FittedDistribution"]//TraditionalForm

The parameters of the test distribution are estimated from the data if not specified:

Wolfram Language code: data = RandomVariate[NormalDistribution[1, 2], 1000];

Wolfram Language code: PearsonChiSquareTest[data, NormalDistribution[μ, σ], "FittedDistribution"]

Specified parameters are not estimated:

Wolfram Language code: PearsonChiSquareTest[data, NormalDistribution[μ, 2], "FittedDistribution"]

Wolfram Language code: PearsonChiSquareTest[data, NormalDistribution[1, 2], "FittedDistribution"]

Maximum likelihood estimates are used for unspecified parameters of the test distribution:

Wolfram Language code: data = RandomVariate[ExponentialDistribution[3], 10^3];

Wolfram Language code: ℋ = PearsonChiSquareTest[data, ExponentialDistribution[λ], "FittedDistribution"]

Wolfram Language code: PearsonChiSquareTest[data, ExponentialDistribution[λ]]

PearsonChiSquareTest effectively compares the observed and expected histograms:

Wolfram Language code: n = 10^4;

Wolfram Language code: data = RandomVariate[NormalDistribution[3, 4], n];

The data is binned into approximately bins that are equiprobable under :

Wolfram Language code: nbins = 2n^2 / 5//Ceiling

Wolfram Language code: bDelim = Quantile[NormalDistribution[3, 4], Range[0., 1, 1 / nbins]];

Under , each bin will contain an equal number of points:

Wolfram Language code: Histogram[Quantile[NormalDistribution[3, 4], Range[0., 1, 1 / 10^4]], {Most@Rest@bDelim}, PlotLabel -> "Expected"]

Observed histograms for when is true and false, respectively:

Wolfram Language code:

{Histogram[data, {Most@Rest@bDelim}, PlotLabel -> "SubscriptBox[H, 0] -> True"], Histogram[RandomVariate[CauchyDistribution[0, 1], 10^4], {Most@Rest@bDelim}, PlotLabel -> "SubscriptBox[H, 0] -> False"]}

The degrees of freedom are equal to the number of non-empty bins minus one:

Wolfram Language code: n = 10^4;

Wolfram Language code: data = RandomVariate[NormalDistribution[3, 4], n];

Wolfram Language code: nbins = 2n^2 / 5//Ceiling

Wolfram Language code: df = nbins - 1

Wolfram Language code: PearsonChiSquareTest[data, NormalDistribution[3, 4], "DegreesOfFreedom"]

One degree of freedom is removed for each parameter that is estimated from the data:

Wolfram Language code: PearsonChiSquareTest[data, NormalDistribution[a, 4], "DegreesOfFreedom"]

Wolfram Language code: PearsonChiSquareTest[data, NormalDistribution[a, b], "DegreesOfFreedom"]

If the parameters are unknown, PearsonChiSquareTest corrects the degrees of freedom:

Wolfram Language code: data = RandomVariate[NormalDistribution[3, 4], 10^4];

Wolfram Language code: est = EstimatedDistribution[data, NormalDistribution[μ, σ]]

No correction is applied when the parameters are specified:

Wolfram Language code: PearsonChiSquareTest[data, est, {"PValue", "DegreesOfFreedom"}]

Wolfram Language code: ℋ = PearsonChiSquareTest[data, NormalDistribution[μ, σ], "HypothesisTestData"];

The fitted distribution is equivalent, but the degrees of freedom and -value are corrected:

Wolfram Language code: ℋ["FittedDistribution"]

Wolfram Language code: ℋ["PValue", "DegreesOfFreedom"]

The Pearson statistic asymptotically follows ChiSquareDistribution under :

Wolfram Language code: data = RandomVariate[NormalDistribution[], {250, 1000}];

Wolfram Language code: t = Table[PearsonChiSquareTest[i, NormalDistribution[], "TestStatistic"], {i, data}];

Wolfram Language code: edist = EstimatedDistribution[t, ChiSquareDistribution[ν]];

Wolfram Language code: Show[Histogram[t, Automatic, "ProbabilityDensity"], Plot[PDF[edist, x], {x, 0, 60}, PlotRange -> All]]

Wolfram Language code: DistributionFitTest[t, ChiSquareDistribution[ν], "TestDataTable"]

Wolfram Language code: DistributionFitTest[t, ChiSquareDistribution[ν], "ShortTestConclusion"]

Independent marginal densities are assumed in tests for multivariate goodness of fit:

Wolfram Language code: data = RandomVariate[MultinormalDistribution[{0, 0}, {{0.118, 0.252}, {0.252, 0.665}}], 100];

Wolfram Language code: PearsonChiSquareTest[data, MultinormalDistribution[{0, 0}, {{0.118, 0.252}, {0.252, 0.665}}], "TestStatistic"]

The test statistic is identical when independence is assumed:

Wolfram Language code: PearsonChiSquareTest[data, MultinormalDistribution[{0, 0}, {{0.118, 0}, {0, 0.665}}], "TestStatistic"]

The Pearson test works with the values only when the input is a TimeSeries:

Wolfram Language code:

ts = TemporalData[TimeSeries, {{{1.224578634529677, 0.47929635789978015, 0.6572781300178168, 
    0.21496048742669355, 0.7299608014554928, -0.2495111111278263, -1.3286551762002712, 
    0.552725018274874, 0.19272112205837066, 1.1809144012420882, -1.1671 ... 40938613662046, 1.052394590214582, 0.9345044123980388, 0.38537803109557855, 
    -0.48660931166089394, -0.71203560340161}}, {{0, 100, 1}}, 1, {"Continuous", 1}, 
  {"Discrete", 1}, 1, {ValueDimensions -> 1, ResamplingMethod -> None}}, False, 10.1];

Wolfram Language code: PearsonChiSquareTest[ts]

Wolfram Language code: PearsonChiSquareTest[ts["Values"]]

Neat Examples (1)

Compute the statistic when the null hypothesis is true:

Wolfram Language code: data = RandomVariate[NormalDistribution[], {2500, 100}];

Wolfram Language code: T1 = PearsonChiSquareTest[#, NormalDistribution[], "TestStatistic"]& /@ data;

The test statistic given a particular alternative:

Wolfram Language code: T2 = PearsonChiSquareTest[#, NormalDistribution[1, 2], "TestStatistic"]& /@ data;

Compare the distributions of the test statistics:

Wolfram Language code:

SmoothHistogram[{T1, T2}, Filling -> Axis, PlotLegends -> {"SubscriptBox[H, 0] is True", "SubscriptBox[H, 0] is False"}, PlotStyle -> Thick]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

PearsonChiSquareTest

Details and Options

Examples

Basic Examples (4)

Scope (9)

Testing (6)

Reporting (3)

Options (3)

Method (3)

Applications (2)

Properties & Relations (10)

Neat Examples (1)

Text

CMS

APA

BibTeX

BibLaTeX

	Method	Automatic	the method to use for computing -values
	SignificanceLevel	0.05	cutoff for diagnostics and reporting

PearsonChiSquareTest

Details and Options

Examples

Basic Examples (4)

Scope (9)

Testing (6)

Reporting (3)

Options (3)

Method (3)

Applications (2)

Properties & Relations (10)

Neat Examples (1)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX