The data must be a list of possible outcomes from a univariate distribution.
FindDistribution[data,n,All] creates a Dataset object with all possible properties.
Properties supported include:

	"BIC"	Bayesian information criterion
	"AIC"	Akaike information criterion
	"HQIC"	Hannan–Quinn information criterion
	"Score"	internal score
	"Complexity"	complexity of the distribution
	"LogLikelihood"	LogLikelihood value
	"PearsonChiSquare"	PearsonChiSquareTest p-value
	"CramerVonMises"	CramerVonMisesTest p-value
	All	all the previous properties

The following options can be given:

MaxItems	Infinity	maximum number of distributions in a mixture distribution
PerformanceGoal	Automatic	aspect of performance to optimize
RandomSeeding	Automatic	what seeding of pseudorandom generators should be done internally
TargetFunctions	Automatic	types of distributions to consider

Possible settings for PerformanceGoal include:
"Speed" minimize the time spent to find distributions

"Quality" try to find better distributions
Possible settings for TargetFunctions include:

	Automatic	automatically chosen distributions
	All	all built-in distributions
	"Continuous"	all continuous distributions
	"Discrete"	all discrete distributions
	{dist₁,,}	distributions dist_i
	{ ${w_(1),w_(2),...}$ {dist₁,,}}	distributions dist_i using weights w_i

Possible settings for RandomSeeding include:

	Automatic	automatically reseed every time the function is called
	Inherited	use externally seeded random numbers
	seed	use an explicit integer or strings as a seed

Possible continuous distributions for TargetFunctions are: BetaDistribution, CauchyDistribution, ChiDistribution, ChiSquareDistribution, ExponentialDistribution, ExtremeValueDistribution, FrechetDistribution, GammaDistribution, GumbelDistribution, HalfNormalDistribution, InverseGaussianDistribution, LaplaceDistribution, LevyDistribution, LogisticDistribution, LogNormalDistribution, MaxwellDistribution, NormalDistribution, ParetoDistribution, RayleighDistribution, StudentTDistribution, UniformDistribution, WeibullDistribution, HistogramDistribution.
Possible discrete distributions for TargetFunctions are: BenfordDistribution, BinomialDistribution, BorelTannerDistribution, DiscreteUniformDistribution, GeometricDistribution, LogSeriesDistribution, NegativeBinomialDistribution, PascalDistribution, PoissonDistribution, WaringYuleDistribution, ZipfDistribution, HistogramDistribution, EmpiricalDistribution.
The internal information criterion uses a Bayesian information criterion together with priors over TargetFunctions.

Examples

open all close all

Basic Examples (2)

Create a list of uniformly distributed random integers:

Wolfram Language code: RandomInteger[10, 100]

Find the underlying distribution from the data:

Wolfram Language code: FindDistribution[%]

Generate data sampled from an exponential distribution:

Wolfram Language code:

𝒟 = ExponentialDistribution[1];
data = RandomVariate[𝒟, 1000];

Find the best distribution from the data:

Wolfram Language code: estimated𝒟 = FindDistribution[data]

Compare the PDFs for the original and estimated distributions:

Wolfram Language code: Plot[{PDF[𝒟, x], PDF[estimated𝒟, x]}, {x, 0, 10}, PlotLegends -> {"𝒟", "e𝒟"}]

Return the best three distributions:

Wolfram Language code: FindDistribution[data, 3]

Compare their Bayesian information criterion and Akaike information criterion values:

Wolfram Language code: FindDistribution[data, 3, {"BIC", "AIC"}]

Scope (3)

Generate data sampled from a mixture distribution:

Wolfram Language code: 𝒟 = MixtureDistribution[{1, 1}, {ExponentialDistribution[1], NormalDistribution[5, 0.8]}];

Wolfram Language code: data = RandomVariate[𝒟, 1000];

Estimate the best distribution from this data:

Wolfram Language code: e𝒟 = FindDistribution[data]

Compare the PDFs for the original and estimated distributions:

Wolfram Language code: Plot[{PDF[𝒟, x], PDF[e𝒟, x]}, {x, -8, 10}, PlotLegends -> {"𝒟", "e𝒟"}, PlotRange -> All]

Estimate parameters for a particular distribution:

Wolfram Language code: 𝒟 = WeibullDistribution[1, 2];

Wolfram Language code: data = RandomVariate[𝒟, 1000];

By default, FindDistribution returns a simpler distribution:

Wolfram Language code: e𝒟 = FindDistribution[data]

Specify the type of distribution to look for:

Wolfram Language code: e𝒟 = FindDistribution[data, TargetFunctions -> {WeibullDistribution}]

Generate data sampled from an exponential distribution:

Wolfram Language code:

𝒟 = ExponentialDistribution[1];
data = RandomVariate[𝒟, 1000];

Generate a Dataset object containing all properties for the top 2 distributions:

Wolfram Language code: report = FindDistribution[data, 2, All]

Options (5)

TargetFunctions (3)

Generate data samples from a mixture distribution:

Wolfram Language code: 𝒟 = MixtureDistribution[{1, 1}, {ExponentialDistribution[1], NormalDistribution[5, 0.8]}];

Wolfram Language code: data = RandomVariate[𝒟, 1000];

Estimate parameters for specific distributions:

Wolfram Language code: e𝒟 = FindDistribution[data, TargetFunctions -> {NormalDistribution, GammaDistribution}]

Compare the PDFs for the original and estimated distributions:

Wolfram Language code: Plot[{PDF[𝒟, x], PDF[e𝒟, x]}, {x, -8, 10}, PlotLegends -> {"𝒟", "e𝒟"}, PlotRange -> All]

Time between geyser eruptions:

Wolfram Language code: waiting = ExampleData[{"Statistics", "OldFaithful"}][[All, 2]];

Estimate the distribution of the data:

Wolfram Language code: e𝒟1 = FindDistribution[waiting]

Estimate the distribution of the data when treated as continuous:

Wolfram Language code: e𝒟2 = FindDistribution[waiting, TargetFunctions -> "Continuous"]

Estimate the distribution of the data when treated as continuous using GammaDistribution:

Wolfram Language code: e𝒟3 = FindDistribution[waiting, TargetFunctions -> {GammaDistribution}]

Compare the histogram of the data to the PDF of the estimated distributions:

Wolfram Language code: legend = SwatchLegend[{Red, ColorData[97, 1], ColorData[97, 2]}, {"e𝒟1", "e𝒟2", "e𝒟3"}];

Wolfram Language code:

Show[Histogram[waiting, 20, "ProbabilityDensity"], 
	DiscretePlot[{PDF[e𝒟1, x]}, {x, 0, 100}, PlotStyle -> {PointSize[.02], Red}], 
	Plot[{PDF[e𝒟2, x], PDF[e𝒟3, x]}, {x, 0, 100}, PlotLegends -> legend]]

Estimate parameters for specific distributions, assuming priors over them:

Wolfram Language code: magnitudes = Select[ExampleData[{"Statistics", "USEarthquakes"}], #[[1]] ≥ 1935&][[All, 7]];

The magnitudes of earthquakes in the United States in the years 1935–1989 have two modes:

Wolfram Language code: h = Histogram[magnitudes, 20, "ProbabilityDensity"]

Estimate the best fit without using TargetFunctions:

Wolfram Language code: Subscript[``e𝒟``, 1] = FindDistribution[magnitudes]

Estimate the best fit using priors over distributions:

Wolfram Language code:

Subscript[``e𝒟``, 2 ] = FindDistribution[magnitudes, TargetFunctions -> {{10, 2} -> {CauchyDistribution, GammaDistribution}}]

Compare the histogram to the PDFs of the estimated distributions:

Wolfram Language code:

Show[h, Plot[{PDF[Subscript[``e𝒟``, 1], x], PDF[Subscript[``e𝒟``, 2 ], x]}, {x, 0, 10}, PlotStyle -> Thick, PlotRange -> All]]

PerformanceGoal (1)

Generate data samples from a mixture distribution:

Wolfram Language code: 𝒟 = MixtureDistribution[{1, 2}, {ChiDistribution[0.6], GammaDistribution[20, 1]}]

Wolfram Language code: data = RandomVariate[𝒟, 10000];

Estimate the best fit for a big dataset and compare the AbsoluteTiming for different settings of PerformanceGoal:

Wolfram Language code: AbsoluteTiming[e𝒟1 = FindDistribution[data, PerformanceGoal -> "Speed"]]

Wolfram Language code: AbsoluteTiming[e𝒟2 = FindDistribution[data, PerformanceGoal -> "Quality"]]

Compare the LogLikelihood of the solutions:

Wolfram Language code: LogLikelihood[#, data]& /@ {e𝒟1, e𝒟2}

RandomSeeding (1)

Generate data samples from a mixture distribution:

Wolfram Language code: 𝒟 = MixtureDistribution[{1, 2}, {NormalDistribution[-6, 1], GammaDistribution[20, 1]}]

Wolfram Language code: data = RandomVariate[𝒟, 1000];

Compare different rounds of FindDistribution and notice how they differ:

Wolfram Language code: Table[FindDistribution[data], 3]

Use the option RandomSeeding to avoid having different results:

Wolfram Language code: Table[FindDistribution[data, RandomSeeding -> 1], 3]

Applications (5)

Lengths of Words Beginning with a Particular Letter (1)

Lengths of all English words in a dictionary that begin with different vowels:

Wolfram Language code: letters = {"a", "e", "i", "o", "u", "y"};

Wolfram Language code: worddata = Table[StringLength /@ DictionaryLookup[l ~~ ___], {l, letters}];

Estimate the distribution for different vowels:

Wolfram Language code: e𝒟 = Table[FindDistribution[i, MaxItems -> 1], {i, worddata}]

Compare the histograms of the original data to the PDFs of the estimated distributions:

Wolfram Language code:

Partition[Table[Show[Histogram[worddata[[i]], {Range[25] - 1 / 2}, "ProbabilityDensity", PlotLabel -> letters[[i]]], DiscretePlot[PDF[e𝒟[[i]], x], {x, 0, 25}, PlotRange -> All, PlotStyle -> PointSize[.025]], ImageSize -> Small], {i, Length[letters]}], 3]//Grid

Text Frequency (1)

Count the number of occurrences of words in the Declaration of Independence:

Wolfram Language code: text = ExampleData[{"Text", "DeclarationOfIndependence"}, "Words"];

Wolfram Language code: wordCount = Tally[text][[All, 2]];

Estimate the distribution of the word count:

Wolfram Language code: e𝒟 = FindDistribution[wordCount, MaxItems -> 1]

Compare the histograms of the original data to the PDF of the estimated distribution:

Wolfram Language code:

Show[Histogram[wordCount, {0.5, 9.5, 1}, "ProbabilityDensity"], DiscretePlot[PDF[e𝒟, x], {x, 1, 10}, PlotStyle -> PointSize[Medium], PlotRange -> All]]

Melanoma in Denmark (1)

Age of patients affected by melanoma:

Wolfram Language code: melanomaAge = ExampleData[{"Statistics", "DenmarkMelanoma"}][[All, 4]];

Estimate the distribution of the data:

Wolfram Language code: e𝒟 = FindDistribution[melanomaAge]

Compare the histogram of the data to the PDF of the estimated distribution:

Wolfram Language code:

Show[Histogram[melanomaAge, {4, 95, 4}, "ProbabilityDensity"], Plot[PDF[e𝒟, x], {x, 4, 95}, PlotStyle -> Thick, PlotRange -> All]]

Infection Time for AIDS (1)

Infection time for AIDS in years:

Wolfram Language code: aids = ExampleData[{"Statistics", "TimeToAIDS"}][[All, 1]];

Estimate the distribution of the data:

Wolfram Language code: e𝒟 = FindDistribution[aids]

Compare the histogram of the data to the PDF of the estimated distribution:

Wolfram Language code:

Show[Histogram[aids, {0, 8, 0.65}, "ProbabilityDensity"], Plot[PDF[e𝒟, x], {x, 0, 8}, PlotStyle -> Thick, PlotRange -> All]]

Time to Kidney Infection after Catheter Replacement (1)

Time to kidney infection in months:

Wolfram Language code: KidneyInfection = ExampleData[{"Statistics", "KidneyInfection"}][[All, 1]];

Estimate the distribution of the data:

Wolfram Language code: e𝒟 = FindDistribution[KidneyInfection]

Compare the histogram of the data to the PDF of the estimated distribution:

Wolfram Language code:

Show[Histogram[KidneyInfection, {0, 28, 2}, "ProbabilityDensity"], Plot[PDF[e𝒟, x], {x, 0, 28}, PlotStyle -> Thick, PlotRange -> All]]

Properties & Relations (1)

By default, FindDistributionParameters uses maximum likelihood to estimate distribution parameters for a fixed distribution. FindDistribution uses a full Bayesian approach by combining the Bayesian information criterion with priors over distributions to select both the best distribution and the best parameters for it.

Generate data sampled from a StudentTDistribution:

Wolfram Language code:

SeedRandom[5]
data = RandomVariate[StudentTDistribution[1, 1, 4], 800];
Histogram[data, {-10, 10, Automatic}, "ProbabilityDensity"]

Use FindDistribution to estimate the best distribution that fits the data:

Wolfram Language code: dist1 = FindDistribution[data, RandomSeeding -> 1]

Use FindDistributionParameters to estimate the best parameters, assuming a StudentTDistribution:

Wolfram Language code:

distribution = StudentTDistribution[α, β, ν];
estimatedParameters = FindDistributionParameters[data, distribution];
dist2 = distribution /. estimatedParameters

Even though the StudentTDistribution minimized the log likelihood, the LogisticDistribution has larger prior and smaller complexity compared to it.

Compare the corresponding LogLikelihood:

Wolfram Language code: LogLikelihood[#, data]& /@ {dist1, dist2}

The option TargetFunctions can be used if you want to find roughly the same parameters as FindDistributionParameters:

Wolfram Language code: dist3 = FindDistribution[data, TargetFunctions -> {StudentTDistribution}]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

FindDistribution

Details and Options

Examples

Basic Examples (2)

Scope (3)

Options (5)

TargetFunctions (3)

PerformanceGoal (1)

RandomSeeding (1)

Applications (5)

Lengths of Words Beginning with a Particular Letter (1)

Text Frequency (1)

Melanoma in Denmark (1)

Infection Time for AIDS (1)

Time to Kidney Infection after Catheter Replacement (1)

Properties & Relations (1)

Text

CMS

APA

BibTeX

BibLaTeX

	"Speed"	minimize the time spent to find distributions
	"Quality"	try to find better distributions

FindDistribution

Details and Options

Examples

Basic Examples (2)

Scope (3)

Options (5)

TargetFunctions (3)

PerformanceGoal (1)

RandomSeeding (1)

Applications (5)

Lengths of Words Beginning with a Particular Letter (1)

Text Frequency (1)

Melanoma in Denmark (1)

Infection Time for AIDS (1)

Time to Kidney Infection after Catheter Replacement (1)

Properties & Relations (1)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX