"DecisionTree" (Machine Learning Method)
- Method for Predict, Classify and LearnDistribution.
- Use a decision tree to model class probabilities, value predictions or probability densities.
Details & Suboptions
- A decision tree is a flow chart–like structure in which each internal node represents a "test" on a feature, each branch represents the outcome of the test, and each leaf represents a class distribution, value distribution or probability density.
- For Classify and Predict, the tree is constructed using the CART algorithm.
- For LearnDistribution, the splits are determined using an information criterion trading off the likelihood and the complexity of the model.
- The following options can be given:
-
"DistributionSmoothing" 1 regularization parameter "FeatureFraction" 1 the fraction of features to be randomly selected for training (only in Classify and Predict)
Examples
open all close allBasic Examples (3)
Train a predictor function on labeled examples:
p = Predict[{1, 2, 3, 4} -> {.3, .4, .6, 9}, Method -> "DecisionTree"]Look at the information about the predictor:
Information[p]Extract option information that can be used for retraining:
Information[p, "MethodOption"]p[1.3]Generate some data and visualize it:
data = Table[x -> Sin[x] + RandomVariate[NormalDistribution[0, .2]], {x, RandomReal[{-10, 10}, 400]}];
ListPlot[List@@@data]Train a predictor function on it:
p = Predict[data, Method -> "DecisionTree"]Compare the data with the predicted values and look at the standard deviation:
Show[Plot[
{p[x],
p[x] + StandardDeviation[p[x, "Distribution"]], p[x] - StandardDeviation[p[x, "Distribution"]]},
{x, -2, 6},
PlotStyle -> {Blue, Gray, Gray},
Filling -> {2 -> {3}},
Exclusions -> False,
PerformanceGoal -> "Speed", PlotLegends -> {"Prediction", "Confidence Interval"}], ListPlot[List@@@data, PlotStyle -> Red, PlotLegends -> {"Data"}]]Learn a distribution using the method "DecisionTree":
data = RandomVariate[NormalDistribution[], 1000];ld = LearnDistribution[data, Method -> "DecisionTree"]Plot[PDF[ld, x], {x, -5, 5}, Filling -> Bottom]Obtain information about the distribution:
Information[ld]Options (4)
"DistributionSmoothing" (2)
Use the "DistributionSmoothing" option to train a classifier:
c = Classify[{1, 2, 3, 4, 5, 6} -> {1, 1, 3, 3, 1, 3}, Method -> {"DecisionTree", "DistributionSmoothing" -> .3}]Use the mushrooms training set to train a classifier with the default value of "DistributionSmoothing":
data = ExampleData[{"MachineLearning", "Mushroom"}, "TrainingData"];classifier = Classify[data, Method -> "DecisionTree"];Train a second classifier using a large "DistributionSmoothing":
smoothed = Classify[data, Method -> {"DecisionTree", "DistributionSmoothing" -> 100}]Compare the probabilities for examples from a test set:
testdata = ExampleData[{"MachineLearning", "Mushroom"}, "TestData"];sample = RandomSample[testdata, 4];
Dataset@<|"AutomaticClassifier" ->
classifier[sample[[All, 1]], "Probabilities"],
"SmoothedClassifier" -> smoothed[sample[[All, 1]], "Probabilities"]|>"FeatureFraction" (2)
Use the "FeatureFraction" option to train a classifier:
c = Classify[{{1, 2.3, 4, 5.3}, {2, 2.3, 2.4, 5}, {2, 2.3, 2.4, 5}, {1, 3, 4, -5.2}, {2, -5, -3.2, 5}, {2, 1.3, -8.1, 3.3}} -> {1, 1, 3, 3, 1, 3}, Method -> {"DecisionTree", "FeatureFraction" -> .5}]Use the mushrooms training set to train two classifiers with different values of "FeatureFraction":
data = ExampleData[{"MachineLearning", "Mushroom"}, "TrainingData"];c1 = Classify[data, Method -> {"DecisionTree", "FeatureFraction" -> 1}]c2 = Classify[data, Method -> {"DecisionTree", "FeatureFraction" -> .1}]Look at the accuracy of these classifiers on a test set:
testdata = ExampleData[{"MachineLearning", "Mushroom"}, "TestData"];ClassifierMeasurements[c1, testdata, "Accuracy"]ClassifierMeasurements[c2, testdata, "Accuracy"]