"GradientBoostedTrees" (Machine Learning Method)
Details & Suboptions
- Gradient boosting is a machine learning technique for regression and classification problems that produces a prediction model in the form of an ensemble of trees. Trees are trained sequentially with the goal of compensating the weaknesses of previous trees. The current implementation uses the LightGBM framework in the back end.
- The following options can be given:
-
MaxTrainingRounds 50 number of boosting rounds "BoostingMethod" "Gradient" the method to use "L1Regularization" 0 L1 regularization parameter "L2Regularization" 0 L2 regularization parameter "LeafSize" Automatic minimum number of data samples in one leaf "LearningRate" Automatic learning rate used in gradient descent "LeavesNumber" Automatic minimum number of leaves in one tree "MaxDepth" 6 maximum depth of each tree - Possible settings for "BoostingMethod" include "Gradient", "GradientOneSideSampling", and "DART" (i.e. Dropouts meet Multiple Additive Regression Trees).
Examples
open all close allBasic Examples (2)
Train a predictor function on labeled examples:
p = Predict[{1, 2, 3, 4} -> {.3, .4, .6, 9}, Method -> "GradientBoostedTrees"]Obtain information about the predictor:
Information[p]p[1.3]Generate some data and visualize it:
data = Table[x -> x ^ 2 + RandomVariate[NormalDistribution[0, 5]], {x, RandomReal[{-10, 10}, 200]}];
ListPlot[List@@@data]Train a predictor function on it:
p = Predict[data, Method -> "GradientBoostedTrees"]Compare the data with the predicted values and look at the standard deviation:
Show[Plot[
{p[x],
p[x] + StandardDeviation[p[x, "Distribution"]], p[x] - StandardDeviation[p[x, "Distribution"]]},
{x, -2, 6},
PlotStyle -> {Blue, Gray, Gray},
Filling -> {2 -> {3}},
Exclusions -> False,
PerformanceGoal -> "Speed", PlotLegends -> {"Prediction", "Confidence Interval"}], ListPlot[List@@@data, PlotStyle -> Red, PlotLegends -> {"Data"}]]Options (8)
"BoostingMethod" (1)
Train two classifiers on the "WineQuality" training set using a different "BoostingMethod" for each, and compare the training time:
trainingdata = ExampleData[{"MachineLearning", "WineQuality"}, "TrainingData"];{{t1, c1}, {t2, c2}} = AbsoluteTiming[Classify[trainingdata, Method -> {"GradientBoostedTrees", "BoostingMethod" -> #}]]& /@ {"Gradient", "DART"}Compare the accuracy on a test set:
testdata = ExampleData[{"MachineLearning", "WineQuality"}, "TestData"];ClassifierMeasurements[#, testdata, "Accuracy"]& /@ {c1, c2}"LeafSize" (2)
Train a predictor function using the "LeafSize" option:
Predict[{1, 2, 3, -2, -4, -5} -> {.1, .2, .6, 1.3, 6.7, 8},
Method -> {"GradientBoostedTrees", "LeafSize" -> 1}]Train two classifiers on the "Titanic" dataset by changing the value of "LeafSize":
trainingset = ExampleData[{"MachineLearning", "Titanic"}, "TrainingData"];c3 = Classify[trainingset, Method -> {"GradientBoostedTrees", "LeafSize" -> 3}]c400 = Classify[trainingset, Method -> {"GradientBoostedTrees", "LeafSize" -> 400}]Look at how the performance increases:
testset = ExampleData[{"MachineLearning", "Titanic"}, "TestData"];ClassifierMeasurements[#, testset, "Accuracy"]& /@ {c3, c400}"LeavesNumber" (1)
Generate a labeled training set:
trainingset = Table[x -> Cos[x] + RandomVariate[NormalDistribution[0, .5]], {x, RandomReal[{-20, 20}, 300]}];
ListPlot[List@@@trainingset]Train two predictors using a different "LeavesNumber" for each:
p5 = Predict[trainingset, Method -> {"GradientBoostedTrees", "LeavesNumber" -> 5}
]p80 = Predict[trainingset, Method -> {"GradientBoostedTrees", "LeavesNumber" -> 80}
]Show[ListPlot[List@@@trainingset], Plot[{p5[x], p80[x]}, {x, -20, 20}]]"MaxDepth" (2)
Use the "MaxDepth" option to train a classifier:
c = Classify[{1, 2, 3, 4, 5, 6} -> {1, 1, 3, 3, 1, 3}, Method -> {"GradientBoostedTrees", "MaxDepth" -> 3}]Use the "BostonHomes" training set to train two predictors with a different "MaxDepth" for each:
trainingdata = ExampleData[{"MachineLearning", "BostonHomes"}, "TrainingData"];p2 = Predict[trainingdata, Method -> {"GradientBoostedTrees", "MaxDepth" -> 2}
]p20 = Predict[trainingdata, Method -> {"GradientBoostedTrees", "MaxDepth" -> 20}
]Compare the "ComparisonPlot" on a test set:
testdata = ExampleData[{"MachineLearning", "BostonHomes"}, "TestData"];Grid[{PredictorMeasurements[#, testdata, "ComparisonPlot"]& /@ {p2, p20}}, Frame -> All]MaxTrainingRounds (2)
Use the MaxTrainingRounds option to train a classifier:
c = Classify[{1, 2, 3, 4, 5, 6} -> {1, 1, 3, 3, 1, 3}, Method -> {"GradientBoostedTrees", MaxTrainingRounds -> 5}]Train two classifiers on the "Mushroom" dataset by changing the value of MaxTrainingRounds:
trainingset = ExampleData[{"MachineLearning", "Mushroom"}, "TrainingData"];c3 = Classify[trainingset, Method -> {"GradientBoostedTrees", MaxTrainingRounds -> 3}]c10 = Classify[trainingset, Method -> {"GradientBoostedTrees", MaxTrainingRounds -> 10}]Look at how the performance increases:
testset = ExampleData[{"MachineLearning", "Mushroom"}, "TestData"];ClassifierMeasurements[c3, testset, "Accuracy"]ClassifierMeasurements[c10, testset, "Accuracy"]