"NearestNeighbors" (Machine Learning Method)
Details & Suboptions
- Nearest neighbors is a type of instance-based learning. In its simplest form, it picks the commonest class or averages the values among the k nearest neighbors.
- The following options can be given:
-
"NeighborsNumber" Automatic the number of neighbors to consider (k) "DistributionSmoothing" 0.5 regularization parameter "NearestMethod" Automatic the method to use for computing the k-nearest examples - Possible settings for "NearestMethod" include:
-
"KDtree" uses a k‐d tree data structure for storing the data "Octree" uses an octree data structure for storing the data "Scan" exaustive search on the entire dataset
Examples
open all close allBasic Examples (2)
Train a classifier function on labeled examples:
c = Classify[{1, 2, 3, 4} -> {1, 1, 2, 2}, Method -> "NearestNeighbors"]Obtain information about the classifier:
Information[c]c[1.3]Generate some data and visualize it:
data = Table[x -> x + RandomVariate[NormalDistribution[0, 2]], {x, RandomReal[{-10, 10}, 40]}];
ListPlot[List@@@data]Train a predictor function on it:
p = Predict[data, Method -> "NearestNeighbors"]Compare the data with the predicted values and look at the standard deviation:
Show[Plot[{p[x],
p[x] + StandardDeviation[p[x, "Distribution"]], p[x] - StandardDeviation[p[x, "Distribution"]]},
{x, -2, 6},
PlotStyle -> {Blue, Gray, Gray},
Filling -> {2 -> {3}},
Exclusions -> False,
PerformanceGoal -> "Speed", PlotLegends -> {"Prediction", "Confidence Interval"}], ListPlot[List@@@data, PlotStyle -> Red, PlotLegends -> {"Data"}]]Options (6)
"DistributionSmoothing" (2)
Train a classifier using the "DistributionSmoothing" suboption:
Classify[{1.98, 3.83, 1.69, 0.04, 2.48, 1.66} -> {"a", "a", "b", "b", "a", "b"}, Method -> {"NearestNeighbors", "DistributionSmoothing" -> 2}]Train two classifiers on an imbalanced dataset by varying the value of "DistributionSmoothing":
data = {1 -> True, 2 -> True, 3 -> True, 4 -> True, 5 -> False, 6 -> True};c1 = Classify[data, Method -> {"NearestNeighbors", "DistributionSmoothing" -> .1}]c2 = Classify[data, Method -> {"NearestNeighbors", "DistributionSmoothing" -> 10}]Look at the probabilities for the two classifiers:
c1[5, "Probabilities"]c2[5, "Probabilities"]"NearestMethod" (2)
Train a classifier using a specific "NearestMethod":
Classify[{1.98, 3.83, 1.69, 0.04, 2.48, 1.66} -> {"a", "a", "b", "b", "a", "b"}, Method -> {"NearestNeighbors", "NearestMethod" -> "Scan"}]Generate a large dataset and visualize it:
gaussian[μ_, σ_, n_] := RandomVariate[MultinormalDistribution[μ, {{σ, 0}, {0, σ}}], n];
positions = {{4, 2}, {-2, 2}, {0, -3}, {3, 0}};
sizes = {2, 1, 5, 0.5};
colors = {RGBColor[1, 0, 0], RGBColor[0, 0, 1], RGBColor[0, 1, 0], RGBColor[1., 0.77, 0.]};
nums = {10000, 10000, 50000, 20000};clusters = MapThread[gaussian, {positions, sizes, nums}];
trainigset = AssociationThread[colors, clusters];
plot = ListPlot[clusters, PlotStyle -> Darker[colors, 0.1], ImageSize -> 200, PlotRange -> {{-5, 5}, {-5, 5}}, Frame -> True, AspectRatio -> 1, PlotLabel -> "data"]Train several classifiers using the different methods and compare their training times:
classifiers = Classify[trainigset, Method -> {"NearestNeighbors", "NearestMethod" -> #}]& /@ {"Octree", "KDtree", "Scan"};Compare the corresponding training times:
Information[#, "TrainingTime"]& /@ classifiers"NeighborsNumber" (2)
Train a predictor function using a specific "NeighborsNumber":
Predict[{1.98, 3.83, 1.69, 0.04, 2.48, 1.66} -> {-1.41, -0.71, -0.701, -0.4, -1.91, -1.6}, Method -> {"NearestNeighbors", "NeighborsNumber" -> 2}]Generate a labeled training set and visualize it:
trainingset = Table[x -> Sin[4x] + RandomReal[.4], {x, RandomReal[{0, 6}, 30]}];
ListPlot[List@@@trainingset]Train a predictor using a small "NeighborsNumber":
p2 = Predict[trainingset, Method -> {"NearestNeighbors", "NeighborsNumber" -> 2}];Train a predictor using a large "NeighborsNumber":
p10 = Predict[trainingset, Method -> {"NearestNeighbors", "NeighborsNumber" -> 10}];Plot[{p2[x], p10[x]}, {x, 0, 6}]See Also
Classify Predict ClassifierFunction PredictorFunction ClassifierMeasurements PredictorMeasurements SequencePredict ClusterClassify
Methods: DecisionTree LinearRegression LogisticRegression GaussianProcess GradientBoostedTrees Markov NaiveBayes NeuralNetwork RandomForest SupportVectorMachine