-
See Also
- TreeGraph
- Graph
- CreateDataStructure
- DataStructure
- Nearest
-
- Data Structures
- RedBlackTree
- AVLTree
- Related Guides
"KDTree" (Data Structure)
"KDTree"
represents k-d tree binary spatial subdivision for a set of real number coordinates.
Details
- A k-d tree is useful for subdividing a set of n coordinates in dimension
so that searches, such as nearest neighbor searches, can be done quickly. Each node has two data elements, indexed by 0 and 1. The data element can be either another k-d tree node or a list {i1,i2,…} of point indexes with 0<ik<=n. Each data element has a unique machine integer ID. -
CreateDataStructure["KDTree",coordinates] create a new "KDTree" from coordinates that can all be represented by real numbers. CreateDataStructure["KDTree",coordinates, n] create a new "KDTree" from coordinates with at most n entries in each subdivistion. Typed[x,"KDTree"] give x the type "KDTree". - For a data structure ds of type "KDTree", the following operations can be used:
-
ds["DataBounds"] return the overall bounds for the coordinates time: O(1) ds["DataCoordinates"] return the coordinates time: O(1) ds["Graphics"] show the spatial subdivision for coordinates in 2D time: O(n) ds["ID"] return the ID for ds time: O(1) ds["ID", b] return the ID for the b (0 or 1) data element of ds time: O(1) ds["Indexes",b] return the indexes for the coordinates enclosed in the box represented by the b (0 or 1) leaf data element of ds time: O(1) ds["LeafQ", b] return True if the b (0 or 1) data element of ds is a leaf time: O(1) ds["Node",b] return the "KDTree" for the the b (0 or 1) node data element of ds time: O(1) ds["SplitDimension"] return the dimension for the spatial split of ds time: O(1) ds["SplitPosition"] return the position for the spatial split of ds time: O(1) ds["SubtreeIndexes",sd] return all the indexes for the coordinates in the subtree represented by ds time: O(n) ds["SubtreeLength"] find the number of coordinates bound by the subtree represented by ds time: O(n) ds["TreeSize"] return the number of data elements time: O(1) ds["Visualization"] return a visualization of ds time: O(n)
Examples
open all close allBasic Examples (1)
A new "KDTree" object can be created with CreateDataStructure:
coordinates = BlockRandom[RandomVariate[NormalDistribution[], {47, 2}], RandomSeeding -> 1234];ds = CreateDataStructure["KDTree", coordinates]Show graphics for the 2D data:
dsg = ds["Graphics", Axes -> True]Test if either of the 0 and 1 data elements are leaves:
Table[ds["LeafQ", b], {b, 0, 1}]Get the points included in the 1-data element leaf and highlight them with red in the graphics:
Show[dsg, Graphics[{Red, Point[ds["DataCoordinates"][[ds["Indexes", 1]]]]}]]Get the k-d tree represented by the 0-data element:
left = ds["Node", 0]The graphics for this colors pink the part of the coordinates not included in the subtree:
left["Graphics"]Visualize the tree structure of the k-d tree:
ds["Visualization"]Get the split dimension and position for the k-d tree:
{sdim = ds["SplitDimension"], spos = ds["SplitPosition"]}Get the coordinate indexes for the 1-element data of the k-d tree.
{ds["LeafQ", 1], ind = ds["Indexes", 1]}Test that the coordinates are all above the split position in the split dimension:
AllTrue[coordinates[[ind, sdim]], (# ≥ spos)&]Get all coordinate indexes for the 1-element subtree k-d tree and check that they are all below the split position:
lind = left["SubtreeIndexes"];
AllTrue[coordinates[[lind, sdim]], (# ≤ spos)&]Scope (3)
Information (1)
A new "KDTree" can be created with CreateDataStructure:
coordinates = RandomReal[1, {1663, 4}];ds = CreateDataStructure["KDTree", coordinates]Information about the data structure ds:
Information[ds]Leaf Size (1)
The subdivision continues until there are at most a given number of coordinates in each box that can be specified as an additional argument in CreateDataStructure.
Limit the number of coordinates to at most 5:
coordinates = BlockRandom[RandomVariate[NormalDistribution[], {47, 2}], RandomSeeding -> 1234];ds = CreateDataStructure["KDTree", coordinates, 5]ds["Graphics"]The default is chosen to give a good balance between construction and traversal cost and operation cost on the leaves:
CreateDataStructure["KDTree", coordinates]["Graphics"]Arbitrary Precision (1)
If you give the coordinates with more than MachinePrecision digits, then the computations will be done in the appropriate precision:
coordinates = BlockRandom[RandomReal[{-1, 1}, {47, 2}, WorkingPrecision -> 24], RandomSeeding -> 1234];
ds = CreateDataStructure["KDTree", coordinates]ds["SplitPosition"]Applications (2)
Nearest Neighbor Search (1)
Set up a program to use a "KDTree" data structure to find the element in a set of coordinates nearest to a given point:
FindNearestNeighbor[kdTree_, x_] :=
Module[{searchData},
searchData = <|"Point" -> x, "Coordinates" -> kdTree["DataCoordinates"], "Nearest" -> CreateDataStructure["Value", {0, Infinity}]|>;
nodeNearest[kdTree, searchData];
searchData["Nearest"]["Get"]
];At each node, process the side that the point is in or closest to first:
nodeNearest[node_, searchData_] := Module[{spos, xd, sdist, first, second, ndist},
spos = node["SplitPosition"];
xd = searchData["Point"][[node["SplitDimension"]]];
If[xd ≤ spos,
sdist = spos - xd;
{first, second} = {0, 1},
sdist = xd - spos;
{first, second} = {1, 0}
];
sideNearest[first, node, searchData];
ndist = searchData["Nearest"]["Get"][[2]];
If[ndist ≥ sdist,
sideNearest[second, node, searchData]
];
];For a leaf data element, test all the points, and for a node data element, recurse:
sideNearest[side_, node_, searchData_] :=
If[node["LeafQ", side],
(* Compute min distance to included points *)
Module[{nind, ndist, dist},
{nind, ndist} = searchData["Nearest"]["Get"];
Do[
dist = Norm[searchData["Coordinates"][[ind]] - searchData["Point"]];
If[dist < ndist,
{nind, ndist} = {ind, dist};
],
{ind, node["Indexes", side]}
];
searchData["Nearest"]["Set", {nind, ndist}]
],
nodeNearest[node["Node", side], searchData]
];Test the function on a set of one million coordinates in three dimensions:
coordinates = BlockRandom[RandomReal[{-1, 1}, {10 ^ 6, 4}], RandomSeeding -> 1234];ds = CreateDataStructure["KDTree", coordinates]Developer`PackedArrayQ[ds["DataCoordinates"]]FindNearestNeighbor[ds, {0., 0., 0., 0.}]Compare to the result returned by Nearest:
Nearest[coordinates -> {"Index", "Distance"}, {0., 0., 0., 0.}]Compare the timing to just computing the Norm of all of the points:
RepeatedTiming[Min[Map[Norm, coordinates]]]RepeatedTiming[FindNearestNeighbor[ds, {0., 0., 0., 0.}]]The construction of the k-d tree takes more time than a single scan, but is worth the extra time if several searches are desired:
First[RepeatedTiming[CreateDataStructure["KDTree", coordinates]]]RepeatedTiming[ds["DataCoordinates"];]Range Search (1)
Set up a program to use a "KDTree" object to find all elements in a set of coordinates within given ranges:
RangeSearch[kdTree_, ranges : {{_, _}..}] := Module[{searchData},
searchData = <|"LowerLimits" -> ranges[[All, 1]], "UpperLimits" -> ranges[[All, 2]], "Coordinates" -> kdTree["DataCoordinates"], "InRange" -> CreateDataStructure["DynamicArray"]|>;
nodeRangeSearch[kdTree, searchData];
Normal[searchData["InRange"]]
];At each node, process each side that fits in the range for the node split dimension:
nodeRangeSearch[node_, searchData_] :=
Module[{spos, sdim, ll, ul},
sdim = node["SplitDimension"];
spos = node["SplitPosition"];
ll = searchData["LowerLimits"][[sdim]];
ul = searchData["UpperLimits"][[sdim]];
If[ll ≤ spos,
sideRangeSearch[0, node, searchData]
];
If[ul ≥ spos,
sideRangeSearch[1, node, searchData]
];
];For a leaf data element, test all the points for inclusion, and for a node data element, recurse:
sideRangeSearch[side_, node_, searchData_] :=
Module[{x, ll, ul},
If[node["LeafQ", side],
(* Determine index of points in range *)
ll = searchData["LowerLimits"];
ul = searchData["UpperLimits"];
x = searchData["Coordinates"];
Do[
If[AllTrue[MapThread[LessEqual, {ll, x[[ind]], ul}], Identity],
searchData["InRange"]["Append", ind]
],
{ind, node["Indexes", side]}
],
nodeRangeSearch[node["Node", side], searchData]
]
];Here is a dataset of selected properties for all countries for which those properties are known:
data = Cases[Table[CountryData[#, prop], {prop, {"Name", "Population", "Area", "GDP"}}]& /@ CountryData[], datum_ /; FreeQ[datum, _Missing, Infinity]];Get corresponding numerical data and create a "KDTree" data structure from it:
numericalData = QuantityArray[data[[All, 2 ;; -1]]]["Magnitudes"];
ds = CreateDataStructure["KDTree", numericalData];Find the countries with a population under a million people and GDP greater than 10 billion $:
index = RangeSearch[ds, {{-Infinity, 10 ^ 6}, {-Infinity, Infinity}, {10 ^ 10, Infinity}}];
data[[index, 1]]Find the countries with population over 10 million people and area under 100000 km2:
index = RangeSearch[ds, {{10 ^ 7, Infinity}, {-Infinity, 10 ^ 5}, {-Infinity, Infinity}}];
data[[index, 1]]See Also
TreeGraph Graph CreateDataStructure DataStructure Nearest
Data Structures: RedBlackTree AVLTree
Related Guides
History
Introduced in 2021 (12.3)