Wolfram Language & System Documentation Center

"DBSCAN" (Machine Learning Method)

See Also
- FindClusters
- ClusterClassify
- ClusteringComponents
- ClusteringTree
- Dendrogram
- DimensionReduction
- Methods
- Agglomerate
- GaussianMixture
- JarvisPatrick
- KMeans
- KMedoids
- MeanShift
- NeighborhoodContraction
- SpanningTree
- Spectral
Tech Notes
- Partitioning Data into Clusters
- See Also
  - FindClusters
  - ClusterClassify
  - ClusteringComponents
  - ClusteringTree
  - Dendrogram
  - DimensionReduction
  - Methods
  - Agglomerate
  - GaussianMixture
  - JarvisPatrick
  - KMeans
  - KMedoids
  - MeanShift
  - NeighborhoodContraction
  - SpanningTree
  - Spectral
- Tech Notes
  - Partitioning Data into Clusters

"DBSCAN" (Machine Learning Method)

Method for FindClusters, ClusterClassify and ClusteringComponents.
Partitions data into clusters of similar elements using density-based spatial clustering of applications with noise (DBSCAN).

Details & Suboptions

"DBSCAN" (density-based spatial clustering of applications with noise) is a density-based clustering method where the density is estimated using a neighbor-based approach. "DBSCAN" works for arbitrary cluster shapes and sizes but requires clusters to have similar densities.
The following plots show the results of the "DBSCAN" method applied to toy datasets (black points indicate outliers):

"DBSCAN" defines "core points" as data points that have more than k neighbors within a ball of ϵ radius (i.e. data points in high-density regions). Then, core points that are at a distance of less than ϵ from each other define a cluster. Furthermore, any point that is at a distance of less than ϵ of a core point belongs to the cluster of the core point. Any point that is not near a core point is considered noise.
This results in each cluster containing one or more core points at its core and some non-core points at its "edge". Overall, "DBSCAN" defines clusters as connected high-density regions. In the following figure, core points are red, edge points are yellow and noise points are blue:

In ClusteringComponents and ClusterClassify, noise points are labeled Missing["Anomalous"].
In FindClusters, noise points are returned as a cluster.
The option DistanceFunction can be used to define which distance to use.
The following suboptions can be given:

"NeighborhoodRadius"	Automatic	radius ϵ
"NeighborsNumber"	Automatic	number of neighbors k
"DropAnomalousValues"	False	whether to drop outliers

Examples

open all close all

Basic Examples (3)

Find clusters of nearby values using the "DBSCAN" method:

Wolfram Language code: FindClusters[{1, 2, 10, 12, 3, 1, 13, 25}, Method -> "DBSCAN"]

Train the ClassifierFunction on a list of colors using the "DBSCAN" method:

Wolfram Language code:

colors = RandomColor[70];
c = ClusterClassify[colors, Method -> "DBSCAN"]

Gather the elements by their class number:

Wolfram Language code: GatherBy[colors, c]

Create random 2D vectors:

Wolfram Language code:

SeedRandom[123]
data = Join[RandomReal[1, {100, 2}], RandomReal[{-3, -.1}, {100, 2}]];
ListPlot[data]

Plot clusters in data found using the "DBSCAN" method:

Wolfram Language code: ListPlot[FindClusters[data, Method -> "DBSCAN"]]

Scope (2)

Obtain a random list of times:

Wolfram Language code: data = TimeObject /@ RandomReal[AbsoluteTime[]//Round, 20]

Train the ClassifierFunction using the "DBSCAN" method:

Wolfram Language code: c = ClusterClassify[data, Method -> "DBSCAN"]

Obtain the cluster assignment and cluster the data:

Wolfram Language code:

assignment = c[data]
GatherBy[data, c]

Train the ClassifierFunction using the "DBSCAN" method:

Wolfram Language code:

data = {1000, 1, 2, 3, 1.1, 2, 9, 8, 7, 6, 4, 1, .5, 152, 153, 150, 145, -1000};
cc = ClusterClassify[data, Method -> "DBSCAN"]

Noise points are labeled as Missing["Anomalous"]:

Wolfram Language code: cc[data]

Options (7)

DistanceFunction (1)

Cluster string data using edit distance:

Wolfram Language code: FindClusters[{"abc", "xyz", "bca", "wxyz"}, Method -> "DBSCAN", DistanceFunction -> EditDistance]

Cluster data using Manhattan distance:

Wolfram Language code: FindClusters[{{2, 3}, {5, 10}, {4, 5}, {2, 2}}, Method -> "DBSCAN", DistanceFunction -> ManhattanDistance]

"NeighborhoodRadius" (2)

Find clusters by specifying the "NeighborhoodRadius" suboption:

Wolfram Language code: FindClusters[{1, 2, 10, 12, 3, 1, 13, 25}, Method -> {"DBSCAN", "NeighborhoodRadius" -> 1}]

Define a set of two-dimensional data points, characterized by four somewhat nebulous clusters:

Wolfram Language code:

dn1 = MultinormalDistribution[{4, 4}, {{2, 1 / 3}, {1 / 3, 2 / 3}}];
dn2 = MultinormalDistribution[{-3, -3}, {{2, -1 / 3}, {-1 / 3, 2 / 3}}];
dn3 = MultinormalDistribution[{4, -4}, {{2, -1 / 3}, {-1 / 3, 2 / 3}}];
dn4 = MultinormalDistribution[{-3, 4}, {{2, -1 / 3}, {-1 / 3, 2 / 3}}];
data = RandomVariate[MixtureDistribution[{1, 1, 1, 1}, {dn1, dn2, dn3, dn4}], 400];
	
ListPlot[data]

Plot clusters in data found using the "DBSCAN" method:

Wolfram Language code: ListPlot[FindClusters[data, Method -> "DBSCAN"]]

Plot different clusterings of data using the "DBSCAN" method by varying the "NeighborhoodRadius":

Wolfram Language code:

table = Table[ListPlot[FindClusters[data, Method -> {"DBSCAN", "NeighborhoodRadius" -> p}]], {p, {0.1, 0.2, 0.3}}];
	Grid[{table}, Frame -> All]

"NeighborsNumber" (3)

Find clusters by specifying the "NeighborsNumber" suboption:

Wolfram Language code: FindClusters[{1, 2, 10, 12, 3, 1, 13, 25, 30, 32}, Method -> {"DBSCAN", "NeighborsNumber" -> 3}]

Create random 2D vectors:

Wolfram Language code:

data = Join[RandomReal[1, {100, 2}], RandomReal[{-3, -.1}, {100, 2}]];
ListPlot[data]

Plot clusters in data found using the "DBSCAN" method:

Wolfram Language code: ListPlot[FindClusters[data, Method -> "DBSCAN"]]

Plot different clusterings of data using the "DBSCAN" method by varying the "NeighborsNumber":

Wolfram Language code:

table = Table[ListPlot[FindClusters[data, Method -> {"DBSCAN", "NeighborsNumber" -> p}]], {p, {3, 5, 10}}];
Grid[{table}, Frame -> All]

Define a set of two-dimensional data points, characterized by four somewhat nebulous clusters:

Wolfram Language code:

dn1 = MultinormalDistribution[{4, 4}, {{2, 1 / 3}, {1 / 3, 2 / 3}}];
dn2 = MultinormalDistribution[{-3, -3}, {{2, -1 / 3}, {-1 / 3, 2 / 3}}];
dn3 = MultinormalDistribution[{4, -4}, {{2, -1 / 3}, {-1 / 3, 2 / 3}}];
dn4 = MultinormalDistribution[{-3, 4}, {{2, -1 / 3}, {-1 / 3, 2 / 3}}];
data = RandomVariate[MixtureDistribution[{1, 1, 1, 1}, {dn1, dn2, dn3, dn4}], 400];
	
ListPlot[data]

Plot clusters in data using the "DBSCAN" method:

Wolfram Language code: ListPlot[FindClusters[data, Method -> "DBSCAN"]]

Plot different clusterings of data using the "DBSCAN" method by varying the "NeighborsNumber":

Wolfram Language code:

table = Table[ListPlot[FindClusters[data, Method -> {"DBSCAN", "NeighborsNumber" -> p}]], {p, {5, 30, 50}}];
Grid[{table}, Frame -> All]

"DropAnomalousValues" (1)

Train the ClassifierFunction, which labels outliers as Missing["Anomalous"]:

Wolfram Language code:

ndata = {1000, 1, 2, 3, 1.1, 2, 9, 8, 7, 6, 4, 1, .5, 152, 153, 150, 145, -1000};
cc = ClusterClassify[ndata, Method -> "DBSCAN"]

Use the trained ClassifierFunction to identify the outliers:

Wolfram Language code: cc[ndata]

Train the ClassifierFunction by dropping outliers and finding new cluster assignments:

Wolfram Language code:

cc = ClusterClassify[ndata, Method -> {"DBSCAN", "DropAnomalousValues" -> True}]
cc[ndata]

Similarly, find clusters of nearby values with outliers:

Wolfram Language code: FindClusters[ndata, Method -> "DBSCAN"]

Remove outliers using the "DropAnomalousValues" suboption:

Wolfram Language code: FindClusters[ndata, Method -> {"DBSCAN", "DropAnomalousValues" -> True}]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

"DBSCAN" (Machine Learning Method)

Details & Suboptions

Examples

Basic Examples (3)

Scope (2)

Options (7)

DistanceFunction (1)

"NeighborhoodRadius" (2)

"NeighborsNumber" (3)

"DropAnomalousValues" (1)

"DBSCAN" (Machine Learning Method)

Details & Suboptions

Examples

Basic Examples (3)

Scope (2)

Options (7)

DistanceFunction (1)

"NeighborhoodRadius" (2)

"NeighborsNumber" (3)

"DropAnomalousValues" (1)

See Also

Tech Notes

Related Links

History