EUCLIDEAN DBSCAN REVISITED: FROM STATIC TO DYNAMIC

Junhao Gan (1)

(1) School of Information technology and Electrical Engineering, The University of Queensland, j.gan@uq.edu.au

 

DBSCAN is a highly successful density-based clustering method for multi-dimensional points. Its seminal paper won the Test-of-Time Award at KDD 2014, and it has over 9000 citations at Google Scholar. Although DBSCAN has received extensive applications, its computational hardness was unsolved until the recent work at SIGMOD 2015.

This talk focuses on the problem of computing DBSCAN clusters on a set of n points in d-dimensional space from scratch (assuming no existing indexes) under the Euclidean distance. More specifically, we first show the DBSCAN problem is “hard” in three or higher dimensional space. Motivated by this, we propose a relaxed version of the problem called ρ-approximate DBSCAN, which returns the same clusters as DBSCAN, unless the clusters are “unstable”. The ρ-approximate problem is “easy” regardless of the constant dimensionality.

This talk further investigate the algorithmic principles for dynamic clustering by DBSCAN. Surprisingly, we prove that the ρ-approximate version suffers from the very same hardness when the dataset is fully dynamic. We also show that this issue goes away as soon as tiny further relaxation is applied, yet still ensuring the same quality of ρ-approximate DBSCAN.