Contributor
A. S. Aditya Sarma

DBSCAN Clustering in Mahout


Mentors
Trevor Grant
Organization
Apache Software Foundation

Clustering is an important Data Mining technique with wide applications in Medicine, Biology, Social Network Analysis, Image Segmentation just to name a few. Density-based clustering is an intuitive and efficient to group similar objects together. The DBSCAN algorithm is a state of the art density-based clustering algorithm. The DBSCAN algorithm has quadratic time complexity making it unsuitable for Big Data Applications. I propose to implement a distributed R-Tree based DBSCAN algorithm in Mahout which has a complexity of O(nlog(n)). And after due discussions, implement an optimized version of the distributed DBSCAN algorithm.