Contributor: mentekid

Approximate Nearest Neighbor Search: Implementation of Multiprobe LSH and LSH Tuning

Mentors: Ryan Curtin
Organization: mlpack: a scalable C++ machine learning library

Approximate Nearest Neighbor Search solutions become appealing in problems where dimensionality increases. Locality Sensitive Hashing is a popular algorithm for this task, but has important drawbacks: it requires a large amount of memory, and the number of parameters makes it less appealing to people unfamiliar with the inner workings of the algorithm.

To reduce memory inefficiency, Multi-probe LSH (MPLSH) better utilises the memory used by LSH. MPLSH was shown to perform better with regards to both memory and execution time. Also, a model of LSH's performance has been developed, that allows users to easily explore the parameter space by using a small sample of the dataset.

As part of my GSoC project, I am proposing the implementation of these two algorithms. The core LSH algorithm is already in mlpack, and I believe my proposed additions will

make it easier for mlpack users to use LSH by alleviating the need to repeat the tedious tuning process every time a new dataset is introduced.
make the current LSH implementation more efficient both time- and space-wise, by using a smarter algorithm and exploiting the benefits of parallelization.