The AcousticBrainz database contains detailed high and low-level information for millions of audio recordings, all of which create an essential for creatives, researchers, and music fanatics alike. Our understanding of audio can be greatly improved through features that focus on similarities between the content of recordings in such a large database. As such, the development of a similarity index between recordings is essential to improving the AcousticBrainz platform and also to the progression of music recommendation engines in related projects like ListenBrainz.

Especially in relation to AcousticBrainz, previous investigations on similarity systems have supported the success of content-based (high and low level data) engines for determining track similarity. These implementations have fallen short since their architecture prevents scalability, ultimately lacking the speed required for use in AcousticBrainz.

With the information gained from previous pitfalls in recording similarity research and the importance of improved efficiency for a long term implementation, my 2019 GSoC project aims to lay the foundation for an AcousticBrainz similarity engine.


Aidan Lawford-Wickham


  • Alastair Porter