Until recently, biodiversity data was scattered in different formats in natural history collections, survey reports, and in literature. In the last fifteen years, lot of efforts are being made to establish standards in the biodiversity database structure and to centralize the data for better accessibility. But the data gathered by such entities does not enforce strong data quality standards. These sources often tend to be prone to many flaws. Thus the data retrieved from centralized sources needs to go through a well formed quality-control process to be used in researches.

Bdclean was created for that same purpose. So far we have been able to create numerous quality checks, work-flows, analyses and visualization functionalities in the taxonomical, spatial and temporal aspects. But all these remain as standalone components without much synchronization or connectivity. We propose to refine the overall data cleaning pipeline of bdclean and bring synergy to all the developed components as well as develop new important functionalities. At the end of this project, users will be able go through the quality control process in a very structured, intuitive and effective way.


Thiloshon Nagarajah


  • Yohay Carmel
  • Vijay Barve
  • Tomer Gueta