Creating a Chemically Intelligent MongoDB Integration for RDKit
- Mentors
- Marco Stenta, Peter Gedeck, Greg Landrum
- Organization
- Open Chemistry
Chemical databases involve great heterogeneity. They are often variably indexed and represent chemical structures using custom data structures. This makes them suited for more flexible schemes to store and query data. One such scheme is MongoDB, a document-oriented, NoSQL database solution. Researchers have already demonstrated MongoDB’s potential for extraordinary query performance and rich information storage in genomics and materials science.
This is a proposal for an integration between RDKit, a chemoinformatics library, and MongoDB. MongoDB's aggregation pipeline and distributed computing, or "sharding" features could lead to high-performance similarity and substructure searches over large, complex datasets, while its aggregation framework and document-oriented nature lend themselves well to storing a multitude of chemical information types.
This integration could allow businesses and scientists to screen very large chemical collections, such as the Real Database, with high speed and specificity, identifying candidate compounds for drug development, research, agriculture, and more.