DeepChem has enabled powerful and practical applications for machine learning in many disciplines of the natural sciences. While its name recognizes its historical origins as a tool for analyzing chemical molecules, its reach has more recently broadened to encompass neighboring fields such as materials science and biology. The goal of this proposal is to expand DeepChem’s burgeoning infrastructure to better support protein modeling.
This proposal will add a simple codon featurizer to DeepChem, which will be the first DeepChem featurizer capable of directly processing RNAs and protein sequences.
It will also add new model classes to DeepChem’s models directory. These classes will act as wrappers for models in Facebook’s Evolutionary Scale Modeling (ESM) repository. They will enable DeepChem users to use ESM’s Multiple Sequence Alignment transformer and Protein BERT models to perform efficient, scalable transformer learning.
Also including carefully planned additions to DeepChem’s documentation, unit tests, tutorials, and the MoleculeNet benchmarking suite, this proposal will empower DeepChem users to perform large scale protein modeling while staying within our ecosystem.