Contributor: Dhuvi Karthikeyan

Target Conditioned Antibody Sequence Generation Using Protein Language Models

Mentors: Aaron Rock Menezes
Organization: DeepChem
Technologies: python, pytorch, HuggingFace
Topics: machine learning, biology, Large Language Models, Immunology

Monoclonal antibodies (mAbs) are a potent therapeutic because of their ability to precisely target molecular surfaces at a sub-protein level resolution. They are used in cancer checkpoint blockades, treatments for viral infection, and useful in situations of snakebites as anti-venom. Current methods of antibody discovery involve inoculating various chimeric animal constructs with specific antigens and isolating and purifying antigen-specific molecules. As such there has been a significant interest in designing antibodies in-silico. A growing number of large language models have been employed to capture the sequence distributions of these proteins, with varying degrees of success. However, the vast majority of language models are encoder-only transformers whose target-conditioned sequence generation capabilities are limited, and their code is prototypical rather than production ready. This project aims to integrate and finetune a target-conditioned LLM for eptiope-specific antibody design. Extending DeepChem's codebase to include support for this class of therapeutics would not only help democratize this technology for resource constrained operations, but also bring these groups onto a platform with a lively user base and healthy collaborative environment.