cBioPortal provides web resources in cancer genome research. Building a web service to provide protein alignments between ensembl and PDB sequences will help mapping mutations to protein 3D structures. Building an auto updating pipeline will help cBioPortal kept adopting the updated protein structures, and advance the cancer research eventually. The project starts from downloading human proteins in all isoforms from ensembl and blast them against all PDB sequences. The alignments will be parsed and stored into carefully designed databases. A JSON based API will be built upon the database via predefined methods in database query, and cBioPortal obtained the alignments from the exposed API. A pipeline will also be constructed to update the alignments weekly. Once new structures from PDB were released, the pipeline initiate automatically to update the alignments on the updated structures. The updated alignments were stored into the database and a regression test was applied to check the correctness on a validation set. Performance, robustness, reliability and developing feasibility were all considered in the proposal.




  • Pieter
  • Onur
  • sheridan