Automating Quality Control of Datahub Submissions
- Mentors
- Charles Haynes, Walle, Avery Wang
- Organization
- cBioPortal for Cancer Genomics
- Technologies
- docker, bash, unix, shell scripting, GitHub Actions
- Topics
- devops, Workflow automation, Engineering Productivity
cBioPortal allows submission of data sets to enrich its open-access cancer genomics database. Currently, new data sets can be uploaded to a separate repository for further review. However, to ensure full quality assurance, new data sets must be manually imported into an instance of cBioPortal and visually inspected to ensure the data displays correctly. This manual step is tedious, and presents a barrier to easily reviewing new submissions.
To resolve this bottleneck in the review process, the proposed solution will create a new workflow that will trigger upon a new submission to automatically deploy a live staging instance of cBioPortal, with the new data set already imported. Time permitting, a second potential workflow could also be explored to further streamline the process by generating resources such as screenshots for easier visual inspection. These improvements would allow cBioPortal maintainers to more easily validate new submissions, leading to faster approvals for new data, and a higher quality database for cancer researchers.