Contributor
Favour James

Generate Example Dataset for PyTorch Geometric Based on Pathway Commons and Prototype


Mentors
AugustinL, Yoshitaka Inoue
Organization
National Resource for Network Biology (NRNB)
Technologies
python, pytorch, gpu, Pytorch Geometric
Topics
machine learning, bioinformatics, deep learning, data integration, graph neural networks, Data Preprocessing
The main goal of this project is to create a PyTorch Geometric(PyG) dataset by integrating the cBioPortal and Pathway Commons datasets and provide a clear example code for this integration. The project will require working with different datasets from the cBioPortal and Pathway Commons. The proposed solution involves a three-stage process: retrieving and preprocessing data from the datasets, integrating the data, and developing and training Graph Neural Network(GNN) models on the integrated dataset. The main deliverables for this project will be: 1. Preprocessed and integrated example dataset for PyTorch Geometric using cBioPortal and Pathway Commons datasets. 2. Example code for combining cBioPortal and Pathway Commons datasets in PyTorch Geometric. 3. Developed and optimized GNN models for downstream tasks using the integrated dataset. Additionally, the project will produce a well-documented and well-structured codebase for data retrieval, preprocessing, integration, and GNN model development. The final deliverable will be the contributed PyTorch Geometric dataset to the library for potential use by other researchers.