Contributor
Reeba Qureshi

Building Apache Beam Notebooks for Real-World ML Use Cases


Mentors
Danny McCormick, Pablo E
Organization
Apache Software Foundation
Technologies
python, tensorflow, Google Cloud Platform, keras, Scikit Learn, Apache Beam
Topics
machine learning, cloud
In this project, I want to create Jupyter notebooks for a real-world machine learning use case, especially image processing using publicly accessible datasets. The goal is to create a reference guide that others can use to build ML pipelines for image processing or computer vision problems. I already have experience working with image processing to detect breast cancer cells in histopathological images and want to leverage that to build a similar pipeline using Apache Beam for image processing use cases. This can also be expanded to other computer vision problems like object detection, facial recognition, optical character recognition, and hand gesture recognition for disabled people. Following are some ideas on notebooks that I can build and contribute to : 1. Image Data Preprocessing: Apache Beam can be used to create a pipeline for different image pre-processing tasks like resizing, cropping, normalizing and filtering for different image file formats like TIFF, PNG and converting it into a more standardized format like JSON. 2. Model Inference: We can showcase how trained models can be used to make real time inferences in a Beam pipeline. 3. Model Evaluation: We can evaluate the model using Apache Beam, utilizing various metrics such as accuracy, precision, recall, and F1 score. 4. Stretch Goal: If time permits, we can do a similar process for video datasets and show how to preprocess videos using Beam. Deliverables Apache Beam notebook with working code to deploy a pipeline for image processing/video processing use case and supporting documentation.