Build out Beam Yaml features
- Mentors
- Danny McCormick
- Organization
- Apache Software Foundation
- Technologies
- python, machine learning, YAML, Apache Beam, Beam YAML API, Spanner
- Topics
- machine learning, data processing, YAML, Apache Beam, Spanner
The aim of this project is to enhance the Beam Yaml API by introducing ML and IO transforms to provide the users with more functionalities. The suggested set of transforms to be implemented are:
1. RunInference
2. ReadFromSpanner
3. WriteToSpanner
4. Enrichment Transforms
I also want to add three use cases for the Yaml API, providing end-to-end pipelines demonstrating the use of the newly implemented transforms. The suggested use cases are:
1. Text processing with MLTransform and RunInference
2. Processing tabular data from Spanner
3. Enriching tabular customer data with Enrichment Transform.
By expanding the capabilities of the Yaml API, the goal is to streamline the process of constructing and managing data pipelines using Apache Beam. This improvement will enable users to tackle a wider array of data processing tasks with greater ease and efficiency, potentially attracting a broader audience to leverage Apache Beam for their data processing needs.