Creation and usage of disposable Spark on Kubernetes cluster from SWAN notebook
- Mentors
- Diogo Castro, Prasanth Kothuri, Piotr Mrówczyński, Enric Tejedor
- Organization
- CERN-HSF
This project aims to develop a Jupyter notebook plugin which deploys Spark required services to a kubernetes cluster on OpenStack cloud at CERN.
Kubernetes provides scaling when the traffic or computation increases by launching a Spark driver pod in the cluster which in turn creates multiple Spark executer pods which executes the application code.
The services that will be attached to the Kubernetes cluster are CERN CVMFS, Spark shuffle service, and Spark history server. These services are needed for running Spark on Kubernetes. Physicists can then use Spark running in the background to perform scalable interactive data analysis and visualization.
Also, a proper UI will be provided inside the Jupyter notebook so that a user can attach various services to the cluster. This plugin then will be integrated with SWAN notebook service which CERN provides.