Contributor
Muhammad Aditya Hilmy

Integration of Rucio in JupyterLab for SWAN


Mentors
Riccardo Di Maria, Mario Lassnig, Enric Tejedor Saavedra, Diogo Castro, Aris Fkiaras, Enrico Bocchi, Martin Barisits
Organization
CERN-HSF

CERN and the high energy physics community stores their data in various storage, with varying locations and medium (from object storage to magnetic tapes). To move their large data around, CERN uses a service known as Rucio. In order for scientists to be able to perform analyses on Rucio-managed data using SWAN, it must be made accessible from within the notebook. This could be done in several ways, one of which is telling Rucio to replicate the requested data to a specific storage location. To do this, scientists must use Rucio CLI or web-based UI, which could be somewhat overwhelming for some. In addition to that, they must know which storage location the data must be replicated to, and the file path to access the data once replicated. Also, the path can change after sometime, forcing them to go through the process again.

To address the issue, this project aims to develop a JupyterLab extension that integrates with Rucio. It takes care of creating a replication rule and providing an easy way of getting the file path. If the data becomes unavailable, the extension can make it available again automatically.