Umbrella organization for Particle (High-Energy) Physics-related projects

CERN-HSF (High-Energy Physics Software Foundation) is the umbrella organization for high-energy physics-related projects in GSoC. The HEP Software Foundation ( facilitates the coordination of common international efforts in high-energy physics software and computing.

CERN (European Organization for Nuclear Research, has participated in GSoC since 2011 as the CERN-SFT group, which provides common software for CERN's experiments. In 2017, the program expanded to include many software projects from the whole field of high-energy physics. The vast majority of our GSoC projects do not require any physics knowledge.

The experiments at CERN, such as the Large Hadron Collider, the world’s largest and most powerful particle accelerator ( try to answer fundamental questions about the Universe. For example, what is the nature of mass? What are the elementary building blocks of the Universe? What was the early Universe like? What is the nature of dark matter and dark energy? Why is there an asymmetry between matter and antimatter? In 2012, LHC experiments announced the discovery of a new particle, the Higgs Boson, that helps explain how particles obtain mass. Also, CERN is the birthplace of the World Wide Web. Today, particle physicists are working on analyzing the data from the experiments to study the properties of the newly discovered particle and to search for new physics, such as dark matter or extra dimensions. This requires a lot of sophisticated software.

The open-source high-energy physics projects to which students can contribute during GSoC span many high-energy physics software projects: data analysis, detector and accelerator simulation, event reconstruction, data management and many others. We look forward to your contributions!

lightbulb_outline View ideas list


  • c/c++
  • python
  • data analysis
  • parallelization
  • machine learning


email Mailing list
mail_outline Contact email

CERN-HSF 2019 Projects

  • Abhishek Chauhan
    Alert Redistribution System for Fink : an Apache Spark based Broker for Astronomy
    A huge volume of data is generated every night by large astronomical telescopes around the world. A robust and scalable software infrastructure is...
  • Aman Singh Thakur
    Building a Python-based Analysis tool for AWAKE experiment
    Building a library that reads a large number of HDF files and builds a database. Add support for searching and loading multiple datasets,...
  • Charles Escott
    CERN Awkward Array Project
    At CERN, the data from LHC collisions requires complex data types and functions to be processed. As a solution, the awkward-array library makes...
  • Mohit Tyagi
    CERNBox: Bring Your Own Application
    CERNBox provides cloud data storage to all CERN users to store, share and synchronize their data across all devices. It is integrated with variety of...
  • Inzamam Iqbal
    Create a user interface for Ganga that allows for the execution of tasks inside user specified virtual machines.
    Ganga is used to execute a user defined computational task on a distributed back-end. Through this project we let the users define the environment in...
  • Sahil Jajodia
    Creation and usage of disposable Spark on Kubernetes cluster from SWAN notebook
    This project aims to develop a Jupyter notebook plugin which deploys Spark required services to a kubernetes cluster on OpenStack cloud at CERN....
  • Surya S Dwivedi
    Development of LSTM and GRU layers in TMVA
    This project is about development of Long Short Term Memory(LSTM) and Gated Recurrent Unit(GRU) layers in TMVA, both of which belong to a general...
  • Amarnath Karthi
    Distributed Computing Resources: aggregation, usage, monitoring
    DIRAC is an open source interware platform whose roles are submission of jobs, the management of the data produced, to the orchestration of the...
  • Emilio Cortina Labra
    Experiment independent display framework and data format
    Developing a new data format to represent event data, unifying the needs of the different experiments that will make use of it. Improvement on the...
  • Jack Qiu
    Generating Hessians and Jacobians via CLAD
    Clad is a C++ Clang compiler plugin that employs automatic differentiation to derive user-defined functions, performing source code transformations...
  • Ashish Kshirsagar
    Generative Adversarial Networks for Particle Physics Applications
    The project aims implementation of GANs in the Machine Learning toolkit, TMVA of the ROOT framework would be immensely useful because of the advent,...
  • Arpitha Raghunandan
    Implement a GlobalModuleIndex in ROOT and Cling
    ROOT has several features which interact with libraries and require implicit header inclusion. These headers are often immutable, and reparsing is...
  • Mohamed Moanis Ali
    Implement Event based Seeding and Multi-Threading
    Pursuing the goal of running Allpix-Squared simulation’s events -independent by nature- in parallel, have led to the identification of performance...
  • Sharad Chitlangia
    Implementation and Optimisation in ACTS of algorithms exposed in TrackML challenge
    Porting and Analysis of top solution algorithms from the TrackML challenge to ACTS framework. The algorithms include the combinatorial Mikado...
  • Brooks Karlik
    Kalman Filter in Rust
    The Kalman Filter is a method of iteratively predicting the future state of a system based on previous information. Not only is a Kalman Filter more...
  • Himanshu Sahu
    Molr - Operational
    In the view of LHC Run 3, we want to extend the functionalities of Molr so that it will be ready to use in production to control various operational...
  • Pujan Mehta
    Monitoring DIRAC Components
    DIRAC is a highly-scalable software used for accessing distributed resources from various distributed systems. DIRAC’s main contributor is LHCb and...
  • Alfonso
    Novel Applications of Zstandard (ZSTD) compression algorithm to ROOT
    This project aims to investigate application of ZSTD within the ROOT framework; benchmark it in comparison to the other algorithms; test it against...
  • Ishan Rai
    Optimisation of the Ganga toolkit in terms of memory consumption and persistent storage.
    GANGA (Gaudi/Athena and Grid Alliance) is an interface used by scientists to interface with huge amount of computing power and storage available to...
  • Akash Ravi
    Package manager for Jupyter Notebook / SWAN
    This proposal promises to develop a Jupyter notebook extension, that will allow the users to specify python modules (and their respective versions)...
  • Hasan Öztürk
    Proposal for Atlas Experiment - Hasan Öztürk
    Athena framework is being upgraded to run in multithreaded environment and the aim of this project is to create a new Atlas performance monitoring...
  • Shrey Aryan
    Proposal for the Implementation of an HDF5 IO Layer for PODIO
    PODIO is a C++ library that allows the creation of event data models and efficient I/O code for HEP experiments. It does so by avoiding deep-object...
  • Sneha Sinha
    Python Components for the SIMPLE Grid Framework
    The SIMPLE Grid project is an extension of the SIMPLE Framework that combines popular configuration management technologies such as Puppet/Ansible...
  • danieldo
    Real-time conditions data distribution for the Online data processing of the ALICE experiment
    ALICE (A Large Ion Collider Experiment) is a heavy-ion detector on the Large Hadron Collider (LHC) ring. It is designed to study the physics of...
  • Ruturaj Gujar
    Rucio - Exascale Data Management
    Rucio is a data management system that provides the functionality to organize, manage and access a large amount of scientific data (in the order of...
  • Pradeep Kumar S
    SIMT to SPMD Translation
    High Level Trigger 1(HLT1) is the first and critical stage in software reconstruction of collisions at the LHCb experiment in the Large Hadron...
  • Divya Rani
    Testing framework for Jupyter notebooks
    SWAN (Service for Web-based ANalysis) is a cloud data analysis service developed and powered by CERN that provides Jupyter notebooks on demand. It is...
  • Jonathan Guiang
    Tools for Understanding CMS Data Access
    Over the course of Run 2, from 2016 to 2018, the CMS detector produced an unparalleled amount of data, resulting in an intricate optimization problem...
  • A C++ adapter API for integrating vectorized components in a scalar workflow. Many FLOP-intensive algorithms may profit from the vector pipelines of...