Over the course of Run 2, from 2016 to 2018, the CMS detector produced an unparalleled amount of data, resulting in an intricate optimization problem in data access and storage infrastructure as well as distributed computing that is one of the fundamental challenges of running an experiment like the LHC. Dedicated physicists and engineers have constructed a system that has served the collaboration well, but the approaching HL-LHC upgrade, which will produce about an exabyte of data per year, demands a more economic solution. Fortunately, the HEP Software Foundation (HSF) has been collecting data describing global and local access patterns that can be used to model the response of alternative, novel infrastructures that may better serve High Energy Physics for decades. Furthermore, with the advent and ever-growing popularity of "Big Data" in industry, the optimization philosophy of the CMS data infrastructure, as well as the predictive power of the project itself, will have relevance far beyond experimental physics.



Jonathan Guiang


  • Igor Sfiligoi