Content delivery network solutions for the CMS experiment: The evolution towards HL-LHC

IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Journal of Parallel and Distributed Computing Pub Date : 2024-11-22 DOI:10.1016/j.jpdc.2024.105014
Carlos Perez Dengra , Josep Flix , Anna Sikora , the CMS Collaboration
{"title":"Content delivery network solutions for the CMS experiment: The evolution towards HL-LHC","authors":"Carlos Perez Dengra ,&nbsp;Josep Flix ,&nbsp;Anna Sikora ,&nbsp;the CMS Collaboration","doi":"10.1016/j.jpdc.2024.105014","DOIUrl":null,"url":null,"abstract":"<div><div>The Large Hadron Collider at CERN in Geneva is poised for a transformative upgrade, preparing to enhance both its accelerator and particle detectors. This strategic initiative is driven by the tenfold increase in proton-proton collisions anticipated for the forthcoming high-luminosity phase scheduled to start by 2029. The vital role played by the underlying computational infrastructure, the World-Wide LHC Computing Grid, in processing the data generated during these collisions underlines the need for its expansion and adaptation to meet the demands of the new accelerator phase. The provision of these computational resources by the worldwide community remains essential, all within a constant budgetary framework. While technological advancements offer some relief for the expected increase, numerous research and development projects are underway. Their aim is to bring future resources to manageable levels and provide cost-effective solutions to effectively handle the expanding volume of generated data. In the quest for optimized data access and resource utilization, the LHC community is actively investigating Content Delivery Network (CDN) techniques. These techniques serve as a mechanism for the cost-effective deployment of lightweight storage systems that support both, traditional and opportunistic compute resources. Furthermore, they aim to enhance the performance of executing tasks by facilitating the efficient reading of input data via caching content near the end user. A comprehensive study is presented to assess the benefits of implementing data cache solutions for the Compact Muon Solenoid (CMS) experiment. This in-depth examination serves as a use-case study specifically conducted for the Spanish compute facilities, playing a crucial role in supporting CMS activities. Data access patterns and popularity studies suggest that user analysis tasks benefit the most from CDN techniques. Consequently, a data cache has been introduced in the region to acquire a deeper understanding of these effects. In this paper, the details of the implementation of a data cache system in the PIC Tier-1 compute facility are presented. It includes insights into the developed monitoring tools and discusses the positive impact on CPU usage for analysis tasks executed in the region. The study is augmented by simulations of data caches, with the objective of discerning the most optimal requirements in both size and network connectivity for a data cache serving the Spanish region. Additionally, the study delves into the cost benefits associated with deploying such a solution in a production environment. Furthermore, it investigates the potential impact of incorporating this solution into other regions of the CMS computing infrastructure.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"197 ","pages":"Article 105014"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731524001783","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

The Large Hadron Collider at CERN in Geneva is poised for a transformative upgrade, preparing to enhance both its accelerator and particle detectors. This strategic initiative is driven by the tenfold increase in proton-proton collisions anticipated for the forthcoming high-luminosity phase scheduled to start by 2029. The vital role played by the underlying computational infrastructure, the World-Wide LHC Computing Grid, in processing the data generated during these collisions underlines the need for its expansion and adaptation to meet the demands of the new accelerator phase. The provision of these computational resources by the worldwide community remains essential, all within a constant budgetary framework. While technological advancements offer some relief for the expected increase, numerous research and development projects are underway. Their aim is to bring future resources to manageable levels and provide cost-effective solutions to effectively handle the expanding volume of generated data. In the quest for optimized data access and resource utilization, the LHC community is actively investigating Content Delivery Network (CDN) techniques. These techniques serve as a mechanism for the cost-effective deployment of lightweight storage systems that support both, traditional and opportunistic compute resources. Furthermore, they aim to enhance the performance of executing tasks by facilitating the efficient reading of input data via caching content near the end user. A comprehensive study is presented to assess the benefits of implementing data cache solutions for the Compact Muon Solenoid (CMS) experiment. This in-depth examination serves as a use-case study specifically conducted for the Spanish compute facilities, playing a crucial role in supporting CMS activities. Data access patterns and popularity studies suggest that user analysis tasks benefit the most from CDN techniques. Consequently, a data cache has been introduced in the region to acquire a deeper understanding of these effects. In this paper, the details of the implementation of a data cache system in the PIC Tier-1 compute facility are presented. It includes insights into the developed monitoring tools and discusses the positive impact on CPU usage for analysis tasks executed in the region. The study is augmented by simulations of data caches, with the objective of discerning the most optimal requirements in both size and network connectivity for a data cache serving the Spanish region. Additionally, the study delves into the cost benefits associated with deploying such a solution in a production environment. Furthermore, it investigates the potential impact of incorporating this solution into other regions of the CMS computing infrastructure.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CMS 实验的内容交付网络解决方案:向 HL-LHC 演进
位于日内瓦欧洲核子研究中心(CERN)的大型强子对撞机正准备进行一次变革性的升级,准备同时增强其加速器和粒子探测器。这一战略举措的驱动力来自质子-质子对撞的十倍增长,而即将到来的高亮度阶段计划于2029年启动。底层计算基础设施--全球大型强子对撞机计算网格--在处理这些对撞过程中产生的数据方面发挥着至关重要的作用,因此需要对其进行扩展和调整,以满足新加速器阶段的需求。在预算不变的情况下,全球社会提供这些计算资源仍然至关重要。尽管技术进步在一定程度上缓解了预期增长的压力,但许多研发项目仍在进行之中。这些项目的目标是将未来的资源提高到可管理的水平,并提供具有成本效益的解决方案,以有效处理不断扩大的生成数据量。为了优化数据访问和资源利用,大型强子对撞机社区正在积极研究内容传输网络(CDN)技术。这些技术作为一种机制,可以经济高效地部署轻量级存储系统,同时支持传统计算资源和机会计算资源。此外,这些技术还旨在通过缓存终端用户附近的内容,促进高效读取输入数据,从而提高执行任务的性能。本文介绍了一项综合研究,以评估为紧凑渺子螺线管(CMS)实验实施数据缓存解决方案的好处。这项深入研究是专门针对西班牙计算设施进行的用例研究,在支持 CMS 活动中发挥着至关重要的作用。数据访问模式和受欢迎程度研究表明,用户分析任务最受益于 CDN 技术。因此,该地区引入了数据缓存,以便更深入地了解这些效果。本文介绍了在 PIC 一级计算设施中实施数据缓存系统的细节。其中包括对所开发监控工具的深入分析,并讨论了在该区域执行的分析任务对 CPU 使用率的积极影响。本研究通过对数据缓存的模拟进行了补充,目的是确定为西班牙地区服务的数据缓存在规模和网络连接方面的最佳要求。此外,研究还深入探讨了在生产环境中部署此类解决方案的成本效益。此外,研究还探讨了将该解决方案纳入 CMS 计算基础设施其他区域的潜在影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing 工程技术-计算机:理论方法
CiteScore
10.30
自引率
2.60%
发文量
172
审稿时长
12 months
期刊介绍: This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing. The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.
期刊最新文献
Editorial Board Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) Content delivery network solutions for the CMS experiment: The evolution towards HL-LHC A large-scale study of the impact of node behavior on loosely coupled data dissemination: The case of the distributed Arctic observatory GPU tabu search: A study on using GPU to solve massive instances of the maximum diversity problem
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1