Buffer Provisioning for Large-Scale Data-Acquisition Systems

Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems Pub Date : 2018-06-25 DOI:10.1145/3210284.3210288

Alejandro Santos, W. Vandelli, P. García, H. Fröning

{"title":"Buffer Provisioning for Large-Scale Data-Acquisition Systems","authors":"Alejandro Santos, W. Vandelli, P. García, H. Fröning","doi":"10.1145/3210284.3210288","DOIUrl":null,"url":null,"abstract":"The data acquisition system of the ATLAS experiment, a major experiment of the Large Hadron Collider (LHC) at CERN, will go through a major upgrade in the next decade. The upgrade is driven by experimental physics requirements, calling for increased data rates on the order of 6 TB/s. By contrast, the data rate of the existing system is 160 GB/s. Among the changes in the upgraded system will be a very large buffer with a projected size on the order of 70 PB. The buffer role will be decoupling of data production from on-line data processing, storing data for periods of up to 24 hours until it can be analyzed by the event processing system. The larger buffer will allow a new data recording strategy, providing additional margins to handle variable data rates. At the same time it will provide sensible trade-offs between buffering space and on-line processing capabilities. This compromise between two resources will be possible since the data production cycle includes time periods where the experiment will not produce data. In this paper we analyze the consequences of such trade-offs, and introduce a tool that allows a detailed exploration of different strategies for resource provisioning. It is based on a model of the upgraded data acquisition system, implemented in a simulation framework. From this model it is possible to obtain insight into the dynamics of the running system. Given predefined resource constraints, we provide bounds for the provisioning of buffering space and on-line processing requirements.","PeriodicalId":412438,"journal":{"name":"Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3210284.3210288","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The data acquisition system of the ATLAS experiment, a major experiment of the Large Hadron Collider (LHC) at CERN, will go through a major upgrade in the next decade. The upgrade is driven by experimental physics requirements, calling for increased data rates on the order of 6 TB/s. By contrast, the data rate of the existing system is 160 GB/s. Among the changes in the upgraded system will be a very large buffer with a projected size on the order of 70 PB. The buffer role will be decoupling of data production from on-line data processing, storing data for periods of up to 24 hours until it can be analyzed by the event processing system. The larger buffer will allow a new data recording strategy, providing additional margins to handle variable data rates. At the same time it will provide sensible trade-offs between buffering space and on-line processing capabilities. This compromise between two resources will be possible since the data production cycle includes time periods where the experiment will not produce data. In this paper we analyze the consequences of such trade-offs, and introduce a tool that allows a detailed exploration of different strategies for resource provisioning. It is based on a model of the upgraded data acquisition system, implemented in a simulation framework. From this model it is possible to obtain insight into the dynamics of the running system. Given predefined resource constraints, we provide bounds for the provisioning of buffering space and on-line processing requirements.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大规模数据采集系统的缓冲区配置

欧洲核子研究中心(CERN)大型强子对撞机(LHC)的主要实验——ATLAS实验的数据采集系统将在未来10年进行重大升级。这次升级是由实验物理需求驱动的，要求将数据速率提高到6tb /s。相比之下，现有系统的数据速率为160 GB/s。升级后的系统的变化之一将是一个非常大的缓冲区，预计大小约为70 PB。缓冲区的作用是将数据生成与在线数据处理分离，将数据存储长达24小时，直到事件处理系统可以对其进行分析。更大的缓冲区将允许新的数据记录策略，提供额外的空间来处理可变的数据速率。同时，它将在缓冲空间和在线处理能力之间提供合理的权衡。这两种资源之间的折衷是可能的，因为数据产生周期包括实验不会产生数据的时间段。在本文中，我们分析了这种权衡的后果，并介绍了一个工具，该工具允许详细探索资源供应的不同策略。它基于升级后的数据采集系统模型，在仿真框架中实现。从这个模型中，我们可以深入了解运行系统的动力学。给定预定义的资源约束，我们为缓冲空间的供应和在线处理需求提供了界限。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems

自引率

0.00%

发文量

期刊最新文献

Vessel Trajectory Prediction using Sequence-to-Sequence Models over Spatial Grid MtDetector Predicting Destinations by Nearest Neighbor Search on Training Vessel Routes Venilia, On-line Learning and Prediction of Vessel Destination Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems