{"title":"Modeling Performance of Data Collection Systems for High-Energy Physics","authors":"Wilkie Olin-Ammentorp, Xingfu Wu, Andrew A. Chien","doi":"arxiv-2407.00123","DOIUrl":null,"url":null,"abstract":"Exponential increases in scientific experimental data are outstripping the\nrate of progress in silicon technology. As a result, heterogeneous combinations\nof architectures and process or device technologies are increasingly important\nto meet the computing demands of future scientific experiments. However, the\ncomplexity of heterogeneous computing systems requires systematic modeling to\nunderstand performance. We present a model which addresses this need by framing key aspects of data\ncollection pipelines and constraints, and combines them with the important\nvectors of technology that shape alternatives, computing metrics that allow\ncomplex alternatives to be compared. For instance, a data collection pipeline\nmay be characterized by parameters such as sensor sampling rates, amount of\ndata collected, and the overall relevancy of retrieved samples. Alternatives to\nthis pipeline are enabled by hardware development vectors including advancing\nCMOS, GPUs, neuromorphic computing, and edge computing. By calculating metrics\nfor each alternative such as overall F1 score, power, hardware cost, and energy\nexpended per relevant sample, this model allows alternate data collection\nsystems to be rigorously compared. To demonstrate this model's capability, we apply it to the CMS experiment\n(and planned HL-LHC upgrade) to evaluate and compare the application of novel\ntechnologies in the data acquisition system (DAQ). We demonstrate that\nimprovements to early stages in the DAQ are highly beneficial, greatly reducing\nthe resources required at later stages of processing (such as a 60% power\nreduction) and increasing the amount of relevant data retrieved from the\nexperiment per unit power (improving from 0.065 to 0.31 samples/kJ) However, we\npredict further advances will be required in order to meet overall power and\ncost constraints for the DAQ.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Data Analysis, Statistics and Probability","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.00123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Exponential increases in scientific experimental data are outstripping the
rate of progress in silicon technology. As a result, heterogeneous combinations
of architectures and process or device technologies are increasingly important
to meet the computing demands of future scientific experiments. However, the
complexity of heterogeneous computing systems requires systematic modeling to
understand performance. We present a model which addresses this need by framing key aspects of data
collection pipelines and constraints, and combines them with the important
vectors of technology that shape alternatives, computing metrics that allow
complex alternatives to be compared. For instance, a data collection pipeline
may be characterized by parameters such as sensor sampling rates, amount of
data collected, and the overall relevancy of retrieved samples. Alternatives to
this pipeline are enabled by hardware development vectors including advancing
CMOS, GPUs, neuromorphic computing, and edge computing. By calculating metrics
for each alternative such as overall F1 score, power, hardware cost, and energy
expended per relevant sample, this model allows alternate data collection
systems to be rigorously compared. To demonstrate this model's capability, we apply it to the CMS experiment
(and planned HL-LHC upgrade) to evaluate and compare the application of novel
technologies in the data acquisition system (DAQ). We demonstrate that
improvements to early stages in the DAQ are highly beneficial, greatly reducing
the resources required at later stages of processing (such as a 60% power
reduction) and increasing the amount of relevant data retrieved from the
experiment per unit power (improving from 0.065 to 0.31 samples/kJ) However, we
predict further advances will be required in order to meet overall power and
cost constraints for the DAQ.