ALICE TPC在线跟踪器在gpu上用于重离子事件

2012 13th International Workshop on Cellular Nanoscale Networks and their Applications Pub Date : 2012-10-18 DOI:10.1109/CNNA.2012.6331460

D. Rohr

{"title":"ALICE TPC在线跟踪器在gpu上用于重离子事件","authors":"D. Rohr","doi":"10.1109/CNNA.2012.6331460","DOIUrl":null,"url":null,"abstract":"The online event reconstruction for the ALICE experiment at CERN requires processing capabilities to process central Pb-Pb collisions at a rate of more than 200 Hz, corresponding to an input data rate of about 25 GB/s. The reconstruction of particle trajectories in the Time Projection Chamber (TPC) is the most compute intensive step. The TPC online tracker implementation combines the principle of the cellular automaton and the Kalman filter. It has been accelerated by the usage of graphics cards (GPUs). A pipelined processing allows to perform the tracking on the GPU, the data transfer, and the preprocessing on the CPU in parallel. In order to use data locality, the tracking is split in multiple phases. At first, track segments are searched in local sectors of the detector, independently and in parallel. These segments are then merged at a global level. A shortcoming of this approach is that if a track contains only a very short segment in one particular sector, the local search possibly does not find this short part. The fast GPU processing allowed to add an additional step: all found tracks are extrapolated to neighboring sectors and the unassigned clusters which constitute the missing track segment are collected. For running QA, it is important that the output of the CPU and the GPU tracker is as consistent as possible. One major challenge was to implement the tracker such that the output is not affected by concurrency, while maintaining peak performance and efficiency. For instance, a naive implementation depended on the order of the tracks which is nondeterministic when they are created in parallel. Still, due to non-associative floating point arithmetic a direct binary comparison of the CPU and the GPU tracker output is impossible. Thus, the approach chosen for evaluating the GPU tracker efficiency is to compare the cluster to track assignment of the CPU and the GPU tracker cluster by cluster. With the above comparison scheme, the output of the CPU and the GPU tracker differ by 0.00024Compared to the offline tracker, the HLT tracker is orders of magnitudes faster while delivering good results. The GPU version outperforms its CPU analog by another factor of three. Recently, the ALICE HLT cluster was upgraded with new GPUs and is able to process central heavy ion events at a rate of approximately 200 Hz.","PeriodicalId":387536,"journal":{"name":"2012 13th International Workshop on Cellular Nanoscale Networks and their Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"ALICE TPC online tracker on GPUs for heavy-ion events\",\"authors\":\"D. Rohr\",\"doi\":\"10.1109/CNNA.2012.6331460\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The online event reconstruction for the ALICE experiment at CERN requires processing capabilities to process central Pb-Pb collisions at a rate of more than 200 Hz, corresponding to an input data rate of about 25 GB/s. The reconstruction of particle trajectories in the Time Projection Chamber (TPC) is the most compute intensive step. The TPC online tracker implementation combines the principle of the cellular automaton and the Kalman filter. It has been accelerated by the usage of graphics cards (GPUs). A pipelined processing allows to perform the tracking on the GPU, the data transfer, and the preprocessing on the CPU in parallel. In order to use data locality, the tracking is split in multiple phases. At first, track segments are searched in local sectors of the detector, independently and in parallel. These segments are then merged at a global level. A shortcoming of this approach is that if a track contains only a very short segment in one particular sector, the local search possibly does not find this short part. The fast GPU processing allowed to add an additional step: all found tracks are extrapolated to neighboring sectors and the unassigned clusters which constitute the missing track segment are collected. For running QA, it is important that the output of the CPU and the GPU tracker is as consistent as possible. One major challenge was to implement the tracker such that the output is not affected by concurrency, while maintaining peak performance and efficiency. For instance, a naive implementation depended on the order of the tracks which is nondeterministic when they are created in parallel. Still, due to non-associative floating point arithmetic a direct binary comparison of the CPU and the GPU tracker output is impossible. Thus, the approach chosen for evaluating the GPU tracker efficiency is to compare the cluster to track assignment of the CPU and the GPU tracker cluster by cluster. With the above comparison scheme, the output of the CPU and the GPU tracker differ by 0.00024Compared to the offline tracker, the HLT tracker is orders of magnitudes faster while delivering good results. The GPU version outperforms its CPU analog by another factor of three. Recently, the ALICE HLT cluster was upgraded with new GPUs and is able to process central heavy ion events at a rate of approximately 200 Hz.\",\"PeriodicalId\":387536,\"journal\":{\"name\":\"2012 13th International Workshop on Cellular Nanoscale Networks and their Applications\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 13th International Workshop on Cellular Nanoscale Networks and their Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CNNA.2012.6331460\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 13th International Workshop on Cellular Nanoscale Networks and their Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CNNA.2012.6331460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

CERN ALICE实验的在线事件重建需要处理能力，以超过200 Hz的速率处理中心Pb-Pb碰撞，对应于大约25 GB/s的输入数据速率。时间投影室(TPC)中粒子轨迹的重建是计算量最大的步骤。TPC在线跟踪器的实现结合了元胞自动机和卡尔曼滤波的原理。图形卡(gpu)的使用加速了这一进程。流水线处理允许在GPU上并行执行跟踪、数据传输和CPU上的预处理。为了使用数据局部性，跟踪被分成多个阶段。首先，在检测器的局部扇区中独立并行地搜索航迹段。然后在全局级别合并这些段。这种方法的一个缺点是，如果一个轨道只包含一个非常短的部分在一个特定的扇区，本地搜索可能找不到这个短的部分。快速的GPU处理允许添加一个额外的步骤:所有发现的轨道被外推到邻近的扇区，并收集构成缺失轨道段的未分配集群。对于运行QA，重要的是CPU和GPU跟踪器的输出尽可能一致。一个主要的挑战是实现跟踪器，使输出不受并发性的影响，同时保持峰值性能和效率。例如，一个简单的实现依赖于轨道的顺序，这在并行创建轨道时是不确定的。尽管如此，由于非关联浮点运算，CPU和GPU跟踪器输出的直接二进制比较是不可能的。因此，选择评估GPU跟踪器效率的方法是逐个集群地比较集群与CPU和GPU跟踪器的跟踪分配。通过上述比较方案，CPU和GPU跟踪器的输出相差0.00024，与离线跟踪器相比，HLT跟踪器在提供良好结果的同时速度快了几个数量级。GPU版本的性能比其CPU模拟版本高出三倍。最近，ALICE HLT集群升级了新的gpu，能够以大约200 Hz的速率处理中心重离子事件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ALICE TPC online tracker on GPUs for heavy-ion events

The online event reconstruction for the ALICE experiment at CERN requires processing capabilities to process central Pb-Pb collisions at a rate of more than 200 Hz, corresponding to an input data rate of about 25 GB/s. The reconstruction of particle trajectories in the Time Projection Chamber (TPC) is the most compute intensive step. The TPC online tracker implementation combines the principle of the cellular automaton and the Kalman filter. It has been accelerated by the usage of graphics cards (GPUs). A pipelined processing allows to perform the tracking on the GPU, the data transfer, and the preprocessing on the CPU in parallel. In order to use data locality, the tracking is split in multiple phases. At first, track segments are searched in local sectors of the detector, independently and in parallel. These segments are then merged at a global level. A shortcoming of this approach is that if a track contains only a very short segment in one particular sector, the local search possibly does not find this short part. The fast GPU processing allowed to add an additional step: all found tracks are extrapolated to neighboring sectors and the unassigned clusters which constitute the missing track segment are collected. For running QA, it is important that the output of the CPU and the GPU tracker is as consistent as possible. One major challenge was to implement the tracker such that the output is not affected by concurrency, while maintaining peak performance and efficiency. For instance, a naive implementation depended on the order of the tracks which is nondeterministic when they are created in parallel. Still, due to non-associative floating point arithmetic a direct binary comparison of the CPU and the GPU tracker output is impossible. Thus, the approach chosen for evaluating the GPU tracker efficiency is to compare the cluster to track assignment of the CPU and the GPU tracker cluster by cluster. With the above comparison scheme, the output of the CPU and the GPU tracker differ by 0.00024Compared to the offline tracker, the HLT tracker is orders of magnitudes faster while delivering good results. The GPU version outperforms its CPU analog by another factor of three. Recently, the ALICE HLT cluster was upgraded with new GPUs and is able to process central heavy ion events at a rate of approximately 200 Hz.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 13th International Workshop on Cellular Nanoscale Networks and their Applications

自引率

0.00%

发文量

期刊最新文献

Synchronization in cellular spin torque oscillator arrays CNN based dark signal non-uniformity estimation Advanced background elimination in digital holographic microscopy Boolean and non-boolean nearest neighbor architectures for out-of-plane nanomagnet logic 2nd order 2-D spatial filters and Cellular Neural Network implementations