{"title":"ALICE TPC online tracker on GPUs for heavy-ion events","authors":"D. Rohr","doi":"10.1109/CNNA.2012.6331460","DOIUrl":null,"url":null,"abstract":"The online event reconstruction for the ALICE experiment at CERN requires processing capabilities to process central Pb-Pb collisions at a rate of more than 200 Hz, corresponding to an input data rate of about 25 GB/s. The reconstruction of particle trajectories in the Time Projection Chamber (TPC) is the most compute intensive step. The TPC online tracker implementation combines the principle of the cellular automaton and the Kalman filter. It has been accelerated by the usage of graphics cards (GPUs). A pipelined processing allows to perform the tracking on the GPU, the data transfer, and the preprocessing on the CPU in parallel. In order to use data locality, the tracking is split in multiple phases. At first, track segments are searched in local sectors of the detector, independently and in parallel. These segments are then merged at a global level. A shortcoming of this approach is that if a track contains only a very short segment in one particular sector, the local search possibly does not find this short part. The fast GPU processing allowed to add an additional step: all found tracks are extrapolated to neighboring sectors and the unassigned clusters which constitute the missing track segment are collected. For running QA, it is important that the output of the CPU and the GPU tracker is as consistent as possible. One major challenge was to implement the tracker such that the output is not affected by concurrency, while maintaining peak performance and efficiency. For instance, a naive implementation depended on the order of the tracks which is nondeterministic when they are created in parallel. Still, due to non-associative floating point arithmetic a direct binary comparison of the CPU and the GPU tracker output is impossible. Thus, the approach chosen for evaluating the GPU tracker efficiency is to compare the cluster to track assignment of the CPU and the GPU tracker cluster by cluster. With the above comparison scheme, the output of the CPU and the GPU tracker differ by 0.00024Compared to the offline tracker, the HLT tracker is orders of magnitudes faster while delivering good results. The GPU version outperforms its CPU analog by another factor of three. Recently, the ALICE HLT cluster was upgraded with new GPUs and is able to process central heavy ion events at a rate of approximately 200 Hz.","PeriodicalId":387536,"journal":{"name":"2012 13th International Workshop on Cellular Nanoscale Networks and their Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 13th International Workshop on Cellular Nanoscale Networks and their Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CNNA.2012.6331460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
The online event reconstruction for the ALICE experiment at CERN requires processing capabilities to process central Pb-Pb collisions at a rate of more than 200 Hz, corresponding to an input data rate of about 25 GB/s. The reconstruction of particle trajectories in the Time Projection Chamber (TPC) is the most compute intensive step. The TPC online tracker implementation combines the principle of the cellular automaton and the Kalman filter. It has been accelerated by the usage of graphics cards (GPUs). A pipelined processing allows to perform the tracking on the GPU, the data transfer, and the preprocessing on the CPU in parallel. In order to use data locality, the tracking is split in multiple phases. At first, track segments are searched in local sectors of the detector, independently and in parallel. These segments are then merged at a global level. A shortcoming of this approach is that if a track contains only a very short segment in one particular sector, the local search possibly does not find this short part. The fast GPU processing allowed to add an additional step: all found tracks are extrapolated to neighboring sectors and the unassigned clusters which constitute the missing track segment are collected. For running QA, it is important that the output of the CPU and the GPU tracker is as consistent as possible. One major challenge was to implement the tracker such that the output is not affected by concurrency, while maintaining peak performance and efficiency. For instance, a naive implementation depended on the order of the tracks which is nondeterministic when they are created in parallel. Still, due to non-associative floating point arithmetic a direct binary comparison of the CPU and the GPU tracker output is impossible. Thus, the approach chosen for evaluating the GPU tracker efficiency is to compare the cluster to track assignment of the CPU and the GPU tracker cluster by cluster. With the above comparison scheme, the output of the CPU and the GPU tracker differ by 0.00024Compared to the offline tracker, the HLT tracker is orders of magnitudes faster while delivering good results. The GPU version outperforms its CPU analog by another factor of three. Recently, the ALICE HLT cluster was upgraded with new GPUs and is able to process central heavy ion events at a rate of approximately 200 Hz.