{"title":"用于流量目标检测的高精度低延迟非最大抑制处理器","authors":"Chenbo Yuan, Peng Xu, Gang Chen","doi":"10.1587/elex.20.20230445","DOIUrl":null,"url":null,"abstract":"As autonomous driving technology advances, the requirements for object detection are becoming increasingly high. Non-maximum suppression (NMS) algorithm, as a key component in traffic object detection algorithms, is an independent post-processing process in the object detection framework. Due to the complexity of real-world road scenarios and high density of detected entities in urban traffic, the number of candidate bounding boxes generated by the neural network is large. Hence, low-precision processors may generate a significant number of redundant target bounding boxes. The excessive output of redundant target bounding boxes not only imposes a workload on subsequent processing but also has the potential to result in non-optimal decision-making. We propose a high-performance NMS processor that can quickly process a large number of candidate boxes without performing sorting of their scores. Also, it has low precision loss computing units and high parallel computing arrays. Combined with algorithm design, it effectively reduces the computational complexity and reduces the inference time of the end-to-end task of the NMS algorithm. Thus, our NMS processor’s speed is comparable to SOTA architecture, and the average accuracy loss is only 0.4% .","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-accuracy Low-latency Non-Maximum Suppression Processor for Traffic Object Detection\",\"authors\":\"Chenbo Yuan, Peng Xu, Gang Chen\",\"doi\":\"10.1587/elex.20.20230445\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As autonomous driving technology advances, the requirements for object detection are becoming increasingly high. Non-maximum suppression (NMS) algorithm, as a key component in traffic object detection algorithms, is an independent post-processing process in the object detection framework. Due to the complexity of real-world road scenarios and high density of detected entities in urban traffic, the number of candidate bounding boxes generated by the neural network is large. Hence, low-precision processors may generate a significant number of redundant target bounding boxes. The excessive output of redundant target bounding boxes not only imposes a workload on subsequent processing but also has the potential to result in non-optimal decision-making. We propose a high-performance NMS processor that can quickly process a large number of candidate boxes without performing sorting of their scores. Also, it has low precision loss computing units and high parallel computing arrays. Combined with algorithm design, it effectively reduces the computational complexity and reduces the inference time of the end-to-end task of the NMS algorithm. Thus, our NMS processor’s speed is comparable to SOTA architecture, and the average accuracy loss is only 0.4% .\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1587/elex.20.20230445\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1587/elex.20.20230445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High-accuracy Low-latency Non-Maximum Suppression Processor for Traffic Object Detection
As autonomous driving technology advances, the requirements for object detection are becoming increasingly high. Non-maximum suppression (NMS) algorithm, as a key component in traffic object detection algorithms, is an independent post-processing process in the object detection framework. Due to the complexity of real-world road scenarios and high density of detected entities in urban traffic, the number of candidate bounding boxes generated by the neural network is large. Hence, low-precision processors may generate a significant number of redundant target bounding boxes. The excessive output of redundant target bounding boxes not only imposes a workload on subsequent processing but also has the potential to result in non-optimal decision-making. We propose a high-performance NMS processor that can quickly process a large number of candidate boxes without performing sorting of their scores. Also, it has low precision loss computing units and high parallel computing arrays. Combined with algorithm design, it effectively reduces the computational complexity and reduces the inference time of the end-to-end task of the NMS algorithm. Thus, our NMS processor’s speed is comparable to SOTA architecture, and the average accuracy loss is only 0.4% .