{"title":"移动系统中 SLAM 的硬件加速","authors":"Zhe Fan, Yi-Fan Hao, Tian Zhi, Qi Guo, Zi-Dong Du","doi":"10.1007/s11390-021-1523-5","DOIUrl":null,"url":null,"abstract":"<p>The emerging mobile robot industry has spurred a flurry of interest in solving the simultaneous localization and mapping (SLAM) problem. However, existing SLAM platforms have difficulty in meeting the real-time and low-power requirements imposed by mobile systems. Though specialized hardware is promising with regard to achieving high performance and lowering the power, designing an efficient accelerator for SLAM is severely hindered by a wide variety of SLAM algorithms. Based on our detailed analysis of representative SLAM algorithms, we observe that SLAM algorithms advance two challenges for designing efficient hardware accelerators: the large number of computational primitives and irregular control flows. To address these two challenges, we propose a hardware accelerator that features composable computation units classified as the matrix, vector, scalar, and control units. In addition, we design a hierarchical instruction set for coping with a broad range of SLAM algorithms with irregular control flows. Experimental results show that, compared against an Intel x86 processor, on average, our accelerator with the area of 7.41 mm<sup>2</sup> achieves 10.52x and 112.62x better performance and energy savings, respectively, across different datasets. Compared against a more energy-efficient ARM Cortex processor, our accelerator still achieves 33.03x and 62.64x better performance and energy savings, respectively.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"7 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hardware Acceleration for SLAM in Mobile Systems\",\"authors\":\"Zhe Fan, Yi-Fan Hao, Tian Zhi, Qi Guo, Zi-Dong Du\",\"doi\":\"10.1007/s11390-021-1523-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The emerging mobile robot industry has spurred a flurry of interest in solving the simultaneous localization and mapping (SLAM) problem. However, existing SLAM platforms have difficulty in meeting the real-time and low-power requirements imposed by mobile systems. Though specialized hardware is promising with regard to achieving high performance and lowering the power, designing an efficient accelerator for SLAM is severely hindered by a wide variety of SLAM algorithms. Based on our detailed analysis of representative SLAM algorithms, we observe that SLAM algorithms advance two challenges for designing efficient hardware accelerators: the large number of computational primitives and irregular control flows. To address these two challenges, we propose a hardware accelerator that features composable computation units classified as the matrix, vector, scalar, and control units. In addition, we design a hierarchical instruction set for coping with a broad range of SLAM algorithms with irregular control flows. Experimental results show that, compared against an Intel x86 processor, on average, our accelerator with the area of 7.41 mm<sup>2</sup> achieves 10.52x and 112.62x better performance and energy savings, respectively, across different datasets. Compared against a more energy-efficient ARM Cortex processor, our accelerator still achieves 33.03x and 62.64x better performance and energy savings, respectively.</p>\",\"PeriodicalId\":50222,\"journal\":{\"name\":\"Journal of Computer Science and Technology\",\"volume\":\"7 1\",\"pages\":\"\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2023-11-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Science and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11390-021-1523-5\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11390-021-1523-5","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
摘要
新兴的移动机器人产业激发了人们对解决同步定位和绘图(SLAM)问题的浓厚兴趣。然而,现有的 SLAM 平台难以满足移动系统对实时性和低功耗的要求。虽然专用硬件在实现高性能和低功耗方面大有可为,但由于 SLAM 算法种类繁多,为 SLAM 设计高效加速器的工作受到严重阻碍。基于对代表性 SLAM 算法的详细分析,我们发现 SLAM 算法给高效硬件加速器的设计带来了两大挑战:大量的计算基元和不规则的控制流。为了应对这两个挑战,我们提出了一种硬件加速器,它具有可组合的计算单元,分为矩阵单元、矢量单元、标量单元和控制单元。此外,我们还设计了一个分层指令集,以应对各种具有不规则控制流的 SLAM 算法。实验结果表明,与英特尔 x86 处理器相比,我们的加速器(面积为 7.41 平方毫米)在不同数据集上的性能和节能效果分别平均提高了 10.52 倍和 112.62 倍。与能效更高的 ARM Cortex 处理器相比,我们的加速器在性能和节能方面仍分别提高了 33.03 倍和 62.64 倍。
The emerging mobile robot industry has spurred a flurry of interest in solving the simultaneous localization and mapping (SLAM) problem. However, existing SLAM platforms have difficulty in meeting the real-time and low-power requirements imposed by mobile systems. Though specialized hardware is promising with regard to achieving high performance and lowering the power, designing an efficient accelerator for SLAM is severely hindered by a wide variety of SLAM algorithms. Based on our detailed analysis of representative SLAM algorithms, we observe that SLAM algorithms advance two challenges for designing efficient hardware accelerators: the large number of computational primitives and irregular control flows. To address these two challenges, we propose a hardware accelerator that features composable computation units classified as the matrix, vector, scalar, and control units. In addition, we design a hierarchical instruction set for coping with a broad range of SLAM algorithms with irregular control flows. Experimental results show that, compared against an Intel x86 processor, on average, our accelerator with the area of 7.41 mm2 achieves 10.52x and 112.62x better performance and energy savings, respectively, across different datasets. Compared against a more energy-efficient ARM Cortex processor, our accelerator still achieves 33.03x and 62.64x better performance and energy savings, respectively.
期刊介绍:
Journal of Computer Science and Technology (JCST), the first English language journal in the computer field published in China, is an international forum for scientists and engineers involved in all aspects of computer science and technology to publish high quality and refereed papers. Papers reporting original research and innovative applications from all parts of the world are welcome. Papers for publication in the journal are selected through rigorous peer review, to ensure originality, timeliness, relevance, and readability. While the journal emphasizes the publication of previously unpublished materials, selected conference papers with exceptional merit that require wider exposure are, at the discretion of the editors, also published, provided they meet the journal''s peer review standards. The journal also seeks clearly written survey and review articles from experts in the field, to promote insightful understanding of the state-of-the-art and technology trends.
Topics covered by Journal of Computer Science and Technology include but are not limited to:
-Computer Architecture and Systems
-Artificial Intelligence and Pattern Recognition
-Computer Networks and Distributed Computing
-Computer Graphics and Multimedia
-Software Systems
-Data Management and Data Mining
-Theory and Algorithms
-Emerging Areas