Laura Falaschetti;Lorenzo Manoni;Claudio Turchetti
{"title":"一种用于视觉SLAM应用中实时语义分割的低秩CNN架构","authors":"Laura Falaschetti;Lorenzo Manoni;Claudio Turchetti","doi":"10.1109/OJCAS.2022.3174632","DOIUrl":null,"url":null,"abstract":"Real-time semantic segmentation on embedded devices has recently enjoyed significant gain in popularity, due to the increasing interest in smart vehicles and smart robots. In particular, with the emergence of autonomous driving, low latency and computation-intensive operations lead to new challenges for vehicles and robots, such as excessive computing power and energy consumption. The aim of this paper is to address semantic segmentation, one of the most critical tasks for the perception of the environment, and its implementation in a low power core, by preserving the required performance of accuracy and low complexity. To reach this goal a low-rank convolutional neural network (CNN) architecture for real-time semantic segmentation is proposed. The main contributions of this paper are: \n<italic>i)</i>\n a tensor decomposition technique has been applied to the kernel of a generic convolutional layer, \n<italic>ii)</i>\n three versions of an optimized architecture, that combines UNet and ResNet models, have been derived to explore the trade-off between model complexity and accuracy, \n<italic>iii)</i>\n the low-rank CNN architectures have been implemented in a Raspberry Pi 4 and NVIDIA Jetson Nano 2 GB embedded platforms, as severe benchmarks to meet the low-power, low-cost requirements, and in the high-cost GPU NVIDIA Tesla P100 PCIe 16 GB to meet the best performance.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2022-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9773325","citationCount":"3","resultStr":"{\"title\":\"A Low-Rank CNN Architecture for Real-Time Semantic Segmentation in Visual SLAM Applications\",\"authors\":\"Laura Falaschetti;Lorenzo Manoni;Claudio Turchetti\",\"doi\":\"10.1109/OJCAS.2022.3174632\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Real-time semantic segmentation on embedded devices has recently enjoyed significant gain in popularity, due to the increasing interest in smart vehicles and smart robots. In particular, with the emergence of autonomous driving, low latency and computation-intensive operations lead to new challenges for vehicles and robots, such as excessive computing power and energy consumption. The aim of this paper is to address semantic segmentation, one of the most critical tasks for the perception of the environment, and its implementation in a low power core, by preserving the required performance of accuracy and low complexity. To reach this goal a low-rank convolutional neural network (CNN) architecture for real-time semantic segmentation is proposed. The main contributions of this paper are: \\n<italic>i)</i>\\n a tensor decomposition technique has been applied to the kernel of a generic convolutional layer, \\n<italic>ii)</i>\\n three versions of an optimized architecture, that combines UNet and ResNet models, have been derived to explore the trade-off between model complexity and accuracy, \\n<italic>iii)</i>\\n the low-rank CNN architectures have been implemented in a Raspberry Pi 4 and NVIDIA Jetson Nano 2 GB embedded platforms, as severe benchmarks to meet the low-power, low-cost requirements, and in the high-cost GPU NVIDIA Tesla P100 PCIe 16 GB to meet the best performance.\",\"PeriodicalId\":93442,\"journal\":{\"name\":\"IEEE open journal of circuits and systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2022-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9773325\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of circuits and systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/9773325/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of circuits and systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/9773325/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 3
摘要
由于对智能车辆和智能机器人的兴趣日益浓厚,嵌入式设备上的实时语义分割最近得到了显著的普及。特别是,随着自动驾驶的出现,低延迟和计算密集型操作给车辆和机器人带来了新的挑战,例如过度的计算能力和能耗。本文的目的是解决语义分割,这是感知环境的最关键任务之一,并通过保持所需的准确性和低复杂性的性能,在低功耗核心中实现。为了实现这一目标,提出了一种用于实时语义分割的低秩卷积神经网络(CNN)架构。本文的主要贡献有:i)将张量分解技术应用于通用卷积层的内核;ii)导出了三个版本的优化架构,结合UNet和ResNet模型,以探索模型复杂性和准确性之间的权衡;iii)低秩CNN架构已在Raspberry Pi 4和NVIDIA Jetson Nano 2gb嵌入式平台上实现,作为满足低功耗,低成本要求的严格基准。而在高成本的GPU NVIDIA Tesla P100 PCIe 16gb满足最佳性能。
A Low-Rank CNN Architecture for Real-Time Semantic Segmentation in Visual SLAM Applications
Real-time semantic segmentation on embedded devices has recently enjoyed significant gain in popularity, due to the increasing interest in smart vehicles and smart robots. In particular, with the emergence of autonomous driving, low latency and computation-intensive operations lead to new challenges for vehicles and robots, such as excessive computing power and energy consumption. The aim of this paper is to address semantic segmentation, one of the most critical tasks for the perception of the environment, and its implementation in a low power core, by preserving the required performance of accuracy and low complexity. To reach this goal a low-rank convolutional neural network (CNN) architecture for real-time semantic segmentation is proposed. The main contributions of this paper are:
i)
a tensor decomposition technique has been applied to the kernel of a generic convolutional layer,
ii)
three versions of an optimized architecture, that combines UNet and ResNet models, have been derived to explore the trade-off between model complexity and accuracy,
iii)
the low-rank CNN architectures have been implemented in a Raspberry Pi 4 and NVIDIA Jetson Nano 2 GB embedded platforms, as severe benchmarks to meet the low-power, low-cost requirements, and in the high-cost GPU NVIDIA Tesla P100 PCIe 16 GB to meet the best performance.