{"title":"DBCNet:面向智能车辆RGB-T城市场景理解的动态双边交叉融合网络","authors":"Wujie Zhou;Tingting Gong;Jingsheng Lei;Lu Yu","doi":"10.1109/TSMC.2023.3298921","DOIUrl":null,"url":null,"abstract":"Understanding urban scenes is a fundamental capability required of intelligent vehicles. Depth cues provide useful geometric information for semantic segmentation, thus complementing RGB (color) data. Although single-modal RGB images are improved by depth information, semantic segmentation may be degraded in poor-visibility conditions. Thermal imaging can address some limitations of depth data. Therefore, we leverage the multimodal information in RGB-and-thermal (RGB-T) images by introducing a dynamic bilateral cross-fusion network (DBCNet) for RGB-T urban scene understanding. First, RGB-T features extracted by a given backbone are regrouped as high- or low-level features. Second, multimodal high-level features are sent to a dynamic bilateral cross-fusion module for further refinement. Third, a bounded high-level semantic-feature integration module is added to provide feature guidance, and a multitask supervision mechanism is used for fine-tuning. Extensive experiments on two RGB-T urban scene-understanding datasets indicate that DBCNet aggregates multilevel deep features effectively and outperforms state-of-the-art deep-learning scene-understanding methods.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":null,"pages":null},"PeriodicalIF":8.6000,"publicationDate":"2023-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"DBCNet: Dynamic Bilateral Cross-Fusion Network for RGB-T Urban Scene Understanding in Intelligent Vehicles\",\"authors\":\"Wujie Zhou;Tingting Gong;Jingsheng Lei;Lu Yu\",\"doi\":\"10.1109/TSMC.2023.3298921\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding urban scenes is a fundamental capability required of intelligent vehicles. Depth cues provide useful geometric information for semantic segmentation, thus complementing RGB (color) data. Although single-modal RGB images are improved by depth information, semantic segmentation may be degraded in poor-visibility conditions. Thermal imaging can address some limitations of depth data. Therefore, we leverage the multimodal information in RGB-and-thermal (RGB-T) images by introducing a dynamic bilateral cross-fusion network (DBCNet) for RGB-T urban scene understanding. First, RGB-T features extracted by a given backbone are regrouped as high- or low-level features. Second, multimodal high-level features are sent to a dynamic bilateral cross-fusion module for further refinement. Third, a bounded high-level semantic-feature integration module is added to provide feature guidance, and a multitask supervision mechanism is used for fine-tuning. Extensive experiments on two RGB-T urban scene-understanding datasets indicate that DBCNet aggregates multilevel deep features effectively and outperforms state-of-the-art deep-learning scene-understanding methods.\",\"PeriodicalId\":48915,\"journal\":{\"name\":\"IEEE Transactions on Systems Man Cybernetics-Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":8.6000,\"publicationDate\":\"2023-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Systems Man Cybernetics-Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10217340/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man Cybernetics-Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10217340/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
DBCNet: Dynamic Bilateral Cross-Fusion Network for RGB-T Urban Scene Understanding in Intelligent Vehicles
Understanding urban scenes is a fundamental capability required of intelligent vehicles. Depth cues provide useful geometric information for semantic segmentation, thus complementing RGB (color) data. Although single-modal RGB images are improved by depth information, semantic segmentation may be degraded in poor-visibility conditions. Thermal imaging can address some limitations of depth data. Therefore, we leverage the multimodal information in RGB-and-thermal (RGB-T) images by introducing a dynamic bilateral cross-fusion network (DBCNet) for RGB-T urban scene understanding. First, RGB-T features extracted by a given backbone are regrouped as high- or low-level features. Second, multimodal high-level features are sent to a dynamic bilateral cross-fusion module for further refinement. Third, a bounded high-level semantic-feature integration module is added to provide feature guidance, and a multitask supervision mechanism is used for fine-tuning. Extensive experiments on two RGB-T urban scene-understanding datasets indicate that DBCNet aggregates multilevel deep features effectively and outperforms state-of-the-art deep-learning scene-understanding methods.
期刊介绍:
The IEEE Transactions on Systems, Man, and Cybernetics: Systems encompasses the fields of systems engineering, covering issue formulation, analysis, and modeling throughout the systems engineering lifecycle phases. It addresses decision-making, issue interpretation, systems management, processes, and various methods such as optimization, modeling, and simulation in the development and deployment of large systems.