MIPANet:通过多模式交互和集中注意力优化 RGB-D 语义分割

IF 1.9 3区 物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY Frontiers in Physics Pub Date : 2024-05-31 DOI:10.3389/fphy.2024.1411559
Shuai Zhang, Minghong Xie
{"title":"MIPANet:通过多模式交互和集中注意力优化 RGB-D 语义分割","authors":"Shuai Zhang, Minghong Xie","doi":"10.3389/fphy.2024.1411559","DOIUrl":null,"url":null,"abstract":"The semantic segmentation of RGB-D images involves understanding objects appearances and spatial relationships within a scene, which necessitates careful consideration of multiple factors. In indoor scenes, the presence of diverse and disorderly objects, coupled with illumination variations and the influence of adjacent objects, can easily result in misclassifications of pixels, consequently affecting the outcome of semantic segmentation. We propose a Multi-modal Interaction and Pooling Attention Network (MIPANet) in response to these challenges. This network is designed to exploit the interactive synergy between RGB and depth modalities, aiming to enhance the utilization of complementary information and improve segmentation accuracy. Specifically, we incorporate a Multi-modal Interaction Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Moreover, we introduce a Pooling Attention Module (PAM) at various stages of the encoder to enhance the features extracted by the network. The outputs of the PAMs at different stages are selectively integrated into the decoder through a refinement module to improve semantic segmentation performance. Experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYU-Depth V2 and SUN-RGBD, by optimizing the insufficient information interaction between different modalities in RGB-D semantic segmentation. The source codes are available at <jats:ext-link>https://github.com/2295104718/MIPANet</jats:ext-link>.","PeriodicalId":12507,"journal":{"name":"Frontiers in Physics","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention\",\"authors\":\"Shuai Zhang, Minghong Xie\",\"doi\":\"10.3389/fphy.2024.1411559\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The semantic segmentation of RGB-D images involves understanding objects appearances and spatial relationships within a scene, which necessitates careful consideration of multiple factors. In indoor scenes, the presence of diverse and disorderly objects, coupled with illumination variations and the influence of adjacent objects, can easily result in misclassifications of pixels, consequently affecting the outcome of semantic segmentation. We propose a Multi-modal Interaction and Pooling Attention Network (MIPANet) in response to these challenges. This network is designed to exploit the interactive synergy between RGB and depth modalities, aiming to enhance the utilization of complementary information and improve segmentation accuracy. Specifically, we incorporate a Multi-modal Interaction Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Moreover, we introduce a Pooling Attention Module (PAM) at various stages of the encoder to enhance the features extracted by the network. The outputs of the PAMs at different stages are selectively integrated into the decoder through a refinement module to improve semantic segmentation performance. Experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYU-Depth V2 and SUN-RGBD, by optimizing the insufficient information interaction between different modalities in RGB-D semantic segmentation. The source codes are available at <jats:ext-link>https://github.com/2295104718/MIPANet</jats:ext-link>.\",\"PeriodicalId\":12507,\"journal\":{\"name\":\"Frontiers in Physics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2024-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Physics\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3389/fphy.2024.1411559\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Physics","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3389/fphy.2024.1411559","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

RGB-D 图像的语义分割涉及对场景中物体外观和空间关系的理解,需要仔细考虑多种因素。在室内场景中,由于物体的多样性和无序性,再加上光照变化和相邻物体的影响,很容易造成像素的错误分类,从而影响语义分割的结果。为了应对这些挑战,我们提出了多模态交互和集合注意力网络(MIPANet)。该网络旨在利用 RGB 和深度模式之间的互动协同作用,从而加强对互补信息的利用,提高分割准确性。具体来说,我们在网络的最深层加入了多模态交互模块(MIM)。该模块旨在促进 RGB 和深度信息的融合,从而实现相互增强和校正。此外,我们还在编码器的不同阶段引入了汇集注意力模块(PAM),以增强网络提取的特征。不同阶段的 PAM 输出通过细化模块有选择地集成到解码器中,以提高语义分割性能。实验结果表明,在 NYU-Depth V2 和 SUN-RGBD 这两个室内场景数据集上,MIPANet 通过优化 RGB-D 语义分割中不同模态之间的信息交互不足,表现优于现有方法。源代码见 https://github.com/2295104718/MIPANet。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention
The semantic segmentation of RGB-D images involves understanding objects appearances and spatial relationships within a scene, which necessitates careful consideration of multiple factors. In indoor scenes, the presence of diverse and disorderly objects, coupled with illumination variations and the influence of adjacent objects, can easily result in misclassifications of pixels, consequently affecting the outcome of semantic segmentation. We propose a Multi-modal Interaction and Pooling Attention Network (MIPANet) in response to these challenges. This network is designed to exploit the interactive synergy between RGB and depth modalities, aiming to enhance the utilization of complementary information and improve segmentation accuracy. Specifically, we incorporate a Multi-modal Interaction Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Moreover, we introduce a Pooling Attention Module (PAM) at various stages of the encoder to enhance the features extracted by the network. The outputs of the PAMs at different stages are selectively integrated into the decoder through a refinement module to improve semantic segmentation performance. Experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYU-Depth V2 and SUN-RGBD, by optimizing the insufficient information interaction between different modalities in RGB-D semantic segmentation. The source codes are available at https://github.com/2295104718/MIPANet.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Frontiers in Physics
Frontiers in Physics Mathematics-Mathematical Physics
CiteScore
4.50
自引率
6.50%
发文量
1215
审稿时长
12 weeks
期刊介绍: Frontiers in Physics publishes rigorously peer-reviewed research across the entire field, from experimental, to computational and theoretical physics. This multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics, engineers and the public worldwide.
期刊最新文献
A modular torso phantom featuring a pneumatic stepper and flow for MR sequence development Dual chiral structures in the cuticle of Protaetia mirifica analyzed with Mueller matrix spectroscopic ellipsometry Erratum: Anomalous Hall effects in chiral superconductors Quasi-position vector curves in Galilean 4-space Low gain avalanche diodes for photon science applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1