基于扩散模型的高阶视觉任务可见光和热图像融合网络

IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Applied Intelligence Pub Date : 2025-01-09 DOI:10.1007/s10489-024-06210-6
Jin Meng, Jiahui Zou, Zhuoheng Xiang, Cui Wang, Shifeng Wang, Yan Li, Jonghyuk Kim
{"title":"基于扩散模型的高阶视觉任务可见光和热图像融合网络","authors":"Jin Meng,&nbsp;Jiahui Zou,&nbsp;Zhuoheng Xiang,&nbsp;Cui Wang,&nbsp;Shifeng Wang,&nbsp;Yan Li,&nbsp;Jonghyuk Kim","doi":"10.1007/s10489-024-06210-6","DOIUrl":null,"url":null,"abstract":"<div><p>Fusion technology enhances the performance of applications such as security, autonomous driving, military surveillance, medical imaging, and environmental monitoring by combining complementary information. The fusion of visible and thermal (RGB-T) images is critical for improving human observation and visual tasks. However, the training of most semantics-driven fusion algorithms combines segmentation and fusion tasks, thereby increasing the computational cost and underutilizing semantic information. Designing a cleaner fusion architecture to mine rich deep semantic features is the key to addressing this issue. A two-stage RGB-T image fusion network with diffusion models is proposed in this paper. In the first stage, the diffusion model is employed to extract multiscale features. This provided rich semantic features and texture edges for the fusion network. In the next stage, semantic feature enhancement module (SFEM) and detail feature enhancement module (DFEM) are proposed to improve the network’s ability to describe small details. An adaptive global-local attention mechanism (AGAM) is used to enhance the weights of key features related to visual tasks. Specifically, we benchmarked the proposed algorithm by creating a new tri-modal sensor driving scene dataset (TSDS), which includes 15234 sets of labeled images (visible, thermal, and polarization degree images). The semantic segmentation model trained on our fusion images achieved 78.41% accuracy, and the object detection model achieved 87.21% MAP. The experimental results indicate that our algorithm outperforms the state-of-the-art image fusion algorithms.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Visible and thermal image fusion network with diffusion models for high-level visual tasks\",\"authors\":\"Jin Meng,&nbsp;Jiahui Zou,&nbsp;Zhuoheng Xiang,&nbsp;Cui Wang,&nbsp;Shifeng Wang,&nbsp;Yan Li,&nbsp;Jonghyuk Kim\",\"doi\":\"10.1007/s10489-024-06210-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Fusion technology enhances the performance of applications such as security, autonomous driving, military surveillance, medical imaging, and environmental monitoring by combining complementary information. The fusion of visible and thermal (RGB-T) images is critical for improving human observation and visual tasks. However, the training of most semantics-driven fusion algorithms combines segmentation and fusion tasks, thereby increasing the computational cost and underutilizing semantic information. Designing a cleaner fusion architecture to mine rich deep semantic features is the key to addressing this issue. A two-stage RGB-T image fusion network with diffusion models is proposed in this paper. In the first stage, the diffusion model is employed to extract multiscale features. This provided rich semantic features and texture edges for the fusion network. In the next stage, semantic feature enhancement module (SFEM) and detail feature enhancement module (DFEM) are proposed to improve the network’s ability to describe small details. An adaptive global-local attention mechanism (AGAM) is used to enhance the weights of key features related to visual tasks. Specifically, we benchmarked the proposed algorithm by creating a new tri-modal sensor driving scene dataset (TSDS), which includes 15234 sets of labeled images (visible, thermal, and polarization degree images). The semantic segmentation model trained on our fusion images achieved 78.41% accuracy, and the object detection model achieved 87.21% MAP. The experimental results indicate that our algorithm outperforms the state-of-the-art image fusion algorithms.</p></div>\",\"PeriodicalId\":8041,\"journal\":{\"name\":\"Applied Intelligence\",\"volume\":\"55 4\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10489-024-06210-6\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-06210-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

融合技术通过结合互补信息,提高了安防、自动驾驶、军事监视、医疗成像、环境监测等应用的性能。可见光和热成像(RGB-T)图像的融合对于改善人类观察和视觉任务至关重要。然而,大多数语义驱动的融合算法的训练将分割和融合任务结合在一起,从而增加了计算成本,并且没有充分利用语义信息。设计一个更清晰的融合架构来挖掘丰富的深层语义特征是解决这一问题的关键。提出了一种带扩散模型的两级RGB-T图像融合网络。第一阶段,利用扩散模型提取多尺度特征;这为融合网络提供了丰富的语义特征和纹理边缘。下一步,提出语义特征增强模块(semantic feature enhancement module, SFEM)和细节特征增强模块(detail feature enhancement module, DFEM)来提高网络描述小细节的能力。采用自适应全局-局部注意机制(AGAM)增强与视觉任务相关的关键特征权重。具体来说,我们通过创建一个新的三模态传感器驾驶场景数据集(TSDS)来对所提出的算法进行基准测试,该数据集包括15234组标记图像(可见光、热成像和偏振度图像)。在我们的融合图像上训练的语义分割模型的准确率达到78.41%,目标检测模型的MAP达到87.21%。实验结果表明,该算法优于现有的图像融合算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Visible and thermal image fusion network with diffusion models for high-level visual tasks

Fusion technology enhances the performance of applications such as security, autonomous driving, military surveillance, medical imaging, and environmental monitoring by combining complementary information. The fusion of visible and thermal (RGB-T) images is critical for improving human observation and visual tasks. However, the training of most semantics-driven fusion algorithms combines segmentation and fusion tasks, thereby increasing the computational cost and underutilizing semantic information. Designing a cleaner fusion architecture to mine rich deep semantic features is the key to addressing this issue. A two-stage RGB-T image fusion network with diffusion models is proposed in this paper. In the first stage, the diffusion model is employed to extract multiscale features. This provided rich semantic features and texture edges for the fusion network. In the next stage, semantic feature enhancement module (SFEM) and detail feature enhancement module (DFEM) are proposed to improve the network’s ability to describe small details. An adaptive global-local attention mechanism (AGAM) is used to enhance the weights of key features related to visual tasks. Specifically, we benchmarked the proposed algorithm by creating a new tri-modal sensor driving scene dataset (TSDS), which includes 15234 sets of labeled images (visible, thermal, and polarization degree images). The semantic segmentation model trained on our fusion images achieved 78.41% accuracy, and the object detection model achieved 87.21% MAP. The experimental results indicate that our algorithm outperforms the state-of-the-art image fusion algorithms.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Intelligence
Applied Intelligence 工程技术-计算机:人工智能
CiteScore
6.60
自引率
20.80%
发文量
1361
审稿时长
5.9 months
期刊介绍: With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.
期刊最新文献
Multi-view learning based on product and process metrics for software defect prediction MVSRF: Point cloud semantic segmentation and optimization method for granular construction objects WLKA-RVS: a retinal vessel segmentation method using weighted large kernel attention DAAR: Dual attention cooperative adaptive pruning rate by data-driven for filter pruning Chaotic opposition-based plant propagation algorithm for engineering problem
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1