首页 > 最新文献

Image and Vision Computing最新文献

英文 中文
MUNet: A lightweight Mamba-based Under-Display Camera restoration network
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-06 DOI: 10.1016/j.imavis.2025.105486
Wenxin Wang , Boyun Li , Wanli Liu , Xi Peng , Yuanbiao Gou
Under-Display Camera (UDC) restoration aims to recover the underlying clean images from the degraded images captured by UDC. Although promising results have been achieved, most existing UDC restoration methods still suffer from two vital obstacles in practice: (i) existing UDC restoration models are parameter-intensive, and (ii) most of them struggle to capture long-range dependencies within high-resolution images. To overcome above drawbacks, we study a challenging problem in UDC restoration, namely, how to design a lightweight UDC restoration model that could capture long-range image dependencies. To this end, we propose a novel lightweight Mamba-based UDC restoration network (MUNet) consisting of two modules, named Separate Multi-scale Mamba (SMM) and Separate Convolutional Feature Extractor (SCFE). Specifically, SMM exploits our proposed alternate scanning strategy to efficiently capture long-range dependencies across multi-scale image features. SCFE preserves local dependencies through convolutions with various receptive fields. Thanks to SMM and SCFE, MUNet achieves state-of-the-art lightweight UDC restoration performance with significantly fewer parameters, making it well-suited for deployment on mobile devices. Our codes will be available after acceptance.
{"title":"MUNet: A lightweight Mamba-based Under-Display Camera restoration network","authors":"Wenxin Wang ,&nbsp;Boyun Li ,&nbsp;Wanli Liu ,&nbsp;Xi Peng ,&nbsp;Yuanbiao Gou","doi":"10.1016/j.imavis.2025.105486","DOIUrl":"10.1016/j.imavis.2025.105486","url":null,"abstract":"<div><div>Under-Display Camera (UDC) restoration aims to recover the underlying clean images from the degraded images captured by UDC. Although promising results have been achieved, most existing UDC restoration methods still suffer from two vital obstacles in practice: (i) existing UDC restoration models are parameter-intensive, and (ii) most of them struggle to capture long-range dependencies within high-resolution images. To overcome above drawbacks, we study a challenging problem in UDC restoration, namely, how to design a lightweight UDC restoration model that could capture long-range image dependencies. To this end, we propose a novel lightweight Mamba-based UDC restoration network (MUNet) consisting of two modules, named Separate Multi-scale Mamba (SMM) and Separate Convolutional Feature Extractor (SCFE). Specifically, SMM exploits our proposed alternate scanning strategy to efficiently capture long-range dependencies across multi-scale image features. SCFE preserves local dependencies through convolutions with various receptive fields. Thanks to SMM and SCFE, MUNet achieves state-of-the-art lightweight UDC restoration performance with significantly fewer parameters, making it well-suited for deployment on mobile devices. Our codes will be available after acceptance.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105486"},"PeriodicalIF":4.2,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143577294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning for brain tumor segmentation in multimodal MRI images: A review of methods and advances
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-04 DOI: 10.1016/j.imavis.2025.105463
Bin Jiang , Maoyu Liao , Yun Zhao , Gen Li , Siyu Cheng , Xiangkai Wang , Qingling Xia

Background and Objectives:

Image segmentation is crucial in applications like image understanding, feature extraction, and analysis. The rapid development of deep learning techniques in recent years has significantly enhanced the field of medical image processing, with the process of segmenting tumor from MRI images of the brain emerging as a particularly active area of interest within the medical science community. Existing reviews predominantly focus on traditional CNNs and Transformer models but lack systematic analysis and experimental validation on the application of the emerging Mamba architecture in multimodal brain tumor segmentation, the handling of missing modalities, the potential of multimodal fusion strategies, and the heterogeneity of datasets.

Methods:

This paper provides a comprehensive literature review of recent deep learning-based methods for multimodal brain tumor segmentation using multimodal MRI images, including performance and quantitative analysis of state-of-the-art approaches. It focuses on the handling of multimodal fusion, adaptation techniques, and missing modality, while also delving into the performance, advantages, and disadvantages of deep learning models such as U-Net, Transformer, hybrid deep learning, and Mamba-based methods in segmentation tasks.

Results:

Through the entire review process, It is found that most researchers preferred to use the Transformer-based U-Net model and mamba-based U-Net, especially the fusion model combination of U-Net and mamba, for image segmentation.
{"title":"Deep learning for brain tumor segmentation in multimodal MRI images: A review of methods and advances","authors":"Bin Jiang ,&nbsp;Maoyu Liao ,&nbsp;Yun Zhao ,&nbsp;Gen Li ,&nbsp;Siyu Cheng ,&nbsp;Xiangkai Wang ,&nbsp;Qingling Xia","doi":"10.1016/j.imavis.2025.105463","DOIUrl":"10.1016/j.imavis.2025.105463","url":null,"abstract":"<div><h3>Background and Objectives:</h3><div>Image segmentation is crucial in applications like image understanding, feature extraction, and analysis. The rapid development of deep learning techniques in recent years has significantly enhanced the field of medical image processing, with the process of segmenting tumor from MRI images of the brain emerging as a particularly active area of interest within the medical science community. Existing reviews predominantly focus on traditional CNNs and Transformer models but lack systematic analysis and experimental validation on the application of the emerging Mamba architecture in multimodal brain tumor segmentation, the handling of missing modalities, the potential of multimodal fusion strategies, and the heterogeneity of datasets.</div></div><div><h3>Methods:</h3><div>This paper provides a comprehensive literature review of recent deep learning-based methods for multimodal brain tumor segmentation using multimodal MRI images, including performance and quantitative analysis of state-of-the-art approaches. It focuses on the handling of multimodal fusion, adaptation techniques, and missing modality, while also delving into the performance, advantages, and disadvantages of deep learning models such as U-Net, Transformer, hybrid deep learning, and Mamba-based methods in segmentation tasks.</div></div><div><h3>Results:</h3><div>Through the entire review process, It is found that most researchers preferred to use the Transformer-based U-Net model and mamba-based U-Net, especially the fusion model combination of U-Net and mamba, for image segmentation.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105463"},"PeriodicalIF":4.2,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143563582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dense small target detection algorithm for UAV aerial imagery
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-04 DOI: 10.1016/j.imavis.2025.105485
Sheng Lu , Yangming Guo , Jiang Long , Zun Liu , Zhuqing Wang , Ying Li
Unmanned aerial vehicle (UAV) aerial images make dense small target detection challenging due to the complex background, small object size in the wide field of view, low resolution, and dense target distribution. Many aerial target detection networks and attention-based methods have been proposed to enhance the capability of dense small target detection, but there are still problems, such as insufficient effective information extraction, missed detection, and false detection of small targets in dense areas. Therefore, this paper proposes a novel dense small target detection algorithm (DSTDA) for UAV aerial images suitable for various high-altitude complex environments. The core component of the proposed DSTDA consists of the multi-axis attention units, the adaptive feature transformation mechanism, and the target-guided sample allocation strategy. Firstly, by introducing the multi-axis attention units into DSTDA, the limitation of DSTDA on global information perception can be addressed. Thus, the detailed features and spatial relationships of small targets at long distances can be sufficiently extracted by our proposed algorithm. Secondly, an adaptive feature transformation mechanism is designed to flexibly adjust the feature map according to the characteristics of the target distribution, which enables the DSTDA to focus more on densely populated target areas. Lastly, a goal-oriented sample allocation strategy is presented, combining coarse screening based on positional information and fine screening guided by target prediction information. By employing this dynamic sample allocation from coarse to fine, the detection performance of small and dense targets in complex backgrounds is further improved. These above innovative improvements empower the DSTDA with enhanced global perception and target-focusing capabilities, effectively addressing the challenges of detecting dense small targets in complex aerial scenes. Experimental validation was conducted on three publicly available datasets: VisDrone, SIMD, and CARPK. The results showed that the proposed DSTDA outperforms other state-of-the-art algorithms in terms of comprehensive performance. The algorithm significantly improves the issues of false alarms and missed detection in drone-based target detection, showcasing remarkable accuracy and real-time performance. It proves to be proficient in the task of detecting dense small targets in drone scenarios.
{"title":"Dense small target detection algorithm for UAV aerial imagery","authors":"Sheng Lu ,&nbsp;Yangming Guo ,&nbsp;Jiang Long ,&nbsp;Zun Liu ,&nbsp;Zhuqing Wang ,&nbsp;Ying Li","doi":"10.1016/j.imavis.2025.105485","DOIUrl":"10.1016/j.imavis.2025.105485","url":null,"abstract":"<div><div>Unmanned aerial vehicle (UAV) aerial images make dense small target detection challenging due to the complex background, small object size in the wide field of view, low resolution, and dense target distribution. Many aerial target detection networks and attention-based methods have been proposed to enhance the capability of dense small target detection, but there are still problems, such as insufficient effective information extraction, missed detection, and false detection of small targets in dense areas. Therefore, this paper proposes a novel dense small target detection algorithm (DSTDA) for UAV aerial images suitable for various high-altitude complex environments. The core component of the proposed DSTDA consists of the multi-axis attention units, the adaptive feature transformation mechanism, and the target-guided sample allocation strategy. Firstly, by introducing the multi-axis attention units into DSTDA, the limitation of DSTDA on global information perception can be addressed. Thus, the detailed features and spatial relationships of small targets at long distances can be sufficiently extracted by our proposed algorithm. Secondly, an adaptive feature transformation mechanism is designed to flexibly adjust the feature map according to the characteristics of the target distribution, which enables the DSTDA to focus more on densely populated target areas. Lastly, a goal-oriented sample allocation strategy is presented, combining coarse screening based on positional information and fine screening guided by target prediction information. By employing this dynamic sample allocation from coarse to fine, the detection performance of small and dense targets in complex backgrounds is further improved. These above innovative improvements empower the DSTDA with enhanced global perception and target-focusing capabilities, effectively addressing the challenges of detecting dense small targets in complex aerial scenes. Experimental validation was conducted on three publicly available datasets: VisDrone, SIMD, and CARPK. The results showed that the proposed DSTDA outperforms other state-of-the-art algorithms in terms of comprehensive performance. The algorithm significantly improves the issues of false alarms and missed detection in drone-based target detection, showcasing remarkable accuracy and real-time performance. It proves to be proficient in the task of detecting dense small targets in drone scenarios.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105485"},"PeriodicalIF":4.2,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143563580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DAN: Distortion-aware Network for fisheye image rectification using graph reasoning
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-03 DOI: 10.1016/j.imavis.2025.105423
Yongjia Yan , Hongzhe Liu , Cheng Zhang , Cheng Xu , Bingxin Xu , Weiguo Pan , Songyin Dai , Yiqing Song
Despite the wide-field view of fisheye images, their application is still hindered by the presentation of distortions. Existing learning-based methods still suffer from artifacts and loss of details, especially at the image edges. To address this, we introduce the Distortion-aware Network (DAN), a novel deep network architecture for fisheye image rectification that leverages graph reasoning. Specifically, we employ the superior relational understanding capability of graph technology to associate distortion patterns in different regions, generating an accurate and globally consistent unwarping flow. Meanwhile, during the image reconstruction process, we utilize deformable convolution to construct same-resolution feature blocks and employ skip connections to supplement the detailed information. Additionally, we introduce a weight decay-based multi-scale loss function, enabling the model to focus more on accuracy at high-resolution layers while enhancing the model’s generalization ability. To address the lack of quantitative evaluation standards for real fisheye images, we propose a new metric called the “Line Preservation Metric.” Through qualitative and quantitative experiments on PLACE365, COCO2017 and real fisheye images, the proposed method proves to outperform existing methods in terms of performance and generalization.
{"title":"DAN: Distortion-aware Network for fisheye image rectification using graph reasoning","authors":"Yongjia Yan ,&nbsp;Hongzhe Liu ,&nbsp;Cheng Zhang ,&nbsp;Cheng Xu ,&nbsp;Bingxin Xu ,&nbsp;Weiguo Pan ,&nbsp;Songyin Dai ,&nbsp;Yiqing Song","doi":"10.1016/j.imavis.2025.105423","DOIUrl":"10.1016/j.imavis.2025.105423","url":null,"abstract":"<div><div>Despite the wide-field view of fisheye images, their application is still hindered by the presentation of distortions. Existing learning-based methods still suffer from artifacts and loss of details, especially at the image edges. To address this, we introduce the Distortion-aware Network (DAN), a novel deep network architecture for fisheye image rectification that leverages graph reasoning. Specifically, we employ the superior relational understanding capability of graph technology to associate distortion patterns in different regions, generating an accurate and globally consistent unwarping flow. Meanwhile, during the image reconstruction process, we utilize deformable convolution to construct same-resolution feature blocks and employ skip connections to supplement the detailed information. Additionally, we introduce a weight decay-based multi-scale loss function, enabling the model to focus more on accuracy at high-resolution layers while enhancing the model’s generalization ability. To address the lack of quantitative evaluation standards for real fisheye images, we propose a new metric called the “Line Preservation Metric.” Through qualitative and quantitative experiments on PLACE365, COCO2017 and real fisheye images, the proposed method proves to outperform existing methods in terms of performance and generalization.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105423"},"PeriodicalIF":4.2,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143577293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial cascaded clustering and weighted memory for unsupervised person re-identification
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-03 DOI: 10.1016/j.imavis.2025.105478
Jiahao Hong, Jialong Zuo, Chuchu Han, Ruochen Zheng, Ming Tian, Changxin Gao, Nong Sang
Recent advancements in unsupervised person re-identification (re-ID) methods have demonstrated high performance by leveraging fine-grained local context, often referred to as part-based methods. However, many existing part-based methods rely on horizontal division to obtain local contexts, leading to misalignment issues caused by various human poses. Moreover, misalignment of semantic information within part features hampers the effectiveness of metric learning, thereby limiting the potential of part-based methods. These challenges result in under-utilization of part features in existing approaches. To address these issues, we introduce the Spatial Cascaded Clustering and Weighted Memory (SCWM) method. SCWM aims to parse and align more accurate local contexts for different human body parts while allowing the memory module to balance hard example mining and noise suppression. Specifically, we first analyze the issues of foreground omissions and spatial confusions in previous methods. We then propose foreground and space corrections to enhance the completeness and reasonableness of human parsing results. Next, we introduce a weighted memory and utilize two weighting strategies. These strategies address hard sample mining for global features and enhance noise resistance for part features, enabling better utilization of both global and part features. Extensive experiments conducted on Market-1501, DukeMTMC-reID and MSMT17 datasets validate the effectiveness of the proposed method over numerous state-of-the-art methods.
{"title":"Spatial cascaded clustering and weighted memory for unsupervised person re-identification","authors":"Jiahao Hong,&nbsp;Jialong Zuo,&nbsp;Chuchu Han,&nbsp;Ruochen Zheng,&nbsp;Ming Tian,&nbsp;Changxin Gao,&nbsp;Nong Sang","doi":"10.1016/j.imavis.2025.105478","DOIUrl":"10.1016/j.imavis.2025.105478","url":null,"abstract":"<div><div>Recent advancements in unsupervised person re-identification (re-ID) methods have demonstrated high performance by leveraging fine-grained local context, often referred to as part-based methods. However, many existing part-based methods rely on horizontal division to obtain local contexts, leading to misalignment issues caused by various human poses. Moreover, misalignment of semantic information within part features hampers the effectiveness of metric learning, thereby limiting the potential of part-based methods. These challenges result in under-utilization of part features in existing approaches. To address these issues, we introduce the Spatial Cascaded Clustering and Weighted Memory (SCWM) method. SCWM aims to parse and align more accurate local contexts for different human body parts while allowing the memory module to balance hard example mining and noise suppression. Specifically, we first analyze the issues of foreground omissions and spatial confusions in previous methods. We then propose foreground and space corrections to enhance the completeness and reasonableness of human parsing results. Next, we introduce a weighted memory and utilize two weighting strategies. These strategies address hard sample mining for global features and enhance noise resistance for part features, enabling better utilization of both global and part features. Extensive experiments conducted on Market-1501, DukeMTMC-reID and MSMT17 datasets validate the effectiveness of the proposed method over numerous state-of-the-art methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105478"},"PeriodicalIF":4.2,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143577301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time localization and navigation method for autonomous vehicles based on multi-modal data fusion by integrating memory transformer and DDQN 基于记忆变换器和 DDQN 的多模态数据融合的自动驾驶汽车实时定位和导航方法
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-02 DOI: 10.1016/j.imavis.2025.105484
Li Zha , Chen Gong , Kunfeng Lv
In the field of autonomous driving, real-time localization and navigation are the core technologies that ensure vehicle safety and precise operation. With advancements in sensor technology and computing power, multi-modal data fusion has become a key method for enhancing the environmental perception capabilities of autonomous vehicles. This study aims to explore a novel visual-language navigation technology to achieve precise navigation of autonomous cars in complex environments. By integrating information from radar, sonar, 5G networks, Wi-Fi, Bluetooth, and a 360-degree visual information collection device mounted on the vehicle's roof, the model fully exploits rich multi-source data. The model uses the Memory Transformer for efficient data encoding and a data fusion strategy with a self-attention network, ensuring a balance between feature integrity and algorithm real-time performance. Furthermore, the encoded data is input into a DDQN vehicle navigation algorithm based on an automatically growing environmental target knowledge graph and large-scale scene maps, enabling continuous learning and optimization in real-world environments. Comparative experiments show that the proposed model outperforms existing SOTA models, particularly in terms of macro-spatial reference from large-scale scene maps, background knowledge support from the automatically growing knowledge graph, and the experience-optimized navigation strategies of the DDQN algorithm. In the comparative experiments with the SOTA models, the proposed model achieved scores of 3.99, 0.65, 0.67, 0.65, 0.63, and 0.63 on the six metrics NE, SR, OSR, SPL, CLS, and DTW, respectively. All of these results significantly enhance the intelligent positioning and navigation capabilities of autonomous driving vehicles.
{"title":"Real-time localization and navigation method for autonomous vehicles based on multi-modal data fusion by integrating memory transformer and DDQN","authors":"Li Zha ,&nbsp;Chen Gong ,&nbsp;Kunfeng Lv","doi":"10.1016/j.imavis.2025.105484","DOIUrl":"10.1016/j.imavis.2025.105484","url":null,"abstract":"<div><div>In the field of autonomous driving, real-time localization and navigation are the core technologies that ensure vehicle safety and precise operation. With advancements in sensor technology and computing power, multi-modal data fusion has become a key method for enhancing the environmental perception capabilities of autonomous vehicles. This study aims to explore a novel visual-language navigation technology to achieve precise navigation of autonomous cars in complex environments. By integrating information from radar, sonar, 5G networks, Wi-Fi, Bluetooth, and a 360-degree visual information collection device mounted on the vehicle's roof, the model fully exploits rich multi-source data. The model uses the Memory Transformer for efficient data encoding and a data fusion strategy with a self-attention network, ensuring a balance between feature integrity and algorithm real-time performance. Furthermore, the encoded data is input into a DDQN vehicle navigation algorithm based on an automatically growing environmental target knowledge graph and large-scale scene maps, enabling continuous learning and optimization in real-world environments. Comparative experiments show that the proposed model outperforms existing SOTA models, particularly in terms of macro-spatial reference from large-scale scene maps, background knowledge support from the automatically growing knowledge graph, and the experience-optimized navigation strategies of the DDQN algorithm. In the comparative experiments with the SOTA models, the proposed model achieved scores of 3.99, 0.65, 0.67, 0.65, 0.63, and 0.63 on the six metrics NE, SR, OSR, SPL, CLS, and DTW, respectively. All of these results significantly enhance the intelligent positioning and navigation capabilities of autonomous driving vehicles.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105484"},"PeriodicalIF":4.2,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143577422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-information guided camouflaged object detection
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-01 DOI: 10.1016/j.imavis.2025.105470
Caijuan Shi , Lin Zhao , Rui Wang , Kun Zhang , Fanyue Kong , Changyu Duan
Camouflaged Object Detection (COD) aims to identify the objects hidden in the background environment. Though more and more COD methods have been proposed in recent years, existing methods still perform poorly for detecting small objects, obscured objects, boundary-rich objects, and multi-objects, mainly because they fail to effectively utilize context information, texture information, and boundary information simultaneously. Therefore, in this paper, we propose a Multi-information Guided Camouflaged Object Detection Network (MIGNet) to fully utilize multi-information containing context information, texture information, and boundary information to boost the performance of camouflaged object detection. Specifically, firstly, we design the texture and boundary label and the Texture and Boundary Enhanced Module (TBEM) to obtain differentiated texture information and boundary information. Next, the Neighbor Context Information Exploration Module (NCIEM) is designed to obtain rich multi-scale context information. Then, the Parallel Group Bootstrap Module (PGBM) is designed to maximize the effective aggregation of context information, texture information and boundary information. Finally, Information Enhanced Decoder (IED) is designed to effectively enhance the interaction of neighboring layer features and suppress the background noise for good detection results. Extensive quantitative and qualitative experiments are conducted on four widely used datasets. The experimental results indicate that our proposed MIGNet with good performance of camouflaged object detection outperforms the other 22 COD models.
{"title":"Multi-information guided camouflaged object detection","authors":"Caijuan Shi ,&nbsp;Lin Zhao ,&nbsp;Rui Wang ,&nbsp;Kun Zhang ,&nbsp;Fanyue Kong ,&nbsp;Changyu Duan","doi":"10.1016/j.imavis.2025.105470","DOIUrl":"10.1016/j.imavis.2025.105470","url":null,"abstract":"<div><div>Camouflaged Object Detection (COD) aims to identify the objects hidden in the background environment. Though more and more COD methods have been proposed in recent years, existing methods still perform poorly for detecting small objects, obscured objects, boundary-rich objects, and multi-objects, mainly because they fail to effectively utilize context information, texture information, and boundary information simultaneously. Therefore, in this paper, we propose a Multi-information Guided Camouflaged Object Detection Network (MIGNet) to fully utilize multi-information containing context information, texture information, and boundary information to boost the performance of camouflaged object detection. Specifically, firstly, we design the texture and boundary label and the Texture and Boundary Enhanced Module (TBEM) to obtain differentiated texture information and boundary information. Next, the Neighbor Context Information Exploration Module (NCIEM) is designed to obtain rich multi-scale context information. Then, the Parallel Group Bootstrap Module (PGBM) is designed to maximize the effective aggregation of context information, texture information and boundary information. Finally, Information Enhanced Decoder (IED) is designed to effectively enhance the interaction of neighboring layer features and suppress the background noise for good detection results. Extensive quantitative and qualitative experiments are conducted on four widely used datasets. The experimental results indicate that our proposed MIGNet with good performance of camouflaged object detection outperforms the other 22 COD models.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105470"},"PeriodicalIF":4.2,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143551280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CMS-net: Edge-aware multimodal MRI feature fusion for brain tumor segmentation
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-28 DOI: 10.1016/j.imavis.2025.105481
Chunjie Lv , Biyuan Li , Xiuwei Wang , Pengfei Cai , Bo Yang , Xuefeng Jia , Jun Yan
With the growing application of artificial intelligence in medical image processing, multimodal MRI brain tumor segmentation has become crucial for clinical diagnosis and treatment. Accurate segmentation relies heavily on the effective utilization of multimodal information. However, most existing methods primarily focus on global and local deep semantic features, often overlooking critical aspects such as edge information and cross-channel correlations. To address these limitations while retaining the strengths of existing methods, we propose a novel brain tumor segmentation approach: an edge-aware feature fusion model based on a dual-encoder architecture. CMS-Net is a novel brain tumor segmentation model that integrates edge-aware fusion, cross-channel interaction, and spatial state feature extraction to fully leverage multimodal information for improved segmentation accuracy. The architecture comprises two main components: an encoder and a decoder. The encoder utilizes both convolutional downsampling and Smart Swin Transformer downsampling, with the latter employing Shifted Spatial Multi-Head Self-Attention (SSW-MSA) to capture global features and enhance long-range dependencies. The decoder reconstructs the image via the CMS-Block, which consists of three key modules: the Multi-Scale Deep Convolutional Cross-Channel Attention module (MDTA), the Spatial State Module (SSM), and the Boundary-Aware Feature Fusion module (SWA). CMS-Net's dual-encoder architecture allows for deep extraction of both local and global features, enhancing segmentation performance. MDTA generates attention maps through cross-channel covariance, while SSM models spatial context to improve the understanding of complex structures. The SWA module, combining SSW-MSA with pooling, subtraction, and convolution, facilitates feature fusion and edge extraction. Dice and Focal loss functions were introduced to optimize cross-channel and spatial feature extraction. Experimental results on the BraTS2018, BraTS2019, and BraTS2020 datasets demonstrate that CMS-Net effectively integrates spatial state, cross-channel, and boundary information, significantly improving multimodal brain tumor segmentation accuracy.
{"title":"CMS-net: Edge-aware multimodal MRI feature fusion for brain tumor segmentation","authors":"Chunjie Lv ,&nbsp;Biyuan Li ,&nbsp;Xiuwei Wang ,&nbsp;Pengfei Cai ,&nbsp;Bo Yang ,&nbsp;Xuefeng Jia ,&nbsp;Jun Yan","doi":"10.1016/j.imavis.2025.105481","DOIUrl":"10.1016/j.imavis.2025.105481","url":null,"abstract":"<div><div>With the growing application of artificial intelligence in medical image processing, multimodal MRI brain tumor segmentation has become crucial for clinical diagnosis and treatment. Accurate segmentation relies heavily on the effective utilization of multimodal information. However, most existing methods primarily focus on global and local deep semantic features, often overlooking critical aspects such as edge information and cross-channel correlations. To address these limitations while retaining the strengths of existing methods, we propose a novel brain tumor segmentation approach: an edge-aware feature fusion model based on a dual-encoder architecture. CMS-Net is a novel brain tumor segmentation model that integrates edge-aware fusion, cross-channel interaction, and spatial state feature extraction to fully leverage multimodal information for improved segmentation accuracy. The architecture comprises two main components: an encoder and a decoder. The encoder utilizes both convolutional downsampling and Smart Swin Transformer downsampling, with the latter employing Shifted Spatial Multi-Head Self-Attention (SSW-MSA) to capture global features and enhance long-range dependencies. The decoder reconstructs the image via the CMS-Block, which consists of three key modules: the Multi-Scale Deep Convolutional Cross-Channel Attention module (MDTA), the Spatial State Module (SSM), and the Boundary-Aware Feature Fusion module (SWA). CMS-Net's dual-encoder architecture allows for deep extraction of both local and global features, enhancing segmentation performance. MDTA generates attention maps through cross-channel covariance, while SSM models spatial context to improve the understanding of complex structures. The SWA module, combining SSW-MSA with pooling, subtraction, and convolution, facilitates feature fusion and edge extraction. Dice and Focal loss functions were introduced to optimize cross-channel and spatial feature extraction. Experimental results on the BraTS2018, BraTS2019, and BraTS2020 datasets demonstrate that CMS-Net effectively integrates spatial state, cross-channel, and boundary information, significantly improving multimodal brain tumor segmentation accuracy.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105481"},"PeriodicalIF":4.2,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint Transformer and Mamba fusion for multispectral object detection
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-27 DOI: 10.1016/j.imavis.2025.105468
Chao Li, Xiaoming Peng
Multispectral object detection is generally considered better than single-modality-based object detection, due to the complementary properties of multispectral image pairs. However, how to integrate features from images of different modalities for object detection is still an open problem. In this paper, we propose a new multispectral object detection framework based on the Transformer and Mamba architectures, called the joint Transformer and Mamba detection (JTMDet). Specifically, we divide the feature fusion process into two stages, the intra-scale fusion stage and the inter-scale fusion stage, to comprehensively utilize the multi-modal features at different scales. To this end, we designed the so-called cross-modal fusion (CMF) and cross-level fusion (CLF) modules, both of which contain JTMBlock modules. A JTMBlock module interweaves the Transformer and Mamba layers to robustly capture the useful information in multispectral image pairs while maintaining high inference speed. Extensive experiments on three publicly available datasets conclusively show that the proposed JTMDet framework achieves state-of-the-art multispectral object detection performance, and is competitive with current leading methods. Code and pre-trained models are publicly available at https://github.com/LiC2023/JTMDet.
{"title":"Joint Transformer and Mamba fusion for multispectral object detection","authors":"Chao Li,&nbsp;Xiaoming Peng","doi":"10.1016/j.imavis.2025.105468","DOIUrl":"10.1016/j.imavis.2025.105468","url":null,"abstract":"<div><div>Multispectral object detection is generally considered better than single-modality-based object detection, due to the complementary properties of multispectral image pairs. However, how to integrate features from images of different modalities for object detection is still an open problem. In this paper, we propose a new multispectral object detection framework based on the Transformer and Mamba architectures, called the joint Transformer and Mamba detection (JTMDet). Specifically, we divide the feature fusion process into two stages, the intra-scale fusion stage and the inter-scale fusion stage, to comprehensively utilize the multi-modal features at different scales. To this end, we designed the so-called cross-modal fusion (CMF) and cross-level fusion (CLF) modules, both of which contain JTMBlock modules. A JTMBlock module interweaves the Transformer and Mamba layers to robustly capture the useful information in multispectral image pairs while maintaining high inference speed. Extensive experiments on three publicly available datasets conclusively show that the proposed JTMDet framework achieves state-of-the-art multispectral object detection performance, and is competitive with current leading methods. Code and pre-trained models are publicly available at <span><span>https://github.com/LiC2023/JTMDet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105468"},"PeriodicalIF":4.2,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143563581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A spatial-frequency domain multi-branch decoder method for real-time semantic segmentation
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-26 DOI: 10.1016/j.imavis.2025.105483
Liwei Deng , Boda Wu , Songyu Chen , Dongxue Li , Yanze Fang
Semantic segmentation is crucial for the functionality of autonomous driving systems. However, most of the existing real-time semantic segmentation models focus on encoder design and underutilize spatial and frequency domain information in the decoder, limiting the segmentation accuracy of the model. To solve this problem, this paper proposes a multi-branch decoder network combining spatial domain and frequency domain to meet the real-time and accuracy requirements of the semantic segmentation task of road scenes for autonomous driving systems. Firstly, the network introduces a novel multi-scale dilated fusion block that gradually enlarges the receptive field through three consecutive dilated convolutions, and integrates features from different levels using skip connections. At the same time, a strategy of gradually reducing the number of channels is adopted to effectively remove redundant features. Secondly, we design three branches for the decoder. The global branch utilizes a lightweight Transformer architecture to extract global features and employs horizontal and vertical convolutions to achieve interaction among global features. The multi-scale branch combines dilated convolution and adaptive pooling to perform multi-scale feature extraction through fusion and post-processing. The wavelet transform feature converter maps spatial domain features into low-frequency and high-frequency components, which are then fused with global and multi-scale features to enhance the model representation. Finally, we conduct experiments on multiple datasets. The experimental results show that the proposed method best balances segmentation accuracy and inference speed.
{"title":"A spatial-frequency domain multi-branch decoder method for real-time semantic segmentation","authors":"Liwei Deng ,&nbsp;Boda Wu ,&nbsp;Songyu Chen ,&nbsp;Dongxue Li ,&nbsp;Yanze Fang","doi":"10.1016/j.imavis.2025.105483","DOIUrl":"10.1016/j.imavis.2025.105483","url":null,"abstract":"<div><div>Semantic segmentation is crucial for the functionality of autonomous driving systems. However, most of the existing real-time semantic segmentation models focus on encoder design and underutilize spatial and frequency domain information in the decoder, limiting the segmentation accuracy of the model. To solve this problem, this paper proposes a multi-branch decoder network combining spatial domain and frequency domain to meet the real-time and accuracy requirements of the semantic segmentation task of road scenes for autonomous driving systems. Firstly, the network introduces a novel multi-scale dilated fusion block that gradually enlarges the receptive field through three consecutive dilated convolutions, and integrates features from different levels using skip connections. At the same time, a strategy of gradually reducing the number of channels is adopted to effectively remove redundant features. Secondly, we design three branches for the decoder. The global branch utilizes a lightweight Transformer architecture to extract global features and employs horizontal and vertical convolutions to achieve interaction among global features. The multi-scale branch combines dilated convolution and adaptive pooling to perform multi-scale feature extraction through fusion and post-processing. The wavelet transform feature converter maps spatial domain features into low-frequency and high-frequency components, which are then fused with global and multi-scale features to enhance the model representation. Finally, we conduct experiments on multiple datasets. The experimental results show that the proposed method best balances segmentation accuracy and inference speed.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105483"},"PeriodicalIF":4.2,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143527561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Image and Vision Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1