首页 > 最新文献

Image and Vision Computing最新文献

英文 中文
W-shaped network combined with dual transformers and edge protection for multi-focus image fusion W 型网络与双变压器和边缘保护装置相结合,可实现多焦点图像融合
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-13 DOI: 10.1016/j.imavis.2024.105210
Hao Zhai, Yun Chen, Yao Wang, Yuncan Ouyang, Zhi Zeng

In this paper, a W-shaped network combined with dual transformers and edge protection is proposed for multi-focus image fusion. Different from the traditional Convolutional Neural Network (CNN) fusion method, a heterogeneous encoder network framework is designed for feature extraction, and a decoder is used for feature reconstruction. The purpose of this design is to preserve the local details and edge information of the source image to the maximum extent possible. Specifically, the first encoder uses adaptive average pooling to downsample the source image and extract important features from it. The source image pair for edge detection using the Gaussian Modified Laplace Operator (GMLO) is used as input for the second encoder, and adaptive maximum pooling is employed for downsampling. In addition, the encoder part of the network combines CNN and Transformer to extract both local and global features. By reconstructing the extracted feature information, the final fusion image is obtained. To evaluate the performance of this method, we compared 16 recent multi-focus image fusion methods and conducted qualitative and quantitative analyses. Experimental results on public datasets such as Lytro, MFFW, MFI-WHU, and the real scene dataset HBU-CVMDSP demonstrate that our method can accurately identify the focused and defocused regions of source images. It also preserves the edge details of the source images while extracting the focused regions.

本文提出了一种结合双变压器和边缘保护的 W 型网络,用于多焦点图像融合。与传统的卷积神经网络(CNN)融合方法不同,本文设计了一个异构编码器网络框架用于特征提取,解码器用于特征重建。这种设计的目的是最大限度地保留源图像的局部细节和边缘信息。具体来说,第一个编码器使用自适应平均池化技术对源图像进行下采样,并从中提取重要特征。使用高斯修正拉普拉斯算子(GMLO)进行边缘检测的源图像对作为第二个编码器的输入,并采用自适应最大池化进行下采样。此外,网络的编码器部分结合了 CNN 和变换器,以提取局部和全局特征。通过对提取的特征信息进行重构,得到最终的融合图像。为了评估这种方法的性能,我们比较了 16 种最新的多焦点图像融合方法,并进行了定性和定量分析。在 Lytro、MFFW、MFI-WHU 等公开数据集和真实场景数据集 HBU-CVMDSP 上的实验结果表明,我们的方法能准确识别源图像的聚焦和散焦区域。在提取聚焦区域的同时,它还保留了源图像的边缘细节。
{"title":"W-shaped network combined with dual transformers and edge protection for multi-focus image fusion","authors":"Hao Zhai,&nbsp;Yun Chen,&nbsp;Yao Wang,&nbsp;Yuncan Ouyang,&nbsp;Zhi Zeng","doi":"10.1016/j.imavis.2024.105210","DOIUrl":"10.1016/j.imavis.2024.105210","url":null,"abstract":"<div><p>In this paper, a W-shaped network combined with dual transformers and edge protection is proposed for multi-focus image fusion. Different from the traditional Convolutional Neural Network (CNN) fusion method, a heterogeneous encoder network framework is designed for feature extraction, and a decoder is used for feature reconstruction. The purpose of this design is to preserve the local details and edge information of the source image to the maximum extent possible. Specifically, the first encoder uses adaptive average pooling to downsample the source image and extract important features from it. The source image pair for edge detection using the Gaussian Modified Laplace Operator (GMLO) is used as input for the second encoder, and adaptive maximum pooling is employed for downsampling. In addition, the encoder part of the network combines CNN and Transformer to extract both local and global features. By reconstructing the extracted feature information, the final fusion image is obtained. To evaluate the performance of this method, we compared 16 recent multi-focus image fusion methods and conducted qualitative and quantitative analyses. Experimental results on public datasets such as Lytro, MFFW, MFI-WHU, and the real scene dataset HBU-CVMDSP demonstrate that our method can accurately identify the focused and defocused regions of source images. It also preserves the edge details of the source images while extracting the focused regions.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105210"},"PeriodicalIF":4.2,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OCUCFormer: An Over-Complete Under-Complete Transformer Network for accelerated MRI reconstruction OCUCFormer:用于加速核磁共振成像重建的过完整欠完整变压器网络
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-13 DOI: 10.1016/j.imavis.2024.105228
Mohammad Al Fahim , Sriprabha Ramanarayanan , G.S. Rahul , Matcha Naga Gayathri , Arunima Sarkar , Keerthi Ram , Mohanasankar Sivaprakasam

Many deep learning-based architectures have been proposed for accelerated Magnetic Resonance Imaging (MRI) reconstruction. However, existing encoder-decoder-based popular networks have a few shortcomings: (1) They focus on the anatomy structure at the expense of fine details, hindering their performance in generating faithful reconstructions; (2) Lack of long-range dependencies yields sub-optimal recovery of fine structural details. In this work, we propose an Over-Complete Under-Complete Transformer network (OCUCFormer) which focuses on better capturing fine edges and details in the image and can extract the long-range relations between these features for improved single-coil (SC) and multi-coil (MC) MRI reconstruction. Our model computes long-range relations in the highest resolutions using Restormer modules for improved acquisition and restoration of fine anatomical details. Towards learning in the absence of fully sampled ground truth for supervision, we show that our model trained with under-sampled data in a self-supervised fashion shows a superior recovery of fine structures compared to other works. We have extensively evaluated our network for SC and MC MRI reconstruction on brain, cardiac, and knee anatomies for 4× and 5× acceleration factors. We report significant improvements over popular deep learning-based methods when trained in supervised and self-supervised modes. We have also performed experiments demonstrating the strengths of extracting fine details and the anatomical structure and computing long-range relations within over-complete representations. Code for our proposed method is available at: https://github.com/alfahimmohammad/OCUCFormer-main.

许多基于深度学习的架构已被提出用于加速磁共振成像(MRI)重建。然而,现有的基于编码器-解码器的流行网络存在一些缺陷:(1)它们只关注解剖结构,而忽略了精细细节,从而影响了它们生成忠实重建的性能;(2)缺乏长程依赖性,导致精细结构细节的恢复效果不理想。在这项工作中,我们提出了一种过完整欠完整变换器网络(OCUCFormer),它能更好地捕捉图像中的精细边缘和细节,并能提取这些特征之间的长程关系,以改进单线圈(SC)和多线圈(MC)磁共振成像重建。我们的模型利用 Restormer 模块在最高分辨率下计算长程关系,以改进精细解剖细节的获取和还原。在没有完全采样的地面实况作为监督的情况下进行学习,我们的模型以自我监督的方式使用采样不足的数据进行训练,结果表明,与其他作品相比,我们的模型能更好地恢复精细结构。我们广泛评估了我们的网络在 4 倍和 5 倍加速因子下对大脑、心脏和膝关节解剖的 SC 和 MC MRI 重建。我们报告了在监督和自我监督模式下进行训练时,与流行的基于深度学习的方法相比所取得的显著进步。我们还进行了实验,证明了提取精细细节和解剖结构以及在过度完整的表征中计算长程关系的优势。我们提出的方法的代码可在以下网址获取:https://github.com/alfahimmohammad/OCUCFormer-main。
{"title":"OCUCFormer: An Over-Complete Under-Complete Transformer Network for accelerated MRI reconstruction","authors":"Mohammad Al Fahim ,&nbsp;Sriprabha Ramanarayanan ,&nbsp;G.S. Rahul ,&nbsp;Matcha Naga Gayathri ,&nbsp;Arunima Sarkar ,&nbsp;Keerthi Ram ,&nbsp;Mohanasankar Sivaprakasam","doi":"10.1016/j.imavis.2024.105228","DOIUrl":"10.1016/j.imavis.2024.105228","url":null,"abstract":"<div><p>Many deep learning-based architectures have been proposed for accelerated Magnetic Resonance Imaging (MRI) reconstruction. However, existing encoder-decoder-based popular networks have a few shortcomings: (1) They focus on the anatomy structure at the expense of fine details, hindering their performance in generating faithful reconstructions; (2) Lack of long-range dependencies yields sub-optimal recovery of fine structural details. In this work, we propose an Over-Complete Under-Complete Transformer network (OCUCFormer) which focuses on better capturing fine edges and details in the image and can extract the long-range relations between these features for improved single-coil (SC) and multi-coil (MC) MRI reconstruction. Our model computes long-range relations in the highest resolutions using Restormer modules for improved acquisition and restoration of fine anatomical details. Towards learning in the absence of fully sampled ground truth for supervision, we show that our model trained with under-sampled data in a self-supervised fashion shows a superior recovery of fine structures compared to other works. We have extensively evaluated our network for SC and MC MRI reconstruction on brain, cardiac, and knee anatomies for <span><math><mn>4</mn><mo>×</mo></math></span> and <span><math><mn>5</mn><mo>×</mo></math></span> acceleration factors. We report significant improvements over popular deep learning-based methods when trained in supervised and self-supervised modes. We have also performed experiments demonstrating the strengths of extracting fine details and the anatomical structure and computing long-range relations within over-complete representations. Code for our proposed method is available at: <span><span><span>https://github.com/alfahimmohammad/OCUCFormer-main</span></span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105228"},"PeriodicalIF":4.2,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141997841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coplane-constrained sparse depth sampling and local depth propagation for depth estimation 用于深度估计的共面约束稀疏深度采样和局部深度传播
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-12 DOI: 10.1016/j.imavis.2024.105227
Jiehua Zhang , Zhiwen Yang , Chuqiao Chen , Hongkui Wang , Tingyu Wang , Chenggang Yan , Yihong Gong

Depth estimation with sparse reference has emerged recently, and predicts depth map from a monocular image and a set of depth reference samples. Previous works randomly select reference samples by sensors, leading to severe depth bias as this sampling is independent to image semantic and neglects the unbalance of depth distribution in regions. This paper proposes a Coplane-Constrained sparse Depth (CCD) sampling to explore representative reference samples, and design a Local Depth Propagation (LDP) network for complete the sparse point cloud map. This can capture diverse depth information and diffuse the valid points to neighbors with geometry prior. Specifically, we first construct the surface normal map and detect coplane pixels by superpixel segmenting for sampling references, whose depth can be represented by that of superpixel centroid. Then, we introduce local depth propagation to obtain coarse-level depth map with geometric information, which dynamically diffuses the depth from the reference to neighbors based on local planar assumption. Further, we generate the fine-level depth map by devising a pixel-wise focal loss, which imposes the semantic and geometry calibration on pixels with low confidence in coarse-level prediction. Extensive experiments on public datasets demonstrate that our model outperforms SOTA depth estimation and completion methods.

利用稀疏参考进行深度估算是最近出现的一种方法,它可以根据单目图像和一组深度参考样本预测深度图。以往的工作是通过传感器随机选择参考样本,由于这种采样与图像语义无关,且忽略了区域深度分布的不平衡性,因此会导致严重的深度偏差。本文提出了一种科普兰-约束稀疏深度(CCD)采样方法来探索具有代表性的参考样本,并设计了一个局部深度传播(LDP)网络来完成稀疏点云图。这可以捕捉到不同的深度信息,并将有效点扩散到具有几何先验的邻近点。具体来说,我们首先构建表面法线图,并通过超像素分割检测共线像素作为采样参考,其深度可以用超像素中心点的深度来表示。然后,我们引入局部深度传播来获得包含几何信息的粗级深度图,它基于局部平面假设,将深度从参考点动态扩散到邻近点。此外,我们还通过设计一种像素级的焦点损失来生成精细级深度图,对粗级预测置信度较低的像素进行语义和几何校准。在公共数据集上进行的大量实验证明,我们的模型优于 SOTA 深度估计和补全方法。
{"title":"Coplane-constrained sparse depth sampling and local depth propagation for depth estimation","authors":"Jiehua Zhang ,&nbsp;Zhiwen Yang ,&nbsp;Chuqiao Chen ,&nbsp;Hongkui Wang ,&nbsp;Tingyu Wang ,&nbsp;Chenggang Yan ,&nbsp;Yihong Gong","doi":"10.1016/j.imavis.2024.105227","DOIUrl":"10.1016/j.imavis.2024.105227","url":null,"abstract":"<div><p>Depth estimation with sparse reference has emerged recently, and predicts depth map from a monocular image and a set of depth reference samples. Previous works randomly select reference samples by sensors, leading to severe depth bias as this sampling is independent to image semantic and neglects the unbalance of depth distribution in regions. This paper proposes a Coplane-Constrained sparse Depth (CCD) sampling to explore representative reference samples, and design a Local Depth Propagation (LDP) network for complete the sparse point cloud map. This can capture diverse depth information and diffuse the valid points to neighbors with geometry prior. Specifically, we first construct the surface normal map and detect coplane pixels by superpixel segmenting for sampling references, whose depth can be represented by that of superpixel centroid. Then, we introduce local depth propagation to obtain coarse-level depth map with geometric information, which dynamically diffuses the depth from the reference to neighbors based on local planar assumption. Further, we generate the fine-level depth map by devising a pixel-wise focal loss, which imposes the semantic and geometry calibration on pixels with low confidence in coarse-level prediction. Extensive experiments on public datasets demonstrate that our model outperforms SOTA depth estimation and completion methods.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105227"},"PeriodicalIF":4.2,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141990524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAM-RSP: A new few-shot segmentation method based on segment anything model and rough segmentation prompts SAM-RSP:基于分段任何事物模型和粗略分段提示的新型少镜头分段方法
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-12 DOI: 10.1016/j.imavis.2024.105214
Jiaguang Li, Ying Wei, Wei Zhang, Zhenrui Shi

Few-shot segmentation (FSS) aims to segment novel classes with a few labeled images. The backbones used in existing methods are pre-trained through classification tasks on the ImageNet dataset. Although these backbones can effectively perceive the semantic categories of images, they cannot accurately perceive the regional boundaries within one image, which limits the model performance. Recently, Segment Anything Model (SAM) has achieved precise image segmentation based on point or box prompts, thanks to its excellent perception of region boundaries within one image. However, it cannot effectively provide semantic information of images. This paper proposes a new few-shot segmentation method that can effectively perceive both semantic categories and regional boundaries. This method first utilizes the SAM encoder to perceive regions and obtain the query embedding. Then the support and query images are input into a backbone pre-trained on ImageNet to perceive semantics and generate a rough segmentation prompt (RSP). This query embedding is combined with the prompt to generate a pixel-level query prototype, which can better match the query embedding. Finally, the query embedding, prompt, and prototype are combined and input into the designed multi-layer prompt transformer decoder, which is more efficient and lightweight, and can provide a more accurate segmentation result. In addition, other methods can be easily combined with our framework to improve their performance. Plenty of experiments on PASCAL-5i and COCO-20i under 1-shot and 5-shot settings prove the effectiveness of our method. Our method also achieves new state-of-the-art. Codes are available at https://github.com/Jiaguang-NEU/SAM-RSP.

少量图像分割(FSS)旨在用少量标注图像分割新类别。现有方法中使用的骨干是通过 ImageNet 数据集上的分类任务预先训练出来的。虽然这些骨干能有效地感知图像的语义类别,但它们不能准确地感知一张图像中的区域边界,从而限制了模型的性能。最近,"任意分割模型"(Segment Anything Model,SAM)凭借其对图像内区域边界的出色感知能力,实现了基于点或框提示的精确图像分割。但是,它不能有效地提供图像的语义信息。本文提出了一种既能有效感知语义类别,又能有效感知区域边界的新的少帧分割方法。该方法首先利用 SAM 编码器感知区域并获得查询嵌入。然后,将支持图像和查询图像输入在 ImageNet 上预先训练好的骨干,以感知语义并生成粗略分割提示(RSP)。该查询嵌入与提示相结合,生成像素级查询原型,从而更好地匹配查询嵌入。最后,将查询嵌入、提示和原型结合起来,输入到所设计的多层提示变换解码器中,该解码器更高效、更轻便,能提供更准确的分割结果。此外,其他方法也可以很容易地与我们的框架相结合,以提高其性能。在 PASCAL-5i 和 COCO-20i 上进行的 1 次和 5 次设置下的大量实验证明了我们方法的有效性。我们的方法还达到了新的最高水平。代码见 https://github.com/Jiaguang-NEU/SAM-RSP。
{"title":"SAM-RSP: A new few-shot segmentation method based on segment anything model and rough segmentation prompts","authors":"Jiaguang Li,&nbsp;Ying Wei,&nbsp;Wei Zhang,&nbsp;Zhenrui Shi","doi":"10.1016/j.imavis.2024.105214","DOIUrl":"10.1016/j.imavis.2024.105214","url":null,"abstract":"<div><p>Few-shot segmentation (FSS) aims to segment novel classes with a few labeled images. The backbones used in existing methods are pre-trained through classification tasks on the ImageNet dataset. Although these backbones can effectively perceive the semantic categories of images, they cannot accurately perceive the regional boundaries within one image, which limits the model performance. Recently, Segment Anything Model (SAM) has achieved precise image segmentation based on point or box prompts, thanks to its excellent perception of region boundaries within one image. However, it cannot effectively provide semantic information of images. This paper proposes a new few-shot segmentation method that can effectively perceive both semantic categories and regional boundaries. This method first utilizes the SAM encoder to perceive regions and obtain the query embedding. Then the support and query images are input into a backbone pre-trained on ImageNet to perceive semantics and generate a rough segmentation prompt (RSP). This query embedding is combined with the prompt to generate a pixel-level query prototype, which can better match the query embedding. Finally, the query embedding, prompt, and prototype are combined and input into the designed multi-layer prompt transformer decoder, which is more efficient and lightweight, and can provide a more accurate segmentation result. In addition, other methods can be easily combined with our framework to improve their performance. Plenty of experiments on PASCAL-5<sup><em>i</em></sup> and COCO-20<sup><em>i</em></sup> under 1-shot and 5-shot settings prove the effectiveness of our method. Our method also achieves new state-of-the-art. Codes are available at <span><span>https://github.com/Jiaguang-NEU/SAM-RSP</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105214"},"PeriodicalIF":4.2,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141991203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synthetic lidar point cloud generation using deep generative models for improved driving scene object recognition 利用深度生成模型生成合成激光雷达点云,提高驾驶场景物体识别能力
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-11 DOI: 10.1016/j.imavis.2024.105207
Zhengkang Xiang, Zexian Huang, Kourosh Khoshelham

The imbalanced distribution of different object categories poses a challenge for training accurate object recognition models in driving scenes. Supervised machine learning models trained on imbalanced data are biased and easily overfit the majority classes, such as vehicles and pedestrians, which appear more frequently in driving scenes. We propose a novel data augmentation approach for object recognition in lidar point cloud of driving scenes, which leverages probabilistic generative models to produce synthetic point clouds for the minority classes and complement the original imbalanced dataset. We evaluate five generative models based on different statistical principles, including Gaussian mixture model, variational autoencoder, generative adversarial network, adversarial autoencoder and the diffusion model. Experiments with a real-world autonomous driving dataset show that the synthetic point clouds generated for the minority classes by the Latent Generative Adversarial Network result in significant improvement of object recognition performance for both minority and majority classes. The codes are available at https://github.com/AAAALEX-XIANG/Synthetic-Lidar-Generation.

不同物体类别的不平衡分布给驾驶场景中精确物体识别模型的训练带来了挑战。在不平衡数据上训练的监督机器学习模型是有偏差的,很容易过拟合大多数类别,如车辆和行人,这些类别在驾驶场景中出现得更频繁。我们针对驾驶场景激光雷达点云中的物体识别提出了一种新颖的数据增强方法,该方法利用概率生成模型为少数类别生成合成点云,并对原始不平衡数据集进行补充。我们评估了基于不同统计原理的五种生成模型,包括高斯混合模型、变异自动编码器、生成对抗网络、对抗自动编码器和扩散模型。使用真实世界自动驾驶数据集进行的实验表明,通过潜在生成对抗网络为少数群体生成的合成点云显著提高了少数群体和多数群体的物体识别性能。代码见 https://github.com/AAAALEX-XIANG/Synthetic-Lidar-Generation。
{"title":"Synthetic lidar point cloud generation using deep generative models for improved driving scene object recognition","authors":"Zhengkang Xiang,&nbsp;Zexian Huang,&nbsp;Kourosh Khoshelham","doi":"10.1016/j.imavis.2024.105207","DOIUrl":"10.1016/j.imavis.2024.105207","url":null,"abstract":"<div><p>The imbalanced distribution of different object categories poses a challenge for training accurate object recognition models in driving scenes. Supervised machine learning models trained on imbalanced data are biased and easily overfit the majority classes, such as vehicles and pedestrians, which appear more frequently in driving scenes. We propose a novel data augmentation approach for object recognition in lidar point cloud of driving scenes, which leverages probabilistic generative models to produce synthetic point clouds for the minority classes and complement the original imbalanced dataset. We evaluate five generative models based on different statistical principles, including Gaussian mixture model, variational autoencoder, generative adversarial network, adversarial autoencoder and the diffusion model. Experiments with a real-world autonomous driving dataset show that the synthetic point clouds generated for the minority classes by the Latent Generative Adversarial Network result in significant improvement of object recognition performance for both minority and majority classes. The codes are available at <span><span>https://github.com/AAAALEX-XIANG/Synthetic-Lidar-Generation</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105207"},"PeriodicalIF":4.2,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0262885624003123/pdfft?md5=f149d58b78f107538ca14bc730d87d86&pid=1-s2.0-S0262885624003123-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141991204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-branch teacher-student with noise-tolerant learning for domain adaptive nighttime segmentation 用于域自适应夜间分段的具有容噪学习功能的双分支师生系统
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-10 DOI: 10.1016/j.imavis.2024.105211
Ruiying Chen , Yunan Liu , Yuming Bo , Mingyu Lu

While significant progress has been achieved in the field of image semantic segmentation, the majority of research has been primarily concentrated on daytime scenes. Semantic segmentation of nighttime images is equally critical for autonomous driving; however, this task presents greater challenges due to inadequate lighting and difficulties associated with obtaining accurate manual annotations. In this paper, we introduce a novel method called the Dual-Branch Teacher-Student (DBTS) framework for unsupervised nighttime semantic segmentation. Our approach combines domain alignment and knowledge distillation in a mutually reinforcing manner. Firstly, we employ a photometric alignment module to dynamically generate target-like latent images, bridging the appearance gap between the source domain (daytime) and the target domain (nighttime). Secondly, we establish a dual-branch framework, where each branch enhances collaboration between the teacher and student networks. The student network utilizes adversarial learning to align the target domain with another domain (i.e., source or latent domain), while the teacher network generates reliable pseudo-labels by distilling knowledge from the latent domain. Furthermore, recognizing the potential noise present in pseudo-labels, we propose a noise-tolerant learning method to mitigate the risks associated with overreliance on pseudo-labels during domain adaptation. When evaluated on benchmark datasets, the proposed DBTS achieves state-of-the-art performance. Specifically, DBTS, using different backbones, outperforms established baseline models by approximately 25% in mIoU on the Zurich dataset and by over 26% in mIoU on the ACDC dataset, demonstrating the effectiveness of our method in addressing the challenges of domain-adaptive nighttime segmentation.

虽然图像语义分割领域取得了重大进展,但大部分研究主要集中在白天场景。夜间图像的语义分割对于自动驾驶同样至关重要;然而,由于照明不足以及难以获得准确的人工注释,这项任务面临着更大的挑战。在本文中,我们介绍了一种用于无监督夜间语义分割的新型方法,即双支师生(DBTS)框架。我们的方法以相辅相成的方式将领域对齐和知识提炼结合在一起。首先,我们采用光度对齐模块动态生成类似目标的潜在图像,弥合源域(白天)和目标域(夜间)之间的外观差距。其次,我们建立了一个双分支框架,每个分支都加强了教师网络和学生网络之间的协作。学生网络利用对抗学习将目标域与另一个域(即源域或潜域)对齐,而教师网络则通过从潜域中提炼知识生成可靠的伪标签。此外,考虑到伪标签中可能存在的噪声,我们提出了一种噪声容忍学习方法,以降低在域适应过程中过度依赖伪标签所带来的风险。在基准数据集上进行评估时,所提出的 DBTS 达到了最先进的性能。具体来说,使用不同骨干网的 DBTS 在苏黎世数据集上的 mIoU 优于已建立的基线模型约 25%,在 ACDC 数据集上的 mIoU 优于已建立的基线模型 26%,这证明了我们的方法在应对域自适应夜间分割挑战方面的有效性。
{"title":"Dual-branch teacher-student with noise-tolerant learning for domain adaptive nighttime segmentation","authors":"Ruiying Chen ,&nbsp;Yunan Liu ,&nbsp;Yuming Bo ,&nbsp;Mingyu Lu","doi":"10.1016/j.imavis.2024.105211","DOIUrl":"10.1016/j.imavis.2024.105211","url":null,"abstract":"<div><p>While significant progress has been achieved in the field of image semantic segmentation, the majority of research has been primarily concentrated on daytime scenes. Semantic segmentation of nighttime images is equally critical for autonomous driving; however, this task presents greater challenges due to inadequate lighting and difficulties associated with obtaining accurate manual annotations. In this paper, we introduce a novel method called the Dual-Branch Teacher-Student (DBTS) framework for unsupervised nighttime semantic segmentation. Our approach combines domain alignment and knowledge distillation in a mutually reinforcing manner. Firstly, we employ a photometric alignment module to dynamically generate target-like latent images, bridging the appearance gap between the source domain (daytime) and the target domain (nighttime). Secondly, we establish a dual-branch framework, where each branch enhances collaboration between the teacher and student networks. The student network utilizes adversarial learning to align the target domain with another domain (i.e., source or latent domain), while the teacher network generates reliable pseudo-labels by distilling knowledge from the latent domain. Furthermore, recognizing the potential noise present in pseudo-labels, we propose a noise-tolerant learning method to mitigate the risks associated with overreliance on pseudo-labels during domain adaptation. When evaluated on benchmark datasets, the proposed DBTS achieves state-of-the-art performance. Specifically, DBTS, using different backbones, outperforms established baseline models by approximately 25% in mIoU on the Zurich dataset and by over 26% in mIoU on the ACDC dataset, demonstrating the effectiveness of our method in addressing the challenges of domain-adaptive nighttime segmentation.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105211"},"PeriodicalIF":4.2,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An improved attentive residue multi-dilated network for thermal noise removal in magnetic resonance images 用于消除磁共振图像热噪声的改进型贴心残留多扩张网络
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-10 DOI: 10.1016/j.imavis.2024.105213
Bowen Jiang, Tao Yue, Xuemei Hu

Magnetic resonance imaging (MRI) technology is crucial in the medical field, but the thermal noise in the reconstructed MR images may interfere with the clinical diagnosis. Removing the thermal noise in MR images mainly contains two challenges. First, thermal noise in an MR image obeys Rician distribution, where the statistical features are not consistent in different regions of the image. In this case, conventional denoising methods like spatial convolutional filtering will not be appropriate to deal with it. Second, details and edge information in the image may get damaged while smoothing the noise. This paper proposes a novel deep-learning model to denoise MR images. First, the model learns a binary mask to separate the background and signal regions of the noised image, making the noise left in the signal region obey a unified statistical distribution. Second, the model is designed as an attentive residual multi-dilated network (ARM-Net), composed of a multi-branch structure, and supplemented with a frequency-domain-optimizable discrete cosine transform module. In this way, the deep-learning model will be more effective in removing the noise while maintaining the details of the original image. Furthermore, we have also made improvements on the original ARM-Net baseline to establish a new model called ARM-Net v2, which is more efficient and effective. Experimental results illustrate that over the BraTS 2018 dataset, our method achieves the PSNR of 39.7087 and 32.6005 at noise levels of 5% and 20%, which realizes the state-of-the-art performance among existing MR image denoising methods.

磁共振成像(MRI)技术在医学领域至关重要,但重建的磁共振图像中的热噪声可能会干扰临床诊断。去除磁共振图像中的热噪声主要面临两个挑战。首先,核磁共振图像中的热噪声服从 Rician 分布,图像不同区域的统计特征并不一致。在这种情况下,传统的去噪方法(如空间卷积滤波)并不适合处理它。其次,在平滑噪声的过程中,图像中的细节和边缘信息可能会遭到破坏。本文提出了一种新型深度学习模型来对磁共振图像进行去噪。首先,该模型通过学习二元掩码来分离噪声图像的背景和信号区域,使留在信号区域的噪声服从统一的统计分布。其次,该模型被设计为由多分支结构组成的殷勤残差多疏导网络(ARM-Net),并辅以频域优化离散余弦变换模块。这样,深度学习模型就能更有效地去除噪声,同时保持原始图像的细节。此外,我们还对原有的 ARM-Net 基线进行了改进,建立了名为 ARM-Net v2 的新模型,该模型更加高效和有效。实验结果表明,在 BraTS 2018 数据集上,我们的方法在噪声水平为 5% 和 20% 时的 PSNR 分别达到了 39.7087 和 32.6005,在现有的 MR 图像去噪方法中实现了最先进的性能。
{"title":"An improved attentive residue multi-dilated network for thermal noise removal in magnetic resonance images","authors":"Bowen Jiang,&nbsp;Tao Yue,&nbsp;Xuemei Hu","doi":"10.1016/j.imavis.2024.105213","DOIUrl":"10.1016/j.imavis.2024.105213","url":null,"abstract":"<div><p>Magnetic resonance imaging (MRI) technology is crucial in the medical field, but the thermal noise in the reconstructed MR images may interfere with the clinical diagnosis. Removing the thermal noise in MR images mainly contains two challenges. First, thermal noise in an MR image obeys Rician distribution, where the statistical features are not consistent in different regions of the image. In this case, conventional denoising methods like spatial convolutional filtering will not be appropriate to deal with it. Second, details and edge information in the image may get damaged while smoothing the noise. This paper proposes a novel deep-learning model to denoise MR images. First, the model learns a binary mask to separate the background and signal regions of the noised image, making the noise left in the signal region obey a unified statistical distribution. Second, the model is designed as an attentive residual multi-dilated network (ARM-Net), composed of a multi-branch structure, and supplemented with a frequency-domain-optimizable discrete cosine transform module. In this way, the deep-learning model will be more effective in removing the noise while maintaining the details of the original image. Furthermore, we have also made improvements on the original ARM-Net baseline to establish a new model called ARM-Net v2, which is more efficient and effective. Experimental results illustrate that over the BraTS 2018 dataset, our method achieves the PSNR of 39.7087 and 32.6005 at noise levels of 5% and 20%, which realizes the state-of-the-art performance among existing MR image denoising methods.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105213"},"PeriodicalIF":4.2,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale feature correspondence and pseudo label retraining strategy for weakly supervised semantic segmentation 弱监督语义分割的多尺度特征对应和伪标签再训练策略
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-10 DOI: 10.1016/j.imavis.2024.105215
Weizheng Wang, Lei Zhou, Haonan Wang

Recently, the performance of semantic segmentation using weakly supervised learning has significantly improved. Weakly supervised semantic segmentation (WSSS) that uses only image-level labels has received widespread attention, it employs Class Activation Maps (CAM) to generate pseudo labels. Compared to traditional use of pixel-level labels, this technique greatly reduces annotation costs by utilizing simpler and more readily available image-level annotations. Besides, due to the local perceptual ability of Convolutional Neural Networks (CNN), the generated CAM cannot activate the entire object area. Researchers have found that this CNN limitation can be compensated for by using Vision Transformer (ViT). However, ViT also introduces an over-smoothing problem. Recent research has made good progress in solving this issue, but when discussing CAM and its related segmentation predictions, it is easy to overlook their intrinsic information and the interrelationships between them. In this paper, we propose a Multi-Scale Feature Correspondence (MSFC) method. Our MSFC can obtain the feature correspondence of CAM and segmentation predictions at different scales, re-extract useful semantic information from them, enhancing the network's learning of feature information and improving the quality of CAM. Moreover, to further improve the segmentation precision, we design a Pseudo Label Retraining Strategy (PLRS). This strategy refines the accuracy in local regions, elevates the quality of pseudo labels, and aims to enhance segmentation precision. Experimental results on the PASCAL VOC 2012 and MS COCO 2014 datasets show that our method achieves impressive performance among end-to-end WSSS methods.

最近,使用弱监督学习进行语义分割的性能有了显著提高。仅使用图像级标签的弱监督语义分割(WSSS)受到广泛关注,它采用类激活图(CAM)生成伪标签。与传统的像素级标签相比,这种技术利用更简单、更容易获得的图像级注释,大大降低了注释成本。此外,由于卷积神经网络(CNN)的局部感知能力,生成的 CAM 无法激活整个对象区域。研究人员发现,使用视觉转换器(ViT)可以弥补 CNN 的这一局限。然而,ViT 也会带来过度平滑问题。最近的研究在解决这一问题方面取得了良好进展,但在讨论 CAM 及其相关分割预测时,很容易忽略它们的内在信息以及它们之间的相互关系。本文提出了一种多尺度特征对应(MSFC)方法。我们的 MSFC 可以获得 CAM 与分割预测在不同尺度上的特征对应关系,从中重新提取有用的语义信息,增强网络对特征信息的学习,提高 CAM 的质量。此外,为了进一步提高分割精度,我们还设计了一种伪标签重训练策略(PLRS)。该策略可以提高局部区域的准确性,提升伪标签的质量,从而达到提高分割精度的目的。在 PASCAL VOC 2012 和 MS COCO 2014 数据集上的实验结果表明,在端到端 WSSS 方法中,我们的方法取得了令人瞩目的性能。
{"title":"Multi-scale feature correspondence and pseudo label retraining strategy for weakly supervised semantic segmentation","authors":"Weizheng Wang,&nbsp;Lei Zhou,&nbsp;Haonan Wang","doi":"10.1016/j.imavis.2024.105215","DOIUrl":"10.1016/j.imavis.2024.105215","url":null,"abstract":"<div><p>Recently, the performance of semantic segmentation using weakly supervised learning has significantly improved. Weakly supervised semantic segmentation (WSSS) that uses only image-level labels has received widespread attention, it employs Class Activation Maps (CAM) to generate pseudo labels. Compared to traditional use of pixel-level labels, this technique greatly reduces annotation costs by utilizing simpler and more readily available image-level annotations. Besides, due to the local perceptual ability of Convolutional Neural Networks (CNN), the generated CAM cannot activate the entire object area. Researchers have found that this CNN limitation can be compensated for by using Vision Transformer (ViT). However, ViT also introduces an over-smoothing problem. Recent research has made good progress in solving this issue, but when discussing CAM and its related segmentation predictions, it is easy to overlook their intrinsic information and the interrelationships between them. In this paper, we propose a Multi-Scale Feature Correspondence (MSFC) method. Our MSFC can obtain the feature correspondence of CAM and segmentation predictions at different scales, re-extract useful semantic information from them, enhancing the network's learning of feature information and improving the quality of CAM. Moreover, to further improve the segmentation precision, we design a Pseudo Label Retraining Strategy (PLRS). This strategy refines the accuracy in local regions, elevates the quality of pseudo labels, and aims to enhance segmentation precision. Experimental results on the PASCAL VOC 2012 and MS COCO 2014 datasets show that our method achieves impressive performance among end-to-end WSSS methods.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105215"},"PeriodicalIF":4.2,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142041055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater image restoration based on light attenuation prior and color-contrast adaptive correction 基于光衰减先验和色彩对比度自适应校正的水下图像复原
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-10 DOI: 10.1016/j.imavis.2024.105217
Jianru Li , Xu Zhu , Yuchao Zheng , Huimin Lu , Yujie Li

Underwater imaging is uniquely beset by issues such as color distortion and diminished contrast due to the intricate behavior of light as it traverses water, being attenuated by processes of absorption and scattering. Distinct from traditional underwater image restoration techniques, our methodology uniquely accommodates attenuation coefficients pertinent to diverse water conditions. We endeavor to recover the pristine image by approximating decay rates, focusing particularly on the blue-red and blue-green color channels. Recognizing the inherent ambiguities surrounding water type classifications, we meticulously assess attenuation coefficient ratios for an array of predefined aquatic categories. Each classification results in a uniquely restored image, and an automated selection algorithm is employed to determine the most optimal output, rooted in its color distribution. In tandem, we've innovated a color-contrast adaptive correction technique, purposefully crafted to remedy color anomalies in underwater images while simultaneously amplifying contrast and detail fidelity. Extensive trials on benchmark datasets unambiguously highlight our method's preeminence over six other renowned strategies. Impressively, our methodology exhibits exceptional resilience and adaptability, particularly in scenarios dominated by green background imagery.

由于光线在水中穿行时会受到吸收和散射过程的衰减,其复杂的行为会导致色彩失真和对比度降低,因此水下成像受到这些问题的独特困扰。与传统的水下图像复原技术不同,我们的方法能够独特地适应不同水域条件下的衰减系数。我们努力通过近似衰减率来恢复原始图像,尤其侧重于蓝-红和蓝-绿颜色通道。我们认识到水体类型分类存在固有的模糊性,因此对一系列预定义的水体类别进行了细致的衰减系数比评估。每种分类都会生成独特的修复图像,并采用自动选择算法,根据图像的颜色分布确定最佳输出。与此同时,我们还创新了色彩对比度自适应校正技术,专门用于纠正水下图像中的色彩异常,同时增强对比度和细节保真度。在基准数据集上进行的广泛试验明确显示,我们的方法优于其他六种著名策略。令人印象深刻的是,我们的方法表现出卓越的弹性和适应性,尤其是在绿色背景图像占主导地位的情况下。
{"title":"Underwater image restoration based on light attenuation prior and color-contrast adaptive correction","authors":"Jianru Li ,&nbsp;Xu Zhu ,&nbsp;Yuchao Zheng ,&nbsp;Huimin Lu ,&nbsp;Yujie Li","doi":"10.1016/j.imavis.2024.105217","DOIUrl":"10.1016/j.imavis.2024.105217","url":null,"abstract":"<div><p>Underwater imaging is uniquely beset by issues such as color distortion and diminished contrast due to the intricate behavior of light as it traverses water, being attenuated by processes of absorption and scattering. Distinct from traditional underwater image restoration techniques, our methodology uniquely accommodates attenuation coefficients pertinent to diverse water conditions. We endeavor to recover the pristine image by approximating decay rates, focusing particularly on the blue-red and blue-green color channels. Recognizing the inherent ambiguities surrounding water type classifications, we meticulously assess attenuation coefficient ratios for an array of predefined aquatic categories. Each classification results in a uniquely restored image, and an automated selection algorithm is employed to determine the most optimal output, rooted in its color distribution. In tandem, we've innovated a color-contrast adaptive correction technique, purposefully crafted to remedy color anomalies in underwater images while simultaneously amplifying contrast and detail fidelity. Extensive trials on benchmark datasets unambiguously highlight our method's preeminence over six other renowned strategies. Impressively, our methodology exhibits exceptional resilience and adaptability, particularly in scenarios dominated by green background imagery.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105217"},"PeriodicalIF":4.2,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142097494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DynaSeg: A deep dynamic fusion method for unsupervised image segmentation incorporating feature similarity and spatial continuity DynaSeg:结合特征相似性和空间连续性的无监督图像分割深度动态融合方法
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-10 DOI: 10.1016/j.imavis.2024.105206
Boujemaa Guermazi , Riadh Ksantini , Naimul Khan

Our work tackles the fundamental challenge of image segmentation in computer vision, which is crucial for diverse applications. While supervised methods demonstrate proficiency, their reliance on extensive pixel-level annotations limits scalability. We introduce DynaSeg, an innovative unsupervised image segmentation approach that overcomes the challenge of balancing feature similarity and spatial continuity without relying on extensive hyperparameter tuning. Unlike traditional methods, DynaSeg employs a dynamic weighting scheme that automates parameter tuning, adapts flexibly to image characteristics, and facilitates easy integration with other segmentation networks. By incorporating a Silhouette Score Phase, DynaSeg prevents undersegmentation failures where the number of predicted clusters might converge to one. DynaSeg uses CNN-based and pre-trained ResNet feature extraction, making it computationally efficient and more straightforward than other complex models. Experimental results showcase state-of-the-art performance, achieving a 12.2% and 14.12% mIOU improvement over current unsupervised segmentation approaches on COCO-All and COCO-Stuff datasets, respectively. We provide qualitative and quantitative results on five benchmark datasets, demonstrating the efficacy of the proposed approach. Code available at url{https://github.com/RyersonMultimediaLab/DynaSeg}

我们的工作是应对计算机视觉中图像分割这一基本挑战,这对各种应用至关重要。虽然有监督的方法表现出了熟练的能力,但它们对大量像素级注释的依赖限制了可扩展性。我们介绍的 DynaSeg 是一种创新的无监督图像分割方法,它克服了平衡特征相似性和空间连续性的难题,而无需依赖大量的超参数调整。与传统方法不同,DynaSeg 采用动态加权方案,可自动调整参数,灵活适应图像特征,并便于与其他分割网络集成。DynaSeg 结合了轮廓分数阶段(Silhouette Score Phase),从而避免了未充分分割的失败,因为在这种情况下,预测簇的数量可能会趋近于一个。DynaSeg 使用基于 CNN 的预训练 ResNet 特征提取,因此计算效率高,比其他复杂模型更简单。实验结果展示了最先进的性能,在 COCO-All 和 COCO-Stuff 数据集上分别比当前的无监督分割方法提高了 12.2% 和 14.12% mIOU。我们提供了五个基准数据集的定性和定量结果,证明了所提方法的有效性。代码见 url{https://github.com/RyersonMultimediaLab/DynaSeg}
{"title":"DynaSeg: A deep dynamic fusion method for unsupervised image segmentation incorporating feature similarity and spatial continuity","authors":"Boujemaa Guermazi ,&nbsp;Riadh Ksantini ,&nbsp;Naimul Khan","doi":"10.1016/j.imavis.2024.105206","DOIUrl":"10.1016/j.imavis.2024.105206","url":null,"abstract":"<div><p>Our work tackles the fundamental challenge of image segmentation in computer vision, which is crucial for diverse applications. While supervised methods demonstrate proficiency, their reliance on extensive pixel-level annotations limits scalability. We introduce DynaSeg, an innovative unsupervised image segmentation approach that overcomes the challenge of balancing feature similarity and spatial continuity without relying on extensive hyperparameter tuning. Unlike traditional methods, DynaSeg employs a dynamic weighting scheme that automates parameter tuning, adapts flexibly to image characteristics, and facilitates easy integration with other segmentation networks. By incorporating a Silhouette Score Phase, DynaSeg prevents undersegmentation failures where the number of predicted clusters might converge to one. DynaSeg uses CNN-based and pre-trained ResNet feature extraction, making it computationally efficient and more straightforward than other complex models. Experimental results showcase state-of-the-art performance, achieving a 12.2% and 14.12% mIOU improvement over current unsupervised segmentation approaches on COCO-All and COCO-Stuff datasets, respectively. We provide qualitative and quantitative results on five benchmark datasets, demonstrating the efficacy of the proposed approach. Code available at url{<span><span>https://github.com/RyersonMultimediaLab/DynaSeg</span><svg><path></path></svg></span>}</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105206"},"PeriodicalIF":4.2,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0262885624003111/pdfft?md5=da5c387758372711e4b28912d6fd15cc&pid=1-s2.0-S0262885624003111-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Image and Vision Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1