首页 > 最新文献

Machine Vision and Applications最新文献

英文 中文
FESAR: SAR ship detection model based on local spatial relationship capture and fused convolutional enhancement FESAR:基于局部空间关系捕捉和融合卷积增强的合成孔径雷达船舶探测模型
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-08 DOI: 10.1007/s00138-024-01516-4
Chongchong Liu, Chunman Yan

Synthetic aperture radar (SAR) is instrumental in ship monitoring owing to its all-weather capabilities and high resolution. In SAR images, ship targets frequently display blurred or mixed boundaries with the background, and instances of occlusion or partial occlusion may occur. Additionally, multi-scale transformations and small-target ships pose challenges for ship detection. To tackle these challenges, we propose a novel SAR ship detection model, FESAR. Firstly, in addressing multi-scale transformations in ship detection, we propose the Fused Convolution Enhancement Module (FCEM). This network incorporates distinct convolutional branches designed to capture local and global features, which are subsequently fused and enhanced. Secondly, a Spatial Relationship Analysis Module (SRAM) with a spatial-mixing layer is designed to analyze the local spatial relationship between the ship target and the background, effectively combining local information to discern feature distinctions between the ship target and the background. Finally, a new backbone network, SPD-YOLO, is designed to perform deep downsampling for the comprehensive extraction of semantic information related to ships. To validate the model’s performance, an extensive series of experiments was conducted on the public datasets HRSID, LS-SSDD-v1.0, and SSDD. The results demonstrate the outstanding performance of the proposed FESAR model compared to numerous state-of-the-art (SOTA) models. Relative to the baseline model, FESAR exhibits an improvement in mAP by 2.6% on the HRSID dataset, 5.5% on LS-SSDD-v1.0, and 0.2% on the SSDD dataset. In comparison with numerous SAR ship detection models, FESAR demonstrates superior comprehensive performance.

合成孔径雷达(SAR)因其全天候功能和高分辨率而在船舶监测方面发挥着重要作用。在合成孔径雷达图像中,船舶目标经常显示模糊或与背景混合的边界,并可能出现遮挡或部分遮挡的情况。此外,多尺度变换和小目标船只也给船只检测带来了挑战。为了应对这些挑战,我们提出了一种新型合成孔径雷达船舶检测模型 FESAR。首先,针对船舶检测中的多尺度变换,我们提出了融合卷积增强模块(FCEM)。该网络包含不同的卷积分支,旨在捕捉局部和全局特征,然后进行融合和增强。其次,设计了一个带有空间混合层的空间关系分析模块(SRAM),用于分析船舶目标与背景之间的局部空间关系,有效地结合局部信息来辨别船舶目标与背景之间的特征区别。最后,设计了一个新的骨干网络 SPD-YOLO,用于执行深度下采样,以全面提取与船舶相关的语义信息。为了验证模型的性能,我们在公共数据集 HRSID、LS-SSDD-v1.0 和 SSDD 上进行了一系列广泛的实验。结果表明,与众多最先进的(SOTA)模型相比,所提出的 FESAR 模型性能卓越。与基线模型相比,FESAR 在 HRSID 数据集上的 mAP 提高了 2.6%,在 LS-SSDD-v1.0 上提高了 5.5%,在 SSDD 数据集上提高了 0.2%。与众多合成孔径雷达船舶探测模型相比,FESAR 的综合性能更胜一筹。
{"title":"FESAR: SAR ship detection model based on local spatial relationship capture and fused convolutional enhancement","authors":"Chongchong Liu, Chunman Yan","doi":"10.1007/s00138-024-01516-4","DOIUrl":"https://doi.org/10.1007/s00138-024-01516-4","url":null,"abstract":"<p>Synthetic aperture radar (SAR) is instrumental in ship monitoring owing to its all-weather capabilities and high resolution. In SAR images, ship targets frequently display blurred or mixed boundaries with the background, and instances of occlusion or partial occlusion may occur. Additionally, multi-scale transformations and small-target ships pose challenges for ship detection. To tackle these challenges, we propose a novel SAR ship detection model, FESAR. Firstly, in addressing multi-scale transformations in ship detection, we propose the Fused Convolution Enhancement Module (FCEM). This network incorporates distinct convolutional branches designed to capture local and global features, which are subsequently fused and enhanced. Secondly, a Spatial Relationship Analysis Module (SRAM) with a spatial-mixing layer is designed to analyze the local spatial relationship between the ship target and the background, effectively combining local information to discern feature distinctions between the ship target and the background. Finally, a new backbone network, SPD-YOLO, is designed to perform deep downsampling for the comprehensive extraction of semantic information related to ships. To validate the model’s performance, an extensive series of experiments was conducted on the public datasets HRSID, LS-SSDD-v1.0, and SSDD. The results demonstrate the outstanding performance of the proposed FESAR model compared to numerous state-of-the-art (SOTA) models. Relative to the baseline model, FESAR exhibits an improvement in mAP by 2.6% on the HRSID dataset, 5.5% on LS-SSDD-v1.0, and 0.2% on the SSDD dataset. In comparison with numerous SAR ship detection models, FESAR demonstrates superior comprehensive performance.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140071207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An adaptive interpolation and 3D reconstruction algorithm for underwater images 水下图像的自适应插值和三维重建算法
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-07 DOI: 10.1007/s00138-024-01518-2
Zhijie Tang, Congqi Xu, Siyu Yan

3D reconstruction technology is gradually applied to underwater scenes, which has become a crucial research direction for human ocean exploration and exploitation. However, due to the complexity of the underwater environment, the number of high-quality underwater images acquired by underwater robots is limited and cannot meet the requirements of 3D reconstruction. Therefore, this paper proposes an adaptive 3D reconstruction algorithm for underwater targets. We apply the frame interpolation technique to underwater 3D reconstruction, an unprecedented technical attempt. In this paper, we design a single-stage large-angle span underwater image interpolation model, which has an excellent enhancement effect on degraded underwater 2D images compared with other methods. Current methods make it challenging to balance the relationship between feature information acquisition and underwater image quality improvement. In this paper, an optimized cascaded feature pyramid scheme and an adaptive bidirectional optical flow estimation algorithm based on underwater NRIQA metrics are proposed and applied to the proposed model to solve the above problems. The intermediate image output from the model improves the image quality and retains the detailed information. Experiments show that the method proposed in this paper outperforms other methods when dealing with several typical degradation types of underwater images. In underwater 3D reconstruction, the intermediate image generated by the model is used as input instead of the degraded image to obtain a denser 3D point cloud and better visualization. Our method is instructive to the problem of acquiring underwater high-quality target images and underwater 3D reconstruction.

三维重建技术正逐步应用于水下场景,成为人类海洋探测与开发的重要研究方向。然而,由于水下环境的复杂性,水下机器人获取的高质量水下图像数量有限,无法满足三维重建的要求。因此,本文提出了一种针对水下目标的自适应三维重建算法。我们将帧插值技术应用于水下三维重建,这是一次前所未有的技术尝试。本文设计了一种单级大角度跨度水下图像插值模型,与其他方法相比,该模型对劣化的水下二维图像有很好的增强效果。目前的方法在平衡特征信息获取与水下图像质量提升之间的关系上存在挑战。本文提出了一种优化的级联特征金字塔方案和基于水下 NRIQA 指标的自适应双向光流估计算法,并将其应用于所提出的模型,以解决上述问题。模型输出的中间图像提高了图像质量并保留了细节信息。实验表明,在处理几种典型的水下图像退化类型时,本文提出的方法优于其他方法。在水下三维重建中,使用模型生成的中间图像代替退化图像作为输入,可以获得更密集的三维点云和更好的可视化效果。我们的方法对获取水下高质量目标图像和水下三维重建问题具有指导意义。
{"title":"An adaptive interpolation and 3D reconstruction algorithm for underwater images","authors":"Zhijie Tang, Congqi Xu, Siyu Yan","doi":"10.1007/s00138-024-01518-2","DOIUrl":"https://doi.org/10.1007/s00138-024-01518-2","url":null,"abstract":"<p>3D reconstruction technology is gradually applied to underwater scenes, which has become a crucial research direction for human ocean exploration and exploitation. However, due to the complexity of the underwater environment, the number of high-quality underwater images acquired by underwater robots is limited and cannot meet the requirements of 3D reconstruction. Therefore, this paper proposes an adaptive 3D reconstruction algorithm for underwater targets. We apply the frame interpolation technique to underwater 3D reconstruction, an unprecedented technical attempt. In this paper, we design a single-stage large-angle span underwater image interpolation model, which has an excellent enhancement effect on degraded underwater 2D images compared with other methods. Current methods make it challenging to balance the relationship between feature information acquisition and underwater image quality improvement. In this paper, an optimized cascaded feature pyramid scheme and an adaptive bidirectional optical flow estimation algorithm based on underwater NRIQA metrics are proposed and applied to the proposed model to solve the above problems. The intermediate image output from the model improves the image quality and retains the detailed information. Experiments show that the method proposed in this paper outperforms other methods when dealing with several typical degradation types of underwater images. In underwater 3D reconstruction, the intermediate image generated by the model is used as input instead of the degraded image to obtain a denser 3D point cloud and better visualization. Our method is instructive to the problem of acquiring underwater high-quality target images and underwater 3D reconstruction.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140076399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised Siamese keypoint inference network for human pose estimation and tracking 用于人体姿态估计和跟踪的自监督连体关键点推理网络
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-05 DOI: 10.1007/s00138-024-01515-5

Abstract

Human pose estimation and tracking are important tasks to help understand human behavior. Currently, human pose estimation and tracking face the challenges of missed detection due to sparse annotation of video datasets and difficulty in associating partially occluded and unoccluded cases of the same person. To address these challenges, we propose a self-supervised learning-based method, which infers the correspondence between keypoints to associate persons in the videos. Specifically, we propose a bounding box recovery module to recover missed detections and a Siamese keypoint inference network to solve the issue of error matching caused by occlusions. The local–global attention module, which is designed in the Siamese keypoint inference network, learns the varying dependence information of human keypoints between frames. To simulate the occlusions, we mask random pixels in the image before pre-training using knowledge distillation to associate the differing occlusions of the same person. Our method achieves better results than state-of-the-art methods for human pose estimation and tracking on the PoseTrack 2018 and PoseTrack 2021 datasets. Code is available at: https://github.com/yhtian2023/SKITrack.

摘要 人体姿态估计和跟踪是帮助理解人类行为的重要任务。目前,人体姿态估计和跟踪面临着由于视频数据集注释稀疏而导致的漏检以及难以将同一人的部分遮挡和未遮挡情况联系起来的挑战。为了应对这些挑战,我们提出了一种基于自我监督学习的方法,该方法通过推断关键点之间的对应关系来关联视频中的人物。具体来说,我们提出了一个边界框恢复模块来恢复遗漏的检测,并提出了一个连体关键点推理网络来解决因遮挡造成的错误匹配问题。在连体关键点推理网络中设计的局部-全局注意力模块可以学习帧间人类关键点的不同依赖信息。为了模拟遮挡,我们在预训练前屏蔽了图像中的随机像素,利用知识提炼来关联同一人物的不同遮挡。在 PoseTrack 2018 和 PoseTrack 2021 数据集上,我们的方法比最先进的人类姿势估计和跟踪方法取得了更好的结果。代码见:https://github.com/yhtian2023/SKITrack。
{"title":"Self-supervised Siamese keypoint inference network for human pose estimation and tracking","authors":"","doi":"10.1007/s00138-024-01515-5","DOIUrl":"https://doi.org/10.1007/s00138-024-01515-5","url":null,"abstract":"<h3>Abstract</h3> <p>Human pose estimation and tracking are important tasks to help understand human behavior. Currently, human pose estimation and tracking face the challenges of missed detection due to sparse annotation of video datasets and difficulty in associating partially occluded and unoccluded cases of the same person. To address these challenges, we propose a self-supervised learning-based method, which infers the correspondence between keypoints to associate persons in the videos. Specifically, we propose a bounding box recovery module to recover missed detections and a Siamese keypoint inference network to solve the issue of error matching caused by occlusions. The local–global attention module, which is designed in the Siamese keypoint inference network, learns the varying dependence information of human keypoints between frames. To simulate the occlusions, we mask random pixels in the image before pre-training using knowledge distillation to associate the differing occlusions of the same person. Our method achieves better results than state-of-the-art methods for human pose estimation and tracking on the PoseTrack 2018 and PoseTrack 2021 datasets. Code is available at: https://github.com/yhtian2023/SKITrack.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140036481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
That’s BAD: blind anomaly detection by implicit local feature clustering 真糟糕:通过隐式局部特征聚类进行盲目异常检测
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-02 DOI: 10.1007/s00138-024-01511-9
Jie Zhang, Masanori Suganuma, Takayuki Okatani

Recent studies on visual anomaly detection (AD) of industrial objects/textures have achieved quite good performance. They consider an unsupervised setting, specifically the one-class setting, in which we assume the availability of a set of normal (i.e., anomaly-free) images for training. In this paper, we consider a more challenging scenario of unsupervised AD, in which we detect anomalies in a given set of images that might contain both normal and anomalous samples. The setting does not assume the availability of known normal data and thus is completely free from human annotation, which differs from the standard AD considered in recent studies. For clarity, we call the setting blind anomaly detection (BAD). We show that BAD can be converted into a local outlier detection problem and propose a novel method named PatchCluster that can accurately detect image- and pixel-level anomalies. Experimental results show that PatchCluster shows a promising performance without the knowledge of normal data, even comparable to the SOTA methods applied in the one-class setting needing it.

最近关于工业物体/纹理视觉异常检测(AD)的研究取得了相当不错的效果。这些研究考虑了无监督环境,特别是单类环境,其中我们假设有一组正常(即无异常)图像用于训练。在本文中,我们考虑的是更具挑战性的无监督 AD 场景,即在一组给定的图像中检测异常情况,这组图像可能既包含正常样本,也包含异常样本。这种情况不假定存在已知的正常数据,因此完全不需要人工标注,这与近期研究中考虑的标准 AD 有所不同。为清楚起见,我们称这种设置为盲法异常检测(BAD)。我们的研究表明,BAD 可以转化为局部异常点检测问题,并提出了一种名为 PatchCluster 的新方法,该方法可以准确检测图像和像素级异常点。实验结果表明,PatchCluster 在不了解正常数据的情况下表现出了良好的性能,甚至可以与在单类设置中应用的 SOTA 方法相媲美。
{"title":"That’s BAD: blind anomaly detection by implicit local feature clustering","authors":"Jie Zhang, Masanori Suganuma, Takayuki Okatani","doi":"10.1007/s00138-024-01511-9","DOIUrl":"https://doi.org/10.1007/s00138-024-01511-9","url":null,"abstract":"<p>Recent studies on visual anomaly detection (AD) of industrial objects/textures have achieved quite good performance. They consider an unsupervised setting, specifically the one-class setting, in which we assume the availability of a set of normal (i.e., anomaly-free) images for training. In this paper, we consider a more challenging scenario of unsupervised AD, in which we detect anomalies in a given set of images that might contain both normal and anomalous samples. The setting does not assume the availability of known normal data and thus is completely free from human annotation, which differs from the standard AD considered in recent studies. For clarity, we call the setting blind anomaly detection (BAD). We show that BAD can be converted into a local outlier detection problem and propose a novel method named PatchCluster that can accurately detect image- and pixel-level anomalies. Experimental results show that PatchCluster shows a promising performance without the knowledge of normal data, even comparable to the SOTA methods applied in the one-class setting needing it.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140019366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A pixel and channel enhanced up-sampling module for biomedical image segmentation 用于生物医学图像分割的像素和通道增强型上采样模块
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-02-22 DOI: 10.1007/s00138-024-01513-7
Xuan Zhang, Guoping Xu, Xinglong Wu, Wentao Liao, Xuesong Leng, Xiaxia Wang, Xinwei He, Chang Li

Up-sampling operations are frequently utilized to recover the spatial resolution of feature maps in neural networks for segmentation task. However, current up-sampling methods, such as bilinear interpolation or deconvolution, do not fully consider the relationship of feature maps, which have negative impact on learning discriminative features for semantic segmentation. In this paper, we propose a pixel and channel enhanced up-sampling (PCE) module for low-resolution feature maps, aiming to use the relationship of adjacent pixels and channels for learning discriminative high-resolution feature maps. Specifically, the proposed up-sampling module includes two main operations: (1) increasing spatial resolution of feature maps with pixel shuffle and (2) recalibrating channel-wise high-resolution feature response. Our proposed up-sampling module could be integrated into CNN and Transformer segmentation architectures. Extensive experiments on three different modality datasets of biomedical images, including computed tomography (CT), magnetic resonance imaging (MRI) and micro-optical sectioning tomography images (MOST) demonstrate the proposed method could effectively improve the performance of representative segmentation models.

上采样操作经常被用来恢复神经网络中特征图的空间分辨率,以完成分割任务。然而,目前的上采样方法,如双线性插值或解卷积,并没有充分考虑特征图之间的关系,这对学习语义分割的判别特征有负面影响。本文提出了针对低分辨率特征图的像素和信道增强上采样(PCE)模块,旨在利用相邻像素和信道的关系来学习高分辨率特征图。具体来说,所提出的上采样模块包括两个主要操作:(1) 通过像素洗牌提高特征图的空间分辨率;(2) 重新校准高分辨率特征响应的通道。我们提出的上采样模块可集成到 CNN 和变换器分割架构中。在生物医学图像的三种不同模式数据集(包括计算机断层扫描(CT)、磁共振成像(MRI)和微光切片断层扫描图像(MOST))上进行的广泛实验表明,所提出的方法能有效提高代表性分割模型的性能。
{"title":"A pixel and channel enhanced up-sampling module for biomedical image segmentation","authors":"Xuan Zhang, Guoping Xu, Xinglong Wu, Wentao Liao, Xuesong Leng, Xiaxia Wang, Xinwei He, Chang Li","doi":"10.1007/s00138-024-01513-7","DOIUrl":"https://doi.org/10.1007/s00138-024-01513-7","url":null,"abstract":"<p>Up-sampling operations are frequently utilized to recover the spatial resolution of feature maps in neural networks for segmentation task. However, current up-sampling methods, such as bilinear interpolation or deconvolution, do not fully consider the relationship of feature maps, which have negative impact on learning discriminative features for semantic segmentation. In this paper, we propose a pixel and channel enhanced up-sampling (PCE) module for low-resolution feature maps, aiming to use the relationship of adjacent pixels and channels for learning discriminative high-resolution feature maps. Specifically, the proposed up-sampling module includes two main operations: (1) increasing spatial resolution of feature maps with pixel shuffle and (2) recalibrating channel-wise high-resolution feature response. Our proposed up-sampling module could be integrated into CNN and Transformer segmentation architectures. Extensive experiments on three different modality datasets of biomedical images, including computed tomography (CT), magnetic resonance imaging (MRI) and micro-optical sectioning tomography images (MOST) demonstrate the proposed method could effectively improve the performance of representative segmentation models.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139952540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A gradient fusion-based image data augmentation method for reflective workpieces detection under small size datasets 基于梯度融合的图像数据增强方法,用于小尺寸数据集下的反射工件检测
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-02-21 DOI: 10.1007/s00138-024-01512-8
Baori Zhang, Haolang Cai, Lingxiang Wen

Various of Convolutional Neural Network-based object detection models have been widely used in the industrial field. However, the high accuracy of the object detection of these models is difficult to obtain in the industrial sorting line. This is due to the use of small dataset considering of production cost and the changing features of the reflective workpiece. In order to increase the detecting accuracy, a gradient fusion-based image data augmentation method was presented in this paper. It consisted of a high-dynamic range (HDR) exposing algorithm and an image reconstructing algorithm. It augmented the image data for the training and predicting by increasing the feature richness within the regions of reflection and shadow of the image. Tests were conducted on the comparison with other exposing and image fusion methods. The universality of the proposed method was analyzed by testing on various kinds of workpieces and different models including YOLOv8 and SSD. Finally, the Gradient-weighted Class Activation Mapping (Grad-CAM) method and Mean Average Precision (mAP) were used to analyze the model performance improvement. The results showed that the proposed data augmentation method improved the feature richness of the image and the accuracy of the object detection for the reflective workpieces under small size datasets.

各种基于卷积神经网络的物体检测模型已被广泛应用于工业领域。然而,在工业分拣线中,这些模型很难实现高精度的物体检测。这是由于考虑到生产成本和反光工件不断变化的特征,使用的数据集较小。为了提高检测精度,本文提出了一种基于梯度融合的图像数据增强方法。该方法由高动态范围(HDR)曝光算法和图像重建算法组成。它通过增加图像反射区和阴影区的特征丰富度来增强用于训练和预测的图像数据。与其他曝光和图像融合方法进行了对比测试。通过对各种工件和不同模型(包括 YOLOv8 和 SSD)进行测试,分析了所提出方法的通用性。最后,使用梯度加权类激活映射(Grad-CAM)方法和平均精度(mAP)来分析模型的性能改进。结果表明,所提出的数据增强方法提高了图像的特征丰富度和小尺寸数据集下反光工件的物体检测精度。
{"title":"A gradient fusion-based image data augmentation method for reflective workpieces detection under small size datasets","authors":"Baori Zhang, Haolang Cai, Lingxiang Wen","doi":"10.1007/s00138-024-01512-8","DOIUrl":"https://doi.org/10.1007/s00138-024-01512-8","url":null,"abstract":"<p>Various of Convolutional Neural Network-based object detection models have been widely used in the industrial field. However, the high accuracy of the object detection of these models is difficult to obtain in the industrial sorting line. This is due to the use of small dataset considering of production cost and the changing features of the reflective workpiece. In order to increase the detecting accuracy, a gradient fusion-based image data augmentation method was presented in this paper. It consisted of a high-dynamic range (HDR) exposing algorithm and an image reconstructing algorithm. It augmented the image data for the training and predicting by increasing the feature richness within the regions of reflection and shadow of the image. Tests were conducted on the comparison with other exposing and image fusion methods. The universality of the proposed method was analyzed by testing on various kinds of workpieces and different models including YOLOv8 and SSD. Finally, the Gradient-weighted Class Activation Mapping (Grad-CAM) method and Mean Average Precision (mAP) were used to analyze the model performance improvement. The results showed that the proposed data augmentation method improved the feature richness of the image and the accuracy of the object detection for the reflective workpieces under small size datasets.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139926925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Target–distractor memory joint tracking algorithm via Credit Allocation Network 通过信用分配网络实现的目标-脱离者记忆联合跟踪算法
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-02-09 DOI: 10.1007/s00138-024-01508-4
Huanlong Zhang, Panyun Wang, Zhiwu Chen, Jie Zhang, Linwei Li

The tracking framework based on the memory network has gained significant attention due to its enhanced adaptability to variations in target appearance. However, the performance of the framework is limited by the negative effects of distractors in the background. Hence, this paper proposes a method for tracking using Credit Allocation Network to join target and distractor memory. Specifically, we design a Credit Allocation Network (CAN) that is updated online via Guided Focus Loss. The CAN produces credit scores for tracking results by learning features of the target object, ensuring the update of reliable samples for storage in the memory pool. Furthermore, we construct a multi-domain memory model that simultaneously captures target and background information from multiple historical intervals, which can build a more compatible object appearance model while increasing the diversity of the memory sample. Moreover, a novel target–distractor joint localization strategy is presented, which read target and distractor information from memory frames based on cross-attention, so as to cancel out wrong responses in the target response map by using the distractor response map. The experimental results on OTB-2015, GOT-10k, UAV123, LaSOT, and VOT-2018 datasets show the competitiveness and effectiveness of the proposed method compared to other trackers.

基于记忆网络的跟踪框架因其对目标外观变化的适应性更强而备受关注。然而,该框架的性能受到背景中干扰物负面影响的限制。因此,本文提出了一种使用信用分配网络(Credit Allocation Network)来连接目标和干扰记忆的跟踪方法。具体来说,我们设计了一个信用分配网络(CAN),该网络通过 "引导焦点丢失"(Guided Focus Loss)进行在线更新。该网络通过学习目标对象的特征,为跟踪结果生成信用分数,确保更新可靠的样本以存储在记忆池中。此外,我们还构建了一个多域记忆模型,可同时捕捉来自多个历史时间间隔的目标和背景信息,从而在增加记忆样本多样性的同时,建立一个兼容性更强的物体外观模型。此外,我们还提出了一种新颖的目标-分心联合定位策略,该策略基于交叉注意从记忆帧中读取目标和分心信息,从而利用分心响应图抵消目标响应图中的错误响应。在 OTB-2015、GOT-10k、UAV123、LaSOT 和 VOT-2018 数据集上的实验结果表明,与其他跟踪器相比,所提出的方法具有竞争力和有效性。
{"title":"Target–distractor memory joint tracking algorithm via Credit Allocation Network","authors":"Huanlong Zhang, Panyun Wang, Zhiwu Chen, Jie Zhang, Linwei Li","doi":"10.1007/s00138-024-01508-4","DOIUrl":"https://doi.org/10.1007/s00138-024-01508-4","url":null,"abstract":"<p>The tracking framework based on the memory network has gained significant attention due to its enhanced adaptability to variations in target appearance. However, the performance of the framework is limited by the negative effects of distractors in the background. Hence, this paper proposes a method for tracking using Credit Allocation Network to join target and distractor memory. Specifically, we design a Credit Allocation Network (CAN) that is updated online via Guided Focus Loss. The CAN produces credit scores for tracking results by learning features of the target object, ensuring the update of reliable samples for storage in the memory pool. Furthermore, we construct a multi-domain memory model that simultaneously captures target and background information from multiple historical intervals, which can build a more compatible object appearance model while increasing the diversity of the memory sample. Moreover, a novel target–distractor joint localization strategy is presented, which read target and distractor information from memory frames based on cross-attention, so as to cancel out wrong responses in the target response map by using the distractor response map. The experimental results on OTB-2015, GOT-10k, UAV123, LaSOT, and VOT-2018 datasets show the competitiveness and effectiveness of the proposed method compared to other trackers.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
End-to-end optimized image compression with the frequency-oriented transform 利用面向频率的变换进行端到端优化图像压缩
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-02-07 DOI: 10.1007/s00138-023-01507-x
Yuefeng Zhang, Kai Lin

Image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method that could preserve semantic fidelity besides signal-level precision.

在信息爆炸的时代,图像压缩是一项重大挑战。最近采用深度学习方法的研究表明,基于学习的图像压缩方法比传统编解码器性能更优越。然而,与这些方法相关的一个固有挑战在于它们缺乏可解释性。在分析了不同频段的不同压缩劣化程度后,我们提出了端到端优化图像压缩模型,该模型通过面向频率的变换得以实现。所提出的端到端图像压缩模型由四个部分组成:空间采样、频率导向变换、熵估计和频率感知融合。频率导向变换将原始图像信号分离成不同的频段,符合人类可理解的概念。利用非重叠假设,该模型可通过选择性传输任意频率成分实现可扩展编码。大量实验证明,我们的模型在 MS-SSIM 指标上优于所有传统编解码器,包括下一代标准 H.266/VVC。此外,还进行了视觉分析任务(即物体检测和语义分割),以验证所提出的压缩方法除了能保持信号级精度外,还能保持语义保真度。
{"title":"End-to-end optimized image compression with the frequency-oriented transform","authors":"Yuefeng Zhang, Kai Lin","doi":"10.1007/s00138-023-01507-x","DOIUrl":"https://doi.org/10.1007/s00138-023-01507-x","url":null,"abstract":"<p>Image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method that could preserve semantic fidelity besides signal-level precision.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interaction semantic segmentation network via progressive supervised learning 通过渐进式监督学习实现交互语义分割网络
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-02-05 DOI: 10.1007/s00138-023-01500-4

Abstract

Semantic segmentation requires both low-level details and high-level semantics, without losing too much detail and ensuring the speed of inference. Most existing segmentation approaches leverage low- and high-level features from pre-trained models. We propose an interaction semantic segmentation network via Progressive Supervised Learning (ISSNet). Unlike a simple fusion of two sets of features, we introduce an information interaction module to embed semantics into image details, they jointly guide the response of features in an interactive way. We develop a simple yet effective boundary refinement module to provide refined boundary features for matching corresponding semantic. We introduce a progressive supervised learning strategy throughout the training level to significantly promote network performance, not architecture level. Our proposed ISSNet shows optimal inference time. We perform extensive experiments on four datasets, including Cityscapes, HazeCityscapes, RainCityscapes and CamVid. In addition to performing better in fine weather, proposed ISSNet also performs well on rainy and foggy days. We also conduct ablation study to demonstrate the role of our proposed component. Code is available at: https://github.com/Ruini94/ISSNet

摘要 语义分割既需要低层次的细节,也需要高层次的语义,同时又不能丢失太多细节,还要确保推理速度。现有的大多数分割方法都是利用预先训练好的模型中的低级和高级特征。我们提出了一种通过渐进监督学习的交互语义分割网络(ISSNet)。与两组特征的简单融合不同,我们引入了信息交互模块,将语义嵌入图像细节,它们以交互的方式共同引导特征响应。我们开发了一个简单而有效的边界细化模块,为匹配相应语义提供细化的边界特征。我们在整个训练级别引入了渐进式监督学习策略,以显著提高网络性能,而不是架构级别。我们所提出的 ISSNet 可以达到最佳推理时间。我们在四个数据集上进行了广泛的实验,包括城市景观、阴霾城市景观、雨水城市景观和 CamVid。除了在晴朗天气中表现较好外,所提出的 ISSNet 在雨天和雾天也表现出色。我们还进行了消融研究,以证明我们提出的组件的作用。代码见:https://github.com/Ruini94/ISSNet
{"title":"Interaction semantic segmentation network via progressive supervised learning","authors":"","doi":"10.1007/s00138-023-01500-4","DOIUrl":"https://doi.org/10.1007/s00138-023-01500-4","url":null,"abstract":"<h3>Abstract</h3> <p>Semantic segmentation requires both low-level details and high-level semantics, without losing too much detail and ensuring the speed of inference. Most existing segmentation approaches leverage low- and high-level features from pre-trained models. We propose an interaction semantic segmentation network via Progressive Supervised Learning (ISSNet). Unlike a simple fusion of two sets of features, we introduce an information interaction module to embed semantics into image details, they jointly guide the response of features in an interactive way. We develop a simple yet effective boundary refinement module to provide refined boundary features for matching corresponding semantic. We introduce a progressive supervised learning strategy throughout the training level to significantly promote network performance, not architecture level. Our proposed ISSNet shows optimal inference time. We perform extensive experiments on four datasets, including Cityscapes, HazeCityscapes, RainCityscapes and CamVid. In addition to performing better in fine weather, proposed ISSNet also performs well on rainy and foggy days. We also conduct ablation study to demonstrate the role of our proposed component. Code is available at: https://github.com/Ruini94/ISSNet</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of adaptable conventional image processing pipelines and deep learning on limited datasets 回顾有限数据集上的可调整传统图像处理管道和深度学习
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-01-31 DOI: 10.1007/s00138-023-01501-3
Friedrich Rieken Münke, Jan Schützke, Felix Berens, Markus Reischl

The objective of this paper is to study the impact of limited datasets on deep learning techniques and conventional methods in semantic image segmentation and to conduct a comparative analysis in order to determine the optimal scenario for utilizing both approaches. We introduce a synthetic data generator, which enables us to evaluate the impact of the number of training samples as well as the difficulty and diversity of the dataset. We show that deep learning methods excel when large datasets are available and conventional image processing approaches perform well when the datasets are small and diverse. Since transfer learning is a common approach to work around small datasets, we are specifically assessing its impact and found only marginal impact. Furthermore, we implement the conventional image processing pipeline to enable fast and easy application to new problems, making it easy to apply and test conventional methods alongside deep learning with minimal overhead.

本文旨在研究有限数据集对语义图像分割中深度学习技术和传统方法的影响,并进行对比分析,以确定使用这两种方法的最佳方案。我们引入了一个合成数据生成器,它使我们能够评估训练样本数量以及数据集难度和多样性的影响。我们的研究表明,在有大型数据集的情况下,深度学习方法表现出色;而在数据集较小且多样化的情况下,传统的图像处理方法表现出色。由于迁移学习是处理小型数据集的常用方法,我们专门评估了其影响,结果发现其影响微乎其微。此外,我们还实施了传统图像处理管道,以便快速、轻松地应用于新问题,从而以最小的开销轻松应用和测试传统方法与深度学习。
{"title":"A review of adaptable conventional image processing pipelines and deep learning on limited datasets","authors":"Friedrich Rieken Münke, Jan Schützke, Felix Berens, Markus Reischl","doi":"10.1007/s00138-023-01501-3","DOIUrl":"https://doi.org/10.1007/s00138-023-01501-3","url":null,"abstract":"<p>The objective of this paper is to study the impact of limited datasets on deep learning techniques and conventional methods in semantic image segmentation and to conduct a comparative analysis in order to determine the optimal scenario for utilizing both approaches. We introduce a synthetic data generator, which enables us to evaluate the impact of the number of training samples as well as the difficulty and diversity of the dataset. We show that deep learning methods excel when large datasets are available and conventional image processing approaches perform well when the datasets are small and diverse. Since transfer learning is a common approach to work around small datasets, we are specifically assessing its impact and found only marginal impact. Furthermore, we implement the conventional image processing pipeline to enable fast and easy application to new problems, making it easy to apply and test conventional methods alongside deep learning with minimal overhead.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139648922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine Vision and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1