首页 > 最新文献

Machine Vision and Applications最新文献

英文 中文
PTDS CenterTrack: pedestrian tracking in dense scenes with re-identification and feature enhancement PTDS CenterTrack:通过重新识别和特征增强在密集场景中跟踪行人
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-15 DOI: 10.1007/s00138-024-01520-8
Jiazheng Wen, Huanyu Liu, Junbao Li

Multi-object tracking in dense scenes has always been a major difficulty in this field. Although some existing algorithms achieve excellent results in multi-object tracking, they fail to achieve good generalization when the application background is transferred to more challenging dense scenarios. In this work, we propose PTDS(Pedestrian Tracking in Dense Scene) CenterTrack based on the CenterTrack for object center point detection and tracking. It utilizes dense inter-frame similarity to perform object appearance feature comparisons to predict the inter-frame position changes of objects, extending CenterTrack by using only motion features. We propose a feature enhancement method based on a hybrid attention mechanism, which adds information on the temporal dimension between frames to the features required for object detection, and connects the two tasks of detection and tracking. Under the MOT20 benchmark, PTDS CenterTrack has achieved 55.6%MOTA, 55.1%IDF1, 45.1%HOTA, which is an increase of 10.1 percentage points, 4.0 percentage points, and 4.8 percentage points respectively compared to CenterTrack.

密集场景中的多目标跟踪一直是该领域的一大难题。虽然现有的一些算法在多目标跟踪方面取得了很好的效果,但当应用背景转移到更具挑战性的密集场景时,这些算法却无法实现良好的泛化。在这项工作中,我们提出了基于 CenterTrack 的 PTDS(密集场景中的行人跟踪)CenterTrack,用于物体中心点的检测和跟踪。它利用密集帧间相似性来进行物体外观特征比较,从而预测物体在帧间的位置变化,扩展了仅使用运动特征的 CenterTrack。我们提出了一种基于混合注意力机制的特征增强方法,该方法在物体检测所需的特征基础上增加了帧间时间维度的信息,并将检测和跟踪这两项任务联系起来。在 MOT20 基准下,PTDS CenterTrack 实现了 55.6%MOTA、55.1%IDF1 和 45.1%HOTA,与 CenterTrack 相比分别提高了 10.1 个百分点、4.0 个百分点和 4.8 个百分点。
{"title":"PTDS CenterTrack: pedestrian tracking in dense scenes with re-identification and feature enhancement","authors":"Jiazheng Wen, Huanyu Liu, Junbao Li","doi":"10.1007/s00138-024-01520-8","DOIUrl":"https://doi.org/10.1007/s00138-024-01520-8","url":null,"abstract":"<p>Multi-object tracking in dense scenes has always been a major difficulty in this field. Although some existing algorithms achieve excellent results in multi-object tracking, they fail to achieve good generalization when the application background is transferred to more challenging dense scenarios. In this work, we propose PTDS(Pedestrian Tracking in Dense Scene) CenterTrack based on the CenterTrack for object center point detection and tracking. It utilizes dense inter-frame similarity to perform object appearance feature comparisons to predict the inter-frame position changes of objects, extending CenterTrack by using only motion features. We propose a feature enhancement method based on a hybrid attention mechanism, which adds information on the temporal dimension between frames to the features required for object detection, and connects the two tasks of detection and tracking. Under the MOT20 benchmark, PTDS CenterTrack has achieved 55.6%MOTA, 55.1%IDF1, 45.1%HOTA, which is an increase of 10.1 percentage points, 4.0 percentage points, and 4.8 percentage points respectively compared to CenterTrack.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking BoostTrack:提高相似性测量和检测可信度,改进多目标跟踪
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-12 DOI: 10.1007/s00138-024-01531-5
Vukasin D. Stanojevic, Branimir T. Todorovic

Handling unreliable detections and avoiding identity switches are crucial for the success of multiple object tracking (MOT). Ideally, MOT algorithm should use true positive detections only, work in real-time and produce no identity switches. To approach the described ideal solution, we present the BoostTrack, a simple yet effective tracing-by-detection MOT method that utilizes several lightweight plug and play additions to improve MOT performance. We design a detection-tracklet confidence score and use it to scale the similarity measure and implicitly favour high detection confidence and high tracklet confidence pairs in one-stage association. To reduce the ambiguity arising from using intersection over union (IoU), we propose a novel Mahalanobis distance and shape similarity additions to boost the overall similarity measure. To utilize low-detection score bounding boxes in one-stage association, we propose to boost the confidence scores of two groups of detections: the detections we assume to correspond to the existing tracked object, and the detections we assume to correspond to a previously undetected object. The proposed additions are orthogonal to the existing approaches, and we combine them with interpolation and camera motion compensation to achieve results comparable to the standard benchmark solutions while retaining real-time execution speed. When combined with appearance similarity, our method outperforms all standard benchmark solutions on MOT17 and MOT20 datasets. It ranks first among online methods in HOTA metric in the MOT Challenge on MOT17 and MOT20 test sets. We make our code available at https://github.com/vukasin-stanojevic/BoostTrack.

处理不可靠的检测和避免身份转换是多目标跟踪(MOT)成功的关键。理想情况下,多目标跟踪算法应只使用真正的正向检测,实时工作,并且不产生身份转换。为了接近所描述的理想解决方案,我们提出了 BoostTrack,这是一种简单而有效的通过检测进行追踪的 MOT 方法,它利用几个轻量级的即插即用附加功能来提高 MOT 性能。我们设计了一个检测-小轨迹置信度得分,并用它来扩展相似度量,在单阶段关联中隐性地偏向于高检测置信度和高小轨迹置信度对。为了减少因使用交集大于联合(IoU)而产生的歧义,我们提出了一种新的 Mahalanobis 距离和形状相似性加法,以提高整体相似性度量。为了在单阶段关联中利用低检测得分的边界框,我们建议提高两组检测的置信度得分:一组是我们假定与现有跟踪对象相对应的检测,另一组是我们假定与之前未检测到的对象相对应的检测。我们提出的附加方法与现有方法正交,并与插值和摄像机运动补偿相结合,在保持实时执行速度的同时,实现了与标准基准解决方案相当的结果。当与外观相似性相结合时,我们的方法在 MOT17 和 MOT20 数据集上的表现优于所有标准基准解决方案。在 MOT 挑战赛的 MOT17 和 MOT20 测试集上,我们的方法在 HOTA 指标的在线方法中排名第一。我们在 https://github.com/vukasin-stanojevic/BoostTrack 上提供了我们的代码。
{"title":"BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking","authors":"Vukasin D. Stanojevic, Branimir T. Todorovic","doi":"10.1007/s00138-024-01531-5","DOIUrl":"https://doi.org/10.1007/s00138-024-01531-5","url":null,"abstract":"<p>Handling unreliable detections and avoiding identity switches are crucial for the success of multiple object tracking (MOT). Ideally, MOT algorithm should use true positive detections only, work in real-time and produce no identity switches. To approach the described ideal solution, we present the BoostTrack, a simple yet effective tracing-by-detection MOT method that utilizes several lightweight plug and play additions to improve MOT performance. We design a detection-tracklet confidence score and use it to scale the similarity measure and implicitly favour high detection confidence and high tracklet confidence pairs in one-stage association. To reduce the ambiguity arising from using intersection over union (IoU), we propose a novel Mahalanobis distance and shape similarity additions to boost the overall similarity measure. To utilize low-detection score bounding boxes in one-stage association, we propose to boost the confidence scores of two groups of detections: the detections we assume to correspond to the existing tracked object, and the detections we assume to correspond to a previously undetected object. The proposed additions are orthogonal to the existing approaches, and we combine them with interpolation and camera motion compensation to achieve results comparable to the standard benchmark solutions while retaining real-time execution speed. When combined with appearance similarity, our method outperforms all standard benchmark solutions on MOT17 and MOT20 datasets. It ranks first among online methods in HOTA metric in the MOT Challenge on MOT17 and MOT20 test sets. We make our code available at https://github.com/vukasin-stanojevic/BoostTrack.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AP-TransNet: a polarized transformer based aerial human action recognition framework AP-TransNet:基于极化变压器的空中人类动作识别框架
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-10 DOI: 10.1007/s00138-024-01535-1
Chhavi Dhiman, Anunay Varshney, Ved Vyapak

Drones are widespread and actively employed in a variety of applications due to their low cost and quick mobility and enabling new forms of action surveillance. However, owing to various challenges- limited no. of aerial view samples, aerial footage suffers with camera motion, illumination changes, small actor size, occlusion, complex backgrounds, and varying view angles, human action recognition in aerial videos even more challenging. Maneuvering the same, we propose Aerial Polarized-Transformer Network (AP-TransNet) to recognize human actions in aerial view using both spatial and temporal details of the video feed. In this paper, we present the Polarized Encoding Block that performs (({text{i}})) Selection with Rejection to select the significant features and reject least informative features similar to Light photometry phenomena and (({text{ii}})) boosting operation increases the dynamic range of encodings using non-linear softmax normalization at the bottleneck tensors in both channel and spatial sequential branches. The performance of the proposed AP-TransNet is evaluated by conducting extensive experiments on three publicly available benchmark datasets: drone action dataset, UCF-ARG Dataset and Multi-View Outdoor Dataset (MOD20) supporting with ablation study. The proposed work outperformed the state-of-the-arts.

无人机因其低成本和快速移动性而被广泛和积极地应用于各种领域,并实现了新形式的行动监控。然而,由于各种挑战--航拍视图样本数量有限、航拍镜头受相机运动、光照变化、小演员尺寸、遮挡、复杂背景和不同视角的影响,航拍视频中的人类动作识别更具挑战性。有鉴于此,我们提出了空中极化变换网络(Aerial Polarized-Transformer Network,AP-TransNet),利用视频的空间和时间细节来识别空中视图中的人类动作。在本文中,我们提出了极化编码块,它执行(({text{i}}))选择与剔除)来选择重要的特征,并剔除信息量最小的特征,这与光度测量现象和(({/text{ii}}))提升操作类似,在通道和空间顺序分支的瓶颈张量处使用非线性软最大归一化来增加编码的动态范围。通过在三个公开的基准数据集(无人机行动数据集、UCF-ARG 数据集和支持消融研究的多视角室外数据集 (MOD20))上进行大量实验,对所提出的 AP-TransNet 的性能进行了评估。建议的工作性能优于同行。
{"title":"AP-TransNet: a polarized transformer based aerial human action recognition framework","authors":"Chhavi Dhiman, Anunay Varshney, Ved Vyapak","doi":"10.1007/s00138-024-01535-1","DOIUrl":"https://doi.org/10.1007/s00138-024-01535-1","url":null,"abstract":"<p>Drones are widespread and actively employed in a variety of applications due to their low cost and quick mobility and enabling new forms of action surveillance. However, owing to various challenges- limited no. of aerial view samples, aerial footage suffers with camera motion, illumination changes, small actor size, occlusion, complex backgrounds, and varying view angles, human action recognition in aerial videos even more challenging. Maneuvering the same, we propose Aerial Polarized-Transformer Network (AP-TransNet) to recognize human actions in aerial view using both spatial and temporal details of the video feed. In this paper, we present the Polarized Encoding Block that performs (<span>({text{i}}))</span> Selection with Rejection to select the significant features and reject least informative features similar to Light photometry phenomena and (<span>({text{ii}}))</span> boosting operation increases the dynamic range of encodings using non-linear softmax normalization at the bottleneck tensors in both channel and spatial sequential branches. The performance of the proposed AP-TransNet is evaluated by conducting extensive experiments on three publicly available benchmark datasets: drone action dataset, UCF-ARG Dataset and Multi-View Outdoor Dataset (MOD20) supporting with ablation study. The proposed work outperformed the state-of-the-arts.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lunar ground segmentation using a modified U-net neural network 利用改进的 U-net 神经网络进行月球地面分段
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-09 DOI: 10.1007/s00138-024-01533-3
Georgios Petrakis, Panagiotis Partsinevelos

Semantic segmentation plays a significant role in unstructured and planetary scene understanding, offering to a robotic system or a planetary rover valuable knowledge about its surroundings. Several studies investigate rover-based scene recognition planetary-like environments but there is a lack of a semantic segmentation architecture, focused on computing systems with low resources and tested on the lunar surface. In this study, a lightweight encoder-decoder neural network (NN) architecture is proposed for rover-based ground segmentation on the lunar surface. The proposed architecture is composed by a modified MobilenetV2 as encoder and a lightweight U-net decoder while the training and evaluation process were conducted using a publicly available synthetic dataset with lunar landscape images. The proposed model provides robust segmentation results, allowing the lunar scene understanding focused on rocks and boulders. It achieves similar accuracy, compared with original U-net and U-net-based architectures which are 110–140 times larger than the proposed architecture. This study, aims to contribute in lunar landscape segmentation utilizing deep learning techniques, while it proves a great potential in autonomous lunar navigation ensuring a safer and smoother navigation on the moon. To the best of our knowledge, this is the first study which propose a lightweight semantic segmentation architecture for the lunar surface, aiming to reinforce the autonomous rover navigation.

语义分割在非结构化和行星场景理解中发挥着重要作用,为机器人系统或行星漫游车提供了有关其周围环境的宝贵知识。有几项研究对基于漫游车的类地行星环境场景识别进行了调查,但目前还缺乏一种语义分割架构,这种架构主要针对资源较少的计算系统,并在月球表面进行了测试。本研究提出了一种轻量级编码器-解码器神经网络(NN)架构,用于月球表面基于漫游车的地面分割。所提出的架构由修改后的 MobilenetV2 编码器和轻量级 U-net 解码器组成,并使用公开的月球景观图像合成数据集进行训练和评估。所提出的模型提供了稳健的分割结果,使月球场景的理解能够集中在岩石和巨石上。与原始 U-net 和基于 U-net 的架构相比,该模型达到了相似的准确度,而原始 U-net 和基于 U-net 的架构要比所提出的架构大 110-140 倍。这项研究旨在利用深度学习技术为月球景观分割做出贡献,同时证明它在月球自主导航方面具有巨大潜力,可确保在月球上更安全、更顺畅地导航。据我们所知,这是第一项为月球表面提出轻量级语义分割架构的研究,旨在加强月球车的自主导航。
{"title":"Lunar ground segmentation using a modified U-net neural network","authors":"Georgios Petrakis, Panagiotis Partsinevelos","doi":"10.1007/s00138-024-01533-3","DOIUrl":"https://doi.org/10.1007/s00138-024-01533-3","url":null,"abstract":"<p>Semantic segmentation plays a significant role in unstructured and planetary scene understanding, offering to a robotic system or a planetary rover valuable knowledge about its surroundings. Several studies investigate rover-based scene recognition planetary-like environments but there is a lack of a semantic segmentation architecture, focused on computing systems with low resources and tested on the lunar surface. In this study, a lightweight encoder-decoder neural network (NN) architecture is proposed for rover-based ground segmentation on the lunar surface. The proposed architecture is composed by a modified MobilenetV2 as encoder and a lightweight U-net decoder while the training and evaluation process were conducted using a publicly available synthetic dataset with lunar landscape images. The proposed model provides robust segmentation results, allowing the lunar scene understanding focused on rocks and boulders. It achieves similar accuracy, compared with original U-net and U-net-based architectures which are 110–140 times larger than the proposed architecture. This study, aims to contribute in lunar landscape segmentation utilizing deep learning techniques, while it proves a great potential in autonomous lunar navigation ensuring a safer and smoother navigation on the moon. To the best of our knowledge, this is the first study which propose a lightweight semantic segmentation architecture for the lunar surface, aiming to reinforce the autonomous rover navigation.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DisRot: boosting the generalization capability of few-shot learning via knowledge distillation and self-supervised learning DisRot:通过知识提炼和自我监督学习提高少量学习的泛化能力
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-09 DOI: 10.1007/s00138-024-01529-z
Chenyu Ma, Jinfang Jia, Jianqiang Huang, Li Wu, Xiaoying Wang

Few-shot learning (FSL) aims to adapt quickly to new categories with limited samples. Despite significant progress in utilizing meta-learning for solving FSL tasks, challenges such as overfitting and poor generalization still exist. Building upon the demonstrated significance of powerful feature representation, this work proposes disRot, a novel two-strategy training mechanism, which combines knowledge distillation and rotation prediction task for the pre-training phase of transfer learning. Knowledge distillation enables shallow networks to learn relational knowledge contained in deep networks, while the self-supervised rotation prediction task provides class-irrelevant and transferable knowledge for the supervised task. Simultaneous optimization for these two tasks allows the model learn generalizable and transferable feature embedding. Extensive experiments on the miniImageNet and FC100 datasets demonstrate that disRot can effectively improve the generalization ability of the model and is comparable to the leading FSL methods.

少量学习(FSL)旨在利用有限的样本快速适应新的类别。尽管在利用元学习解决 FSL 任务方面取得了重大进展,但过度拟合和泛化能力差等挑战依然存在。基于已证明的强大特征表示的重要性,本研究提出了一种新颖的双策略训练机制--disRot,它将知识蒸馏和旋转预测任务相结合,用于迁移学习的预训练阶段。知识蒸馏使浅层网络能够学习深层网络中包含的关系知识,而自监督旋转预测任务则为监督任务提供与类无关的可迁移知识。针对这两项任务的同步优化使模型能够学习可通用和可转移的特征嵌入。在 miniImageNet 和 FC100 数据集上进行的大量实验证明,disRot 可以有效提高模型的泛化能力,并可与领先的 FSL 方法相媲美。
{"title":"DisRot: boosting the generalization capability of few-shot learning via knowledge distillation and self-supervised learning","authors":"Chenyu Ma, Jinfang Jia, Jianqiang Huang, Li Wu, Xiaoying Wang","doi":"10.1007/s00138-024-01529-z","DOIUrl":"https://doi.org/10.1007/s00138-024-01529-z","url":null,"abstract":"<p>Few-shot learning (FSL) aims to adapt quickly to new categories with limited samples. Despite significant progress in utilizing meta-learning for solving FSL tasks, challenges such as overfitting and poor generalization still exist. Building upon the demonstrated significance of powerful feature representation, this work proposes disRot, a novel two-strategy training mechanism, which combines knowledge distillation and rotation prediction task for the pre-training phase of transfer learning. Knowledge distillation enables shallow networks to learn relational knowledge contained in deep networks, while the self-supervised rotation prediction task provides class-irrelevant and transferable knowledge for the supervised task. Simultaneous optimization for these two tasks allows the model learn generalizable and transferable feature embedding. Extensive experiments on the miniImageNet and FC100 datasets demonstrate that disRot can effectively improve the generalization ability of the model and is comparable to the leading FSL methods.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The improvement of ground truth annotation in public datasets for human detection 改进公共数据集的地面实况标注以进行人类探测
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-08 DOI: 10.1007/s00138-024-01527-1
Sotheany Nou, Joong-Sun Lee, Nagaaki Ohyama, Takashi Obi

The quality of annotations in the datasets is crucial for supervised machine learning as it significantly affects the performance of models. While many public datasets are widely used, they often suffer from annotations errors, including missing annotations, incorrect bounding box sizes, and positions. It results in low accuracy of machine learning models. However, most researchers have traditionally focused on improving model performance by enhancing algorithms, while overlooking concerns regarding data quality. This so-called model-centric AI approach has been predominant. In contrast, a data-centric AI approach, advocated by Andrew Ng at the DATA and AI Summit 2022, emphasizes enhancing data quality while keeping the model fixed, which proves to be more efficient in improving performance. Building upon this data-centric approach, we propose a method to enhance the quality of public datasets such as MS-COCO and Open Image Dataset. Our approach involves automatically retrieving missing annotations and correcting the size and position of existing bounding boxes in these datasets. Specifically, our study deals with human object detection, which is one of the prominent applications of artificial intelligence. Experimental results demonstrate improved performance with models such as Faster-RCNN, EfficientDet, and RetinaNet. We can achieve up to 32% compared to original datasets in the term of mAP after applying both proposed methods to dataset which is transformed the grouped of instances to individual instance. In summary, our methods significantly enhance the model’s performance by improving the quality of annotations at a lower cost with less time than manual improvement employed in other studies.

数据集注释的质量对监督式机器学习至关重要,因为它会极大地影响模型的性能。虽然许多公共数据集被广泛使用,但它们往往存在注释错误,包括注释缺失、边界框大小和位置不正确。这导致机器学习模型的准确率很低。然而,大多数研究人员传统上都专注于通过增强算法来提高模型性能,而忽略了对数据质量的关注。这种所谓的以模型为中心的人工智能方法一直占主导地位。相比之下,吴恩达(Andrew Ng)在 2022 年数据与人工智能峰会上倡导的以数据为中心的人工智能方法则强调在保持模型固定的同时提高数据质量,事实证明这种方法能更有效地提高性能。基于这种以数据为中心的方法,我们提出了一种提高 MS-COCO 和开放图像数据集等公共数据集质量的方法。我们的方法包括在这些数据集中自动检索缺失的注释并修正现有边界框的大小和位置。具体来说,我们的研究涉及人类物体检测,这是人工智能的重要应用之一。实验结果表明,Faster-RCNN、EfficientDet 和 RetinaNet 等模型的性能得到了提高。在将分组实例转换为单个实例的数据集上应用这两种方法后,我们的 mAP 值与原始数据集相比提高了 32%。总之,与其他研究中采用的人工改进方法相比,我们的方法以更低的成本和更少的时间提高了注释的质量,从而大大提高了模型的性能。
{"title":"The improvement of ground truth annotation in public datasets for human detection","authors":"Sotheany Nou, Joong-Sun Lee, Nagaaki Ohyama, Takashi Obi","doi":"10.1007/s00138-024-01527-1","DOIUrl":"https://doi.org/10.1007/s00138-024-01527-1","url":null,"abstract":"<p>The quality of annotations in the datasets is crucial for supervised machine learning as it significantly affects the performance of models. While many public datasets are widely used, they often suffer from annotations errors, including missing annotations, incorrect bounding box sizes, and positions. It results in low accuracy of machine learning models. However, most researchers have traditionally focused on improving model performance by enhancing algorithms, while overlooking concerns regarding data quality. This so-called model-centric AI approach has been predominant. In contrast, a data-centric AI approach, advocated by Andrew Ng at the DATA and AI Summit 2022, emphasizes enhancing data quality while keeping the model fixed, which proves to be more efficient in improving performance. Building upon this data-centric approach, we propose a method to enhance the quality of public datasets such as MS-COCO and Open Image Dataset. Our approach involves automatically retrieving missing annotations and correcting the size and position of existing bounding boxes in these datasets. Specifically, our study deals with human object detection, which is one of the prominent applications of artificial intelligence. Experimental results demonstrate improved performance with models such as Faster-RCNN, EfficientDet, and RetinaNet. We can achieve up to 32% compared to original datasets in the term of mAP after applying both proposed methods to dataset which is transformed the grouped of instances to individual instance. In summary, our methods significantly enhance the model’s performance by improving the quality of annotations at a lower cost with less time than manual improvement employed in other studies.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tensor-guided learning for image denoising using anisotropic PDEs 利用各向异性 PDEs 进行图像去噪的张量引导学习
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-08 DOI: 10.1007/s00138-024-01532-4
Fakhr-eddine Limami, Aissam Hadri, Lekbir Afraites, Amine Laghrib

In this article, we introduce an advanced approach for enhanced image denoising using an improved space-variant anisotropic Partial Differential Equation (PDE) framework. Leveraging Weickert-type operators, this method relies on two critical parameters: (lambda ) and (theta ), defining local image geometry and smoothing strength. We propose an automatic parameter estimation technique rooted in PDE-constrained optimization, incorporating supplementary information from the original clean image. By combining these components, our approach achieves superior image denoising, pushing the boundaries of image enhancement methods. We employed a modified Alternating Direction Method of Multipliers (ADMM) procedure for numerical optimization, demonstrating its efficacy through thorough assessments and affirming its superior performance compared to alternative denoising methods.

在本文中,我们介绍了一种利用改进的空间变异各向异性偏微分方程(PDE)框架增强图像去噪的先进方法。利用魏克特型算子,该方法依赖于两个关键参数:(lambda ) 和 (theta ),这两个参数定义了局部图像的几何形状和平滑强度。我们提出了一种植根于 PDE 受限优化的自动参数估计技术,并结合了来自原始干净图像的补充信息。通过结合这些组件,我们的方法实现了卓越的图像去噪,推动了图像增强方法的发展。我们采用了改进的交替方向乘法(ADMM)程序进行数值优化,通过全面评估证明了其有效性,并肯定了它与其他去噪方法相比的优越性能。
{"title":"Tensor-guided learning for image denoising using anisotropic PDEs","authors":"Fakhr-eddine Limami, Aissam Hadri, Lekbir Afraites, Amine Laghrib","doi":"10.1007/s00138-024-01532-4","DOIUrl":"https://doi.org/10.1007/s00138-024-01532-4","url":null,"abstract":"<p>In this article, we introduce an advanced approach for enhanced image denoising using an improved space-variant anisotropic Partial Differential Equation (PDE) framework. Leveraging Weickert-type operators, this method relies on two critical parameters: <span>(lambda )</span> and <span>(theta )</span>, defining local image geometry and smoothing strength. We propose an automatic parameter estimation technique rooted in PDE-constrained optimization, incorporating supplementary information from the original clean image. By combining these components, our approach achieves superior image denoising, pushing the boundaries of image enhancement methods. We employed a modified Alternating Direction Method of Multipliers (ADMM) procedure for numerical optimization, demonstrating its efficacy through thorough assessments and affirming its superior performance compared to alternative denoising methods.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Keyframe-based RGB-D dense visual SLAM fused semantic cues in dynamic scenes 基于关键帧的 RGB-D 密集视觉 SLAM 融合动态场景中的语义线索
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-07 DOI: 10.1007/s00138-024-01526-2
Wugen Zhou, Xiaodong Peng, Yun Li, Mingrui Fan, Bo Liu

The robustness of dense visual SLAM is still a challenging problem in dynamic environments. In this paper, we propose a novel keyframe-based dense visual SLAM to handle a highly dynamic environment by using an RGB-D camera. The proposed method uses cluster-based residual models and semantic cues to detect dynamic objects, resulting in motion segmentation that outperforms traditional methods. The method also employs motion-segmentation based keyframe selection strategies and frame-to-keyframe matching scheme that reduce the influence of dynamic objects, thus minimizing trajectory errors. We further filter out dynamic object influence based on motion segmentation and then employ true matches from keyframes, which are near the current keyframe, to facilitate loop closure. Finally, a pose graph is established and optimized using the g2o framework. Our experimental results demonstrate the success of our approach in handling highly dynamic sequences, as evidenced by the more robust motion segmentation results and significantly lower trajectory drift compared to several state-of-the-art dense visual odometry or SLAM methods on challenging public benchmark datasets.

在动态环境中,密集视觉 SLAM 的鲁棒性仍然是一个具有挑战性的问题。在本文中,我们提出了一种新颖的基于关键帧的密集视觉 SLAM,利用 RGB-D 摄像机来处理高动态环境。所提出的方法利用基于聚类的残差模型和语义线索来检测动态物体,从而实现优于传统方法的运动分割。该方法还采用了基于运动分割的关键帧选择策略和帧到关键帧匹配方案,以减少动态物体的影响,从而将轨迹误差降至最低。我们在运动分割的基础上进一步过滤掉动态物体的影响,然后采用与当前关键帧附近的关键帧的真实匹配,以促进循环闭合。最后,我们使用 g2o 框架建立并优化了姿势图。我们的实验结果证明了我们的方法在处理高动态序列方面的成功,在具有挑战性的公共基准数据集上,与几种最先进的密集视觉里程测量或 SLAM 方法相比,我们的运动分割结果更加稳健,轨迹漂移明显降低。
{"title":"Keyframe-based RGB-D dense visual SLAM fused semantic cues in dynamic scenes","authors":"Wugen Zhou, Xiaodong Peng, Yun Li, Mingrui Fan, Bo Liu","doi":"10.1007/s00138-024-01526-2","DOIUrl":"https://doi.org/10.1007/s00138-024-01526-2","url":null,"abstract":"<p>The robustness of dense visual SLAM is still a challenging problem in dynamic environments. In this paper, we propose a novel keyframe-based dense visual SLAM to handle a highly dynamic environment by using an RGB-D camera. The proposed method uses cluster-based residual models and semantic cues to detect dynamic objects, resulting in motion segmentation that outperforms traditional methods. The method also employs motion-segmentation based keyframe selection strategies and frame-to-keyframe matching scheme that reduce the influence of dynamic objects, thus minimizing trajectory errors. We further filter out dynamic object influence based on motion segmentation and then employ true matches from keyframes, which are near the current keyframe, to facilitate loop closure. Finally, a pose graph is established and optimized using the g2o framework. Our experimental results demonstrate the success of our approach in handling highly dynamic sequences, as evidenced by the more robust motion segmentation results and significantly lower trajectory drift compared to several state-of-the-art dense visual odometry or SLAM methods on challenging public benchmark datasets.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-person 3D pose estimation from unlabelled data 从无标签数据中估算多人三维姿态
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-06 DOI: 10.1007/s00138-024-01530-6
Daniel Rodriguez-Criado, Pilar Bachiller-Burgos, George Vogiatzis, Luis J. Manso

Its numerous applications make multi-human 3D pose estimation a remarkably impactful area of research. Nevertheless, it presents several challenges, especially when approached using multiple views and regular RGB cameras as the only input. First, each person must be uniquely identified in the different views. Secondly, it must be robust to noise, partial occlusions, and views where a person may not be detected. Thirdly, many pose estimation approaches rely on environment-specific annotated datasets that are frequently prohibitively expensive and/or require specialised hardware. Specifically, this is the first multi-camera, multi-person data-driven approach that does not require an annotated dataset. In this work, we address these three challenges with the help of self-supervised learning. In particular, we present a three-staged pipeline and a rigorous evaluation providing evidence that our approach performs faster than other state-of-the-art algorithms, with comparable accuracy, and most importantly, does not require annotated datasets. The pipeline is composed of a 2D skeleton detection step, followed by a Graph Neural Network to estimate cross-view correspondences of the people in the scenario, and a Multi-Layer Perceptron that transforms the 2D information into 3D pose estimations. Our proposal comprises the last two steps, and it is compatible with any 2D skeleton detector as input. These two models are trained in a self-supervised manner, thus avoiding the need for datasets annotated with 3D ground-truth poses.

多人体三维姿态估算应用广泛,是一个极具影响力的研究领域。然而,它也面临着一些挑战,尤其是在使用多视图和普通 RGB 摄像机作为唯一输入时。首先,必须在不同视图中唯一识别每个人。其次,它必须对噪声、部分遮挡和可能检测不到人的视图具有鲁棒性。第三,许多姿势估计方法都依赖于特定环境的注释数据集,而这些数据集往往过于昂贵,并且/或者需要专用硬件。具体来说,这是第一种不需要注释数据集的多摄像头、多人物数据驱动方法。在这项工作中,我们借助自监督学习来应对这三个挑战。特别是,我们提出了一个三阶段管道和一个严格的评估,证明我们的方法比其他最先进的算法执行得更快,准确度相当,最重要的是,不需要注释数据集。该管道由二维骨架检测步骤和图神经网络组成,图神经网络用于估算场景中人物的跨视角对应关系,多层感知器则将二维信息转换为三维姿态估算。我们的建议包括最后两个步骤,它兼容任何作为输入的二维骨骼检测器。这两个模型以自我监督的方式进行训练,因此无需使用标注了三维真实姿势的数据集。
{"title":"Multi-person 3D pose estimation from unlabelled data","authors":"Daniel Rodriguez-Criado, Pilar Bachiller-Burgos, George Vogiatzis, Luis J. Manso","doi":"10.1007/s00138-024-01530-6","DOIUrl":"https://doi.org/10.1007/s00138-024-01530-6","url":null,"abstract":"<p>Its numerous applications make multi-human 3D pose estimation a remarkably impactful area of research. Nevertheless, it presents several challenges, especially when approached using multiple views and regular RGB cameras as the only input. First, each person must be uniquely identified in the different views. Secondly, it must be robust to noise, partial occlusions, and views where a person may not be detected. Thirdly, many pose estimation approaches rely on environment-specific annotated datasets that are frequently prohibitively expensive and/or require specialised hardware. Specifically, this is the first multi-camera, multi-person data-driven approach that does not require an annotated dataset. In this work, we address these three challenges with the help of self-supervised learning. In particular, we present a three-staged pipeline and a rigorous evaluation providing evidence that our approach performs faster than other state-of-the-art algorithms, with comparable accuracy, and most importantly, does not require annotated datasets. The pipeline is composed of a 2D skeleton detection step, followed by a Graph Neural Network to estimate cross-view correspondences of the people in the scenario, and a Multi-Layer Perceptron that transforms the 2D information into 3D pose estimations. Our proposal comprises the last two steps, and it is compatible with any 2D skeleton detector as input. These two models are trained in a self-supervised manner, thus avoiding the need for datasets annotated with 3D ground-truth poses.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
USIR-Net: sand-dust image restoration based on unsupervised learning USIR-Net:基于无监督学习的沙尘图像修复技术
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-01 DOI: 10.1007/s00138-024-01528-0
Yuan Ding, Kaijun Wu

In sand-dust weather, the influence of sand-dust particles on imaging equipment often results in images with color deviation, blurring, and low contrast, among other issues. These problems making many traditional image restoration methods unable to accurately estimate the semantic information of the images and consequently resulting in poor restoration of clear images. Most current image restoration methods in the field of deep learning are based on supervised learning, which requires pairing and labeling a large amount of data, and the possibility of manual annotation errors. In light of this, we propose an unsupervised sand-dust image restoration network. The overall model adopts an improved CycleGAN to fit unpaired sand-dust images. Firstly, multiscale skip connections in the multiscale cascaded attention module are used to enhance the feature fusion effect after downsampling. Secondly, multi-head convolutional attention with multiple input concatenations is employed, with each head using different kernel sizes to improve the ability to restore detail information. Finally, the adaptive decoder-encoder module is used to achieve adaptive fitting of the model and output the restored image. According to the experiments conducted on the dataset, the qualitative and quantitative indicators of USIR-Net are superior to the selected comparison algorithms, furthermore, in additional experiments conducted on haze removal and underwater image enhancement, we have demonstrated the wide applicability of our model.

在沙尘天气中,沙尘颗粒对成像设备的影响往往会导致图像出现色彩偏差、模糊和对比度低等问题。这些问题使得许多传统的图像复原方法无法准确估计图像的语义信息,从而导致清晰图像的还原效果不佳。目前深度学习领域的图像修复方法大多基于监督学习,需要对大量数据进行配对和标注,并可能出现人工标注错误。有鉴于此,我们提出了一种无监督的沙尘图像修复网络。整体模型采用改进的 CycleGAN 来拟合未配对的沙尘图像。首先,多尺度级联注意模块中的多尺度跳转连接用于增强下采样后的特征融合效果。其次,采用了多头卷积注意力和多输入串联,每个头使用不同的核大小,以提高细节信息的还原能力。最后,利用自适应解码器-编码器模块实现模型的自适应拟合,输出修复后的图像。根据在数据集上进行的实验,USIR-Net 的定性和定量指标均优于所选的对比算法,此外,在去除雾霾和水下图像增强的附加实验中,我们也证明了我们的模型具有广泛的适用性。
{"title":"USIR-Net: sand-dust image restoration based on unsupervised learning","authors":"Yuan Ding, Kaijun Wu","doi":"10.1007/s00138-024-01528-0","DOIUrl":"https://doi.org/10.1007/s00138-024-01528-0","url":null,"abstract":"<p>In sand-dust weather, the influence of sand-dust particles on imaging equipment often results in images with color deviation, blurring, and low contrast, among other issues. These problems making many traditional image restoration methods unable to accurately estimate the semantic information of the images and consequently resulting in poor restoration of clear images. Most current image restoration methods in the field of deep learning are based on supervised learning, which requires pairing and labeling a large amount of data, and the possibility of manual annotation errors. In light of this, we propose an unsupervised sand-dust image restoration network. The overall model adopts an improved CycleGAN to fit unpaired sand-dust images. Firstly, multiscale skip connections in the multiscale cascaded attention module are used to enhance the feature fusion effect after downsampling. Secondly, multi-head convolutional attention with multiple input concatenations is employed, with each head using different kernel sizes to improve the ability to restore detail information. Finally, the adaptive decoder-encoder module is used to achieve adaptive fitting of the model and output the restored image. According to the experiments conducted on the dataset, the qualitative and quantitative indicators of USIR-Net are superior to the selected comparison algorithms, furthermore, in additional experiments conducted on haze removal and underwater image enhancement, we have demonstrated the wide applicability of our model.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine Vision and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1