Machine Vision and Applications最新文献

英文中文

A novel key point based ROI segmentation and image captioning using guidance information 基于关键点的新颖 ROI 分割和使用引导信息的图像字幕制作

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-09-12 DOI: 10.1007/s00138-024-01597-1

Jothi Lakshmi Selvakani, Bhuvaneshwari Ranganathan, Geetha Palanisamy

Recently, image captioning has become an intriguing task that has attracted many researchers. This paper proposes a novel keypoint-based segmentation algorithm for extracting regions of interest (ROI) and an image captioning model guided by this information to generate more accurate image captions. The Difference of Gaussian (DoG) is used to identify keypoints. A novel ROI segmentation algorithm then utilizes these keypoints to extract the ROI. Features of the ROI are extracted, and the text features of related images are merged into a common semantic space using canonical correlation analysis (CCA) to produce the guiding information. The text features are constructed using a Bag of Words (BoW) model. Based on the guiding information and the entire image features, an LSTM generates a caption for the image. The guiding information helps the LSTM focus on important semantic regions in the image to generate the most significant keywords in the image caption. Experiments on the Flickr8k dataset show that the proposed ROI segmentation algorithm accurately identifies the ROI, and the image captioning model with the guidance information outperforms state-of-the-art methods.

近来，图像标题已成为一项吸引众多研究人员的有趣任务。本文提出了一种新颖的基于关键点的分割算法，用于提取感兴趣区域（ROI），并在此信息指导下建立图像字幕模型，以生成更准确的图像字幕。高斯差（DoG）用于识别关键点。然后，一种新颖的 ROI 分割算法利用这些关键点提取 ROI。提取 ROI 的特征，并使用规范相关分析 (CCA) 将相关图像的文本特征合并到一个共同的语义空间，从而生成引导信息。文本特征使用词袋（BoW）模型构建。基于引导信息和整个图像特征，LSTM 为图像生成标题。引导信息可帮助 LSTM 专注于图像中的重要语义区域，从而生成图像标题中最重要的关键词。在 Flickr8k 数据集上进行的实验表明，所提出的 ROI 分割算法能准确识别 ROI，带有引导信息的图像标题模型优于最先进的方法。

{"title":"A novel key point based ROI segmentation and image captioning using guidance information","authors":"Jothi Lakshmi Selvakani, Bhuvaneshwari Ranganathan, Geetha Palanisamy","doi":"10.1007/s00138-024-01597-1","DOIUrl":"https://doi.org/10.1007/s00138-024-01597-1","url":null,"abstract":"Recently, image captioning has become an intriguing task that has attracted many researchers. This paper proposes a novel keypoint-based segmentation algorithm for extracting regions of interest (ROI) and an image captioning model guided by this information to generate more accurate image captions. The Difference of Gaussian (DoG) is used to identify keypoints. A novel ROI segmentation algorithm then utilizes these keypoints to extract the ROI. Features of the ROI are extracted, and the text features of related images are merged into a common semantic space using canonical correlation analysis (CCA) to produce the guiding information. The text features are constructed using a Bag of Words (BoW) model. Based on the guiding information and the entire image features, an LSTM generates a caption for the image. The guiding information helps the LSTM focus on important semantic regions in the image to generate the most significant keywords in the image caption. Experiments on the Flickr8k dataset show that the proposed ROI segmentation algorithm accurately identifies the ROI, and the image captioning model with the guidance information outperforms state-of-the-art methods.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"2011 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Specular Surface Detection with Deep Static Specular Flow and Highlight 利用深度静态镜面流和高光进行镜面检测

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-09-10 DOI: 10.1007/s00138-024-01603-6

Hirotaka Hachiya, Yuto Yoshimura

To apply robot teaching to a factory with many mirror-polished parts, it is necessary to detect the specular surface accurately. Deep models for mirror detection have been studied by designing mirror-specific features, e.g., contextual contrast and similarity. However, mirror-polished parts such as plastic molds, tend to have complex shapes and ambiguous boundaries, and thus, existing mirror-specific deep features could not work well. To overcome the problem, we propose introducing attention maps based on the concept of static specular flow (SSF), condensed reflections of the surrounding scene, and specular highlight (SH), bright light spots, frequently appearing even in complex-shaped specular surfaces and applying them to deep model-based multi-level features. Then, we adaptively integrate approximated mirror maps generated by multi-level SSF, SH, and existing mirror detectors to detect complex specular surfaces. Through experiments with our original data sets with spherical mirrors and real-world plastic molds, we show the effectiveness of the proposed method.

要将机器人示教应用于有许多镜面抛光部件的工厂，就必须准确检测镜面。通过设计镜面特定特征（如上下文对比度和相似度），已经对镜面检测的深度模型进行了研究。然而，塑料模具等镜面抛光部件往往形状复杂、边界模糊，因此现有的镜面特定深度特征无法很好地发挥作用。为了解决这个问题，我们建议引入基于静态镜面流（SSF）和镜面高光（SH）概念的注意图，静态镜面流是周围场景的浓缩反射，而镜面高光则是即使在形状复杂的镜面中也经常出现的明亮光点，并将其应用到基于深度模型的多层次特征中。然后，我们自适应地整合由多级 SSF、SH 和现有镜面检测器生成的近似镜面图，以检测复杂的镜面表面。通过使用球面镜和真实世界塑料模具的原始数据集进行实验，我们展示了所提方法的有效性。

引用次数: 0

Removing cloud shadows from ground-based solar imagery 消除地面太阳图像中的云影

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-09-09 DOI: 10.1007/s00138-024-01607-2

Amal Chaoui, Jay Paul Morgan, Adeline Paiement, Jean Aboudarham

The study and prediction of space weather entails the analysis of solar images showing structures of the Sun’s atmosphere. When imaged from the Earth’s ground, images may be polluted by terrestrial clouds which hinder the detection of solar structures. We propose a new method to remove cloud shadows, based on a U-Net architecture, and compare classical supervision with conditional GAN. We evaluate our method on two different imaging modalities, using both real images and a new dataset of synthetic clouds. Quantitative assessments are obtained through image quality indices (RMSE, PSNR, SSIM, and FID). We demonstrate improved results with regards to the traditional cloud removal technique and a sparse coding baseline, on different cloud types and textures.

研究和预测空间天气需要分析显示太阳大气结构的太阳图像。从地球表面拍摄图像时，图像可能会受到地面云层的污染，从而阻碍对太阳结构的检测。我们提出了一种基于 U-Net 架构的去除云影的新方法，并将经典监督与条件 GAN 进行了比较。我们使用真实图像和新的合成云数据集，在两种不同的成像模式上对我们的方法进行了评估。通过图像质量指标（RMSE、PSNR、SSIM 和 FID）进行定量评估。与传统的云去除技术和稀疏编码基线相比，我们在不同的云类型和纹理上展示了更好的效果。

引用次数: 0

Underwater image object detection based on multi-scale feature fusion 基于多尺度特征融合的水下图像物体检测

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-09-02 DOI: 10.1007/s00138-024-01606-3

Chao Yang, Ce Zhang, Longyu Jiang, Xinwen Zhang

Underwater object detection and classification technology is one of the most important ways for humans to explore the oceans. However, existing methods are still insufficient in terms of accuracy and speed, and have poor detection performance for small objects such as fish. In this paper, we propose a multi-scale aggregation enhanced (MAE-FPN) object detection method based on the feature pyramid network, including the multi-scale convolutional calibration module (MCCM) and the feature calibration distribution module (FCDM). First, we design the MCCM module, which can adaptively extract feature information from objects at different scales. Then, we built the FCDM structure to make the multi-scale information fusion more appropriate and to alleviate the problem of missing features from small objects. Finally, we construct the Fish Segmentation and Detection (FSD) dataset by fusing multiple data augmentation methods, which enriches the data resources for underwater object detection and solves the problem of limited training resources for deep learning. We conduct experiments on FSD and public datasets, and the results show that the proposed MAE-FPN network significantly improves the detection performance of underwater objects, especially small objects.

水下物体探测和分类技术是人类探索海洋的重要途径之一。然而，现有方法在精度和速度方面仍有不足，对鱼类等小物体的检测性能较差。本文提出了一种基于特征金字塔网络的多尺度聚合增强型（MAE-FPN）物体检测方法，包括多尺度卷积校准模块（MCCM）和特征校准分布模块（FCDM）。首先，我们设计了 MCCM 模块，它可以自适应地从不同尺度的物体中提取特征信息。然后，我们构建了 FCDM 结构，使多尺度信息融合更加合适，并缓解了小物体特征缺失的问题。最后，我们通过融合多种数据增强方法构建了鱼类分割与检测（FSD）数据集，丰富了水下物体检测的数据资源，解决了深度学习训练资源有限的问题。我们在 FSD 和公共数据集上进行了实验，结果表明，所提出的 MAE-FPN 网络显著提高了水下物体尤其是小物体的检测性能。

{"title":"Underwater image object detection based on multi-scale feature fusion","authors":"Chao Yang, Ce Zhang, Longyu Jiang, Xinwen Zhang","doi":"10.1007/s00138-024-01606-3","DOIUrl":"https://doi.org/10.1007/s00138-024-01606-3","url":null,"abstract":"Underwater object detection and classification technology is one of the most important ways for humans to explore the oceans. However, existing methods are still insufficient in terms of accuracy and speed, and have poor detection performance for small objects such as fish. In this paper, we propose a multi-scale aggregation enhanced (MAE-FPN) object detection method based on the feature pyramid network, including the multi-scale convolutional calibration module (MCCM) and the feature calibration distribution module (FCDM). First, we design the MCCM module, which can adaptively extract feature information from objects at different scales. Then, we built the FCDM structure to make the multi-scale information fusion more appropriate and to alleviate the problem of missing features from small objects. Finally, we construct the Fish Segmentation and Detection (FSD) dataset by fusing multiple data augmentation methods, which enriches the data resources for underwater object detection and solves the problem of limited training resources for deep learning. We conduct experiments on FSD and public datasets, and the results show that the proposed MAE-FPN network significantly improves the detection performance of underwater objects, especially small objects.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"113 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Object Recognition Consistency in Regression for Active Detection 主动探测回归中的物体识别一致性

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-08-29 DOI: 10.1007/s00138-024-01604-5

Ming Jing, Zhilong Ou, Hongxing Wang, Jiaxin Li, Ziyi Zhao

Active learning has achieved great success in image classification because of selecting the most informative samples for data labeling and model training. However, the potential of active learning has been far from being realised in object detection due to its unique challenge in utilizing localization information. A popular compromise is to simply take active classification learning over detected object candidates. To consider the localization information of object detection, current effort usually falls into the model-dependent fashion, which either works on specific detection frameworks or relies on additionally designed modules. In this paper, we propose model-agnostic Object Recognition Consistency in Regression (ORCR), which can holistically measure the uncertainty information of classification and localization of each detected candidate from object detection. The philosophy behind ORCR is to obtain the detection uncertainty by calculating the classification consistency through localization regression at two successive detection scales. In the light of the proposed ORCR, we devise an active learning framework that enables an effortless deployment to any object detection architecture. Experimental results on the PASCAL VOC and MS-COCO benchmarks show that our method achieves better performance while simplifying the active detection process.

主动学习可以选择信息量最大的样本进行数据标注和模型训练，因此在图像分类领域取得了巨大成功。然而，由于主动学习在利用定位信息方面的独特挑战，其在物体检测方面的潜力还远远没有发挥出来。一种流行的折衷方法是简单地对检测到的候选对象进行主动分类学习。为了考虑物体检测的定位信息，目前的研究通常都是采用依赖模型的方式，要么基于特定的检测框架，要么依赖于额外设计的模块。在本文中，我们提出了与模型无关的回归中的物体识别一致性（ORCR），它可以从整体上衡量物体检测中每个检测候选对象的分类和定位的不确定性信息。ORCR 背后的理念是通过在两个连续的检测尺度上进行定位回归，计算分类一致性，从而获得检测的不确定性。根据所提出的 ORCR，我们设计了一个主动学习框架，可轻松部署到任何物体检测架构中。在 PASCAL VOC 和 MS-COCO 基准上的实验结果表明，我们的方法在简化主动检测过程的同时实现了更好的性能。

{"title":"Object Recognition Consistency in Regression for Active Detection","authors":"Ming Jing, Zhilong Ou, Hongxing Wang, Jiaxin Li, Ziyi Zhao","doi":"10.1007/s00138-024-01604-5","DOIUrl":"https://doi.org/10.1007/s00138-024-01604-5","url":null,"abstract":"Active learning has achieved great success in image classification because of selecting the most informative samples for data labeling and model training. However, the potential of active learning has been far from being realised in object detection due to its unique challenge in utilizing localization information. A popular compromise is to simply take active classification learning over detected object candidates. To consider the localization information of object detection, current effort usually falls into the model-dependent fashion, which either works on specific detection frameworks or relies on additionally designed modules. In this paper, we propose model-agnostic Object Recognition Consistency in Regression (ORCR), which can holistically measure the uncertainty information of classification and localization of each detected candidate from object detection. The philosophy behind ORCR is to obtain the detection uncertainty by calculating the classification consistency through localization regression at two successive detection scales. In the light of the proposed ORCR, we devise an active learning framework that enables an effortless deployment to any object detection architecture. Experimental results on the PASCAL VOC and MS-COCO benchmarks show that our method achieves better performance while simplifying the active detection process.\u0000","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"73 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast no-reference deep image dehazing 快速无参照深度图像去毛刺

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-08-29 DOI: 10.1007/s00138-024-01601-8

Hongyi Qin, Alexander G. Belyaev

This paper presents a deep learning method for image dehazing and clarification. The main advantages of the method are high computational speed and using unpaired image data for training. The method adapts the Zero-DCE approach (Li et al. in IEEE Trans Pattern Anal Mach Intell 44(8):4225–4238, 2021) for the image dehazing problem and uses high-order curves to adjust the dynamic range of images and achieve dehazing. Training the proposed dehazing neural network does not require paired hazy and clear datasets but instead utilizes a set of loss functions, assessing the quality of dehazed images to drive the training process. Experiments on a large number of real-world hazy images demonstrate that our proposed network effectively removes haze while preserving details and enhancing brightness. Furthermore, on an affordable GPU-equipped laptop, the processing speed can reach 1000 FPS for images with 2K resolution, making it highly suitable for real-time dehazing applications.

本文介绍了一种用于图像去毛刺和澄清的深度学习方法。该方法的主要优点是计算速度快，并使用非配对图像数据进行训练。该方法将 Zero-DCE 方法（Li 等人，发表于 IEEE Trans Pattern Anal Mach Intell 44(8):4225-4238, 2021）应用于图像去雾问题，并使用高阶曲线来调整图像的动态范围并实现去雾。训练所提出的去毛刺神经网络不需要配对朦胧和清晰数据集，而是利用一组损失函数，评估去毛刺图像的质量来驱动训练过程。在大量真实世界的雾霾图像上进行的实验表明，我们提出的网络能有效去除雾霾，同时保留细节并提高亮度。此外，在配备 GPU 的经济型笔记本电脑上，处理 2K 分辨率图像的速度可达 1000 FPS，因此非常适合实时去雾应用。

引用次数: 0

Synergetic proto-pull and reciprocal points for open set recognition 用于开集识别的协同原动力和互惠点

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-08-23 DOI: 10.1007/s00138-024-01596-2

Xin Deng, Luyao Yang, Ao Zhang, Jingwen Wang, Hexu Wang, Tianzhang Xing, Pengfei Xu

Open set recognition (OSR) aims to accept and classify known classes while rejecting unknown classes, which is the key technology for pattern recognition algorithms to be widely applied in practice. The challenges to OSR is to reduce the empirical classification risk of known classes and the open space risk of potential unknown classes. However, the existing OSR methods less consider to optimize the open space risk, and much dark information in unknown space is not taken into account, which results in that many unknown classes are misidentified as known classes. Therefore, we present a self-supervised learningbased OSR method with synergetic proto-pull and reciprocal points, which can remarkably reduce the risks of empirical classification and open space. Especially, we propose a new concept of proto-pull point, which can be synergistically combined with reciprocal points to shrink the feature spaces of known and unknown classes, and increase the feature distance between different classes, so as to form a good feature distribution. In addition, a self-supervised learning task of identifying the directions of rotated images is introduced in OSR model training, which is benefit for the OSR mdoel to capture more distinguishing features, and decreases both empirical classification and open space risks. The final experimental results on benchmark datasets show that our propsoed approach outperforms most existing OSR methods.

开放集识别（OSR）旨在接受已知类别并将其分类，同时拒绝未知类别，这是模式识别算法在实践中广泛应用的关键技术。开放集识别的挑战在于降低已知类的经验分类风险和潜在未知类的开放空间风险。然而，现有的 OSR 方法较少考虑优化开放空间风险，未知空间中的许多暗信息未被考虑在内，导致许多未知类被误认为已知类。因此，我们提出了一种基于自监督学习的 OSR 方法，该方法具有原点拉动和倒数点的协同作用，可以显著降低经验分类和开放空间的风险。特别是，我们提出了原拉点的新概念，它可以与倒易点协同结合，缩小已知类和未知类的特征空间，增加不同类之间的特征距离，从而形成良好的特征分布。此外，在 OSR 模型训练中还引入了识别旋转图像方向的自监督学习任务，这有利于 OSR mdoel 捕捉到更多的区分特征，并降低经验分类和开放空间的风险。在基准数据集上的最终实验结果表明，我们提出的方法优于大多数现有的 OSR 方法。

{"title":"Synergetic proto-pull and reciprocal points for open set recognition","authors":"Xin Deng, Luyao Yang, Ao Zhang, Jingwen Wang, Hexu Wang, Tianzhang Xing, Pengfei Xu","doi":"10.1007/s00138-024-01596-2","DOIUrl":"https://doi.org/10.1007/s00138-024-01596-2","url":null,"abstract":"Open set recognition (OSR) aims to accept and classify known classes while rejecting unknown classes, which is the key technology for pattern recognition algorithms to be widely applied in practice. The challenges to OSR is to reduce the empirical classification risk of known classes and the open space risk of potential unknown classes. However, the existing OSR methods less consider to optimize the open space risk, and much dark information in unknown space is not taken into account, which results in that many unknown classes are misidentified as known classes. Therefore, we present a self-supervised learningbased OSR method with synergetic proto-pull and reciprocal points, which can remarkably reduce the risks of empirical classification and open space. Especially, we propose a new concept of proto-pull point, which can be synergistically combined with reciprocal points to shrink the feature spaces of known and unknown classes, and increase the feature distance between different classes, so as to form a good feature distribution. In addition, a self-supervised learning task of identifying the directions of rotated images is introduced in OSR model training, which is benefit for the OSR mdoel to capture more distinguishing features, and decreases both empirical classification and open space risks. The final experimental results on benchmark datasets show that our propsoed approach outperforms most existing OSR methods.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"36 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced keypoint information and pose-weighted re-ID features for multi-person pose estimation and tracking 用于多人姿态估计和跟踪的增强型关键点信息和姿态加权再识别特征

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-08-22 DOI: 10.1007/s00138-024-01602-7

Xiangyang Wang, Tao Pei, Rui Wang

Multi-person pose estimation and tracking are crucial research directions in the field of artificial intelligence, with widespread applications in virtual reality, action recognition, and human-computer interaction. While existing pose tracking algorithms predominantly follow the top-down paradigm, they face challenges, such as pose occlusion and motion blur in complex scenes, leading to tracking inaccuracies. To address these challenges, we leverage enhanced keypoint information and pose-weighted re-identification (re-ID) features to improve the performance of multi-person pose estimation and tracking. Specifically, our proposed Decouple Heatmap Network decouples heatmaps into keypoint confidence and position. The refined keypoint information are utilized to reconstruct occluded poses. For the pose tracking task, we introduce a more efficient pipeline founded on pose-weighted re-ID features. This pipeline integrates a Pose Embedding Network to allocate weights to re-ID features and achieves the final pose tracking through a novel tracking matching algorithm. Extensive experiments indicate that our approach performs well in both multi-person pose estimation and tracking and achieves state-of-the-art results on the PoseTrack 2017 and 2018 datasets. Our source code is available at: https://github.com/TaoTaoPei/posetracking.

多人姿态估计和跟踪是人工智能领域的重要研究方向，在虚拟现实、动作识别和人机交互中有着广泛的应用。虽然现有的姿势跟踪算法主要遵循自上而下的范式，但它们面临着各种挑战，如复杂场景中的姿势遮挡和运动模糊，从而导致跟踪不准确。为了应对这些挑战，我们利用增强的关键点信息和姿势加权再识别（re-ID）特征来提高多人姿势估计和跟踪的性能。具体来说，我们提出的解耦热图网络将热图解耦为关键点置信度和位置。细化的关键点信息可用于重建被遮挡的姿势。对于姿势跟踪任务，我们引入了基于姿势加权再识别特征的更高效管道。该管道整合了姿势嵌入网络，为再识别特征分配权重，并通过新颖的跟踪匹配算法实现最终的姿势跟踪。广泛的实验表明，我们的方法在多人姿势估计和跟踪方面表现出色，并在 PoseTrack 2017 和 2018 数据集上取得了最先进的结果。我们的源代码可在以下网址获取：https://github.com/TaoTaoPei/posetracking。

{"title":"Enhanced keypoint information and pose-weighted re-ID features for multi-person pose estimation and tracking","authors":"Xiangyang Wang, Tao Pei, Rui Wang","doi":"10.1007/s00138-024-01602-7","DOIUrl":"https://doi.org/10.1007/s00138-024-01602-7","url":null,"abstract":"Multi-person pose estimation and tracking are crucial research directions in the field of artificial intelligence, with widespread applications in virtual reality, action recognition, and human-computer interaction. While existing pose tracking algorithms predominantly follow the top-down paradigm, they face challenges, such as pose occlusion and motion blur in complex scenes, leading to tracking inaccuracies. To address these challenges, we leverage enhanced keypoint information and pose-weighted re-identification (re-ID) features to improve the performance of multi-person pose estimation and tracking. Specifically, our proposed Decouple Heatmap Network decouples heatmaps into keypoint confidence and position. The refined keypoint information are utilized to reconstruct occluded poses. For the pose tracking task, we introduce a more efficient pipeline founded on pose-weighted re-ID features. This pipeline integrates a Pose Embedding Network to allocate weights to re-ID features and achieves the final pose tracking through a novel tracking matching algorithm. Extensive experiments indicate that our approach performs well in both multi-person pose estimation and tracking and achieves state-of-the-art results on the PoseTrack 2017 and 2018 datasets. Our source code is available at: https://github.com/TaoTaoPei/posetracking.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"8 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Camera-based mapping in search-and-rescue via flying and ground robot teams 通过飞行和地面机器人团队在搜救过程中使用基于摄像头的制图技术

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-08-20 DOI: 10.1007/s00138-024-01594-4

Bernardo Esteves Henriques, Mirko Baglioni, Anahita Jamshidnejad

Search and rescue (SaR) is challenging, due to the unknown environmental situation after disasters occur. Robotics has become indispensable for precise mapping of the environment and for locating the victims. Combining flying and ground robots more effectively serves this purpose, due to their complementary features in terms of viewpoint and maneuvering. To this end, a novel, cost-effective framework for mapping unknown environments is introduced that leverages You Only Look Once and video streams transmitted by a ground and a flying robot. The integrated mapping approach is for performing three crucial SaR tasks: localizing the victims, i.e., determining their position in the environment and their body pose, tracking the moving victims, and providing a map of the ground elevation that assists both the ground robot and the SaR crew in navigating the SaR environment. In real-life experiments at the CyberZoo of the Delft University of Technology, the framework proved very effective and precise for all these tasks, particularly in occluded and complex environments.

由于灾害发生后的环境状况未知，搜救工作极具挑战性。机器人技术已成为精确绘制环境地图和定位受害者不可或缺的工具。由于飞行机器人和地面机器人在视角和操纵方面具有互补性，因此它们的结合能更有效地实现这一目的。为此，我们介绍了一种新颖、经济高效的未知环境绘图框架，该框架利用了 "只看一次 "以及地面机器人和飞行机器人传输的视频流。这种集成绘图方法用于执行三项关键的 SaR 任务：定位受害者，即确定他们在环境中的位置和身体姿势；跟踪移动的受害者；提供地面高程图，以协助地面机器人和 SaR 人员在 SaR 环境中导航。在代尔夫特理工大学网络动物园（CyberZoo）的实际实验中，该框架被证明对所有这些任务都非常有效和精确，尤其是在隐蔽和复杂的环境中。

引用次数: 0

Transformer with multi-level grid features and depth pooling for image captioning 具有多级网格特征和深度汇集功能的变换器，用于图像字幕制作

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-08-20 DOI: 10.1007/s00138-024-01599-z

Doanh C. Bui, Tam V. Nguyen, Khang Nguyen

Image captioning is an exciting yet challenging problem in both computer vision and natural language processing research. In recent years, this problem has been addressed by Transformer-based models optimized with Cross-Entropy loss and boosted performance via Self-Critical Sequence Training. Two types of representations are embedded into captioning models: grid features and region features, and there have been attempts to include 2D geometry information in the self-attention computation. However, the 3D order of object appearances is not considered, leading to confusion for the model in cases of complex scenes with overlapped objects. In addition, recent studies using only feature maps from the last layer or block of a pretrained CNN-based model may lack spatial information. In this paper, we present the Transformer-based captioning model dubbed TMDNet. Our model includes one module to aggregate multi-level grid features (MGFA) to enrich the representation ability using prior knowledge, and another module to effectively embed the image’s depth-grid aggregation (DGA) into the model space for better performance. The proposed model demonstrates its effectiveness via evaluation on the MS-COCO “Karpathy” test split across five standard metrics.

图像标题是计算机视觉和自然语言处理研究中一个令人兴奋而又充满挑战的问题。近年来，基于变换器的模型通过交叉熵损失进行了优化，并通过自关键序列训练提高了性能，从而解决了这一问题。字幕模型中嵌入了两类表征：网格特征和区域特征，并尝试将二维几何信息纳入自我关注计算。然而，由于没有考虑物体出现的三维顺序，因此在物体重叠的复杂场景中会导致模型混乱。此外，最近的研究仅使用基于 CNN 的预训练模型最后一层或块的特征图，可能缺乏空间信息。在本文中，我们提出了基于变换器的字幕模型，称为 TMDNet。我们的模型包括一个用于聚合多级网格特征（MGFA）的模块，以利用先验知识丰富表示能力；另一个模块用于将图像的深度网格聚合（DGA）有效嵌入模型空间，以获得更好的性能。通过对 MS-COCO "Karpathy "测试（分为五个标准指标）进行评估，证明了所提出模型的有效性。

{"title":"Transformer with multi-level grid features and depth pooling for image captioning","authors":"Doanh C. Bui, Tam V. Nguyen, Khang Nguyen","doi":"10.1007/s00138-024-01599-z","DOIUrl":"https://doi.org/10.1007/s00138-024-01599-z","url":null,"abstract":"Image captioning is an exciting yet challenging problem in both computer vision and natural language processing research. In recent years, this problem has been addressed by Transformer-based models optimized with Cross-Entropy loss and boosted performance via Self-Critical Sequence Training. Two types of representations are embedded into captioning models: grid features and region features, and there have been attempts to include 2D geometry information in the self-attention computation. However, the 3D order of object appearances is not considered, leading to confusion for the model in cases of complex scenes with overlapped objects. In addition, recent studies using only feature maps from the last layer or block of a pretrained CNN-based model may lack spatial information. In this paper, we present the Transformer-based captioning model dubbed TMDNet. Our model includes one module to aggregate multi-level grid features (MGFA) to enrich the representation ability using prior knowledge, and another module to effectively embed the image’s depth-grid aggregation (DGA) into the model space for better performance. The proposed model demonstrates its effectiveness via evaluation on the MS-COCO “Karpathy” test split across five standard metrics.\u0000","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"9 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Machine Vision and Applications

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀