首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
Positional diffusion: Graph-based diffusion models for set ordering 位置扩散:基于图的集合排序扩散模型
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.010
Francesco Giuliari , Gianluca Scarpellini , Stefano Fiorini , Stuart James , Pietro Morerio , Yiming Wang , Alessio Del Bue
Positional reasoning is the process of ordering an unsorted set of parts into a consistent structure. To address this problem, we present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models. Using a diffusion process, we add Gaussian noise to the set elements’ position and map them to a random position in a continuous space. Positional Diffusion learns to reverse the noising process and recover the original positions through an Attention-based Graph Neural Network. To evaluate our method, we conduct extensive experiments on three different tasks and seven datasets, comparing our approach against the state-of-the-art methods for visual puzzle-solving, sentence ordering, and room arrangement, demonstrating that our method outperforms long-lasting research on puzzle solving with up to +17% compared to the second-best deep learning method, and performs on par against the state-of-the-art methods on sentence ordering and room rearrangement. Our work highlights the suitability of diffusion models for ordering problems and proposes a novel formulation and method for solving various ordering tasks. We release our code at https://github.com/IIT-PAVIS/Positional_Diffusion.
定位推理是将未排序的部件集合排序为一致结构的过程。为了解决这个问题,我们提出了位置扩散(Positional Diffusion),这是一种采用扩散概率模型的即插即用图表述方法。利用扩散过程,我们在集合元素的位置上添加高斯噪声,并将它们映射到连续空间中的随机位置。通过基于注意力的图神经网络,位置扩散学会逆转噪声过程并恢复原始位置。为了评估我们的方法,我们在三个不同的任务和七个数据集上进行了广泛的实验,将我们的方法与视觉解谜、句子排序和房间布置的最先进方法进行了比较,结果表明,我们的方法优于解谜方面的长期研究,与第二好的深度学习方法相比,我们的方法最多可提高 17%,与句子排序和房间重新布置的最先进方法相比,我们的方法表现也不相上下。我们的工作强调了扩散模型在排序问题上的适用性,并提出了解决各种排序任务的新公式和方法。我们在 https://github.com/IIT-PAVIS/Positional_Diffusion 上发布了我们的代码。
{"title":"Positional diffusion: Graph-based diffusion models for set ordering","authors":"Francesco Giuliari ,&nbsp;Gianluca Scarpellini ,&nbsp;Stefano Fiorini ,&nbsp;Stuart James ,&nbsp;Pietro Morerio ,&nbsp;Yiming Wang ,&nbsp;Alessio Del Bue","doi":"10.1016/j.patrec.2024.10.010","DOIUrl":"10.1016/j.patrec.2024.10.010","url":null,"abstract":"<div><div>Positional reasoning is the process of ordering an unsorted set of parts into a consistent structure. To address this problem, we present <em>Positional Diffusion</em>, a plug-and-play graph formulation with Diffusion Probabilistic Models. Using a diffusion process, we add Gaussian noise to the set elements’ position and map them to a random position in a continuous space. <em>Positional Diffusion</em> learns to reverse the noising process and recover the original positions through an Attention-based Graph Neural Network. To evaluate our method, we conduct extensive experiments on three different tasks and seven datasets, comparing our approach against the state-of-the-art methods for visual puzzle-solving, sentence ordering, and room arrangement, demonstrating that our method outperforms long-lasting research on puzzle solving with up to <span><math><mrow><mo>+</mo><mn>17</mn><mtext>%</mtext></mrow></math></span> compared to the second-best deep learning method, and performs on par against the state-of-the-art methods on sentence ordering and room rearrangement. Our work highlights the suitability of diffusion models for ordering problems and proposes a novel formulation and method for solving various ordering tasks. We release our code at <span><span>https://github.com/IIT-PAVIS/Positional_Diffusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 272-278"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepMarkerNet: Leveraging supervision from the Duchenne Marker for spontaneous smile recognition DeepMarkerNet:利用 Duchenne 标记的监督进行自发微笑识别
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.015
Mohammad Junayed Hasan , Kazi Rafat , Fuad Rahman , Nabeel Mohammed , Shafin Rahman
Distinguishing between spontaneous and posed smiles from videos poses a significant challenge in pattern classification literature. Researchers have developed feature-based and deep learning-based solutions for this problem. To this end, deep learning outperforms feature-based methods. However, certain aspects of feature-based methods could improve deep learning methods. For example, previous research has shown that Duchenne Marker (or D-Marker) features from the face play a vital role in spontaneous smiles, which can be useful to improve deep learning performances. In this study, we propose a deep learning solution that leverages D-Marker features to improve performance further. Our multi-task learning framework, named DeepMarkerNet, integrates a transformer network with the utilization of facial D-Markers for accurate smile classification. Unlike past methods, our approach simultaneously predicts the class of the smile and associated facial D-Markers using two different feed-forward neural networks, thus creating a symbiotic relationship that enriches the learning process. The novelty of our approach lies in incorporating supervisory signals from the pre-calculated D-Markers (instead of as input in previous works), harmonizing the loss functions through a weighted average. In this way, our training utilizes the benefits of D-Markers, but the inference does not require computing the D-Marker. We validate our model’s effectiveness on four well-known smile datasets: UvA-NEMO, BBC, MMI facial expression, and SPOS datasets, and achieve state-of-the-art results.
在模式分类文献中,如何区分视频中的自发微笑和摆拍微笑是一项重大挑战。研究人员针对这一问题开发了基于特征和基于深度学习的解决方案。为此,深度学习优于基于特征的方法。不过,基于特征的方法的某些方面可以改进深度学习方法。例如,先前的研究表明,人脸的杜氏标记(或 D-Marker)特征在自发微笑中起着至关重要的作用,这对提高深度学习性能很有帮助。在本研究中,我们提出了一种利用 D-Marker 特征进一步提高性能的深度学习解决方案。我们的多任务学习框架被命名为 DeepMarkerNet,它将变压器网络与面部 D 标记的利用整合在一起,以实现准确的微笑分类。与以往的方法不同,我们的方法使用两个不同的前馈神经网络同时预测微笑的类别和相关的面部 D-标记,从而建立了一种共生关系,丰富了学习过程。我们方法的新颖之处在于结合了预先计算的 D-标记的监督信号(而不是之前工作中的输入信号),通过加权平均来协调损失函数。这样,我们的训练利用了 D-标记的优势,但推理不需要计算 D-标记。我们在四个著名的微笑数据集上验证了我们模型的有效性:我们在 UvA-NEMO、BBC、MMI 面部表情和 SPOS 数据集上验证了我们模型的有效性,并取得了最先进的结果。
{"title":"DeepMarkerNet: Leveraging supervision from the Duchenne Marker for spontaneous smile recognition","authors":"Mohammad Junayed Hasan ,&nbsp;Kazi Rafat ,&nbsp;Fuad Rahman ,&nbsp;Nabeel Mohammed ,&nbsp;Shafin Rahman","doi":"10.1016/j.patrec.2024.09.015","DOIUrl":"10.1016/j.patrec.2024.09.015","url":null,"abstract":"<div><div>Distinguishing between spontaneous and posed smiles from videos poses a significant challenge in pattern classification literature. Researchers have developed feature-based and deep learning-based solutions for this problem. To this end, deep learning outperforms feature-based methods. However, certain aspects of feature-based methods could improve deep learning methods. For example, previous research has shown that Duchenne Marker (or D-Marker) features from the face play a vital role in spontaneous smiles, which can be useful to improve deep learning performances. In this study, we propose a deep learning solution that leverages D-Marker features to improve performance further. Our multi-task learning framework, named DeepMarkerNet, integrates a transformer network with the utilization of facial D-Markers for accurate smile classification. Unlike past methods, our approach simultaneously predicts the class of the smile and associated facial D-Markers using two different feed-forward neural networks, thus creating a symbiotic relationship that enriches the learning process. The novelty of our approach lies in incorporating supervisory signals from the pre-calculated D-Markers (instead of as input in previous works), harmonizing the loss functions through a weighted average. In this way, our training utilizes the benefits of D-Markers, but the inference does not require computing the D-Marker. We validate our model’s effectiveness on four well-known smile datasets: UvA-NEMO, BBC, MMI facial expression, and SPOS datasets, and achieve state-of-the-art results.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 148-155"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual-guided hierarchical iterative fusion for multi-modal video action recognition 视觉引导分层迭代融合技术用于多模态视频动作识别
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.003
Bingbing Zhang , Ying Zhang , Jianxin Zhang , Qiule Sun , Rong Wang , Qiang Zhang
Vision-Language models (VLMs) have shown promising improvements on various visual tasks. Most existing VLMs employ two separate transformer-based encoders, each dedicated to modeling visual and language features independently. Because the visual features and language features are unaligned in the feature space, it is challenging for the multi-modal encoder to learn vision-language interactions. In this paper, we propose a Visual-guided Hierarchical Iterative Fusion (VgHIF) method for VLMs in video action recognition, which acquires more discriminative vision and language representation. VgHIF leverages visual features from different levels in visual encoder to interact with language representation. The interaction is processed by the attention mechanism to calculate the correlation between visual features and language representation. VgHIF learns grounded video-text representation and supports many different pre-trained VLMs in a flexible and efficient manner with a tiny computational cost. We conducted experiments on the Kinetics-400 Mini Kinetics 200 HMDB51, and UCF101 using VLMs: CLIP, X-CLIP, and ViFi-CLIP. The experiments were conducted under full supervision and few shot settings, and compared with the baseline multi-modal model without VgHIF, the Top-1 accuracy of the proposed method has been improved to varying degrees, and several groups of results have achieved comparable results with state-of-the-art performance, which strongly verified the effectiveness of the proposed method.
视觉-语言模型(VLM)在各种视觉任务中都有可喜的改进。现有的视觉语言模型大多采用两个独立的基于变压器的编码器,分别独立地对视觉和语言特征进行建模。由于视觉特征和语言特征在特征空间中是不对齐的,因此多模态编码器在学习视觉-语言交互时面临挑战。在本文中,我们为视频动作识别中的 VLMs 提出了一种视觉引导分层迭代融合(VgHIF)方法,它能获得更具区分性的视觉和语言表征。VgHIF 利用视觉编码器中不同层次的视觉特征与语言表征进行交互。这种交互由注意力机制处理,以计算视觉特征与语言表征之间的相关性。VgHIF 学习接地的视频-文本表征,并以灵活高效的方式支持多种不同的预训练 VLM,计算成本极低。我们使用 VLM 在 Kinetics-400 Mini Kinetics 200 HMDB51 和 UCF101 上进行了实验:CLIP、X-CLIP 和 ViFi-CLIP。实验是在完全监督和少量镜头设置下进行的,与不带 VgHIF 的基线多模态模型相比,所提方法的 Top-1 精度有不同程度的提高,有几组结果达到了与最先进性能相当的结果,这有力地验证了所提方法的有效性。
{"title":"Visual-guided hierarchical iterative fusion for multi-modal video action recognition","authors":"Bingbing Zhang ,&nbsp;Ying Zhang ,&nbsp;Jianxin Zhang ,&nbsp;Qiule Sun ,&nbsp;Rong Wang ,&nbsp;Qiang Zhang","doi":"10.1016/j.patrec.2024.10.003","DOIUrl":"10.1016/j.patrec.2024.10.003","url":null,"abstract":"<div><div>Vision-Language models<!--> <!-->(VLMs) have shown promising improvements on various visual tasks. Most existing VLMs employ two separate transformer-based encoders, each dedicated to modeling visual and language features independently. Because the visual features and language features are unaligned in the feature space, it is challenging for the multi-modal encoder to learn vision-language interactions. In this paper, we propose a <strong>V</strong>isual-<strong>g</strong>uided <strong>H</strong>ierarchical <strong>I</strong>terative <strong>F</strong>usion (VgHIF) method for VLMs in video action recognition, which acquires more discriminative vision and language representation. VgHIF leverages visual features from different levels in visual encoder to interact with language representation. The interaction is processed by the attention mechanism to calculate the correlation between visual features and language representation. VgHIF learns grounded video-text representation and supports many different pre-trained VLMs in a flexible and efficient manner with a tiny computational cost. We conducted experiments on the Kinetics-400 Mini Kinetics 200 HMDB51, and UCF101 using VLMs: CLIP, X-CLIP, and ViFi-CLIP. The experiments were conducted under full supervision and few shot settings, and compared with the baseline multi-modal model without VgHIF, the Top-1 accuracy of the proposed method has been improved to varying degrees, and several groups of results have achieved comparable results with state-of-the-art performance, which strongly verified the effectiveness of the proposed method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 213-220"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust affine point matching via quadratic assignment on Grassmannians 通过格拉斯曼二次赋值实现鲁棒仿射点匹配
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.016
Alexander Kolpakov , Michael Werman
Robust Affine Matching with Grassmannians (RoAM) is a new algorithm to perform affine registration of point clouds. The algorithm is based on minimizing the Frobenius distance between two elements of the Grassmannian. For this purpose, an indefinite relaxation of the Quadratic Assignment Problem (QAP) is used, and several approaches to affine feature matching are studied and compared. Experiments demonstrate that RoAM is more robust to noise and point discrepancy than previous methods.
用格拉斯曼进行鲁棒仿射匹配(RoAM)是一种对点云进行仿射配准的新算法。该算法基于最小化格拉斯曼两个元素之间的弗罗贝尼斯距离。为此,该算法使用了二次赋值问题(QAP)的不定期松弛,并对几种仿射特征匹配方法进行了研究和比较。实验证明,与之前的方法相比,RoAM 对噪声和点差异的鲁棒性更强。
{"title":"Robust affine point matching via quadratic assignment on Grassmannians","authors":"Alexander Kolpakov ,&nbsp;Michael Werman","doi":"10.1016/j.patrec.2024.09.016","DOIUrl":"10.1016/j.patrec.2024.09.016","url":null,"abstract":"<div><div>Robust Affine Matching with Grassmannians (RoAM) is a new algorithm to perform affine registration of point clouds. The algorithm is based on minimizing the Frobenius distance between two elements of the Grassmannian. For this purpose, an indefinite relaxation of the Quadratic Assignment Problem (QAP) is used, and several approaches to affine feature matching are studied and compared. Experiments demonstrate that RoAM is more robust to noise and point discrepancy than previous methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 265-271"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the effects of obfuscating speaker attributes in privacy-aware depression detection 在隐私感知抑郁检测中混淆说话者属性的影响
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.016
Nujud Aloshban , Anna Esposito , Alessandro Vinciarelli , Tanaya Guha
Detection of depressive symptoms from spoken content has emerged as an efficient Artificial Intelligence (AI) tool for diagnosing this serious mental health condition. Since speech is a highly sensitive form of data, privacy-enhancing measures need to be in place for this technology to be useful. A common approach to enhance speech privacy is by using adversarial learning that involves concealing speaker’s specific attributes/identity while maintaining performance of the primary task. Although this technique works well for applications such as speech recognition, they are often ineffective for depression detection due to the interplay between certain speaker attributes and the performance of depression detection. This paper studies such interplay through a systematic study on how obfuscating specific speaker attributes (age, education) through adversarial learning impact the performance of a depression detection model. We highlight the relevance of two previously unexplored speaker attributes to depression detection, while considering a multimodal (audio-lexical) setting to highlight the relative vulnerabilities of the modalities under obfuscation. Results on a publicly available, clinically validated, depression detection dataset shows that attempts to disentangle age/education attributes through adversarial learning result in a large drop in depression detection accuracy, especially for the text modality. This calls for a revisit to how privacy mitigation should to be achieved for depression detection and any human-centric applications for that matter.
从口语内容中检测抑郁症状已经成为诊断这种严重精神疾病的有效人工智能(AI)工具。由于语音是一种高度敏感的数据形式,因此需要采取加强隐私保护的措施才能使这项技术发挥作用。增强语音隐私的常见方法是使用对抗学习,即在保持主要任务性能的同时隐藏说话者的特定属性/身份。虽然这种技术在语音识别等应用中效果很好,但由于某些说话者属性与抑郁检测性能之间的相互作用,它们在抑郁检测中往往不起作用。本文通过系统研究通过对抗学习混淆特定说话者属性(年龄、教育程度)如何影响抑郁检测模型的性能,来研究这种相互作用。我们强调了两个以前未探索过的说话者属性与抑郁检测的相关性,同时考虑了多模态(音频-文字)设置,以突出混淆下模态的相对脆弱性。在一个公开的、经过临床验证的抑郁检测数据集上得出的结果表明,试图通过对抗学习来区分年龄/教育属性会导致抑郁检测准确率大幅下降,尤其是对于文本模态。这就要求我们重新审视抑郁症检测和任何以人为中心的应用应如何实现隐私保护。
{"title":"On the effects of obfuscating speaker attributes in privacy-aware depression detection","authors":"Nujud Aloshban ,&nbsp;Anna Esposito ,&nbsp;Alessandro Vinciarelli ,&nbsp;Tanaya Guha","doi":"10.1016/j.patrec.2024.10.016","DOIUrl":"10.1016/j.patrec.2024.10.016","url":null,"abstract":"<div><div>Detection of depressive symptoms from spoken content has emerged as an efficient Artificial Intelligence (AI) tool for diagnosing this serious mental health condition. Since speech is a highly sensitive form of data, privacy-enhancing measures need to be in place for this technology to be useful. A common approach to enhance speech privacy is by using adversarial learning that involves concealing speaker’s specific attributes/identity while maintaining performance of the primary task. Although this technique works well for applications such as speech recognition, they are often ineffective for depression detection due to the interplay between certain speaker attributes and the performance of depression detection. This paper studies such interplay through a systematic study on how obfuscating specific speaker attributes (age, education) through adversarial learning impact the performance of a depression detection model. We highlight the relevance of two previously unexplored speaker attributes to depression detection, while considering a multimodal (audio-lexical) setting to highlight the relative vulnerabilities of the modalities under obfuscation. Results on a publicly available, clinically validated, depression detection dataset shows that attempts to disentangle age/education attributes through adversarial learning result in a large drop in depression detection accuracy, especially for the text modality. This calls for a revisit to how privacy mitigation should to be achieved for depression detection and any human-centric applications for that matter.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 300-305"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Innovative multi-stage matching for counting anything 创新的多级匹配,可用于任何计数
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.014
Shihui Zhang , Zhigang Huang , Sheng Zhan , Ping Li , Zhiguo Cui , Feiyu Li
Few-shot counting (FSC) is the task of counting the number of objects in an image that belong to the same category, by using a provided exemplar pattern. By replacing the exemplar, we can effectively count anything, even in cases where we have no prior knowledge of that category’s exemplar. However, due to the variations within the same category and the impact of inter-class similarity, it is challenging to achieve accurate intra-class similarity matching using conventional similarity comparison methods. To tackle these issues, we propose a novel few-shot counting method called Multi-stage Exemplar Attention Match Network (MEAMNet), which increases the accuracy of matching, reduces the impact of noise, and enhances similarity feature matching. Specifically, we propose a multi-stage matching strategy to obtain more stable and effective matching results by acquiring similar feature in different feature spaces. In addition, we propose a novel feature matching module called Exemplar Attention Match (EAM). With this module, the intra-class similarity representation in each stage will be enhanced to achieve a better matching of the key feature. Experimental results indicate that our method not only significantly surpasses the state-of-the-art (SOTA) methods in most evaluation metrics on the FSC-147 dataset but also achieves comprehensive superiority on the CARPK dataset. This highlights the outstanding accuracy and stability of our matching performance, as well as its exceptional transferability. We will release the code at https://github.com/hzg0505/MEAMNet.
少点计数(FSC)是指利用提供的示例模式,对图像中属于同一类别的物体数量进行计数。通过替换示例,我们可以有效地计算出任何物体的数量,即使在我们事先不知道该类别的示例的情况下也是如此。然而,由于同一类别内的差异和类间相似性的影响,使用传统的相似性比较方法实现准确的类内相似性匹配具有挑战性。为了解决这些问题,我们提出了一种新颖的少量计数方法,即多阶段典范注意力匹配网络(MEAMNet),它能提高匹配的准确性,减少噪声的影响,并增强相似性特征匹配。具体来说,我们提出了一种多阶段匹配策略,通过获取不同特征空间中的相似特征来获得更稳定有效的匹配结果。此外,我们还提出了一种新颖的特征匹配模块,称为 "典范关注匹配(EAM)"。有了这个模块,每个阶段的类内相似性表示将得到增强,从而实现更好的关键特征匹配。实验结果表明,在 FSC-147 数据集上,我们的方法不仅在大多数评价指标上大大超过了最先进的方法(SOTA),而且在 CARPK 数据集上也取得了全面的优势。这凸显了我们匹配性能的出色准确性和稳定性,以及其卓越的可移植性。我们将在 https://github.com/hzg0505/MEAMNet 上发布代码。
{"title":"Innovative multi-stage matching for counting anything","authors":"Shihui Zhang ,&nbsp;Zhigang Huang ,&nbsp;Sheng Zhan ,&nbsp;Ping Li ,&nbsp;Zhiguo Cui ,&nbsp;Feiyu Li","doi":"10.1016/j.patrec.2024.09.014","DOIUrl":"10.1016/j.patrec.2024.09.014","url":null,"abstract":"<div><div>Few-shot counting (FSC) is the task of counting the number of objects in an image that belong to the same category, by using a provided exemplar pattern. By replacing the exemplar, we can effectively count anything, even in cases where we have no prior knowledge of that category’s exemplar. However, due to the variations within the same category and the impact of inter-class similarity, it is challenging to achieve accurate intra-class similarity matching using conventional similarity comparison methods. To tackle these issues, we propose a novel few-shot counting method called Multi-stage Exemplar Attention Match Network (MEAMNet), which increases the accuracy of matching, reduces the impact of noise, and enhances similarity feature matching. Specifically, we propose a multi-stage matching strategy to obtain more stable and effective matching results by acquiring similar feature in different feature spaces. In addition, we propose a novel feature matching module called Exemplar Attention Match (EAM). With this module, the intra-class similarity representation in each stage will be enhanced to achieve a better matching of the key feature. Experimental results indicate that our method not only significantly surpasses the state-of-the-art (SOTA) methods in most evaluation metrics on the FSC-147 dataset but also achieves comprehensive superiority on the CARPK dataset. This highlights the outstanding accuracy and stability of our matching performance, as well as its exceptional transferability. We will release the code at <span><span>https://github.com/hzg0505/MEAMNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 141-147"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of visual SLAM algorithms in unstructured planetary-like and agricultural environments 评估非结构化类地行星和农业环境中的视觉 SLAM 算法
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.025
Víctor Romero-Bautista, Leopoldo Altamirano-Robles, Raquel Díaz-Hernández, Saúl Zapotecas-Martínez, Nohemí Sanchez-Medel
Given the significant advance in visual SLAM (VSLAM), it might be assumed that the location and mapping problem has been solved. Nevertheless, VSLAM algorithms may exhibit poor performance in unstructured environments. This paper addresses the problem of VSLAM in unstructured planetary-like and agricultural environments. A performance study of state-of-the-art algorithms in these environments was conducted to evaluate their robustness. Quantitative and qualitative results of the study are reported, which exposes that the impressive performance of most state-of-the-art VSLAM algorithms is not generally reflected in unstructured planetary-like and agricultural environments. Statistical scene analysis was performed on datasets from well-known structured environments as well as planetary-like and agricultural datasets to identify visual differences between structured and unstructured environments, which cause VSLAM algorithms to fail. In addition, strategies to overcome the VSLAM algorithm limitations in unstructured planetary-like and agricultural environments are suggested to guide future research on VSLAM in these environments.
鉴于视觉 SLAM(VSLAM)技术的长足进步,人们可能会认为定位和绘图问题已经解决。然而,VSLAM 算法在非结构化环境中可能表现不佳。本文探讨了非结构化类地行星和农业环境中的 VSLAM 问题。在这些环境中对最先进的算法进行了性能研究,以评估其鲁棒性。研究报告的定量和定性结果表明,大多数最先进的 VSLAM 算法在非结构化类地行星和农业环境中并没有普遍体现出令人印象深刻的性能。研究人员对著名的结构化环境数据集以及类行星和农业数据集进行了统计场景分析,以确定结构化环境和非结构化环境之间的视觉差异,这些差异会导致 VSLAM 算法失效。此外,还提出了在非结构化类地行星和农业环境中克服 VSLAM 算法局限性的策略,以指导未来在这些环境中的 VSLAM 研究。
{"title":"Evaluation of visual SLAM algorithms in unstructured planetary-like and agricultural environments","authors":"Víctor Romero-Bautista,&nbsp;Leopoldo Altamirano-Robles,&nbsp;Raquel Díaz-Hernández,&nbsp;Saúl Zapotecas-Martínez,&nbsp;Nohemí Sanchez-Medel","doi":"10.1016/j.patrec.2024.09.025","DOIUrl":"10.1016/j.patrec.2024.09.025","url":null,"abstract":"<div><div>Given the significant advance in visual SLAM (VSLAM), it might be assumed that the location and mapping problem has been solved. Nevertheless, VSLAM algorithms may exhibit poor performance in unstructured environments. This paper addresses the problem of VSLAM in unstructured planetary-like and agricultural environments. A performance study of state-of-the-art algorithms in these environments was conducted to evaluate their robustness. Quantitative and qualitative results of the study are reported, which exposes that the impressive performance of most state-of-the-art VSLAM algorithms is not generally reflected in unstructured planetary-like and agricultural environments. Statistical scene analysis was performed on datasets from well-known structured environments as well as planetary-like and agricultural datasets to identify visual differences between structured and unstructured environments, which cause VSLAM algorithms to fail. In addition, strategies to overcome the VSLAM algorithm limitations in unstructured planetary-like and agricultural environments are suggested to guide future research on VSLAM in these environments.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 106-112"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A histogram-based approach to calculate graph similarity using graph neural networks 利用图神经网络计算图相似性的直方图方法
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.015
Nadeem Iqbal Kajla , Malik Muhammad Saad Missen , Mickael Coustaty , Hafiz Muhammad Sanaullah Badar , Maruf Pasha , Faiza Belbachir
Deep learning has revolutionized the field of pattern recognition and machine learning by exhibiting exceptional efficiency in recognizing patterns. The success of deep learning can be seen in a wide range of applications including speech recognition, natural language processing, video processing, and image classification. Moreover, it has also been successful in recognizing structural patterns, such as graphs. Graph Neural Networks (GNNs) are models that employ message passing between nodes in a graph to capture its dependencies. These networks memorize a state that approximates graph information with greater depth compared to traditional neural networks. Although training a GNN can be challenging, recent advances in GNN variants, including Graph Convolutional Neural Networks, Gated Graph Neural Networks, and Graph Attention Networks, have shown promising results in solving various problems. In this work, we present a GNN-based approach for computing graph similarity and demonstrate its application to a classification problem. Our proposed method converts the similarity of two graphs into a score, and experiments on state-of-the-art datasets show that the proposed technique is effective and efficient. Results are summarized using a confusion matrix and mean square error metric, demonstrating the accuracy of our proposed technique.
深度学习在模式识别和机器学习领域掀起了一场革命,在识别模式方面表现出了非凡的效率。深度学习的成功体现在语音识别、自然语言处理、视频处理和图像分类等广泛应用中。此外,深度学习在识别图等结构模式方面也取得了成功。图神经网络(GNN)是一种利用图中节点之间的信息传递来捕捉其依赖关系的模型。与传统的神经网络相比,这些网络记忆的状态可以更深入地近似图信息。尽管训练 GNN 可能具有挑战性,但最近在 GNN 变体(包括图卷积神经网络、门控图神经网络和图注意力网络)方面取得的进展已经在解决各种问题方面显示出良好的效果。在这项工作中,我们提出了一种基于图神经网络的计算图相似性的方法,并演示了它在分类问题中的应用。我们提出的方法将两个图的相似性转换为分数,在最先进的数据集上进行的实验表明,我们提出的技术是有效和高效的。实验结果使用混淆矩阵和均方误差度量进行总结,证明了我们提出的技术的准确性。
{"title":"A histogram-based approach to calculate graph similarity using graph neural networks","authors":"Nadeem Iqbal Kajla ,&nbsp;Malik Muhammad Saad Missen ,&nbsp;Mickael Coustaty ,&nbsp;Hafiz Muhammad Sanaullah Badar ,&nbsp;Maruf Pasha ,&nbsp;Faiza Belbachir","doi":"10.1016/j.patrec.2024.10.015","DOIUrl":"10.1016/j.patrec.2024.10.015","url":null,"abstract":"<div><div>Deep learning has revolutionized the field of pattern recognition and machine learning by exhibiting exceptional efficiency in recognizing patterns. The success of deep learning can be seen in a wide range of applications including speech recognition, natural language processing, video processing, and image classification. Moreover, it has also been successful in recognizing structural patterns, such as graphs. Graph Neural Networks (GNNs) are models that employ message passing between nodes in a graph to capture its dependencies. These networks memorize a state that approximates graph information with greater depth compared to traditional neural networks. Although training a GNN can be challenging, recent advances in GNN variants, including Graph Convolutional Neural Networks, Gated Graph Neural Networks, and Graph Attention Networks, have shown promising results in solving various problems. In this work, we present a GNN-based approach for computing graph similarity and demonstrate its application to a classification problem. Our proposed method converts the similarity of two graphs into a score, and experiments on state-of-the-art datasets show that the proposed technique is effective and efficient. Results are summarized using a confusion matrix and mean square error metric, demonstrating the accuracy of our proposed technique.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 286-291"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local Reference Feature Transfer (LRFT): A simple pre-processing step for image enhancement 局部参考特征转移 (LRFT):图像增强的简单预处理步骤
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.013
Ling Zhou , Weidong Zhang , Yuchao Zheng , Jianping Wang , Wenyi Zhao
Low-light, nighttime haze, and underwater images captured in harsh environments typically exhibit color deviations and reduced visibility due to light scattering and absorption. Additionally, we observe an almost complete loss of information in at least one color channel in these degraded images. To repair the lost information in each channel, we present an image preprocessing strategy called Local Reference Feature Transfer (LRFT), which employs the local feature to compensate for the color loss automatically. Specifically, we design a dedicated reference image by fusing the detail, salience, and uniform grayscale images of the raw image that ensures a balanced chromaticity distribution. Subsequently, we employ the local reference feature transfer strategy to migrate the local mean and variance of the reference image to the raw image to get a color-corrected image. Extensive evaluation experiments demonstrate that our proposed LRFT method has good preprocessing performance for the subsequent enhancement of images of different degradation types. The code is publicly available at: https://www.researchgate.net/publication/383528251_2024-LRFT.
由于光的散射和吸收,在恶劣环境下拍摄的弱光、夜间雾霾和水下图像通常会出现色彩偏差,能见度降低。此外,我们还观察到这些劣化图像中至少有一个颜色通道的信息几乎完全丢失。为了修复每个通道中丢失的信息,我们提出了一种名为本地参考特征转移(LRFT)的图像预处理策略,该策略利用本地特征自动补偿色彩损失。具体来说,我们通过融合原始图像的细节、显著性和均匀灰度图像来设计专用参考图像,以确保色度分布均衡。随后,我们采用局部参考特征转移策略,将参考图像的局部均值和方差迁移到原始图像上,从而得到色彩校正图像。广泛的评估实验证明,我们提出的 LRFT 方法具有良好的预处理性能,可用于不同退化类型图像的后续增强。代码可在以下网址公开获取:https://www.researchgate.net/publication/383528251_2024-LRFT。
{"title":"Local Reference Feature Transfer (LRFT): A simple pre-processing step for image enhancement","authors":"Ling Zhou ,&nbsp;Weidong Zhang ,&nbsp;Yuchao Zheng ,&nbsp;Jianping Wang ,&nbsp;Wenyi Zhao","doi":"10.1016/j.patrec.2024.10.013","DOIUrl":"10.1016/j.patrec.2024.10.013","url":null,"abstract":"<div><div>Low-light, nighttime haze, and underwater images captured in harsh environments typically exhibit color deviations and reduced visibility due to light scattering and absorption. Additionally, we observe an almost complete loss of information in at least one color channel in these degraded images. To repair the lost information in each channel, we present an image preprocessing strategy called Local Reference Feature Transfer (LRFT), which employs the local feature to compensate for the color loss automatically. Specifically, we design a dedicated reference image by fusing the detail, salience, and uniform grayscale images of the raw image that ensures a balanced chromaticity distribution. Subsequently, we employ the local reference feature transfer strategy to migrate the local mean and variance of the reference image to the raw image to get a color-corrected image. Extensive evaluation experiments demonstrate that our proposed LRFT method has good preprocessing performance for the subsequent enhancement of images of different degradation types. The code is publicly available at: <span><span>https://www.researchgate.net/publication/383528251_2024-LRFT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 330-336"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Label-noise learning via uncertainty-aware neighborhood sample selection 通过不确定性感知邻域样本选择进行标签噪声学习
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.012
Yiliang Zhang, Yang Lu, Hanzi Wang
Existing deep learning methods often require a large amount of high-quality labeled data. Yet, the presence of noisy labels in the real-world training data seriously affects the generalization ability of the model. Sample selection techniques, the current dominant approach to mitigating the effects of noisy labels on models, use the consistency of sample predictions and observed labels to make clean selections. However, these methods rely heavily on the accuracy of the sample predictions and inevitably suffer when the model predictions are unstable. To address these issues, we propose an uncertainty-aware neighborhood sample selection method. Especially, it calibrates for sample prediction by neighbor prediction and reassigns model attention to the selected samples based on sample uncertainty. By alleviating the influence of prediction bias on sample selection and avoiding the occurrence of prediction bias, our proposed method achieves excellent performance in extensive experiments. In particular, we achieved an average of 5% improvement in asymmetric noise scenarios.
现有的深度学习方法通常需要大量高质量的标记数据。然而,真实世界训练数据中存在的噪声标签会严重影响模型的泛化能力。样本选择技术是目前减轻噪声标签对模型影响的主流方法,它利用样本预测和观察到的标签的一致性来进行干净的选择。然而,这些方法在很大程度上依赖于样本预测的准确性,当模型预测不稳定时,这些方法不可避免地会受到影响。为了解决这些问题,我们提出了一种不确定性感知邻域样本选择方法。特别是,它通过邻域预测对样本预测进行校准,并根据样本的不确定性重新分配模型对所选样本的关注度。通过减轻预测偏差对样本选择的影响和避免预测偏差的发生,我们提出的方法在大量实验中取得了优异的性能。特别是在非对称噪声场景下,我们平均提高了 5%。
{"title":"Label-noise learning via uncertainty-aware neighborhood sample selection","authors":"Yiliang Zhang,&nbsp;Yang Lu,&nbsp;Hanzi Wang","doi":"10.1016/j.patrec.2024.09.012","DOIUrl":"10.1016/j.patrec.2024.09.012","url":null,"abstract":"<div><div>Existing deep learning methods often require a large amount of high-quality labeled data. Yet, the presence of noisy labels in the real-world training data seriously affects the generalization ability of the model. Sample selection techniques, the current dominant approach to mitigating the effects of noisy labels on models, use the consistency of sample predictions and observed labels to make clean selections. However, these methods rely heavily on the accuracy of the sample predictions and inevitably suffer when the model predictions are unstable. To address these issues, we propose an uncertainty-aware neighborhood sample selection method. Especially, it calibrates for sample prediction by neighbor prediction and reassigns model attention to the selected samples based on sample uncertainty. By alleviating the influence of prediction bias on sample selection and avoiding the occurrence of prediction bias, our proposed method achieves excellent performance in extensive experiments. In particular, we achieved an average of 5% improvement in asymmetric noise scenarios.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 191-197"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142433534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1