Pattern Recognition Letters最新文献_第3页

DDOWOD: DiffusionDet for open-world object detection DDOWOD：用于开放世界物体检测的 DiffusionDet

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.002

Jiaqi Fan , Enming Zhang , Ying Wei , Yuefeng Wang , Jiakun Xia , Junwei Liu , Xinghong Liu , Shuailei Ma

Open-world object detection (OWOD) poses a significant challenge in computer vision, requiring models to detect unknown objects and incrementally learn new categories. To explore this field, we propose the DDOWOD based on the DiffusionDet. It is more likely to cover unknown objects hidden in the background and can reduce the model’s bias towards known class objects during training due to its ability to randomly generate boxes and reconstruct the characteristics of the GT from them. Also, to improve the insufficient quality of pseudo-labels which leads to reduced accuracy in recognizing unknown classes, we use the Segment Anything Model (SAM) as the teacher model in distillation learning to endow DDOWOD with rich visual knowledge. Surprisingly, compared to other existing models, our DDOWOD is more suitable for using SAM as the teacher. Furthermore, we proposed the Stepwise distillation (SD) which is a new incremental learning method specialized for our DDOWOD to avoid catastrophic forgetting during the training. Our approach utilizes all previously trained models from past tasks rather than solely relying on the last one. DDOWOD has achieved excellent performance. U-Recall is 53.2, 51.5, 50.7 in OWOD split and U-AP is 21.9 in IntensiveSet.

开放世界物体检测（OWOD）是计算机视觉领域的一项重大挑战，需要模型检测未知物体并逐步学习新的类别。为了探索这一领域，我们提出了基于 DiffusionDet 的 DDOWOD，它更有可能覆盖隐藏在背景中的未知物体，并能随机生成方框并从中重建 GT 的特征，从而减少模型在训练过程中对已知类别物体的偏差。同时，为了改善伪标签质量不足导致识别未知类别准确率降低的问题，我们在蒸馏学习中使用了 "任意分段模型"（Segment Anything Model，SAM）作为教师模型，为 DDOWOD 赋予丰富的视觉知识。令人惊讶的是，与其他现有模型相比，我们的 DDOWOD 更适合使用 SAM 作为教师模型。此外，我们还提出了逐步蒸馏法（SD），这是一种新的增量学习方法，专门用于我们的 DDOWOD，以避免训练过程中的灾难性遗忘。我们的方法利用了以往任务中所有训练过的模型，而不是仅仅依赖于最后一个模型。DDOWOD 取得了优异的性能。在 OWOD 分案中，U-Recall 为 53.2，51.5，50.7；在 IntensiveSet 中，U-AP 为 21.9。

{"title":"DDOWOD: DiffusionDet for open-world object detection","authors":"Jiaqi Fan , Enming Zhang , Ying Wei , Yuefeng Wang , Jiakun Xia , Junwei Liu , Xinghong Liu , Shuailei Ma","doi":"10.1016/j.patrec.2024.10.002","DOIUrl":"10.1016/j.patrec.2024.10.002","url":null,"abstract":"<div><div>Open-world object detection (OWOD) poses a significant challenge in computer vision, requiring models to detect unknown objects and incrementally learn new categories. To explore this field, we propose the DDOWOD based on the DiffusionDet. It is more likely to cover unknown objects hidden in the background and can reduce the model’s bias towards known class objects during training due to its ability to randomly generate boxes and reconstruct the characteristics of the GT from them. Also, to improve the insufficient quality of pseudo-labels which leads to reduced accuracy in recognizing unknown classes, we use the Segment Anything Model (SAM) as the teacher model in distillation learning to endow DDOWOD with rich visual knowledge. Surprisingly, compared to other existing models, our DDOWOD is more suitable for using SAM as the teacher. Furthermore, we proposed the Stepwise distillation (SD) which is a new incremental learning method specialized for our DDOWOD to avoid catastrophic forgetting during the training. Our approach utilizes all previously trained models from past tasks rather than solely relying on the last one. DDOWOD has achieved excellent performance. U-Recall is 53.2, 51.5, 50.7 in OWOD split and U-AP is 21.9 in IntensiveSet.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 170-177"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pseudo-label refinement via hierarchical contrastive learning for source-free unsupervised domain adaptation 通过分层对比学习完善伪标签，实现无源无监督领域适配

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.006

Deng Li , Jianguang Zhang , Kunhong Wu , Yucheng Shi , Yahong Han

Source-free unsupervised domain adaptation aims to adapt a source model to an unlabeled target domain without accessing the source data due to privacy considerations. Existing works mainly solve the problem by self-training methods and representation learning. However, these works typically learn the representation on a single semantic level and barely exploit the rich hierarchical semantic information to obtain clear decision boundaries, which makes it hard for these methods to achieve satisfactory generalization performance. In this paper, we propose a novel hierarchical contrastive domain adaptation algorithm that exploits self-supervised contrastive learning on both fine-grained instances and coarse-grained cluster semantics. On the one hand, we propose an adaptive prototype pseudo-labeling strategy to obtain much more reliable labels. On the other hand, we propose hierarchical contrastive representation learning on both fine-grained instance-wise level and coarse-grained cluster level to reduce the negative effect of label noise and stabilize the whole training procedure. Extensive experiments are conducted on primary unsupervised domain adaptation benchmark datasets, and the results demonstrate the effectiveness of the proposed method.

无源无监督领域适配旨在将源模型适配到未标记的目标领域，而无需访问源数据（出于隐私考虑）。现有研究主要通过自我训练方法和表征学习来解决这一问题。然而，这些工作通常是在单一语义层次上学习表示，几乎不利用丰富的分层语义信息来获得清晰的决策边界，这使得这些方法难以达到令人满意的泛化性能。在本文中，我们提出了一种新颖的分层对比领域适应算法，该算法同时利用了细粒度实例和粗粒度聚类语义的自监督对比学习。一方面，我们提出了一种自适应原型伪标签策略，以获得更可靠的标签。另一方面，我们提出了在细粒度实例层面和粗粒度聚类层面进行分层对比表示学习的方法，以减少标签噪声的负面影响并稳定整个训练过程。我们在主要的无监督领域适应基准数据集上进行了广泛的实验，结果证明了所提方法的有效性。

{"title":"Pseudo-label refinement via hierarchical contrastive learning for source-free unsupervised domain adaptation","authors":"Deng Li , Jianguang Zhang , Kunhong Wu , Yucheng Shi , Yahong Han","doi":"10.1016/j.patrec.2024.10.006","DOIUrl":"10.1016/j.patrec.2024.10.006","url":null,"abstract":"<div><div>Source-free unsupervised domain adaptation aims to adapt a source model to an unlabeled target domain without accessing the source data due to privacy considerations. Existing works mainly solve the problem by self-training methods and representation learning. However, these works typically learn the representation on a single semantic level and barely exploit the rich hierarchical semantic information to obtain clear decision boundaries, which makes it hard for these methods to achieve satisfactory generalization performance. In this paper, we propose a novel hierarchical contrastive domain adaptation algorithm that exploits self-supervised contrastive learning on both fine-grained instances and coarse-grained cluster semantics. On the one hand, we propose an adaptive prototype pseudo-labeling strategy to obtain much more reliable labels. On the other hand, we propose hierarchical contrastive representation learning on both fine-grained instance-wise level and coarse-grained cluster level to reduce the negative effect of label noise and stabilize the whole training procedure. Extensive experiments are conducted on primary unsupervised domain adaptation benchmark datasets, and the results demonstrate the effectiveness of the proposed method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 236-242"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Measuring student behavioral engagement using histogram of actions 利用行动直方图衡量学生的行为参与度

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.11.002

Ahmed Abdelkawy , Aly Farag , Islam Alkabbany , Asem Ali , Chris Foreman , Thomas Tretter , Nicholas Hindy

In this work, we propose a novel method for assessing students’ behavioral engagement by representing student’s actions and their frequencies over an arbitrary time interval as a histogram of actions. This histogram and the student’s gaze are utilized as input to a classifier that determines whether the student is engaged or not. For action recognition, we use students’ skeletons to model their postures and upper body movements. To learn the dynamics of a student’s upper body, a 3D-CNN model is developed. The trained 3D-CNN model recognizes actions within every 2-minute video segment then these actions are used to build the histogram of actions. To evaluate the proposed framework, we build a dataset consisting of 1414 video segments annotated with 13 actions and 963 2-minute video segments annotated with two engagement levels. Experimental results indicate that student actions can be recognized with top-1 accuracy 86.32% and the proposed framework can capture the average engagement of the class with a 90% F1-score.

在这项工作中，我们提出了一种评估学生行为参与度的新方法，即将学生在任意时间间隔内的动作及其频率表示为动作直方图。该直方图和学生的注视被用作分类器的输入，由分类器判断学生是否参与。在动作识别方面，我们使用学生的骨骼来模拟他们的姿势和上半身动作。为了学习学生上半身的动态，我们开发了一个 3D-CNN 模型。经过训练的 3D-CNN 模型可识别每 2 分钟视频片段中的动作，然后利用这些动作建立动作直方图。为了评估所提出的框架，我们建立了一个数据集，其中包括 1414 个标注了 13 个动作的视频片段和 963 个标注了两个参与度的 2 分钟视频片段。实验结果表明，学生动作的识别准确率最高可达 86.32%，建议的框架可以捕捉全班学生的平均参与度，F1 分数高达 90%。

{"title":"Measuring student behavioral engagement using histogram of actions","authors":"Ahmed Abdelkawy , Aly Farag , Islam Alkabbany , Asem Ali , Chris Foreman , Thomas Tretter , Nicholas Hindy","doi":"10.1016/j.patrec.2024.11.002","DOIUrl":"10.1016/j.patrec.2024.11.002","url":null,"abstract":"<div><div>In this work, we propose a novel method for assessing students’ behavioral engagement by representing student’s actions and their frequencies over an arbitrary time interval as a histogram of actions. This histogram and the student’s gaze are utilized as input to a classifier that determines whether the student is engaged or not. For action recognition, we use students’ skeletons to model their postures and upper body movements. To learn the dynamics of a student’s upper body, a 3D-CNN model is developed. The trained 3D-CNN model recognizes actions within every 2-minute video segment then these actions are used to build the histogram of actions. To evaluate the proposed framework, we build a dataset consisting of 1414 video segments annotated with 13 actions and 963 2-minute video segments annotated with two engagement levels. Experimental results indicate that student actions can be recognized with top-1 accuracy 86.32% and the proposed framework can capture the average engagement of the class with a 90% F1-score.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 337-344"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design of a differentiable L-1 norm for pattern recognition and machine learning 设计用于模式识别和机器学习的可微分 L-1 准则

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.020

Min Zhang , Yiming Wang , Hongyu Chen , Taihao Li , Shupeng Liu , Xianfeng Gu , Xiaoyin Xu

In various applications of pattern recognition, feature selection, and machine learning, L-1 norm is used as either an objective function or a regularizer. Mathematically, L-1 norm has unique characteristics that make it attractive in machine learning, feature selection, optimization, and regression. Computationally, however, L-1 norm presents a hurdle as it is non-differentiable, making the process of finding a solution difficult. Existing approach therefore relies on numerical approaches. In this work we designed an L-1 norm that is differentiable and, thus, has an analytical solution. The differentiable L-1 norm removes the absolute sign in the conventional definition and is everywhere differentiable. The new L-1 norm is almost everywhere linear, a desirable feature that is also present in the conventional L-1 norm. The only limitation of the new L-1 norm is that near zero, its behavior is not linear, hence we consider the new L-1 norm quasi-linear. Being differentiable, the new L-1 norm and its quasi-linear variation make them amenable to analytic solutions. Hence, it can facilitate the development and implementation of many algorithms involving L-1 norm. Our tests validate the capability of the new L-1 norm in various applications.

在模式识别、特征选择和机器学习的各种应用中，L-1 准则被用作目标函数或正则表达式。在数学上，L-1 准则具有独特的特性，这使它在机器学习、特征选择、优化和回归中具有吸引力。然而，在计算上，L-1 准则是一个障碍，因为它是无差别的，使得寻找解决方案的过程变得困难。因此，现有方法依赖于数值方法。在这项工作中，我们设计了一种 L-1 准则，它是可微分的，因此有一个解析解。可微分的 L-1 准则去掉了传统定义中的绝对符号，在任何地方都是可微分的。新的 L-1 准则几乎到处都是线性的，这也是传统 L-1 准则的一个理想特征。新 L-1 准则的唯一限制是，在零点附近，它的行为不是线性的，因此我们认为新 L-1 准则是准线性的。由于可微分，新 L-1 准则及其准线性变化使它们易于分析求解。因此，它可以促进许多涉及 L-1 准则的算法的开发和实施。我们的测试验证了新 L-1 准则在各种应用中的能力。

{"title":"Design of a differentiable L-1 norm for pattern recognition and machine learning","authors":"Min Zhang , Yiming Wang , Hongyu Chen , Taihao Li , Shupeng Liu , Xianfeng Gu , Xiaoyin Xu","doi":"10.1016/j.patrec.2024.09.020","DOIUrl":"10.1016/j.patrec.2024.09.020","url":null,"abstract":"<div><div>In various applications of pattern recognition, feature selection, and machine learning, L-1 norm is used as either an objective function or a regularizer. Mathematically, L-1 norm has unique characteristics that make it attractive in machine learning, feature selection, optimization, and regression. Computationally, however, L-1 norm presents a hurdle as it is non-differentiable, making the process of finding a solution difficult. Existing approach therefore relies on numerical approaches. In this work we designed an L-1 norm that is differentiable and, thus, has an analytical solution. The differentiable L-1 norm removes the absolute sign in the conventional definition and is everywhere differentiable. The new L-1 norm is almost everywhere linear, a desirable feature that is also present in the conventional L-1 norm. The only limitation of the new L-1 norm is that near zero, its behavior is not linear, hence we consider the new L-1 norm quasi-linear. Being differentiable, the new L-1 norm and its quasi-linear variation make them amenable to analytic solutions. Hence, it can facilitate the development and implementation of many algorithms involving L-1 norm. Our tests validate the capability of the new L-1 norm in various applications.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 126-132"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online probabilistic knowledge distillation on cryptocurrency trading using Deep Reinforcement Learning 利用深度强化学习对加密货币交易进行在线概率知识提炼

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.005

Vasileios Moustakidis , Nikolaos Passalis , Anastasios Tefas

Leveraging Deep Reinforcement Learning (DRL) for training agents for financial trading has gained significant attention in recent years. However, training these agents in noisy financial environments remains challenging and unstable, significantly impacting their performance as trading agents, as the recent literature has also showcased. This paper introduces a novel distillation method for DRL agents, aiming to improve the training stability of DRL agents. The proposed method transfers knowledge from a teacher ensemble to a student model, incorporating both the action probability distribution knowledge from the output layer, as well as the knowledge from the intermediate layers of the teacher’s network. Furthermore, the proposed method also works in an online fashion, allowing for eliminating the separate teacher training process typically involved in many DRL distillation pipelines, simplifying the distillation process. The proposed method is extensively evaluated on a large-scale cryptocurrency trading setup, demonstrating its ability to both lead to significant improvements in trading accuracy and obtained profit, as well as increase the stability of the training process.

近年来，利用深度强化学习（DRL）来训练金融交易代理已受到广泛关注。然而，在嘈杂的金融环境中训练这些代理仍具有挑战性和不稳定性，极大地影响了它们作为交易代理的性能，最近的文献也证明了这一点。本文为 DRL 代理引入了一种新颖的蒸馏方法，旨在提高 DRL 代理的训练稳定性。所提出的方法将知识从教师集合转移到学生模型，既包含输出层的行动概率分布知识，也包含教师网络中间层的知识。此外，所提出的方法还能以在线方式工作，从而省去了许多 DRL 提炼管道中通常涉及的单独教师培训过程，简化了提炼过程。我们在大规模加密货币交易设置上对所提出的方法进行了广泛评估，证明该方法既能显著提高交易准确性和利润，又能提高训练过程的稳定性。

{"title":"Online probabilistic knowledge distillation on cryptocurrency trading using Deep Reinforcement Learning","authors":"Vasileios Moustakidis , Nikolaos Passalis , Anastasios Tefas","doi":"10.1016/j.patrec.2024.10.005","DOIUrl":"10.1016/j.patrec.2024.10.005","url":null,"abstract":"<div><div>Leveraging Deep Reinforcement Learning (DRL) for training agents for financial trading has gained significant attention in recent years. However, training these agents in noisy financial environments remains challenging and unstable, significantly impacting their performance as trading agents, as the recent literature has also showcased. This paper introduces a novel distillation method for DRL agents, aiming to improve the training stability of DRL agents. The proposed method transfers knowledge from a teacher ensemble to a student model, incorporating both the action probability distribution knowledge from the output layer, as well as the knowledge from the intermediate layers of the teacher’s network. Furthermore, the proposed method also works in an online fashion, allowing for eliminating the separate teacher training process typically involved in many DRL distillation pipelines, simplifying the distillation process. The proposed method is extensively evaluated on a large-scale cryptocurrency trading setup, demonstrating its ability to both lead to significant improvements in trading accuracy and obtained profit, as well as increase the stability of the training process.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 243-249"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Explainable hypergraphs for gait based Parkinson classification 基于帕金森病步态分类的可解释超图

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.026

Anirban Dutta Choudhury , Ananda S. Chowdhury

Parkinson Disease (PD) classification using Vertical Ground Reaction Force (VGRF) sensors can help in unobtrusive detection and monitoring of PD patients. State-of-the-art (SOTA) research in PD classification reveals that Deep Learning (DL), at the expense of explainability, performs better than Shallow Learning (SL). In this paper, we introduce a novel explainable weighted hypergraph, where the interconnections of the SOTA features are exploited, leading to more discriminative derived features, and thereby, forming an SL arm. In parallel, we create a DL arm consisting of ResNet architecture to learn the spatio-temporal patterns of the VGRF signals. Probabilities of PD classification scores from the SL and the DL arms are adaptively fused to create a hybrid pipeline. The pipeline achieves an AUC value of 0.979 on the Physionet Parkinson Dataset. This AUC value is found to be superior to the SL as well as the DL arm used in isolation, yielding respective AUCs of 0.878 and 0.852. The proposed pipeline demonstrates explainability through improved permutation feature importance and contrasting examples of use cases, where incorrect misclassification of the DL arm gets rectified by the SL arm and vice versa. We further demonstrate that our solution achieves comparable performance with SOTA methods. To the best of our knowledge, this is the first approach to analyze PD classification with a hypergraph based xAI (Explainable Artificial Intelligence).

使用垂直地面反作用力（VGRF）传感器对帕金森病（PD）进行分类有助于对帕金森病患者进行非侵入式检测和监测。帕金森病分类的最新研究表明，深度学习（DL）以牺牲可解释性为代价，比浅层学习（SL）表现更好。在本文中，我们引入了一种新颖的可解释加权超图，利用 SOTA 特征之间的相互联系，得出更具区分性的衍生特征，从而形成 SL 臂。同时，我们创建了一个由 ResNet 架构组成的 DL 臂，以学习 VGRF 信号的时空模式。来自 SL 和 DL 臂的 PD 分类得分概率被自适应地融合在一起，以创建一个混合管道。该管道在 Physionet 帕金森数据集上的 AUC 值为 0.979。该AUC值优于单独使用的SL和DL臂，前者的AUC值分别为0.878和0.852。所提出的管道通过改进的置换特征重要性和使用案例的对比实例证明了其可解释性，在这些案例中，DL臂的错误分类会被SL臂纠正，反之亦然。我们进一步证明，我们的解决方案与 SOTA 方法的性能相当。据我们所知，这是第一种利用基于超图的 xAI（可解释人工智能）分析 PD 分类的方法。

{"title":"Explainable hypergraphs for gait based Parkinson classification","authors":"Anirban Dutta Choudhury , Ananda S. Chowdhury","doi":"10.1016/j.patrec.2024.09.026","DOIUrl":"10.1016/j.patrec.2024.09.026","url":null,"abstract":"<div><div>Parkinson Disease (PD) classification using Vertical Ground Reaction Force (VGRF) sensors can help in unobtrusive detection and monitoring of PD patients. State-of-the-art (SOTA) research in PD classification reveals that Deep Learning (DL), at the expense of explainability, performs better than Shallow Learning (SL). In this paper, we introduce a novel explainable weighted hypergraph, where the interconnections of the SOTA features are exploited, leading to more discriminative derived features, and thereby, forming an SL arm. In parallel, we create a DL arm consisting of ResNet architecture to learn the spatio-temporal patterns of the VGRF signals. Probabilities of PD classification scores from the SL and the DL arms are adaptively fused to create a hybrid pipeline. The pipeline achieves an AUC value of 0.979 on the Physionet Parkinson Dataset. This AUC value is found to be superior to the SL as well as the DL arm used in isolation, yielding respective AUCs of 0.878 and 0.852. The proposed pipeline demonstrates explainability through improved permutation feature importance and contrasting examples of use cases, where incorrect misclassification of the DL arm gets rectified by the SL arm and vice versa. We further demonstrate that our solution achieves comparable performance with SOTA methods. To the best of our knowledge, this is the first approach to analyze PD classification with a hypergraph based xAI (Explainable Artificial Intelligence).</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 1-7"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive feature alignment for adversarial training 对抗训练的自适应特征对齐

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.004

Kai Zhao , Tao Wang , Ruixin Zhang , Wei Shen

Recent studies reveal that Convolutional Neural Networks (CNNs) are typically vulnerable to adversarial attacks. Many adversarial defense methods have been proposed to improve the robustness against adversarial samples. Moreover, these methods can only defend adversarial samples of a specific strength, reducing their flexibility against attacks of varying strengths. Moreover, these methods often enhance adversarial robustness at the expense of accuracy on clean samples. In this paper, we first observed that features of adversarial images change monotonically and smoothly w.r.t the rising of attacking strength. This intriguing observation suggests that features of adversarial images with various attacking strengths can be approximated by interpolating between the features of adversarial images with the strongest and weakest attacking strengths. Due to the monotonicity property, the interpolation weight can be easily learned by a neural network. Based on the observation, we proposed the adaptive feature alignment (AFA) that automatically align features to defense adversarial attacks of various attacking strengths. During training, our method learns the statistical information of adversarial samples with various attacking strengths using a dual batchnorm architecture. In this architecture, each batchnorm process handles samples of a specific attacking strength. During inference, our method automatically adjusts to varying attacking strengths by linearly interpolating the dual-BN features. Unlike previous methods that need to either retrain the model or manually tune hyper-parameters for a new attacking strength, our method can deal with arbitrary attacking strengths with a single model without introducing any hyper-parameter. Additionally, our method improves the model robustness against adversarial samples without incurring much loss of accuracy on clean images. Experiments on CIFAR-10, SVHN and tiny-ImageNet datasets demonstrate that our method outperforms the state-of-the-art under various attacking strengths and even improve accuracy on clean samples. Code will be made open available upon acceptance.

最近的研究表明，卷积神经网络（CNN）通常很容易受到对抗性攻击。人们提出了许多对抗性防御方法，以提高对抗性样本的鲁棒性。而且，这些方法只能防御特定强度的对抗样本，从而降低了其应对不同强度攻击的灵活性。而且，这些方法往往以牺牲对干净样本的准确性为代价来增强对抗性鲁棒性。在本文中，我们首先观察到，随着攻击强度的上升，对抗图像的特征会发生单调而平滑的变化。这一有趣的观察结果表明，不同攻击强度的对抗图像的特征可以通过在攻击强度最强和攻击强度最弱的对抗图像的特征之间进行插值来近似。由于单调性特性，神经网络可以很容易地学习插值权重。基于这一观察结果，我们提出了自适应特征对齐（AFA）方法，它能自动对齐特征以防御不同攻击强度的对抗攻击。在训练过程中，我们的方法使用双批次规范架构来学习具有不同攻击强度的对抗样本的统计信息。在该架构中，每个批规范过程处理特定攻击强度的样本。在推理过程中，我们的方法通过线性插值双 BN 特征，自动适应不同的攻击强度。以往的方法需要针对新的攻击强度重新训练模型或手动调整超参数，而我们的方法与之不同，无需引入任何超参数，只需一个模型即可处理任意攻击强度。此外，我们的方法还提高了模型对敌对样本的鲁棒性，而不会对干净图像造成太大的精度损失。在 CIFAR-10、SVHN 和 tiny-ImageNet 数据集上的实验表明，在各种攻击强度下，我们的方法都优于最先进的方法，甚至提高了对干净样本的准确性。代码一经接受将公开发布。

{"title":"Adaptive feature alignment for adversarial training","authors":"Kai Zhao , Tao Wang , Ruixin Zhang , Wei Shen","doi":"10.1016/j.patrec.2024.10.004","DOIUrl":"10.1016/j.patrec.2024.10.004","url":null,"abstract":"<div><div>Recent studies reveal that Convolutional Neural Networks (CNNs) are typically vulnerable to adversarial attacks. Many adversarial defense methods have been proposed to improve the robustness against adversarial samples. Moreover, these methods can only defend adversarial samples of a specific strength, reducing their flexibility against attacks of varying strengths. Moreover, these methods often enhance adversarial robustness at the expense of accuracy on clean samples. In this paper, we first observed that features of adversarial images change monotonically and smoothly w.r.t the rising of attacking strength. This intriguing observation suggests that features of adversarial images with various attacking strengths can be approximated by interpolating between the features of adversarial images with the strongest and weakest attacking strengths. Due to the monotonicity property, the interpolation weight can be easily learned by a neural network. Based on the observation, we proposed the adaptive feature alignment (AFA) that automatically align features to defense adversarial attacks of various attacking strengths. During training, our method learns the statistical information of adversarial samples with various attacking strengths using a dual batchnorm architecture. In this architecture, each batchnorm process handles samples of a specific attacking strength. During inference, our method automatically adjusts to varying attacking strengths by linearly interpolating the dual-BN features. Unlike previous methods that need to either retrain the model or manually tune hyper-parameters for a new attacking strength, our method can deal with arbitrary attacking strengths with a single model without introducing any hyper-parameter. Additionally, our method improves the model robustness against adversarial samples without incurring much loss of accuracy on clean images. Experiments on CIFAR-10, SVHN and tiny-ImageNet datasets demonstrate that our method outperforms the state-of-the-art under various attacking strengths and even improve accuracy on clean samples. Code will be made open available upon acceptance.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 184-190"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discrete diffusion models with Refined Language-Image Pre-trained representations for remote sensing image captioning 利用精炼语言-图像预训练表示的离散扩散模型为遥感图像添加标题

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.019

Guannan Leng , Yu-Jie Xiong , Chunping Qiu , Congzhou Guo

RS image captioning (RSIC) utilizes natural language to provide a description of image content, assisting in the comprehension of object properties and relationships. Nonetheless, RS images are characterized by variations in object scales, distributions, and quantities, which make it challenging to obtain global semantic information and object connections. To enhance the accuracy of captions produced from RS images, this paper proposes a novel method referred to as Discrete Diffusion Models with Refined Language-Image Pre-trained representations (DDM-RLIP), leveraging an advanced discrete diffusion model (DDM) for nosing and denoising text tokens. DDM-RLIP is based on an advanced DDM-based method designed for natural pictures. The primary approach for refining image representations involves fine-tuning a CLIP image encoder on RS images, followed by adapting the transformer with an additional attention module to focus on crucial image regions and relevant words. Furthermore, experiments were conducted on three datasets, Sydney-Captions, UCM-Captions, and NWPU-Captions, and the results demonstrated the superior performance of the proposed method compared to conventional autoregressive models. On the NWPU-Captions dataset, the CIDEr score improved from 116.4 to 197.7, further validating the efficacy and potential of DDM-RLIP. The implementation codes for our approach DDM-RLIP are available at https://github.com/Leng-bingo/DDM-RLIP.

RS 图像标题（RSIC）利用自然语言对图像内容进行描述，有助于理解对象的属性和关系。然而，RS 图像的特点是物体的比例、分布和数量各不相同，这给获取全局语义信息和物体之间的联系带来了挑战。为了提高根据 RS 图像制作的字幕的准确性，本文提出了一种称为 "具有精炼语言图像预训练表示的离散扩散模型（DDM-RLIP）"的新方法，利用先进的离散扩散模型（DDM）对文本标记进行定点和去噪。DDM-RLIP 基于先进的 DDM 方法，专为自然图片而设计。完善图像表征的主要方法包括在 RS 图像上微调 CLIP 图像编码器，然后利用附加的注意力模块调整变换器，以聚焦于关键图像区域和相关单词。此外，我们还在 Sydney-Captions、UCM-Captions 和 NWPU-Captions 这三个数据集上进行了实验，结果表明与传统的自回归模型相比，所提出的方法具有更优越的性能。在 NWPU-Captions 数据集上，CIDEr 分数从 116.4 分提高到 197.7 分，进一步验证了 DDM-RLIP 的功效和潜力。我们的方法 DDM-RLIP 的实现代码见 https://github.com/Leng-bingo/DDM-RLIP。

{"title":"Discrete diffusion models with Refined Language-Image Pre-trained representations for remote sensing image captioning","authors":"Guannan Leng , Yu-Jie Xiong , Chunping Qiu , Congzhou Guo","doi":"10.1016/j.patrec.2024.09.019","DOIUrl":"10.1016/j.patrec.2024.09.019","url":null,"abstract":"<div><div>RS image captioning (RSIC) utilizes natural language to provide a description of image content, assisting in the comprehension of object properties and relationships. Nonetheless, RS images are characterized by variations in object scales, distributions, and quantities, which make it challenging to obtain global semantic information and object connections. To enhance the accuracy of captions produced from RS images, this paper proposes a novel method referred to as Discrete Diffusion Models with Refined Language-Image Pre-trained representations (DDM-RLIP), leveraging an advanced discrete diffusion model (DDM) for nosing and denoising text tokens. DDM-RLIP is based on an advanced DDM-based method designed for natural pictures. The primary approach for refining image representations involves fine-tuning a CLIP image encoder on RS images, followed by adapting the transformer with an additional attention module to focus on crucial image regions and relevant words. Furthermore, experiments were conducted on three datasets, Sydney-Captions, UCM-Captions, and NWPU-Captions, and the results demonstrated the superior performance of the proposed method compared to conventional autoregressive models. On the NWPU-Captions dataset, the CIDEr score improved from 116.4 to 197.7, further validating the efficacy and potential of DDM-RLIP. The implementation codes for our approach DDM-RLIP are available at <span><span>https://github.com/Leng-bingo/DDM-RLIP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 164-169"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rethinking unsupervised domain adaptation for semantic segmentation 反思语义分割的无监督领域适应性

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.022

Zhijie Wang , Masanori Suganuma , Takayuki Okatani

Unsupervised domain adaptation (UDA) adapts a model trained on one domain (called source) to a novel domain (called target) using only unlabeled data. Due to its high annotation cost, researchers have developed many UDA methods for semantic segmentation, which assume no labeled sample is available in the target domain. We question the practicality of this assumption for two reasons. First, after training a model with a UDA method, we must somehow verify the model before deployment. Second, UDA methods have at least a few hyper-parameters that need to be determined. The surest solution to these is to evaluate the model using validation data, i.e., a certain amount of labeled target-domain samples. This question about the basic assumption of UDA leads us to rethink UDA from a data-centric point of view. Specifically, we assume we have access to a minimum level of labeled data. Then, we ask how much is necessary to find good hyper-parameters of existing UDA methods. We then consider what if we use the same data for supervised training of the same model, e.g., finetuning. We conducted experiments to answer these questions with popular scenarios, {GTA5, SYNTHIA}

\to

Cityscapes. We found that i) choosing good hyper-parameters needs only a few labeled images for some UDA methods whereas a lot more for others; and ii) simple finetuning works surprisingly well; it outperforms many UDA methods if only several dozens of labeled images are available.

无监督领域适应（UDA）是指仅使用无标注数据，将在一个领域（称为源领域）训练的模型适应到一个新领域（称为目标领域）。由于标注成本较高，研究人员开发了许多用于语义分割的 UDA 方法，这些方法假定目标领域中没有标注样本。我们对这一假设的实用性提出质疑，原因有二。首先，使用 UDA 方法训练模型后，我们必须在部署前以某种方式验证模型。其次，UDA 方法至少有几个超参数需要确定。最可靠的解决方案是使用验证数据（即一定量的标注目标域样本）对模型进行评估。这个关于 UDA 基本假设的问题引导我们从以数据为中心的角度重新思考 UDA。具体来说，我们假设我们可以获得最低水平的标注数据。然后，我们会问需要多少数据才能找到现有 UDA 方法的超参数。然后，我们考虑如果使用相同的数据对同一模型进行监督训练，例如微调，会有什么结果。为了回答这些问题，我们在{GTA5, SYNTHIA}→Cityscapes 等流行场景中进行了实验。我们发现：i）对于某些 UDA 方法来说，选择好的超参数只需要几张标注图像，而对于其他方法来说，则需要更多的标注图像；ii）简单的微调效果出奇的好；如果只有几十张标注图像，微调效果会优于许多 UDA 方法。

{"title":"Rethinking unsupervised domain adaptation for semantic segmentation","authors":"Zhijie Wang , Masanori Suganuma , Takayuki Okatani","doi":"10.1016/j.patrec.2024.09.022","DOIUrl":"10.1016/j.patrec.2024.09.022","url":null,"abstract":"<div><div>Unsupervised domain adaptation (UDA) adapts a model trained on one domain (called source) to a novel domain (called target) using only unlabeled data. Due to its high annotation cost, researchers have developed many UDA methods for semantic segmentation, which assume no labeled sample is available in the target domain. We question the practicality of this assumption for two reasons. First, after training a model with a UDA method, we must somehow verify the model before deployment. Second, UDA methods have at least a few hyper-parameters that need to be determined. The surest solution to these is to evaluate the model using validation data, i.e., a certain amount of labeled target-domain samples. This question about the basic assumption of UDA leads us to rethink UDA from a data-centric point of view. Specifically, we assume we have access to a minimum level of labeled data. Then, we ask how much is necessary to find good hyper-parameters of existing UDA methods. We then consider what if we use the same data for supervised training of the same model, e.g., finetuning. We conducted experiments to answer these questions with popular scenarios, {GTA5, SYNTHIA}<span><math><mo>→</mo></math></span>Cityscapes. We found that i) choosing good hyper-parameters needs only a few labeled images for some UDA methods whereas a lot more for others; and ii) simple finetuning works surprisingly well; it outperforms many UDA methods if only several dozens of labeled images are available.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 119-125"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Motion-guided small MAV detection in complex and non-planar scenes 复杂和非平面场景中的小型无人飞行器运动导航探测

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.013

Hanqing Guo , Canlun Zheng , Shiyu Zhao

In recent years, there has been a growing interest in the visual detection of micro aerial vehicles (MAVs) due to its importance in numerous applications. However, the existing methods based on either appearance or motion features encounter difficulties when the background is complex or the MAV is too small. In this paper, we propose a novel motion-guided MAV detector that can accurately identify small MAVs in complex and non-planar scenes. This detector first exploits a motion feature enhancement module to capture the motion features of small MAVs. Then it uses multi-object tracking and trajectory filtering to eliminate false positives caused by motion parallax. Finally, an appearance-based classifier and an appearance-based detector that operates on the cropped regions are used to achieve precise detection results. Our proposed method can effectively and efficiently detect extremely small MAVs from dynamic and complex backgrounds because it aggregates pixel-level motion features and eliminates false positives based on the motion and appearance features of MAVs. Experiments on the ARD-MAV dataset demonstrate that the proposed method could achieve high performance in small MAV detection under challenging conditions and outperform other state-of-the-art methods across various metrics.

近年来，由于微型飞行器（MAV）在众多应用中的重要性，人们对其视觉检测的兴趣与日俱增。然而，当背景复杂或微型飞行器太小时，基于外观或运动特征的现有方法都会遇到困难。在本文中，我们提出了一种新颖的运动引导式无人飞行器检测器，它能在复杂和非平面场景中准确识别小型无人飞行器。该检测器首先利用运动特征增强模块捕捉小型飞行器的运动特征。然后，它使用多目标跟踪和轨迹过滤来消除运动视差造成的误报。最后，使用基于外观的分类器和基于外观的检测器对裁剪区域进行操作，以实现精确的检测结果。我们所提出的方法可以从动态复杂背景中有效、高效地检测出极小的无人飞行器，因为它可以聚合像素级运动特征，并根据无人飞行器的运动和外观特征消除误报。在 ARD-MAV 数据集上的实验表明，所提出的方法可以在具有挑战性的条件下实现高性能的小型飞行器检测，并且在各种指标上都优于其他最先进的方法。

{"title":"Motion-guided small MAV detection in complex and non-planar scenes","authors":"Hanqing Guo , Canlun Zheng , Shiyu Zhao","doi":"10.1016/j.patrec.2024.09.013","DOIUrl":"10.1016/j.patrec.2024.09.013","url":null,"abstract":"<div><div>In recent years, there has been a growing interest in the visual detection of micro aerial vehicles (MAVs) due to its importance in numerous applications. However, the existing methods based on either appearance or motion features encounter difficulties when the background is complex or the MAV is too small. In this paper, we propose a novel motion-guided MAV detector that can accurately identify small MAVs in complex and non-planar scenes. This detector first exploits a motion feature enhancement module to capture the motion features of small MAVs. Then it uses multi-object tracking and trajectory filtering to eliminate false positives caused by motion parallax. Finally, an appearance-based classifier and an appearance-based detector that operates on the cropped regions are used to achieve precise detection results. Our proposed method can effectively and efficiently detect extremely small MAVs from dynamic and complex backgrounds because it aggregates pixel-level motion features and eliminates false positives based on the motion and appearance features of MAVs. Experiments on the ARD-MAV dataset demonstrate that the proposed method could achieve high performance in small MAV detection under challenging conditions and outperform other state-of-the-art methods across various metrics.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 98-105"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0