首页 > 最新文献

Pattern Recognition最新文献

英文 中文
Beyond pillars: Advancing 3D object detection with salient voxel enhancement of liDAR-4D radar fusion 超越支柱:利用liDAR-4D雷达融合的显著体素增强推进3D目标检测
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-08 DOI: 10.1016/j.patcog.2025.112841
Pengfei Yang , Feng Wu , Minyang Liu , Ting Zhong , Fan Zhou
The fusion of LiDAR and 4D radar has emerged as a promising solution for robust and accurate 3D object detection in complex and adverse conditions. Existing methods typically rely on pillar-based representations, which, although computationally efficient, fail to provide fine-grained structural details necessary for precise object localization and recognition. In contrast, voxel-based representations offer richer spatial information but face challenges such as background noise and data quality disparity. To address these limitations, we propose SVEFusion, a voxel-based 3D object detection framework that integrates LiDAR and 4D radar data using a salient voxel enhancement mechanism. Our method introduces an adaptive feature alignment module and a novel spatial neighborhood attention module for efficient early-stage multi-modal voxel feature integration. Furthermore, we design a salient voxel enhancement mechanism that assigns higher weights to foreground voxels using a multi-scale weight prediction strategy, progressively refining weight accuracy with supervision loss. Experimental results demonstrate that SVEFusion significantly outperforms state-of-the-art methods, establishing a new benchmark in multi-modal 3D object detection. The source code and network weighting for reproducibility are available at https://github.com/icdm-adteam/SVEFusion.
激光雷达和四维雷达的融合已经成为在复杂和不利条件下进行鲁棒和精确的三维目标检测的有前途的解决方案。现有的方法通常依赖于基于柱的表示,尽管计算效率很高,但无法提供精确目标定位和识别所需的细粒度结构细节。相比之下,基于体素的表示提供了更丰富的空间信息,但面临背景噪声和数据质量差异等挑战。为了解决这些限制,我们提出了SVEFusion,这是一种基于体素的3D目标检测框架,它使用显着体素增强机制集成了LiDAR和4D雷达数据。该方法引入自适应特征对齐模块和空间邻域关注模块,实现早期多模态体素特征的高效集成。此外,我们设计了一个显著体素增强机制,该机制使用多尺度权重预测策略为前景体素分配更高的权重,在监督损失的情况下逐步优化权重精度。实验结果表明,SVEFusion显著优于现有的方法,为多模态三维目标检测建立了新的基准。源代码和可再现性的网络权重可从https://github.com/icdm-adteam/SVEFusion获得。
{"title":"Beyond pillars: Advancing 3D object detection with salient voxel enhancement of liDAR-4D radar fusion","authors":"Pengfei Yang ,&nbsp;Feng Wu ,&nbsp;Minyang Liu ,&nbsp;Ting Zhong ,&nbsp;Fan Zhou","doi":"10.1016/j.patcog.2025.112841","DOIUrl":"10.1016/j.patcog.2025.112841","url":null,"abstract":"<div><div>The fusion of LiDAR and 4D radar has emerged as a promising solution for robust and accurate 3D object detection in complex and adverse conditions. Existing methods typically rely on pillar-based representations, which, although computationally efficient, fail to provide fine-grained structural details necessary for precise object localization and recognition. In contrast, voxel-based representations offer richer spatial information but face challenges such as background noise and data quality disparity. To address these limitations, we propose SVEFusion, a voxel-based 3D object detection framework that integrates LiDAR and 4D radar data using a salient voxel enhancement mechanism. Our method introduces an adaptive feature alignment module and a novel spatial neighborhood attention module for efficient early-stage multi-modal voxel feature integration. Furthermore, we design a salient voxel enhancement mechanism that assigns higher weights to foreground voxels using a multi-scale weight prediction strategy, progressively refining weight accuracy with supervision loss. Experimental results demonstrate that SVEFusion significantly outperforms state-of-the-art methods, establishing a new benchmark in multi-modal 3D object detection. The source code and network weighting for reproducibility are available at <span><span>https://github.com/icdm-adteam/SVEFusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112841"},"PeriodicalIF":7.6,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning clique-based inter-class affinity for compositional zero-shot learning 基于学习小团体的班级间亲和力的作文零射击学习
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-08 DOI: 10.1016/j.patcog.2025.112819
Chenyi Jiang , Qiaolin Ye , Shidong Wang , Zebin Wu , Haofeng Zhang
Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions of objects and states by transferring knowledge from seen compositions. A critical omission in prior studies is the uniform penalty imposed on all incorrect compositions, ignoring their inherent affinities with the ground-truth labels. This oversight leads to severe overfitting on seen classes and impedes the discovery of genuine visual-semantic relationships. To address this, we propose Clique-based Interclass Affinity (CIA), a framework that introduces hierarchical semantic supervision by grouping compositions into affinity cliques. CIA encodes both semantic affinity and visual affinity to construct multi-level cliques. These cliques guide a one-to-many alignment between visual and semantic features, enabling the model to learn generalizable class prototypes through structured constraints, rather than treating all incorrect classes equally. Unlike prior works focusing on direct classification, CIA emphasizes unveiling intrinsic compositional structures by analyzing inter-semantic and visual relationships. Extensive experiments on MIT-States, UT-Zappos, and C-GQA demonstrate CIA’s superiority, showcasing its robustness in both closed-world and open-world settings. Our code is available at https://github.com/LanchJL/CIA-CZSL.
组合零射击学习(CZSL)旨在通过从已见组合中转移知识来识别物体和状态的新组合。在先前的研究中,一个关键的遗漏是对所有不正确的成分施加统一的惩罚,忽略了它们与基本真理标签的内在亲和力。这种疏忽导致了严重的视觉类过拟合,阻碍了真正视觉语义关系的发现。为了解决这个问题,我们提出了基于小团体的类间亲和(CIA),这是一个通过将组合物分组到亲和小团体中来引入分层语义监督的框架。CIA对语义亲和性和视觉亲和性进行编码,以构建多层次的集团。这些团块引导了视觉和语义特征之间一对多的对齐,使模型能够通过结构化约束学习可泛化的类原型,而不是平等地对待所有不正确的类。与以往的作品侧重于直接分类不同,CIA强调通过分析语义间和视觉关系来揭示内在的组成结构。在MIT-States、UT-Zappos和C-GQA上进行的大量实验证明了CIA的优势,展示了它在封闭世界和开放世界环境下的稳健性。我们的代码可在https://github.com/LanchJL/CIA-CZSL上获得。
{"title":"Learning clique-based inter-class affinity for compositional zero-shot learning","authors":"Chenyi Jiang ,&nbsp;Qiaolin Ye ,&nbsp;Shidong Wang ,&nbsp;Zebin Wu ,&nbsp;Haofeng Zhang","doi":"10.1016/j.patcog.2025.112819","DOIUrl":"10.1016/j.patcog.2025.112819","url":null,"abstract":"<div><div>Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions of objects and states by transferring knowledge from seen compositions. A critical omission in prior studies is the uniform penalty imposed on all incorrect compositions, ignoring their inherent affinities with the ground-truth labels. This oversight leads to severe overfitting on seen classes and impedes the discovery of genuine visual-semantic relationships. To address this, we propose Clique-based Interclass Affinity (CIA), a framework that introduces hierarchical semantic supervision by grouping compositions into affinity cliques. CIA encodes both semantic affinity and visual affinity to construct multi-level cliques. These cliques guide a one-to-many alignment between visual and semantic features, enabling the model to learn generalizable class prototypes through structured constraints, rather than treating all incorrect classes equally. Unlike prior works focusing on direct classification, CIA emphasizes unveiling intrinsic compositional structures by analyzing inter-semantic and visual relationships. Extensive experiments on MIT-States, UT-Zappos, and C-GQA demonstrate CIA’s superiority, showcasing its robustness in both closed-world and open-world settings. Our code is available at <span><span>https://github.com/LanchJL/CIA-CZSL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112819"},"PeriodicalIF":7.6,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Entropy-aware dynamic bias watermarking for LLM-generated emotional content llm生成情感内容的熵感知动态偏差水印
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-07 DOI: 10.1016/j.patcog.2025.112866
Dawei Xu , Xuyang Dong , Chunhai Li , Baokun Zheng , Chuan Zhang , Yilin Chen , Liehuang Zhu
The application of large language models(LLMs) in affective computing-ranging from empathetic chatbots to creative writing-has intensified the demand for distinguishing and authenticating AI-generated emotional content. Watermarking, by embedding detectable signals into the outputs of language models, offers a promising solution. However, a critical challenge persists: emotional texts often exhibit low entropy or complex spiky entropy distributions, which severely undermine the performance of existing watermarking methods. Unlike prior works that primarily treated spiky entropy as an external metric, we focus specifically on its role within the text generation process itself. To address the challenges of watermarking under low-entropy and complex entropy distributions, we propose DBW (Dynamic Bias Watermarking)-an entropy-aware watermarking algorithm for LLMs. DBW dynamically adjusts the watermarking bias in real time based on the entropy of each token. This innovation ensures a stronger watermark signal (increased green token count) in high-entropy contexts, while minimizing interference and quality degradation in fragile low-entropy emotional segments. Experimental results demonstrate that the proposed DBW algorithm outperforms the KGW watermarking method in both complex entropy distribution and low-entropy text generation scenarios. DBW achieves higher detection accuracy without sacrificing text quality. Furthermore, comparative experiments show that our proposed DBW algorithm demonstrates superior robustness under different attacks. Our work provides a reliable and adaptive tool for safeguarding emotion-AI generated content, contributing to the secure and trustworthy deployment of large-scale pre-trained models in affective computing.
大型语言模型(llm)在情感计算中的应用——从移情聊天机器人到创意写作——加剧了对区分和验证人工智能生成的情感内容的需求。水印通过在语言模型的输出中嵌入可检测的信号,提供了一个很有前途的解决方案。然而,一个关键的挑战仍然存在:情感文本通常表现出低熵或复杂的尖形熵分布,这严重破坏了现有水印方法的性能。不像以前的工作,主要是把尖刺熵作为一个外部度量,我们特别关注它在文本生成过程本身的作用。为了解决低熵和复杂熵分布下的水印挑战,我们提出了DBW(动态偏差水印)——一种用于llm的熵感知水印算法。DBW基于每个令牌的熵值实时动态调整水印偏差。这种创新确保了在高熵环境下更强的水印信号(增加绿色令牌计数),同时最大限度地减少了脆弱的低熵情感片段的干扰和质量退化。实验结果表明,DBW算法在复杂熵分布和低熵文本生成场景下都优于KGW水印方法。DBW在不牺牲文本质量的情况下实现了更高的检测精度。此外,对比实验表明,我们提出的DBW算法在不同攻击下都具有较好的鲁棒性。我们的工作为保护情感人工智能生成的内容提供了可靠和自适应的工具,有助于在情感计算中安全可靠地部署大规模预训练模型。
{"title":"Entropy-aware dynamic bias watermarking for LLM-generated emotional content","authors":"Dawei Xu ,&nbsp;Xuyang Dong ,&nbsp;Chunhai Li ,&nbsp;Baokun Zheng ,&nbsp;Chuan Zhang ,&nbsp;Yilin Chen ,&nbsp;Liehuang Zhu","doi":"10.1016/j.patcog.2025.112866","DOIUrl":"10.1016/j.patcog.2025.112866","url":null,"abstract":"<div><div>The application of large language models(LLMs) in affective computing-ranging from empathetic chatbots to creative writing-has intensified the demand for distinguishing and authenticating AI-generated emotional content. Watermarking, by embedding detectable signals into the outputs of language models, offers a promising solution. However, a critical challenge persists: emotional texts often exhibit low entropy or complex spiky entropy distributions, which severely undermine the performance of existing watermarking methods. Unlike prior works that primarily treated spiky entropy as an external metric, we focus specifically on its role within the text generation process itself. To address the challenges of watermarking under low-entropy and complex entropy distributions, we propose DBW (Dynamic Bias Watermarking)-an entropy-aware watermarking algorithm for LLMs. DBW dynamically adjusts the watermarking bias in real time based on the entropy of each token. This innovation ensures a stronger watermark signal (increased green token count) in high-entropy contexts, while minimizing interference and quality degradation in fragile low-entropy emotional segments. Experimental results demonstrate that the proposed DBW algorithm outperforms the KGW watermarking method in both complex entropy distribution and low-entropy text generation scenarios. DBW achieves higher detection accuracy without sacrificing text quality. Furthermore, comparative experiments show that our proposed DBW algorithm demonstrates superior robustness under different attacks. Our work provides a reliable and adaptive tool for safeguarding emotion-AI generated content, contributing to the secure and trustworthy deployment of large-scale pre-trained models in affective computing.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112866"},"PeriodicalIF":7.6,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DANIM: Domain adaptation network with intermediate domain masking for night-time scene parsing DANIM:用于夜景分析的带有中间域掩码的域自适应网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-07 DOI: 10.1016/j.patcog.2025.112796
Qijian Tian , Sen Wang , Ran Yi , Zufeng Zhang , Bin Sheng , Xin Tan , Lizhuang Ma
Night-time scene parsing is important for practical applications such as autonomous driving and robot vision. Since annotating is time-consuming, Unsupervised Domain Adaptation (UDA) is an effective solution for night-time scene parsing. Due to the low illumination, over/under-exposure, and motion blur in night-time scenes, existing methods can not connect daytime scenes and night-time scenes well, limiting their performance. Some methods rely on day-night paired images, which are costly to collect and therefore impractical. In this paper, we propose DANIM, a self-training UDA network for night-time scene parsing. We introduce an intermediate domain that explicitly models the connection between daytime scenes and night-time scenes from lighting and structure. The intermediate domain shares similar structure information with the night-time target domain and similar lighting information with the daytime source domain. By harnessing the rich prior knowledge of a pre-trained text-driven generative model, the intermediate domain can be generated, and we propose a scoring mechanism for selecting the high-quality one for training. Besides, we propose intermediate domain masking to address the inconsistency between the intermediate domain and the target domain. We further design a coupled mask strategy to make the mask more effective. Extensive experiments show that DANIM has achieved first place on the DarkZurich leaderboard and outperforms state-of-the-art methods on other widely used night-time scene parsing benchmarks, i.e., ACDC-night, NightCity, and NighttimeDriving.
夜间场景分析对于自动驾驶和机器人视觉等实际应用非常重要。由于标注耗时,无监督域自适应(UDA)是夜间场景解析的有效解决方案。由于夜间场景存在照度低、曝光过低、动态模糊等问题,现有的方法不能很好地连接白天场景和夜间场景,限制了其性能。一些方法依赖于昼夜配对的图像,这是昂贵的收集,因此不切实际。本文提出了一种用于夜景分析的自训练UDA网络DANIM。我们引入了一个中间域,从照明和结构上明确地模拟了白天场景和夜间场景之间的联系。中间域与夜间目标域具有相似的结构信息,与白天源域具有相似的照明信息。通过利用预训练文本驱动生成模型丰富的先验知识,生成中间域,并提出了一种选择高质量的中间域进行训练的评分机制。此外,为了解决中间域与目标域不一致的问题,我们提出了中间域掩码。我们进一步设计了一种耦合掩模策略,使掩模更有效。大量的实验表明,DANIM已经在DarkZurich排行榜上获得了第一名,并且在其他广泛使用的夜间场景分析基准测试(即ACDC-night, NightCity和NighttimeDriving)上优于最先进的方法。
{"title":"DANIM: Domain adaptation network with intermediate domain masking for night-time scene parsing","authors":"Qijian Tian ,&nbsp;Sen Wang ,&nbsp;Ran Yi ,&nbsp;Zufeng Zhang ,&nbsp;Bin Sheng ,&nbsp;Xin Tan ,&nbsp;Lizhuang Ma","doi":"10.1016/j.patcog.2025.112796","DOIUrl":"10.1016/j.patcog.2025.112796","url":null,"abstract":"<div><div>Night-time scene parsing is important for practical applications such as autonomous driving and robot vision. Since annotating is time-consuming, Unsupervised Domain Adaptation (UDA) is an effective solution for night-time scene parsing. Due to the low illumination, over/under-exposure, and motion blur in night-time scenes, existing methods can not connect daytime scenes and night-time scenes well, limiting their performance. Some methods rely on day-night paired images, which are costly to collect and therefore impractical. In this paper, we propose DANIM, a self-training UDA network for night-time scene parsing. We introduce an intermediate domain that explicitly models the connection between daytime scenes and night-time scenes from lighting and structure. The intermediate domain shares similar structure information with the night-time target domain and similar lighting information with the daytime source domain. By harnessing the rich prior knowledge of a pre-trained text-driven generative model, the intermediate domain can be generated, and we propose a scoring mechanism for selecting the high-quality one for training. Besides, we propose intermediate domain masking to address the inconsistency between the intermediate domain and the target domain. We further design a coupled mask strategy to make the mask more effective. Extensive experiments show that DANIM has achieved first place on the DarkZurich leaderboard and outperforms state-of-the-art methods on other widely used night-time scene parsing benchmarks, i.e., ACDC-night, NightCity, and NighttimeDriving.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112796"},"PeriodicalIF":7.6,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Source-free domain adaptation via multimodal space-guided alignment 通过多模态空间引导对齐实现无源域自适应
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-07 DOI: 10.1016/j.patcog.2025.112827
Lijuan Chen , Yunxiang Bai , Ying Hu , Qiong Wang , Xiaozhi Qi
Conventional UDA requires access to the source domain, invalidating it in the information security and privacy protection scenarios. In contrast, Source-free Domain Adaptation (SFDA) involves transferring a pre-trained source model to an unlabeled target domain while the source data is absent. However, prior methods based on self-supervised learning have struggled to find a quality domain invariant representation space due to the lack of source data. To address this challenge, in this work, we propose leveraging the success of vision-language pre-trained (ViL) models (e.g., CLIP). To integrate the domain generality of the ViL model and the task specificity of source model more effectively, we introduce a novel MultiModal Space-Guided Alignment (MMGA) approach. Specifically, we start with a multimodal feature calibration for achieving coarse alignment between the target visual domain and the multimodal space. However, this ViL space is still not the domain invariant space, being trained on a large number of samples. To achieve further fine-grained alignment towards the domain invariant space, we have designed two methods: Potential category consistency and prediction consistency alignment. These methods push the potential categories distribution and the prediction distribution closer to the fused pseudo-supervision by the ViL model and the adapted source model, respectively. This strategy corrects the errors of feature alignment to the ViL space. Extensive experiments show that our MMGA approach significantly outperforms current state-of-the-art alternatives. The code and data are available at https://github.com/YunxiangBai0/MMGA/
传统的UDA需要访问源域,在信息安全和隐私保护场景下无效。相比之下,无源域适应(SFDA)涉及在源数据不存在的情况下将预训练的源模型转移到未标记的目标域。然而,先前基于自监督学习的方法由于缺乏源数据而难以找到高质量的域不变表示空间。为了应对这一挑战,在这项工作中,我们建议利用视觉语言预训练(ViL)模型(例如CLIP)的成功。为了更有效地将ViL模型的域通用性和源模型的任务专用性结合起来,提出了一种新的多模态空间制导对齐方法。具体来说,我们从多模态特征校准开始,以实现目标视觉域和多模态空间之间的粗对齐。然而,这个ViL空间仍然不是域不变空间,需要在大量样本上进行训练。为了进一步实现对域不变空间的细粒度对齐,我们设计了两种方法:潜在类别一致性和预测一致性对齐。这些方法分别通过ViL模型和自适应源模型使潜在类别分布和预测分布更接近于融合的伪监督。该策略纠正了特征对齐到ViL空间的错误。大量的实验表明,我们的MMGA方法明显优于当前最先进的替代方法。代码和数据可在https://github.com/YunxiangBai0/MMGA/上获得
{"title":"Source-free domain adaptation via multimodal space-guided alignment","authors":"Lijuan Chen ,&nbsp;Yunxiang Bai ,&nbsp;Ying Hu ,&nbsp;Qiong Wang ,&nbsp;Xiaozhi Qi","doi":"10.1016/j.patcog.2025.112827","DOIUrl":"10.1016/j.patcog.2025.112827","url":null,"abstract":"<div><div>Conventional UDA requires access to the source domain, invalidating it in the information security and privacy protection scenarios. In contrast, Source-free Domain Adaptation (SFDA) involves transferring a pre-trained source model to an unlabeled target domain while the source data is absent. However, prior methods based on self-supervised learning have struggled to find a quality domain invariant representation space due to the lack of source data. To address this challenge, in this work, we propose leveraging the success of vision-language pre-trained (ViL) models (e.g., CLIP). To integrate the domain generality of the ViL model and the task specificity of source model more effectively, we introduce a novel <em><strong>M</strong>ulti<strong>M</strong>odal Space-<strong>G</strong>uided <strong>A</strong>lignment (<strong>MMGA</strong>)</em> approach. Specifically, we start with a multimodal feature calibration for achieving coarse alignment between the target visual domain and the multimodal space. However, this ViL space is still not the domain invariant space, being trained on a large number of samples. To achieve further fine-grained alignment towards the domain invariant space, we have designed two methods: Potential category consistency and prediction consistency alignment. These methods push the potential categories distribution and the prediction distribution closer to the fused pseudo-supervision by the ViL model and the adapted source model, respectively. This strategy corrects the errors of feature alignment to the ViL space. Extensive experiments show that our MMGA approach significantly outperforms current state-of-the-art alternatives. The code and data are available at <span><span>https://github.com/YunxiangBai0/MMGA/</span><svg><path></path></svg></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112827"},"PeriodicalIF":7.6,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSG-CLIP: Enhancing CLIP’s ability to learn fine-grained structural associations through multi-modal scene graph alignment MSG-CLIP:通过多模态场景图形对齐,增强CLIP学习细粒度结构关联的能力
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-06 DOI: 10.1016/j.patcog.2025.112794
Xiaotian Lv , Yue Zhao , Hanlong Yin , Yifei Chen , Jianxing Liu
As a typical representative of Vision-Language foundation models, the Contrastive Language-Image Pre-training (CLIP) framework has garnered extensive attention due to its cross-modal understanding capabilities. Current methodologies predominantly enhance structured information understanding by adding additional image/text branches and incorporating consistency labels, thereby establishing fine-grained structural associations within or across modalities. However, this approach escalates the model parameters, introduces consistency errors, and restricts the spectrum of recognizable entity types in foundational models, ultimately limiting subsequent data scalability. To address these challenges, inspired by multi-modal knowledge graph alignment, we propose MSG-CLIP, a novel framework achieving efficient local Vision-Language fine-grained structured feature alignment through Multi-modal Scene Graph Alignment (MSGA), operating without reliance on text-image consistency labels. Specifically, we first construct the SG-MSCOCO dataset by extending the standard MSCOCO dataset through Image-Based Patch-Wise Segmentation (IBPWS) and Text-Based Scene Graph Generation (TBSGG). Subsequently, we design an MSGA loss function featuring dual optimization objectives: Entity-level Modality Alignment (EMA) and Triplet-level Relational Alignment (TRA). Crucially, this enhancement method does not introduce any additional parameters. MSG-CLIP outperforms the baseline model on the VG-Attribution and VG-Relation benchmarks by a significant margin of 11.2 % and 2.5 %, respectively. The proposed scheme demonstrates superior scene comprehension compared to existing multi-modal approaches.
对比语言-图像预训练(CLIP)框架作为视觉语言基础模型的典型代表,因其跨模态理解能力而受到广泛关注。当前的方法主要通过添加额外的图像/文本分支和合并一致性标签来增强结构化信息的理解,从而在模式内部或模式之间建立细粒度的结构关联。然而,这种方法增加了模型参数,引入了一致性错误,并限制了基础模型中可识别实体类型的范围,最终限制了后续数据的可扩展性。为了解决这些挑战,受多模态知识图对齐的启发,我们提出了MSG-CLIP,这是一个新的框架,通过多模态场景图对齐(MSGA)实现高效的局部视觉语言细粒度结构化特征对齐,而不依赖于文本图像一致性标签。具体而言,我们首先通过基于图像的Patch-Wise Segmentation (IBPWS)和基于文本的Scene Graph Generation (TBSGG)对标准MSCOCO数据集进行扩展,构建SG-MSCOCO数据集。随后,我们设计了一个具有双重优化目标的MSGA损失函数:实体级模态对齐(EMA)和三重级关系对齐(TRA)。关键是,这种增强方法不引入任何额外的参数。MSG-CLIP在VG-Attribution和VG-Relation基准上分别以11.2%和2.5%的显著优势优于基线模型。与现有的多模态方法相比,该方案具有更好的场景理解能力。
{"title":"MSG-CLIP: Enhancing CLIP’s ability to learn fine-grained structural associations through multi-modal scene graph alignment","authors":"Xiaotian Lv ,&nbsp;Yue Zhao ,&nbsp;Hanlong Yin ,&nbsp;Yifei Chen ,&nbsp;Jianxing Liu","doi":"10.1016/j.patcog.2025.112794","DOIUrl":"10.1016/j.patcog.2025.112794","url":null,"abstract":"<div><div>As a typical representative of Vision-Language foundation models, the Contrastive Language-Image Pre-training (CLIP) framework has garnered extensive attention due to its cross-modal understanding capabilities. Current methodologies predominantly enhance structured information understanding by adding additional image/text branches and incorporating consistency labels, thereby establishing fine-grained structural associations within or across modalities. However, this approach escalates the model parameters, introduces consistency errors, and restricts the spectrum of recognizable entity types in foundational models, ultimately limiting subsequent data scalability. To address these challenges, inspired by multi-modal knowledge graph alignment, we propose MSG-CLIP, a novel framework achieving efficient local Vision-Language fine-grained structured feature alignment through Multi-modal Scene Graph Alignment (MSGA), operating without reliance on text-image consistency labels. Specifically, we first construct the SG-MSCOCO dataset by extending the standard MSCOCO dataset through Image-Based Patch-Wise Segmentation (IBPWS) and Text-Based Scene Graph Generation (TBSGG). Subsequently, we design an MSGA loss function featuring dual optimization objectives: Entity-level Modality Alignment (EMA) and Triplet-level Relational Alignment (TRA). Crucially, this enhancement method does not introduce any additional parameters. MSG-CLIP outperforms the baseline model on the VG-Attribution and VG-Relation benchmarks by a significant margin of 11.2 % and 2.5 %, respectively. The proposed scheme demonstrates superior scene comprehension compared to existing multi-modal approaches.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112794"},"PeriodicalIF":7.6,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text-Centric multimodal sentiment analysis with asymmetric fine-tuning 非对称微调的以文本为中心的多模态情感分析
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-05 DOI: 10.1016/j.patcog.2025.112842
Hanzhao Pan , Gengshen Wu , Yi Liu , Jungong Han
Affective Computing, particularly through Multimodal Sentiment Analysis (MSA), aims to capture and interpret the full spectrum of human emotion by integrating linguistic, visual, and acoustic signals. While large-scale pre-trained models have become foundational in this field, existing methods often suffer from high computational overhead and suboptimal fusion strategies. This paper introduces a novel MSA framework, Text-Centric Asymmetric Multimodal Sentiment Analysis (TAMSA). It builds upon powerful pre-trained encoders and employs an asymmetric fine-tuning strategy. Specifically, the text encoder is fully fine-tuned with LoRA adapters to maximize semantic representation learning, while the visual and acoustic encoders are only partially fine-tuned to balance efficiency with performance. A text-centric fusion mechanism selectively aggregates contextual information from visual and audio modalities, mimicking human cognitive processes. Extensive experiments on the CMU-MOSI and CH-SIMS datasets demonstrate leading prediction accuracy, achieving a 79.7 % Pearson correlation coefficient and 85.2 % binary classification accuracy on CMU-MOSI, and an 87.8 % binary classification accuracy on CH-SIMS. Our framework also extends to cross-lingual joint training, addressing language and label granularity differences.
情感计算,特别是通过多模态情感分析(MSA),旨在通过整合语言、视觉和声学信号来捕获和解释人类情感的全谱。虽然大规模预训练模型已成为该领域的基础,但现有方法往往存在计算开销高和融合策略不理想的问题。本文介绍了一种新的多模态情感分析框架——以文本为中心的非对称多模态情感分析(TAMSA)。它建立在强大的预训练编码器和采用不对称微调策略。具体来说,文本编码器使用LoRA适配器进行了全面微调,以最大限度地提高语义表示学习,而视觉和声学编码器仅进行了部分微调,以平衡效率和性能。以文本为中心的融合机制选择性地从视觉和音频模式中聚集上下文信息,模仿人类的认知过程。在CMU-MOSI和CH-SIMS数据集上的大量实验证明了领先的预测精度,在CMU-MOSI上实现了79.7%的Pearson相关系数和85.2%的二值分类精度,在CH-SIMS上实现了87.8%的二值分类精度。我们的框架还扩展到跨语言联合训练,解决语言和标签粒度的差异。
{"title":"Text-Centric multimodal sentiment analysis with asymmetric fine-tuning","authors":"Hanzhao Pan ,&nbsp;Gengshen Wu ,&nbsp;Yi Liu ,&nbsp;Jungong Han","doi":"10.1016/j.patcog.2025.112842","DOIUrl":"10.1016/j.patcog.2025.112842","url":null,"abstract":"<div><div>Affective Computing, particularly through Multimodal Sentiment Analysis (MSA), aims to capture and interpret the full spectrum of human emotion by integrating linguistic, visual, and acoustic signals. While large-scale pre-trained models have become foundational in this field, existing methods often suffer from high computational overhead and suboptimal fusion strategies. This paper introduces a novel MSA framework, Text-Centric Asymmetric Multimodal Sentiment Analysis (TAMSA). It builds upon powerful pre-trained encoders and employs an asymmetric fine-tuning strategy. Specifically, the text encoder is fully fine-tuned with LoRA adapters to maximize semantic representation learning, while the visual and acoustic encoders are only partially fine-tuned to balance efficiency with performance. A text-centric fusion mechanism selectively aggregates contextual information from visual and audio modalities, mimicking human cognitive processes. Extensive experiments on the CMU-MOSI and CH-SIMS datasets demonstrate leading prediction accuracy, achieving a 79.7 % Pearson correlation coefficient and 85.2 % binary classification accuracy on CMU-MOSI, and an 87.8 % binary classification accuracy on CH-SIMS. Our framework also extends to cross-lingual joint training, addressing language and label granularity differences.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112842"},"PeriodicalIF":7.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regularized evidential neural networks for deep active learning 深度主动学习的正则化证据神经网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-05 DOI: 10.1016/j.patcog.2025.112836
Pengju Wang , Sicong Zhang , Rongrong Chen , Jiamin Chen , Yufeng Fan , Lulu Ning , Yongfeng Cao
Deep models heavily rely on large amounts of labeled data. Active learning aims to alleviate this issue by selecting the most informative samples, with uncertainty-based strategies being the most commonly employed approach. Evidential Neural Networks (ENNs) have been proposed to quantify uncertainty in classification predictions, offering a representative way to estimate predictive uncertainty with a single deterministic forward pass. However, our extensive experiments reveal that ENNs tend to be overconfident, and generalize poorly compared to vanilla training (VT). We identify two key factors behind these issues: (1) the unbounded amplification of correct-class evidence and (2) the redundant penalization of incorrect-class evidence. By eliminating the redundant penalization and moderating the evidence for the correct class, we significantly improve the model’s intrinsic calibration and achieve better generalization than VT. We propose a novel loss function that enables controllable regulation of the predicted evidence for the correct class while removing redundant penalties for incorrect classes, which we refer to as Regularized Evidential Neural Networks (RENNs). Leveraging the robust calibration and generalization of RENNs, we further introduce RENN4DAL, a simple yet effective uncertainty-based deep active learning approach. RENN4DAL achieves consistent state-of-the-art performance across a variety of convolutional neural network (CNN) and Transformer-based benchmarks, with particularly strong gains observed on challenging datasets.
深度模型严重依赖于大量的标记数据。主动学习旨在通过选择信息量最大的样本来缓解这一问题,基于不确定性的策略是最常用的方法。证据神经网络(evidence Neural Networks, ENNs)被用来量化分类预测中的不确定性,提供了一种具有代表性的方法来估计单次确定性前向传递的预测不确定性。然而,我们的大量实验表明,与香草训练(VT)相比,enn往往过于自信,泛化能力较差。我们确定了这些问题背后的两个关键因素:(1)正确证据的无限放大和(2)错误证据的冗余处罚。通过消除冗余惩罚和调节正确类别的证据,我们显著提高了模型的内在校准,并实现了比VT更好的泛化。我们提出了一种新的损失函数,可以对正确类别的预测证据进行可控调节,同时去除错误类别的冗余惩罚,我们将其称为正则化证据神经网络(RENNs)。利用renn的鲁棒校准和泛化,我们进一步介绍了RENN4DAL,一种简单而有效的基于不确定性的深度主动学习方法。RENN4DAL在各种卷积神经网络(CNN)和基于transformer的基准测试中实现了一致的最先进性能,在具有挑战性的数据集上观察到特别强的增益。
{"title":"Regularized evidential neural networks for deep active learning","authors":"Pengju Wang ,&nbsp;Sicong Zhang ,&nbsp;Rongrong Chen ,&nbsp;Jiamin Chen ,&nbsp;Yufeng Fan ,&nbsp;Lulu Ning ,&nbsp;Yongfeng Cao","doi":"10.1016/j.patcog.2025.112836","DOIUrl":"10.1016/j.patcog.2025.112836","url":null,"abstract":"<div><div>Deep models heavily rely on large amounts of labeled data. Active learning aims to alleviate this issue by selecting the most informative samples, with uncertainty-based strategies being the most commonly employed approach. Evidential Neural Networks (ENNs) have been proposed to quantify uncertainty in classification predictions, offering a representative way to estimate predictive uncertainty with a single deterministic forward pass. However, our extensive experiments reveal that ENNs tend to be overconfident, and generalize poorly compared to vanilla training (VT). We identify two key factors behind these issues: (1) the unbounded amplification of correct-class evidence and (2) the redundant penalization of incorrect-class evidence. By eliminating the redundant penalization and moderating the evidence for the correct class, we significantly improve the model’s intrinsic calibration and achieve better generalization than VT. We propose a novel loss function that enables controllable regulation of the predicted evidence for the correct class while removing redundant penalties for incorrect classes, which we refer to as Regularized Evidential Neural Networks (RENNs). Leveraging the robust calibration and generalization of RENNs, we further introduce RENN4DAL, a simple yet effective uncertainty-based deep active learning approach. RENN4DAL achieves consistent state-of-the-art performance across a variety of convolutional neural network (CNN) and Transformer-based benchmarks, with particularly strong gains observed on challenging datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112836"},"PeriodicalIF":7.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nested evolution for interactively fusing feature agents and learning ensembled classifier agents 用于交互融合特征代理和学习集成分类器代理的嵌套进化
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-05 DOI: 10.1016/j.patcog.2025.112837
Qinghua Huang , Haoning Li , Hao Xu , Cong Wang
Currently, feature selection methods combined with classifiers face challenges due to a lack of interactivity and interpretability. These issues lead to suboptimal classification performance. Inspired by the continuous learning process among agents through mutual feedback, we propose an innovative nested evolutionary framework. In this framework, we innovatively propose the concepts of feature agent and classifier agent, representing the feature selection and classifier construction and evaluation processes, respectively. Furthermore, we introduce macro-evolution and micro-evolution mechanisms to facilitate interactive learning between the processes. Specifically, during the macro-evolution phase, a multi-objective evolutionary biclustering algorithm is employed to generate multiple biclusters (feature subsets), thereby completing the learning process of the feature agents. Subsequently, classification rules are extracted from these biclusters to construct weak classifiers, which are further evaluated and optimized to achieve the learning process of the classifier agents. In the micro-evolution phase, the evaluation results of the weak classifiers are used as feedback to re-evolve the biclusters corresponding to the underperforming weak classifiers, resulting in improved feature subsets and thereby enhancing the performance of the weak classifiers. This iterative process achieves interactive learning between the feature agents and classifier agents, thereby simultaneously improving both the feature subsets and classifiers. Finally, we employ AdaBoost to integrate the weak classifiers into a robust strong classifier. Experimental results demonstrate that it outperforms other methods across multiple binary classification datasets.
目前,与分类器相结合的特征选择方法由于缺乏交互性和可解释性而面临挑战。这些问题导致分类性能不够理想。受智能体之间通过相互反馈的持续学习过程的启发,我们提出了一种创新的嵌套进化框架。在此框架中,我们创新性地提出了特征代理和分类器代理的概念,分别代表特征选择和分类器构建和评估过程。此外,我们引入了宏观进化和微观进化机制,以促进过程之间的互动学习。具体而言,在宏观进化阶段,采用多目标进化双聚类算法生成多个双聚类(特征子集),从而完成特征代理的学习过程。随后,从这些双聚类中提取分类规则,构建弱分类器,并对弱分类器进行进一步评估和优化,实现分类器智能体的学习过程。在微进化阶段,将弱分类器的评价结果作为反馈,对表现不佳的弱分类器对应的双聚类进行重新进化,得到改进的特征子集,从而提高弱分类器的性能。这个迭代过程实现了特征代理和分类器代理之间的交互学习,从而同时改进了特征子集和分类器。最后,我们使用AdaBoost将弱分类器集成到一个鲁棒的强分类器中。实验结果表明,该方法在多个二值分类数据集上优于其他方法。
{"title":"Nested evolution for interactively fusing feature agents and learning ensembled classifier agents","authors":"Qinghua Huang ,&nbsp;Haoning Li ,&nbsp;Hao Xu ,&nbsp;Cong Wang","doi":"10.1016/j.patcog.2025.112837","DOIUrl":"10.1016/j.patcog.2025.112837","url":null,"abstract":"<div><div>Currently, feature selection methods combined with classifiers face challenges due to a lack of interactivity and interpretability. These issues lead to suboptimal classification performance. Inspired by the continuous learning process among agents through mutual feedback, we propose an innovative nested evolutionary framework. In this framework, we innovatively propose the concepts of feature agent and classifier agent, representing the feature selection and classifier construction and evaluation processes, respectively. Furthermore, we introduce macro-evolution and micro-evolution mechanisms to facilitate interactive learning between the processes. Specifically, during the macro-evolution phase, a multi-objective evolutionary biclustering algorithm is employed to generate multiple biclusters (feature subsets), thereby completing the learning process of the feature agents. Subsequently, classification rules are extracted from these biclusters to construct weak classifiers, which are further evaluated and optimized to achieve the learning process of the classifier agents. In the micro-evolution phase, the evaluation results of the weak classifiers are used as feedback to re-evolve the biclusters corresponding to the underperforming weak classifiers, resulting in improved feature subsets and thereby enhancing the performance of the weak classifiers. This iterative process achieves interactive learning between the feature agents and classifier agents, thereby simultaneously improving both the feature subsets and classifiers. Finally, we employ AdaBoost to integrate the weak classifiers into a robust strong classifier. Experimental results demonstrate that it outperforms other methods across multiple binary classification datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112837"},"PeriodicalIF":7.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing distance-based genetic programming classifiers by reconstructing datasets for imbalanced binary classification 基于非平衡二值分类数据集重构的基于距离的遗传规划分类器
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-05 DOI: 10.1016/j.patcog.2025.112825
Wenyang Meng , Ying Li , Fan Zhang , Xiaoying Gao , Jianbin Ma
Designing effective Genetic Programming (GP) classifiers for imbalanced data is challenging because classifiers tend to bias towards the majority class. Fixed-threshold GP classifiers cannot effectively handle imbalanced classification problems, and determining an appropriate threshold for final decision-making is a challenging task when using threshold-independent methods. In this paper, we propose a distance-based GP classifier construction method which does not rely on specific thresholds. This method reconstructs datasets, balances the number of instances in majority and minority classes, and determines class labels of unknown instances by distance measurement. Experiments on sixteen imbalanced binary datasets show that our distance-based GP classifier construction method can effectively improve imbalanced classification problems. Comparisons with eight GP-based methods for imbalanced data show that our proposed method achieves significantly better performance on most datasets, while comparisons with six traditional Machine Learning (ML) algorithms show that it achieves competitive results.
为不平衡数据设计有效的遗传规划(GP)分类器是具有挑战性的,因为分类器倾向于偏向大多数类。固定阈值GP分类器不能有效处理不平衡分类问题,在使用阈值无关方法时,确定一个合适的阈值进行最终决策是一项具有挑战性的任务。本文提出了一种不依赖特定阈值的基于距离的GP分类器构建方法。该方法重构数据集,平衡多数类和少数类实例的数量,并通过距离度量确定未知实例的类标签。在16个不平衡二值数据集上的实验表明,基于距离的GP分类器构建方法可以有效地改善不平衡分类问题。与八种基于gp的不平衡数据方法的比较表明,我们提出的方法在大多数数据集上取得了显着更好的性能,而与六种传统机器学习(ML)算法的比较表明,它取得了竞争结果。
{"title":"Developing distance-based genetic programming classifiers by reconstructing datasets for imbalanced binary classification","authors":"Wenyang Meng ,&nbsp;Ying Li ,&nbsp;Fan Zhang ,&nbsp;Xiaoying Gao ,&nbsp;Jianbin Ma","doi":"10.1016/j.patcog.2025.112825","DOIUrl":"10.1016/j.patcog.2025.112825","url":null,"abstract":"<div><div>Designing effective Genetic Programming (GP) classifiers for imbalanced data is challenging because classifiers tend to bias towards the majority class. Fixed-threshold GP classifiers cannot effectively handle imbalanced classification problems, and determining an appropriate threshold for final decision-making is a challenging task when using threshold-independent methods. In this paper, we propose a distance-based GP classifier construction method which does not rely on specific thresholds. This method reconstructs datasets, balances the number of instances in majority and minority classes, and determines class labels of unknown instances by distance measurement. Experiments on sixteen imbalanced binary datasets show that our distance-based GP classifier construction method can effectively improve imbalanced classification problems. Comparisons with eight GP-based methods for imbalanced data show that our proposed method achieves significantly better performance on most datasets, while comparisons with six traditional Machine Learning (ML) algorithms show that it achieves competitive results.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112825"},"PeriodicalIF":7.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1