首页 > 最新文献

Pattern Recognition最新文献

英文 中文
Test-time adaptive vision-language alignment for zero-shot group activity recognition 零射击群体活动识别的测试时间自适应视觉语言对齐
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.patcog.2025.113033
Runhao Zeng , Yirui Wang , Wenfu Peng , Xionglin Zhu , Ronghao Zhang , Zhihua Wang
Zero-shot group activity recognition (ZS-GAR) aims to identify activities unseen during training. However, conventional methods deploy models with parameters frozen at test time. This static nature prevents the model from adapting to the inherent distributional shift of unseen classes, severely impairing its generalization capability. To address this problem, we propose a test-time adaptation (TTA) framework that dynamically adapts the model during inference by employing two synergistic self-supervised mechanisms. First, an Actor-Drop Feature Augmentation strategy leverages group relational structure as a potent self-supervised signal by enforcing predictive consistency on samples where individuals are randomly masked. Second, our Label-Semantic Contrastive Learning mechanism generates pseudo-labels from high-confidence predictions and uses a dynamic memory bank, aligning features with their inferred semantic prototypes. This process not only enhances vision-language alignment for unseen classes but also demonstrates robustness against data corruptions, as validated on two new benchmarks, VD-C and CAD-C, featuring various corruption types. Extensive experiments on standard ZS-GAR benchmarks show our method significantly outperforms existing techniques, validating TTA’s effectiveness for this task.
零射击组活动识别(ZS-GAR)旨在识别训练中未见的活动。然而,传统方法部署的模型参数在测试时是冻结的。这种静态特性阻止了模型适应不可见类的固有分布变化,严重损害了它的泛化能力。为了解决这个问题,我们提出了一个测试时间适应(TTA)框架,该框架通过采用两个协同的自监督机制,在推理过程中动态地适应模型。首先,Actor-Drop特征增强策略利用群体关系结构作为一种有效的自监督信号,在个体被随机屏蔽的样本上强制执行预测一致性。其次,我们的标签-语义对比学习机制从高置信度的预测中生成伪标签,并使用动态记忆库,将特征与其推断的语义原型对齐。这个过程不仅增强了对未见类的视觉语言一致性,而且还展示了对数据损坏的鲁棒性,这在两个新的基准测试(cd - c和CAD-C)上得到了验证,它们具有各种损坏类型。在标准ZS-GAR基准测试上进行的大量实验表明,我们的方法明显优于现有技术,验证了TTA在此任务中的有效性。
{"title":"Test-time adaptive vision-language alignment for zero-shot group activity recognition","authors":"Runhao Zeng ,&nbsp;Yirui Wang ,&nbsp;Wenfu Peng ,&nbsp;Xionglin Zhu ,&nbsp;Ronghao Zhang ,&nbsp;Zhihua Wang","doi":"10.1016/j.patcog.2025.113033","DOIUrl":"10.1016/j.patcog.2025.113033","url":null,"abstract":"<div><div>Zero-shot group activity recognition (ZS-GAR) aims to identify activities unseen during training. However, conventional methods deploy models with parameters frozen at test time. This static nature prevents the model from adapting to the inherent distributional shift of unseen classes, severely impairing its generalization capability. To address this problem, we propose a test-time adaptation (TTA) framework that dynamically adapts the model during inference by employing two synergistic self-supervised mechanisms. First, an Actor-Drop Feature Augmentation strategy leverages group relational structure as a potent self-supervised signal by enforcing predictive consistency on samples where individuals are randomly masked. Second, our Label-Semantic Contrastive Learning mechanism generates pseudo-labels from high-confidence predictions and uses a dynamic memory bank, aligning features with their inferred semantic prototypes. This process not only enhances vision-language alignment for unseen classes but also demonstrates robustness against data corruptions, as validated on two new benchmarks, VD-C and CAD-C, featuring various corruption types. Extensive experiments on standard ZS-GAR benchmarks show our method significantly outperforms existing techniques, validating TTA’s effectiveness for this task.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113033"},"PeriodicalIF":7.6,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145915220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prompt-level contrastive learning for context-aware multi-modal image representation in medical diagnosis 医学诊断中上下文感知多模态图像表征的提示级对比学习
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.patcog.2025.113027
Guowei Dai , Zhimin Tian , Chen Xin , Duwei Dai , Chaoyu Wang , Yi Zhang , Hu Chen , Matthew Hamilton
Accurate early diagnosis cancer of medical images presents significant challenges due to visual ambiguities. While multimodal methods integrating images and clinical data are promising, effectively synthesizing diverse information sources and emulating clinical reasoning remains difficult. This paper introduces PCL-MFP, a novel framework designed to enhance multimodal Oral Cancer (OCA) diagnosis by deeply integrating visual and contextual information. Our approach uniquely leverages three modalities: clinical images, diagnostic descriptions generated by a multi-modal large language model (MLLM), and MLLM-derived classification probability vectors. PCL-MFP features a Prompt-Guided Multimodal Fusion (MFP) module built upon a multi-stage, hierarchical Transformer architecture. This module dynamically fuses the trimodal inputs, capturing complex cross-modal dependencies and higher-order interactions including bilinear and gated trimodal tensor interactions, to generate a contextually rich multimodal prompt. Crucially, we introduce a Prompt-Level Contrastive Learning (PCL) mechanism where this fused multimodal prompt serves as a guiding signal. PCL specifically refines the image representation through contrastive learning, enforcing alignment between visual features and the comprehensive semantic context embodied in the prompt within a shared embedding space using hard-negative mining. This strategy aims to learn highly discriminative, context-aware visual features, effectively bridging the gap between visual evidence and diagnostic interpretation for improved medical diagnosis, with a specific application to Oral Cancer and Breast Cancer diagnosis Analysis.
由于视觉上的模糊性,对医学图像进行癌症的准确早期诊断提出了重大挑战。虽然将图像和临床数据相结合的多模态方法很有前景,但有效地综合各种信息源并模拟临床推理仍然很困难。本文介绍了PCL-MFP,这是一个新的框架,旨在通过深度整合视觉和上下文信息来提高多模态口腔癌(OCA)的诊断。我们的方法独特地利用了三种模式:临床图像,由多模态大语言模型(MLLM)生成的诊断描述,以及MLLM衍生的分类概率向量。PCL-MFP具有基于多级分层变压器架构的快速引导多模态融合(MFP)模块。该模块动态融合三模态输入,捕获复杂的跨模态依赖关系和高阶交互,包括双线性和门控三模态张量交互,以生成上下文丰富的多模态提示。至关重要的是,我们引入了提示级对比学习(PCL)机制,其中融合的多模态提示作为指导信号。PCL特别通过对比学习来改进图像表示,使用硬负挖掘在共享嵌入空间中强制视觉特征和提示中体现的综合语义上下文之间的对齐。该策略旨在学习具有高度判别性和上下文感知的视觉特征,有效地弥合视觉证据与诊断解释之间的差距,以改进医疗诊断,具体应用于口腔癌和乳腺癌诊断分析。
{"title":"Prompt-level contrastive learning for context-aware multi-modal image representation in medical diagnosis","authors":"Guowei Dai ,&nbsp;Zhimin Tian ,&nbsp;Chen Xin ,&nbsp;Duwei Dai ,&nbsp;Chaoyu Wang ,&nbsp;Yi Zhang ,&nbsp;Hu Chen ,&nbsp;Matthew Hamilton","doi":"10.1016/j.patcog.2025.113027","DOIUrl":"10.1016/j.patcog.2025.113027","url":null,"abstract":"<div><div>Accurate early diagnosis cancer of medical images presents significant challenges due to visual ambiguities. While multimodal methods integrating images and clinical data are promising, effectively synthesizing diverse information sources and emulating clinical reasoning remains difficult. This paper introduces PCL-MFP, a novel framework designed to enhance multimodal Oral Cancer (OCA) diagnosis by deeply integrating visual and contextual information. Our approach uniquely leverages three modalities: clinical images, diagnostic descriptions generated by a multi-modal large language model (MLLM), and MLLM-derived classification probability vectors. PCL-MFP features a Prompt-Guided Multimodal Fusion (MFP) module built upon a multi-stage, hierarchical Transformer architecture. This module dynamically fuses the trimodal inputs, capturing complex cross-modal dependencies and higher-order interactions including bilinear and gated trimodal tensor interactions, to generate a contextually rich multimodal prompt. Crucially, we introduce a Prompt-Level Contrastive Learning (PCL) mechanism where this fused multimodal prompt serves as a guiding signal. PCL specifically refines the image representation through contrastive learning, enforcing alignment between visual features and the comprehensive semantic context embodied in the prompt within a shared embedding space using hard-negative mining. This strategy aims to learn highly discriminative, context-aware visual features, effectively bridging the gap between visual evidence and diagnostic interpretation for improved medical diagnosis, with a specific application to Oral Cancer and Breast Cancer diagnosis Analysis.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"174 ","pages":"Article 113027"},"PeriodicalIF":7.6,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust graph neural networks via supervised block diagonal regularizer 基于监督块对角正则化的鲁棒图神经网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.patcog.2025.113019
Zhi Chen , Libin Wang , Yulong Wang , Han Li , Qiang He , Huiwu Luo , Jinyu Tian , Yuan Yan Tang
Graph Neural Networks (GNNs) are susceptible to structural perturbations in the input graph. To address this, various graph sparsification strategies have been proposed to learn an optimal selection matrix, ideally with a block diagonal structure. However, existing works either overlook this critical prior or rely on indirect structural priors such as sparsity and low-rankness, leading to suboptimal solutions. To bridge this gap, we propose a novel Supervised Block Diagonal Regularizer (SuBDR) with two key advantages: (1) directness: it directly enforces the selection matrix to approximate the optimal block diagonal structure. (2) discriminability: it takes full advantage of the label information of training data to enhance the discriminative ability of the selection matrix. Equipped with SuBDR, we develop a general and robust GNNs framework coined SuBDR-GNNs, which not only directly exploits the block diagonal structure prior but also fully incorporates the training data label information into the learning process of the selection matrix, thereby ensuring more robust and decent node representations. Another contribution of this work is the derivation of an efficient optimization algorithm, with its convergence rigorously proved. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness, robustness and generality of our proposed SuBDR-GNNs framework. The code is available at https://github.com/ccmature/SuBDR-GNNs.
图神经网络(gnn)容易受到输入图结构扰动的影响。为了解决这个问题,已经提出了各种图稀疏化策略来学习最优选择矩阵,理想情况下具有块对角结构。然而,现有的研究要么忽略了这一关键先验,要么依赖于间接结构先验,如稀疏性和低秩性,从而导致次优解。为了弥补这一差距,我们提出了一种新的监督块对角正则器(SuBDR),它具有两个关键优点:(1)直接性:它直接强制选择矩阵近似最优块对角结构。(2)可判别性:充分利用训练数据的标签信息,增强选择矩阵的判别能力。结合SuBDR,我们开发了一种通用的鲁棒gnn框架,即SuBDR- gnn,它不仅直接利用了块对角结构先验,而且将训练数据标签信息充分融入到选择矩阵的学习过程中,从而确保了更鲁棒和体面的节点表示。本工作的另一个贡献是推导了一种有效的优化算法,并严格证明了其收敛性。在多个基准数据集上的大量实验证明了我们提出的subdr - gnn框架的有效性、鲁棒性和通用性。代码可在https://github.com/ccmature/SuBDR-GNNs上获得。
{"title":"Robust graph neural networks via supervised block diagonal regularizer","authors":"Zhi Chen ,&nbsp;Libin Wang ,&nbsp;Yulong Wang ,&nbsp;Han Li ,&nbsp;Qiang He ,&nbsp;Huiwu Luo ,&nbsp;Jinyu Tian ,&nbsp;Yuan Yan Tang","doi":"10.1016/j.patcog.2025.113019","DOIUrl":"10.1016/j.patcog.2025.113019","url":null,"abstract":"<div><div>Graph Neural Networks (GNNs) are susceptible to structural perturbations in the input graph. To address this, various graph sparsification strategies have been proposed to learn an optimal selection matrix, ideally with a block diagonal structure. However, existing works either overlook this critical prior or rely on indirect structural priors such as sparsity and low-rankness, leading to suboptimal solutions. To bridge this gap, we propose a novel <strong>Su</strong>pervised <strong>B</strong>lock <strong>D</strong>iagonal <strong>R</strong>egularizer (SuBDR) with two key advantages: (1) <strong>directness</strong>: it directly enforces the selection matrix to approximate the optimal block diagonal structure. (2) <strong>discriminability</strong>: it takes full advantage of the label information of training data to enhance the discriminative ability of the selection matrix. Equipped with SuBDR, we develop a general and robust GNNs framework coined SuBDR-GNNs, which not only directly exploits the block diagonal structure prior but also fully incorporates the training data label information into the learning process of the selection matrix, thereby ensuring more robust and decent node representations. Another contribution of this work is the derivation of an efficient optimization algorithm, with its convergence rigorously proved. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness, robustness and generality of our proposed SuBDR-GNNs framework. The code is available at <span><span>https://github.com/ccmature/SuBDR-GNNs</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113019"},"PeriodicalIF":7.6,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Backdoor defense for large language models with weak-to-strong knowledge distillation 基于弱到强知识蒸馏的大型语言模型的后门防御
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.patcog.2025.113030
Yuwen Li , Xinyi Wu , Zhongliang Guo , Luwei Xiao , Yanhao Jia , Shuai Zhao
Parameter-efficient fine-tuning (PEFT) can bridge the gap between large language models and downstream tasks. However, PEFT has been proven vulnerable to malicious attacks. Research indicates that poisoned LLMs, even after PEFT, retain the capability to activate internalized backdoors when input samples contain predefined triggers. To defend against backdoor attacks, in this paper, we introduce a novel weak-to-strong defense algorithm based on feature alignment knowledge distillation, named W2SDefense. Specifically, we first train a small-scale language model through full-parameter fine-tuning to serve as the clean teacher model. Then, this teacher model guides the large-scale poisoned student model in decoupling backdoor features by leveraging PEFT. Furthermore, we also propose the W2SDefense-Lite algorithm, which further optimizes W2SDefense. It incorporates a small-scale teacher model based on low-rank adaptation and a lightweight Feature Offset Penalty knowledge distillation algorithm grounded in the Frobenius norm, significantly reducing additional computational overhead and model complexity. Theoretical analysis suggests that W2SDefense and its Lite version have the potential to enhance the ability of student model to eliminate backdoor features through PEFT. Empirical results demonstrate the outstanding performance of the proposed algorithm in defending against backdoor attacks without compromising model performance.
参数有效微调(PEFT)可以弥合大型语言模型和下游任务之间的差距。然而,PEFT已被证明容易受到恶意攻击。研究表明,即使在PEFT之后,中毒的llm仍保留了在输入样本包含预定义触发器时激活内化后门的能力。为了防御后门攻击,本文引入了一种基于特征对齐知识蒸馏的弱到强防御算法W2SDefense。具体来说,我们首先通过全参数微调训练一个小规模的语言模型作为清洁教师模型。然后,该教师模型利用PEFT指导大规模毒学生模型解耦后门特征。此外,我们还提出了W2SDefense- lite算法,进一步优化了W2SDefense。它结合了基于低秩自适应的小规模教师模型和基于Frobenius规范的轻量级特征偏移惩罚知识精馏算法,显著降低了额外的计算开销和模型复杂性。理论分析表明,W2SDefense及其精简版具有增强学生模型通过PEFT消除后门特征的能力的潜力。实证结果表明,在不影响模型性能的情况下,该算法在防御后门攻击方面表现出色。
{"title":"Backdoor defense for large language models with weak-to-strong knowledge distillation","authors":"Yuwen Li ,&nbsp;Xinyi Wu ,&nbsp;Zhongliang Guo ,&nbsp;Luwei Xiao ,&nbsp;Yanhao Jia ,&nbsp;Shuai Zhao","doi":"10.1016/j.patcog.2025.113030","DOIUrl":"10.1016/j.patcog.2025.113030","url":null,"abstract":"<div><div>Parameter-efficient fine-tuning (PEFT) can bridge the gap between large language models and downstream tasks. However, PEFT has been proven vulnerable to malicious attacks. Research indicates that poisoned LLMs, even after PEFT, retain the capability to activate internalized backdoors when input samples contain predefined triggers. To defend against backdoor attacks, in this paper, we introduce a novel weak-to-strong defense algorithm based on feature alignment knowledge distillation, named <strong>W2SDefense</strong>. Specifically, we first train a small-scale language model through full-parameter fine-tuning to serve as the clean teacher model. Then, this teacher model guides the large-scale poisoned student model in decoupling backdoor features by leveraging PEFT. Furthermore, we also propose the <strong>W2SDefense-Lite</strong> algorithm, which further optimizes W2SDefense. It incorporates a small-scale teacher model based on low-rank adaptation and a lightweight Feature Offset Penalty knowledge distillation algorithm grounded in the Frobenius norm, significantly reducing additional computational overhead and model complexity. Theoretical analysis suggests that W2SDefense and its Lite version have the potential to enhance the ability of student model to eliminate backdoor features through PEFT. Empirical results demonstrate the outstanding performance of the proposed algorithm in defending against backdoor attacks without compromising model performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113030"},"PeriodicalIF":7.6,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145915223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fuzzy rough guided subspace anomaly detection in nominal data 模糊粗糙引导子空间异常检测
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.patcog.2025.113024
Qian Hu , Zhong Yuan , Jun Zhang , Jusheng Mi
Classical rough set theory has been successfully applied to anomaly detection tasks in nominal data. However, such detection tasks granulate the universe of discourse based on strict equivalence relations, which are applicable to nominal data but cannot effectively characterize the fuzziness between samples. Recently, fuzzy rough set theory has been used to construct anomaly detectors to deal with mixed-attribute data, which can effectively handle uncertain information such as fuzziness. However, when dealing with nominal data, the fuzzy equivalence relation degrades to the traditional equivalence relation. This causes the fuzzy rough set to degenerate into the classical rough set, resulting in the model’s inability to describe the fuzziness between nominal data samples. Based on these facts, we propose a novel fuzzy rough approximation computational model for decision-free nominal data. First, a fuzzy similarity measure is introduced to construct a fuzzy similarity relation between nominal data samples. Then, a generalized fuzzy granular family is induced based on the constructed fuzzy relations. Further, a fuzzy rough upper and lower approximation computational model for decision-free nominal data is constructed. Meanwhile, the corresponding fuzzy uncertainty measures are also defined, and their related properties are discussed. Finally, the importance of conditional attributes is defined based on the fuzzy uncertainty measures to select the subspace data. Based on the constructed subspace data, a fuzzy rough anomaly score is constructed to detect anomalies in the nominal subspace data. Extensive experimental results show that the proposed anomaly detector outperforms some current mainstream methods and can effectively detect anomalies in nominal data. The code is publicly available online at https://github.com/BELLoney/FRSAD.
经典粗糙集理论已成功地应用于标称数据的异常检测任务。然而,这种检测任务基于严格的等价关系对话语范围进行颗粒化,适用于名义数据,但不能有效表征样本之间的模糊性。近年来,利用模糊粗糙集理论构造异常检测器来处理混合属性数据,可以有效地处理模糊性等不确定信息。然而,在处理标称数据时,模糊等价关系退化为传统等价关系。这导致模糊粗糙集退化为经典粗糙集,导致模型无法描述标称数据样本之间的模糊性。基于这些事实,我们提出了一种新的无决策标称数据的模糊粗略近似计算模型。首先,引入模糊相似测度,构建名义数据样本间的模糊相似关系。然后,根据构造的模糊关系归纳出广义模糊颗粒族。在此基础上,建立了无决策标称数据的模糊粗糙上下近似计算模型。同时,定义了相应的模糊不确定性测度,并讨论了它们的相关性质。最后,基于模糊不确定性度量定义条件属性的重要性,选择子空间数据。在构造子空间数据的基础上,构造模糊粗糙异常分值来检测标称子空间数据中的异常。大量的实验结果表明,所提出的异常检测器优于目前的一些主流方法,可以有效地检测到标称数据中的异常。该代码可在https://github.com/BELLoney/FRSAD上公开获取。
{"title":"Fuzzy rough guided subspace anomaly detection in nominal data","authors":"Qian Hu ,&nbsp;Zhong Yuan ,&nbsp;Jun Zhang ,&nbsp;Jusheng Mi","doi":"10.1016/j.patcog.2025.113024","DOIUrl":"10.1016/j.patcog.2025.113024","url":null,"abstract":"<div><div>Classical rough set theory has been successfully applied to anomaly detection tasks in nominal data. However, such detection tasks granulate the universe of discourse based on strict equivalence relations, which are applicable to nominal data but cannot effectively characterize the fuzziness between samples. Recently, fuzzy rough set theory has been used to construct anomaly detectors to deal with mixed-attribute data, which can effectively handle uncertain information such as fuzziness. However, when dealing with nominal data, the fuzzy equivalence relation degrades to the traditional equivalence relation. This causes the fuzzy rough set to degenerate into the classical rough set, resulting in the model’s inability to describe the fuzziness between nominal data samples. Based on these facts, we propose a novel fuzzy rough approximation computational model for decision-free nominal data. First, a fuzzy similarity measure is introduced to construct a fuzzy similarity relation between nominal data samples. Then, a generalized fuzzy granular family is induced based on the constructed fuzzy relations. Further, a fuzzy rough upper and lower approximation computational model for decision-free nominal data is constructed. Meanwhile, the corresponding fuzzy uncertainty measures are also defined, and their related properties are discussed. Finally, the importance of conditional attributes is defined based on the fuzzy uncertainty measures to select the subspace data. Based on the constructed subspace data, a fuzzy rough anomaly score is constructed to detect anomalies in the nominal subspace data. Extensive experimental results show that the proposed anomaly detector outperforms some current mainstream methods and can effectively detect anomalies in nominal data. The code is publicly available online at <span><span>https://github.com/BELLoney/FRSAD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"174 ","pages":"Article 113024"},"PeriodicalIF":7.6,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empowering 2D neural network for 3D medical image segmentation via neighborhood information fusion 基于邻域信息融合的二维神经网络三维医学图像分割
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.patcog.2025.113035
Qiankun Li , Xiaolong Huang , Yani Zhang , Bo Fang , Duo Hong , Junxin Chen
In recent years, 3D volumetric medical images have been widely used in clinical diagnosis. However, popular 2D networks still segment the 3D data slice by slice in isolation, which neglects inter-slice information and has been found to be insufficient. In this paper, we propose a novel plug-and-play neighborhood explorer (PnP-NE) that can improve the performance of 2D networks in 3D medical image segmentation. The PnP-NE includes three components. The slice combination component first combines three neighborhood slices as an input sample. Then feature extraction component duplicates the encoder of the 2D segmentation network into three copies with shared weights for extracting features of the sample. Finally, a feature fusion component with slice attention weight is constructed to fuse features into the decoder for segmentation. In addition, the proposed weight sharing and feature storage strategies make PnP-NE extremely efficient. Taking advantage of the plug-and-play architecture, powerful neighborhood information exploration capabilities, and efficient optimization strategies, our PnP-NE is able to integrate into existing 2D networks conveniently and enhance their performance of 3D volumetric medical image segmentation. The experimental results with comprehensive ablation studies demonstrate the satisfactory performance of our proposed PnP-NE. Code is available at https://github.com/qklee-lz/PnP-NE.
近年来,三维立体医学图像在临床诊断中得到了广泛的应用。然而,目前流行的二维网络仍然是孤立地对三维数据逐片分割,忽略了片间信息,存在不足。在本文中,我们提出了一种新型的即插即用邻域浏览器(PnP-NE),可以提高二维网络在三维医学图像分割中的性能。PnP-NE由三个部分组成。切片组合组件首先将三个邻域切片组合为输入样本。然后特征提取组件将二维分割网络的编码器复制为三个副本,并共享权值,用于提取样本的特征。最后,构造具有切片关注权的特征融合组件,将特征融合到解码器中进行分割。此外,所提出的权值共享和特征存储策略使得PnP-NE非常高效。利用即插即用的架构、强大的邻域信息探索能力和高效的优化策略,我们的PnP-NE能够方便地集成到现有的二维网络中,提高其三维体医学图像分割的性能。综合烧蚀实验结果表明,我们提出的PnP-NE具有令人满意的性能。代码可从https://github.com/qklee-lz/PnP-NE获得。
{"title":"Empowering 2D neural network for 3D medical image segmentation via neighborhood information fusion","authors":"Qiankun Li ,&nbsp;Xiaolong Huang ,&nbsp;Yani Zhang ,&nbsp;Bo Fang ,&nbsp;Duo Hong ,&nbsp;Junxin Chen","doi":"10.1016/j.patcog.2025.113035","DOIUrl":"10.1016/j.patcog.2025.113035","url":null,"abstract":"<div><div>In recent years, 3D volumetric medical images have been widely used in clinical diagnosis. However, popular 2D networks still segment the 3D data slice by slice in isolation, which neglects inter-slice information and has been found to be insufficient. In this paper, we propose a novel plug-and-play neighborhood explorer (PnP-NE) that can improve the performance of 2D networks in 3D medical image segmentation. The PnP-NE includes three components. The slice combination component first combines three neighborhood slices as an input sample. Then feature extraction component duplicates the encoder of the 2D segmentation network into three copies with shared weights for extracting features of the sample. Finally, a feature fusion component with slice attention weight is constructed to fuse features into the decoder for segmentation. In addition, the proposed weight sharing and feature storage strategies make PnP-NE extremely efficient. Taking advantage of the plug-and-play architecture, powerful neighborhood information exploration capabilities, and efficient optimization strategies, our PnP-NE is able to integrate into existing 2D networks conveniently and enhance their performance of 3D volumetric medical image segmentation. The experimental results with comprehensive ablation studies demonstrate the satisfactory performance of our proposed PnP-NE. Code is available at <span><span>https://github.com/qklee-lz/PnP-NE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113035"},"PeriodicalIF":7.6,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AddSR: Accelerating diffusion-based blind super-resolution with adversarial diffusion distillation 基于对抗扩散蒸馏的加速扩散盲超分辨
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 DOI: 10.1016/j.patcog.2025.113012
Ying Tai , Rui Xie , Chen Zhao , Kai Zhang , Zhenyu Zhang , Jun Zhou , Jian Yang
Blind super-resolution methods based on Stable Diffusion (SD) demonstrate impressive generative capabilities in reconstructing clear, high-resolution (HR) images with intricate details from low-resolution (LR) inputs. However, their practical applicability is often limited by poor efficiency, as they typically require tens or even dozens of sampling steps. Inspired by Adversarial Diffusion Distillation (ADD), we incorporate this approach to design a highly effective and efficient blind super-resolution method. Nonetheless, two challenges arise: First, the original ADD significantly reduces result fidelity, leading to a perception-distortion imbalance. Second, SD-based methods are sensitive to the quality of the conditioning input, while LR images often have complex degradation, which further hinders effectiveness. To address these issues, we introduce a Timestep-Adaptive ADD (TA-ADD) to mitigate the perception-distortion imbalance caused by the original ADD. Furthermore, we propose a prediction-based self-refinement strategy to estimate HR, which allows for the provision of more high-frequency information without the need for additional modules. Extensive experiments show that our method, AddSR, generates superior restoration results while being significantly faster than previous SD-based state-of-the-art models (e.g., 7 ×  faster than SeeSR). Code is available at https://github.com/NJU-PCALab/AddSR.
基于稳定扩散(SD)的盲超分辨率方法在从低分辨率(LR)输入重建具有复杂细节的清晰、高分辨率(HR)图像方面展示了令人印象深刻的生成能力。然而,它们的实际应用往往受到效率低下的限制,因为它们通常需要几十甚至几十个采样步骤。受对抗扩散蒸馏(Adversarial Diffusion Distillation, ADD)的启发,我们设计了一种高效的盲超分辨方法。然而,出现了两个挑战:首先,原始的ADD显著降低了结果保真度,导致感知失真不平衡。其次,基于sd的方法对条件输入的质量敏感,而LR图像通常具有复杂的退化,这进一步阻碍了有效性。为了解决这些问题,我们引入了一种时间步自适应ADD (TA-ADD)来缓解由原始ADD引起的感知失真失衡。此外,我们提出了一种基于预测的自细化策略来估计HR,该策略允许提供更多的高频信息,而无需额外的模块。大量实验表明,我们的方法AddSR产生了优越的恢复结果,同时比以前基于sd的最先进模型快得多(例如,比SeeSR快7 × )。代码可从https://github.com/NJU-PCALab/AddSR获得。
{"title":"AddSR: Accelerating diffusion-based blind super-resolution with adversarial diffusion distillation","authors":"Ying Tai ,&nbsp;Rui Xie ,&nbsp;Chen Zhao ,&nbsp;Kai Zhang ,&nbsp;Zhenyu Zhang ,&nbsp;Jun Zhou ,&nbsp;Jian Yang","doi":"10.1016/j.patcog.2025.113012","DOIUrl":"10.1016/j.patcog.2025.113012","url":null,"abstract":"<div><div>Blind super-resolution methods based on Stable Diffusion (SD) demonstrate impressive generative capabilities in reconstructing clear, high-resolution (HR) images with intricate details from low-resolution (LR) inputs. However, their practical applicability is often limited by poor efficiency, as they typically require tens or even dozens of sampling steps. Inspired by Adversarial Diffusion Distillation (ADD), we incorporate this approach to design a highly effective and efficient blind super-resolution method. Nonetheless, two challenges arise: First, the original ADD significantly reduces result fidelity, leading to a perception-distortion imbalance. Second, SD-based methods are sensitive to the quality of the conditioning input, while LR images often have complex degradation, which further hinders effectiveness. To address these issues, we introduce a Timestep-Adaptive ADD (TA-ADD) to mitigate the perception-distortion imbalance caused by the original ADD. Furthermore, we propose a prediction-based self-refinement strategy to estimate HR, which allows for the provision of more high-frequency information without the need for additional modules. Extensive experiments show that our method, AddSR, generates superior restoration results while being significantly faster than previous SD-based state-of-the-art models (e.g., 7 ×  faster than SeeSR). Code is available at <span><span>https://github.com/NJU-PCALab/AddSR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113012"},"PeriodicalIF":7.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145915222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient breast cancer segmentation via Brownian Bridge diffusion with semantic fusion strategy 基于语义融合策略的布朗桥扩散乳腺癌有效分割
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 DOI: 10.1016/j.patcog.2025.112993
Feiyan Feng , Tianyu Liu , Fulin Zheng , Yanshen Sun , Hong Wang
Accurate segmentation and early diagnosis of breast cancer are crucial for reducing the high mortality rate. However, accurately distinguishing lesion areas from similar tissues is challenging. Existing diffusion models approach image segmentation as a conditional generation process, but (1) are severely affected by domain gaps when recovering tumor regions from noise, and (2) overlook the semantic gap between image and noise space, causing the loss of critical information during fusion. In response to these two issues, this paper presents a natural alternative based on the Brownian Bridge diffusion paradigm: BBDiffSF (Efficient Breast Cancer Segmentation via Brownian Bridge Diffusion with Semantic Fusion Strategy), aimed at enhancing breast cancer segmentation. Specifically, (1) we replace traditional diffusion models with the Brownian Bridge diffusion model, discarding the use of Gaussian noise as the initial condition. This modification effectively mitigates cross-domain discrepancies that may arise when tumors are generated directly from Gaussian noise. Additionally, (2) we retain medical images as supplementary input and introduce two novel components: the Semantic Correlation Mapping Attention (SCMA) module and the State Space Fusion (SSF) module. These modules effectively integrate semantic and noise spaces, allowing BBDiffSF to accurately incorporate semantic details during denoising and generate accurate tumor segmentation maps. Extensive experiments have demonstrated that our model outperforms current state-of-the-art methods while achieving segmentation with fewer inference steps. To the best of our knowledge, we are the first to directly apply the Brownian Bridge diffusion model to medical image segmentation.
乳腺癌的准确分割和早期诊断对于降低高死亡率至关重要。然而,准确区分病变区域与类似组织是具有挑战性的。现有的扩散模型将图像分割作为一个条件生成过程,但(1)在从噪声中恢复肿瘤区域时受到域间隙的严重影响;(2)忽略了图像和噪声空间之间的语义间隙,导致融合过程中关键信息的丢失。针对这两个问题,本文提出了一种基于布朗桥扩散范式的自然替代方案:BBDiffSF (Efficient Breast Cancer Segmentation via Brownian Bridge diffusion with Semantic Fusion Strategy),旨在增强乳腺癌分割。具体来说,(1)我们用布朗桥扩散模型取代传统的扩散模型,放弃使用高斯噪声作为初始条件。这种修改有效地减轻了当肿瘤直接由高斯噪声产生时可能出现的跨域差异。此外,(2)保留医学图像作为补充输入,并引入语义相关映射注意(SCMA)模块和状态空间融合(SSF)模块。这些模块有效地整合了语义空间和噪声空间,使得BBDiffSF能够在去噪过程中准确地融合语义细节,生成准确的肿瘤分割图。大量的实验表明,我们的模型在以更少的推理步骤实现分割的同时,优于当前最先进的方法。据我们所知,我们是第一个直接将布朗桥扩散模型应用于医学图像分割的人。
{"title":"Efficient breast cancer segmentation via Brownian Bridge diffusion with semantic fusion strategy","authors":"Feiyan Feng ,&nbsp;Tianyu Liu ,&nbsp;Fulin Zheng ,&nbsp;Yanshen Sun ,&nbsp;Hong Wang","doi":"10.1016/j.patcog.2025.112993","DOIUrl":"10.1016/j.patcog.2025.112993","url":null,"abstract":"<div><div>Accurate segmentation and early diagnosis of breast cancer are crucial for reducing the high mortality rate. However, accurately distinguishing lesion areas from similar tissues is challenging. Existing diffusion models approach image segmentation as a conditional generation process, but (1) are severely affected by domain gaps when recovering tumor regions from noise, and (2) overlook the semantic gap between image and noise space, causing the loss of critical information during fusion. In response to these two issues, this paper presents a natural alternative based on the Brownian Bridge diffusion paradigm: <em>BBDiffSF</em> (Efficient Breast Cancer Segmentation via <u>B</u>rownian <u>B</u>ridge <u>Diff</u>usion with <u>S</u>emantic <u>F</u>usion Strategy), aimed at enhancing breast cancer segmentation. Specifically, (1) we replace traditional diffusion models with the Brownian Bridge diffusion model, discarding the use of Gaussian noise as the initial condition. This modification effectively mitigates cross-domain discrepancies that may arise when tumors are generated directly from Gaussian noise. Additionally, (2) we retain medical images as supplementary input and introduce two novel components: the Semantic Correlation Mapping Attention (SCMA) module and the State Space Fusion (SSF) module. These modules effectively integrate semantic and noise spaces, allowing BBDiffSF to accurately incorporate semantic details during denoising and generate accurate tumor segmentation maps. Extensive experiments have demonstrated that our model outperforms current state-of-the-art methods while achieving segmentation with fewer inference steps. To the best of our knowledge, we are the first to directly apply the Brownian Bridge diffusion model to medical image segmentation.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"174 ","pages":"Article 112993"},"PeriodicalIF":7.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prompt-guided selective frequency network for real-world scene text image super-Resolution 现实场景文本图像超分辨率的提示引导选择频率网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 DOI: 10.1016/j.patcog.2025.112974
Xiang Yan , Tianqi Shan , Hanlin Qin , Naveed Akhtar , Yu Liu , Hossein Rahmani , Ajmal Mian
Real-world scene text image super-resolution is challenging due to complex writing strokes, random text distribution, and diverse scene degradations. Existing text super-resolution methods focus on pure text images or fixed-size single-line text, which limits their practical utility. To address that, we propose a Prompt-Guided Selective Frequency super-resolution Network (PGSFNet). Our unique bicephalous neural model comprises a super-resolution branch and a prompt guidance branch. The latter specifically helps in leveraging text content-aware information priors. To that end, we propose a Text Information Enhancement module. To exploit selective frequency information present in the image, PGSFNet employs a proposed Adaptive Frequency Modulator fused with multi-attention structures. Considering the criticality of text edges in our task, we also propose a tailored text edge perception loss. Extensive experiments on the standard open real-world scene text image datasets demonstrate remarkable performance of our method, achieving up to 8.75% PNSR gain for  × 2 and 2.28% SSIM gain for  × 4 super-resolution on the Real-CE dataset. Our code will be made public at https://github.com/holastq/PGSFNet.
由于复杂的书写笔画、随机的文本分布和不同的场景退化,现实场景文本图像的超分辨率具有挑战性。现有的文本超分辨率方法主要针对纯文本图像或固定大小的单行文本,限制了其实际应用。为了解决这个问题,我们提出了一种提示引导选择频率超分辨率网络(PGSFNet)。我们独特的双头神经模型包括一个超分辨分支和一个提示引导分支。后者特别有助于利用文本内容感知的先验信息。为此,我们提出了一个文本信息增强模块。为了利用图像中存在的选择性频率信息,PGSFNet采用了一种融合多注意结构的自适应调频器。考虑到文本边缘在我们任务中的重要性,我们还提出了一种定制的文本边缘感知损失。在标准开放现实场景文本图像数据集上的大量实验证明了我们的方法的卓越性能,在Real-CE数据集上, × 2的超分辨率获得了8.75%的PNSR增益, × 4的超分辨率获得了2.28%的SSIM增益。我们的代码将在https://github.com/holastq/PGSFNet上公开。
{"title":"Prompt-guided selective frequency network for real-world scene text image super-Resolution","authors":"Xiang Yan ,&nbsp;Tianqi Shan ,&nbsp;Hanlin Qin ,&nbsp;Naveed Akhtar ,&nbsp;Yu Liu ,&nbsp;Hossein Rahmani ,&nbsp;Ajmal Mian","doi":"10.1016/j.patcog.2025.112974","DOIUrl":"10.1016/j.patcog.2025.112974","url":null,"abstract":"<div><div>Real-world scene text image super-resolution is challenging due to complex writing strokes, random text distribution, and diverse scene degradations. Existing text super-resolution methods focus on pure text images or fixed-size single-line text, which limits their practical utility. To address that, we propose a Prompt-Guided Selective Frequency super-resolution Network (PGSFNet). Our unique bicephalous neural model comprises a super-resolution branch and a prompt guidance branch. The latter specifically helps in leveraging text content-aware information priors. To that end, we propose a Text Information Enhancement module. To exploit selective frequency information present in the image, PGSFNet employs a proposed Adaptive Frequency Modulator fused with multi-attention structures. Considering the criticality of text edges in our task, we also propose a tailored text edge perception loss. Extensive experiments on the standard open real-world scene text image datasets demonstrate remarkable performance of our method, achieving up to 8.75% PNSR gain for  × 2 and 2.28% SSIM gain for  × 4 super-resolution on the Real-CE dataset. Our code will be made public at <span><span>https://github.com/holastq/PGSFNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 112974"},"PeriodicalIF":7.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging CV to haptic processing: Cross tactile-visual mapping based on shared information 利用CV进行触觉处理:基于共享信息的交叉触觉-视觉映射
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-31 DOI: 10.1016/j.patcog.2025.113016
Jin Chen , Ying Fang , Yiwen Xu , Qian Liu , Tiesong Zhao
As Artificial Intelligence (AI) transitions from algorithms to embodied intelligence applications, haptic information, including tactile data, plays a crucial role. However, existing AI algorithms rarely process this type of information, posing challenges for related applications. Therefore, we propose a method that leverages generative models to establish a mapping between visual and tactile modalities, facilitating haptic information processing based on Computer Vision (CV). Specifically, we employ a bidirectional network consisting of two autoencoders, with an information theory constraint applied between them to ensure the similarity of latent space features, thereby ensuring cross-modal consistency. Experiments show that this mapping combined with CV analysis outperforms pure tactile recognition, improving tactile data classification accuracy to 99%. Subjective evaluations of the generated quality reflect that our method also aids in perceiving tactile information without specialized haptic tools. Moreover, this work also provides a visual-to-tactile mapping method, which offers a potential approach for generating tactile information in VR and other scenarios.
随着人工智能(AI)从算法向具身智能应用的转变,包括触觉数据在内的触觉信息起着至关重要的作用。然而,现有的人工智能算法很少处理这类信息,给相关应用带来了挑战。因此,我们提出了一种利用生成模型建立视觉和触觉模式之间映射的方法,促进基于计算机视觉(CV)的触觉信息处理。具体而言,我们采用了由两个自编码器组成的双向网络,并在它们之间应用信息论约束来确保潜在空间特征的相似性,从而确保跨模态一致性。实验表明,该映射结合CV分析优于单纯的触觉识别,将触觉数据分类准确率提高到99%。对生成的质量的主观评价反映出我们的方法也有助于在没有专门的触觉工具的情况下感知触觉信息。此外,本工作还提供了一种视觉到触觉的映射方法,为在VR和其他场景中生成触觉信息提供了一种潜在的方法。
{"title":"Leveraging CV to haptic processing: Cross tactile-visual mapping based on shared information","authors":"Jin Chen ,&nbsp;Ying Fang ,&nbsp;Yiwen Xu ,&nbsp;Qian Liu ,&nbsp;Tiesong Zhao","doi":"10.1016/j.patcog.2025.113016","DOIUrl":"10.1016/j.patcog.2025.113016","url":null,"abstract":"<div><div>As Artificial Intelligence (AI) transitions from algorithms to embodied intelligence applications, haptic information, including tactile data, plays a crucial role. However, existing AI algorithms rarely process this type of information, posing challenges for related applications. Therefore, we propose a method that leverages generative models to establish a mapping between visual and tactile modalities, facilitating haptic information processing based on Computer Vision (CV). Specifically, we employ a bidirectional network consisting of two autoencoders, with an information theory constraint applied between them to ensure the similarity of latent space features, thereby ensuring cross-modal consistency. Experiments show that this mapping combined with CV analysis outperforms pure tactile recognition, improving tactile data classification accuracy to 99%. Subjective evaluations of the generated quality reflect that our method also aids in perceiving tactile information without specialized haptic tools. Moreover, this work also provides a visual-to-tactile mapping method, which offers a potential approach for generating tactile information in VR and other scenarios.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"174 ","pages":"Article 113016"},"PeriodicalIF":7.6,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1