首页 > 最新文献

Pattern Recognition最新文献

英文 中文
Prioritized scanning: Combining spatial information multiple instance learning for computational pathology 优先扫描:结合空间信息多实例学习的计算病理学
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.patcog.2026.113151
Yuqi Zhang , Jiakai Wang , Baoyu Liang , Yuancheng Yang , Siyang Wu , Chao Tong
Multiple instance learning (MIL) has emerged as a reliable paradigm that has propelled the integration of computational pathology (CPath) into clinical histopathology. However, despite significant advancements, current MIL approaches continue to face challenges due to inadequate spatial information representation resulting from the disorder of the original whole slide images (WSIs). To address this limitation, we first demonstrate the importance of prioritized scanning within the structured state space models (SSM). We introduce a MIL framework that incorporates spatial information, termed Prioritized Scanning MIL (PSMIL). PSMIL primarily comprises two branches and a fusion block. The first branch, known as the spatial branch, incorporates potential spatial information into the patch sequence using the original 2D positions and employs SSM to model the spatial features of the WSI. The second branch, referred to as the cross-spatial branch, utilizes a significance scoring block along with SSM to harness feature relationships among similar instances across spatial locations. Finally, a lightweight feature fusion block integrates the outputs of both branches, facilitating more comprehensive feature utilization. Extensive experiments on 5 popular datasets and 3 downstream tasks strongly demonstrate that PSMIL surpasses the state-of-the-art MIL methods significantly, up to 5.26% ACC improvements for cancer sub-typing. Our code is available at https://github.com/YuqiZhang-Buaa/PSMIL.
多实例学习(MIL)已经成为一种可靠的范式,它推动了计算病理学(CPath)与临床组织病理学的整合。然而,尽管取得了重大进展,目前的MIL方法仍然面临着挑战,因为原始整个幻灯片图像(wsi)的无序导致空间信息表示不足。为了解决这一限制,我们首先展示了在结构化状态空间模型(SSM)中优先扫描的重要性。我们引入了一个包含空间信息的MIL框架,称为优先扫描MIL (PSMIL)。PSMIL主要由两个分支和一个融合块组成。第一个分支称为空间分支,利用原始二维位置将潜在的空间信息整合到patch序列中,并使用SSM对WSI的空间特征进行建模。第二个分支称为跨空间分支,它利用显著性评分块和SSM来利用跨空间位置的类似实例之间的特征关系。最后,一个轻量级的特征融合块集成了两个分支的输出,便于更全面地利用特征。在5个流行数据集和3个下游任务上进行的大量实验表明,PSMIL显著优于最先进的MIL方法,在癌症亚型分型方面的ACC提高高达5.26%。我们的代码可在https://github.com/YuqiZhang-Buaa/PSMIL上获得。
{"title":"Prioritized scanning: Combining spatial information multiple instance learning for computational pathology","authors":"Yuqi Zhang ,&nbsp;Jiakai Wang ,&nbsp;Baoyu Liang ,&nbsp;Yuancheng Yang ,&nbsp;Siyang Wu ,&nbsp;Chao Tong","doi":"10.1016/j.patcog.2026.113151","DOIUrl":"10.1016/j.patcog.2026.113151","url":null,"abstract":"<div><div>Multiple instance learning (MIL) has emerged as a reliable paradigm that has propelled the integration of computational pathology (CPath) into clinical histopathology. However, despite significant advancements, current MIL approaches continue to face challenges due to inadequate spatial information representation resulting from the disorder of the original whole slide images (WSIs). To address this limitation, we first demonstrate the importance of prioritized scanning within the structured state space models (SSM). We introduce a MIL framework that incorporates spatial information, termed <strong>Prioritized Scanning MIL (PSMIL)</strong>. PSMIL primarily comprises two branches and a fusion block. The first branch, known as the spatial branch, incorporates potential spatial information into the patch sequence using the original 2D positions and employs SSM to model the spatial features of the WSI. The second branch, referred to as the cross-spatial branch, utilizes a significance scoring block along with SSM to harness feature relationships among similar instances across spatial locations. Finally, a lightweight feature fusion block integrates the outputs of both branches, facilitating more comprehensive feature utilization. Extensive experiments on 5 popular datasets and 3 downstream tasks strongly demonstrate that PSMIL surpasses the state-of-the-art MIL methods significantly, up to 5.26% ACC improvements for cancer sub-typing. Our code is available at <span><span>https://github.com/YuqiZhang-Buaa/PSMIL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113151"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Audio-visual perceptual quality measurement via multi-perspective spatio-temporal EEG analysis 基于多视角时空脑电图分析的视听感知质量测量
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.patcog.2026.113156
Shuzhan Hu , Mingyu Li , Yang Liu , Weiwei Jiang , Bingrui Geng , Wei Zhong , Long Ye
In human-centered communication systems, establishing human perception-aligned audio-visual quality assessment methods is crucial for enhancing multimedia system performance and service quality. However, conventional subjective evaluation methods based on user ratings are susceptible to biases induced by high-level cognitive processes. To address this limitation, we propose an electroencephalography (EEG) feature fusion approach to establish correlations between audio-visual distortions and perceptual experiences. Specifically, we construct an audio-visual degradation-EEG dataset by recording neural responses from subjects exposed to progressively degraded stimuli. Leveraging this dataset, we extract event-related potential (ERP) features to quantify variations in subjects’ perception of audio-visual quality, demonstrating the feasibility of EEG-based perceptual experience assessment. Capitalizing on EEG’s sensitivity to dynamic multimodal perceptual changes, we develop a multi-perspective feature fusion framework, incorporating a spatio-temporal feature fusion architecture and a diffusion-driven EEG augmentation strategy. This framework enables the extraction of experience-related features from single-trial EEG signals, establishing an EEG-based classifier to detect whether distortions induce perceptual experience alterations. Experimental results validate that EEG signals effectively reflect perception changes induced by quality degradation, while the proposed model achieves efficient and dynamic detection of perception alterations from single-trial EEG data.
在以人为中心的通信系统中,建立符合人的感知的视听质量评价方法是提高多媒体系统性能和服务质量的关键。然而,传统的基于用户评分的主观评价方法容易受到高层次认知过程的影响。为了解决这一限制,我们提出了一种脑电图(EEG)特征融合方法来建立视听扭曲和感知体验之间的相关性。具体来说,我们通过记录暴露于逐渐退化的刺激的受试者的神经反应,构建了一个视听退化-脑电图数据集。利用该数据集,我们提取事件相关电位(ERP)特征来量化受试者对视听质量感知的变化,证明了基于脑电图的感知体验评估的可行性。利用脑电对动态多模态感知变化的敏感性,我们开发了一个多视角特征融合框架,将时空特征融合架构和扩散驱动的脑电增强策略相结合。该框架能够从单次脑电图信号中提取与经验相关的特征,建立基于脑电图的分类器来检测扭曲是否会引起感知经验的改变。实验结果表明,脑电信号能够有效地反映质量退化引起的感知变化,该模型能够实现对单次脑电信号感知变化的高效、动态检测。
{"title":"Audio-visual perceptual quality measurement via multi-perspective spatio-temporal EEG analysis","authors":"Shuzhan Hu ,&nbsp;Mingyu Li ,&nbsp;Yang Liu ,&nbsp;Weiwei Jiang ,&nbsp;Bingrui Geng ,&nbsp;Wei Zhong ,&nbsp;Long Ye","doi":"10.1016/j.patcog.2026.113156","DOIUrl":"10.1016/j.patcog.2026.113156","url":null,"abstract":"<div><div>In human-centered communication systems, establishing human perception-aligned audio-visual quality assessment methods is crucial for enhancing multimedia system performance and service quality. However, conventional subjective evaluation methods based on user ratings are susceptible to biases induced by high-level cognitive processes. To address this limitation, we propose an electroencephalography (EEG) feature fusion approach to establish correlations between audio-visual distortions and perceptual experiences. Specifically, we construct an audio-visual degradation-EEG dataset by recording neural responses from subjects exposed to progressively degraded stimuli. Leveraging this dataset, we extract event-related potential (ERP) features to quantify variations in subjects’ perception of audio-visual quality, demonstrating the feasibility of EEG-based perceptual experience assessment. Capitalizing on EEG’s sensitivity to dynamic multimodal perceptual changes, we develop a multi-perspective feature fusion framework, incorporating a spatio-temporal feature fusion architecture and a diffusion-driven EEG augmentation strategy. This framework enables the extraction of experience-related features from single-trial EEG signals, establishing an EEG-based classifier to detect whether distortions induce perceptual experience alterations. Experimental results validate that EEG signals effectively reflect perception changes induced by quality degradation, while the proposed model achieves efficient and dynamic detection of perception alterations from single-trial EEG data.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113156"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative model-based mixed-semantic enhancement for transductive zero-shot learning 基于生成模型的混合语义增强转导零采样学习
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.patcog.2026.113124
Huaizhou Qi , Yang Liu , Jungong Han , Lei Zhang
Zero-shot learning (ZSL) addresses the critical challenge of recognizing and classifying instances from categories not seen during training. Although generative model-based approaches have achieved notable success in ZSL, their predominant reliance on forward generation strategies coupled with excessive dependence on auxiliary information hampers model generalization and robustness. To overcome these limitations, we propose a Mixed-Semantic Enhancement framework inspired by interpolation-based feature extraction. This novel approach is designed to synthesize enriched auxiliary information through integrating authentic semantic cues, thereby refining the mapping from semantic descriptions to visual features. The enhanced feature synthesis capability enables better discrimination of ambiguous classes while preserving inter-class relationships. In addition, we establish bidirectional alignment between visual features and auxiliary information. This cross-modal interaction mechanism not only strengthens the generator’s training process through feature consistency constraints but also facilitates dynamic information exchange between modalities. Extensive experiments in a transductive setting across four benchmark datasets demonstrate significant performance gains, highlighting the robustness and effectiveness of our approach in advancing generative ZSL models.
零射击学习(ZSL)解决了从训练中未见的类别中识别和分类实例的关键挑战。尽管基于生成模型的方法在ZSL中取得了显著的成功,但它们主要依赖前向生成策略,加上过度依赖辅助信息,阻碍了模型的泛化和鲁棒性。为了克服这些限制,我们提出了一个基于插值特征提取的混合语义增强框架。该方法通过整合真实的语义线索来合成丰富的辅助信息,从而完善从语义描述到视觉特征的映射。增强的特征合成功能可以在保留类间关系的同时更好地区分歧义类。此外,我们建立了视觉特征和辅助信息之间的双向对齐。这种跨模态交互机制不仅通过特征一致性约束加强了生成器的训练过程,而且促进了模态之间的动态信息交换。在四个基准数据集的转换设置中进行的广泛实验证明了显著的性能提升,突出了我们的方法在推进生成式ZSL模型方面的鲁棒性和有效性。
{"title":"Generative model-based mixed-semantic enhancement for transductive zero-shot learning","authors":"Huaizhou Qi ,&nbsp;Yang Liu ,&nbsp;Jungong Han ,&nbsp;Lei Zhang","doi":"10.1016/j.patcog.2026.113124","DOIUrl":"10.1016/j.patcog.2026.113124","url":null,"abstract":"<div><div>Zero-shot learning (ZSL) addresses the critical challenge of recognizing and classifying instances from categories not seen during training. Although generative model-based approaches have achieved notable success in ZSL, their predominant reliance on forward generation strategies coupled with excessive dependence on auxiliary information hampers model generalization and robustness. To overcome these limitations, we propose a Mixed-Semantic Enhancement framework inspired by interpolation-based feature extraction. This novel approach is designed to synthesize enriched auxiliary information through integrating authentic semantic cues, thereby refining the mapping from semantic descriptions to visual features. The enhanced feature synthesis capability enables better discrimination of ambiguous classes while preserving inter-class relationships. In addition, we establish bidirectional alignment between visual features and auxiliary information. This cross-modal interaction mechanism not only strengthens the generator’s training process through feature consistency constraints but also facilitates dynamic information exchange between modalities. Extensive experiments in a transductive setting across four benchmark datasets demonstrate significant performance gains, highlighting the robustness and effectiveness of our approach in advancing generative ZSL models.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113124"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning gated experts for segment anything in the wild 学习门控专家在野外分割任何东西
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.patcog.2026.113064
Yizhen Guo , Hang Guo , Tao Dai , Zhi Wang , Bin Chen , Shu-Tao Xia
Segment anything model (SAM) and its variants have recently shown promising performance as foundation models. However, existing SAM-based models can only handle scenarios seen during training, and usually suffer unstable performance when transferring to real-world unseen data, such as low-light, rainy, or blurred images, which is crucial for applications such as autopilot. Therefore, adapting SAM-based models for real-world degradation while not impairing their original ability remains an open challenge. In this work, we propose a novel gated Mixture-of-Experts (MoE) structure, called RouGE, to improve the robustness of SAM-based models. Specifically, RouGE uses multiple lightweight probability gates to decompose complex real-world image conditions and judge whether the feature needs to be adjusted, as well as to what extent the adjustment needs to be done, then handles them differently with a set of low-rank experts. During the inference stage, RouGE processes input images in a completely blind manner, thus improving the model’s performance in real-world scenarios. Extensive experiments demonstrate that RouGE consistently achieves state-of-the-art results on both degraded and clean images compared with other methods while tuning only 1.5% of parameters. Code is available at https://github.com/Guo-Yizhen/RouGE.
分段任意模型(SAM)及其变体最近作为基础模型显示出了良好的性能。然而,现有的基于sam的模型只能处理训练期间看到的场景,并且在传输到现实世界中看不见的数据时,通常会出现不稳定的性能,例如低光,下雨或模糊的图像,这对于自动驾驶等应用至关重要。因此,在不损害其原始能力的情况下,调整基于sam的模型以适应现实世界的退化仍然是一个开放的挑战。在这项工作中,我们提出了一种新的门控混合专家(MoE)结构,称为RouGE,以提高基于sam的模型的鲁棒性。具体来说,RouGE使用多个轻量级概率门来分解复杂的真实世界图像条件,并判断特征是否需要调整,以及需要调整到什么程度,然后用一组低秩专家进行不同的处理。在推理阶段,RouGE以完全盲的方式处理输入图像,从而提高了模型在真实场景中的性能。大量的实验表明,与其他方法相比,RouGE在只调整1.5%的参数的情况下,始终如一地在退化和干净的图像上获得最先进的结果。代码可从https://github.com/Guo-Yizhen/RouGE获得。
{"title":"Learning gated experts for segment anything in the wild","authors":"Yizhen Guo ,&nbsp;Hang Guo ,&nbsp;Tao Dai ,&nbsp;Zhi Wang ,&nbsp;Bin Chen ,&nbsp;Shu-Tao Xia","doi":"10.1016/j.patcog.2026.113064","DOIUrl":"10.1016/j.patcog.2026.113064","url":null,"abstract":"<div><div>Segment anything model (SAM) and its variants have recently shown promising performance as foundation models. However, existing SAM-based models can only handle scenarios seen during training, and usually suffer unstable performance when transferring to real-world unseen data, such as low-light, rainy, or blurred images, which is crucial for applications such as autopilot. Therefore, adapting SAM-based models for real-world degradation while not impairing their original ability remains an open challenge. In this work, we propose a novel gated Mixture-of-Experts (MoE) structure, called RouGE, to improve the robustness of SAM-based models. Specifically, RouGE uses multiple lightweight probability gates to decompose complex real-world image conditions and judge whether the feature needs to be adjusted, as well as to what extent the adjustment needs to be done, then handles them differently with a set of low-rank experts. During the inference stage, RouGE processes input images in a completely blind manner, thus improving the model’s performance in real-world scenarios. Extensive experiments demonstrate that RouGE consistently achieves state-of-the-art results on both degraded and clean images compared with other methods while tuning only 1.5% of parameters. Code is available at <span><span>https://github.com/Guo-Yizhen/RouGE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113064"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain generalization via domain uncertainty shrinkage 通过域不确定性收缩进行域泛化
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.patcog.2026.113118
Jun-Zheng Chu , Bin Pan , Tian-Yang Shi , Zhen-Wei Shi
Ensuring model robustness against distributional shifts still presents a significant challenge in many machine learning applications. To address this issue, a wide range of domain generalization (DG) methods have been developed. However, these approaches mainly focus on invariant representations by leveraging multiple source domain data, which ignore the uncertainty presented from different domains. In this paper, we establish a novel DG framework in form of evidential deep learning (EDL-DG). To reach DG objective under finite given domains, we propose a new Domain Uncertainty Shrinkage (DUS) regularization scheme on the output Dirichlet distribution parameters, which achieves better generalization across unseen domains without introducing additional structures. Theoretically, we analyze the convergence of EDL-DG, and provide a generalization bound in the framework of PAC-Bayesian learning. We show that our proposed method reduce the PAC-Bayesian bound under certain conditions, and thus achieve better generalization across unseen domains. In our experiments, we validate the effectiveness our proposed method on DomainBed benchmark in multiple real-world datasets.
在许多机器学习应用中,确保模型对分布变化的鲁棒性仍然是一个重大挑战。为了解决这个问题,已经开发了各种领域泛化(DG)方法。然而,这些方法主要侧重于利用多源领域数据的不变表示,而忽略了来自不同领域的不确定性。在本文中,我们以证据深度学习(EDL-DG)的形式建立了一个新的DG框架。为了在有限给定域下达到DG目标,我们对输出的Dirichlet分布参数提出了一种新的域不确定性收缩(DUS)正则化方案,该方案在不引入额外结构的情况下实现了更好的跨未知域的泛化。从理论上分析了EDL-DG的收敛性,并给出了PAC-Bayesian学习框架下的泛化界。在一定条件下,我们的方法减小了PAC-Bayesian边界,从而实现了更好的跨未知域的泛化。在我们的实验中,我们在多个真实数据集的DomainBed基准上验证了我们提出的方法的有效性。
{"title":"Domain generalization via domain uncertainty shrinkage","authors":"Jun-Zheng Chu ,&nbsp;Bin Pan ,&nbsp;Tian-Yang Shi ,&nbsp;Zhen-Wei Shi","doi":"10.1016/j.patcog.2026.113118","DOIUrl":"10.1016/j.patcog.2026.113118","url":null,"abstract":"<div><div>Ensuring model robustness against distributional shifts still presents a significant challenge in many machine learning applications. To address this issue, a wide range of domain generalization (DG) methods have been developed. However, these approaches mainly focus on invariant representations by leveraging multiple source domain data, which ignore the uncertainty presented from different domains. In this paper, we establish a novel DG framework in form of evidential deep learning (EDL-DG). To reach DG objective under finite given domains, we propose a new <em>Domain Uncertainty Shrinkage</em> (<strong>DUS</strong>) regularization scheme on the output Dirichlet distribution parameters, which achieves better generalization across unseen domains without introducing additional structures. Theoretically, we analyze the convergence of EDL-DG, and provide a generalization bound in the framework of PAC-Bayesian learning. We show that our proposed method reduce the PAC-Bayesian bound under certain conditions, and thus achieve better generalization across unseen domains. In our experiments, we validate the effectiveness our proposed method on DomainBed benchmark in multiple real-world datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113118"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GaitMDF: Gait recognition via motion deformation field modeling and knowledge transfer GaitMDF:基于运动变形场建模和知识转移的步态识别
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.patcog.2026.113147
Wei Huo , Ke Wang , Jun Tang , Xudong Zhou , Nian Wang
Gait recognition aims to identify target subjects across non-overlapping camera viewpoints according to their unique walking patterns. Motion representation is the core task in constructing an applicable gait recognition system, which is required to characterize fine-grained dynamic posture changes. In current gait recognition research, multi-scale temporal modeling in conjunction with spatial representation learning is the mainstream line. However, such ideas describe walking patterns in an implicit manner, which often leads to missing important motion information. To address these challenges, we model continuous human body movement as motion deformation field sequences with more physical interpretability. And the learned deformation fields are seamlessly integrated into the proposed gait recognition framework GaitMDF. Specifically, we first learn the multi-scale deformation fields from silhouettes using the designed Deformation Field Generation Network (DFGNet) in a self-supervised manner. Then, we develop two powerful feature extraction network, i.e., Silhouette Feature Extractor (SFE) and Deformation Field Feature Extractor (DFFE), for the silhouette and deformation field sequences to obtain discriminative spatial-temporal representations. Furthermore, a two-stage knowledge distillation strategy is developed to transfer the motion features learned from DFFE to the mimetic deformation field features. By applying this strategy, we can not only preserve the motion information of the deformation fields but also significantly reduce computational cost in inference with no need for DFGNet and DFFE. Finally, the silhouette and the mimetic deformation field features are fused for identity recognition. Extensive experiments on three popular gait datasets demonstrate the effectiveness and superiority of the proposed method.
步态识别的目的是根据目标对象独特的行走模式,在非重叠的摄像机视点上识别目标对象。运动表征是构建一个适用的步态识别系统的核心任务,它需要对细粒度的动态姿态变化进行表征。在当前的步态识别研究中,结合空间表征学习的多尺度时间建模是主流方向。然而,这些想法以一种隐含的方式描述了行走模式,这通常会导致丢失重要的运动信息。为了解决这些挑战,我们将连续的人体运动建模为具有更多物理可解释性的运动变形场序列。将学习到的变形场无缝地集成到步态识别框架GaitMDF中。具体而言,我们首先使用设计的变形场生成网络(DFGNet)以自监督的方式从轮廓中学习多尺度变形场。然后,我们开发了两个功能强大的特征提取网络,即轮廓特征提取器(SFE)和变形场特征提取器(DFFE),以获得轮廓和变形场序列的判别性时空表征。在此基础上,提出了一种两阶段知识精馏策略,将从DFFE学习到的运动特征转化为模拟变形场特征。采用该策略不仅可以保留变形场的运动信息,而且可以在不需要DFGNet和DFFE的情况下显著降低推理的计算成本。最后,将轮廓特征与拟变形场特征融合进行身份识别。在三种常用的步态数据集上进行的大量实验证明了该方法的有效性和优越性。
{"title":"GaitMDF: Gait recognition via motion deformation field modeling and knowledge transfer","authors":"Wei Huo ,&nbsp;Ke Wang ,&nbsp;Jun Tang ,&nbsp;Xudong Zhou ,&nbsp;Nian Wang","doi":"10.1016/j.patcog.2026.113147","DOIUrl":"10.1016/j.patcog.2026.113147","url":null,"abstract":"<div><div>Gait recognition aims to identify target subjects across non-overlapping camera viewpoints according to their unique walking patterns. Motion representation is the core task in constructing an applicable gait recognition system, which is required to characterize fine-grained dynamic posture changes. In current gait recognition research, multi-scale temporal modeling in conjunction with spatial representation learning is the mainstream line. However, such ideas describe walking patterns in an implicit manner, which often leads to missing important motion information. To address these challenges, we model continuous human body movement as motion deformation field sequences with more physical interpretability. And the learned deformation fields are seamlessly integrated into the proposed gait recognition framework GaitMDF. Specifically, we first learn the multi-scale deformation fields from silhouettes using the designed Deformation Field Generation Network (DFGNet) in a self-supervised manner. Then, we develop two powerful feature extraction network, i.e., Silhouette Feature Extractor (SFE) and Deformation Field Feature Extractor (DFFE), for the silhouette and deformation field sequences to obtain discriminative spatial-temporal representations. Furthermore, a two-stage knowledge distillation strategy is developed to transfer the motion features learned from DFFE to the mimetic deformation field features. By applying this strategy, we can not only preserve the motion information of the deformation fields but also significantly reduce computational cost in inference with no need for DFGNet and DFFE. Finally, the silhouette and the mimetic deformation field features are fused for identity recognition. Extensive experiments on three popular gait datasets demonstrate the effectiveness and superiority of the proposed method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113147"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ParkinsonNet: A unified end-to-end framework for estimating Parkinson’s disease motor symptom severity ParkinsonNet:一个统一的端到端评估帕金森病运动症状严重程度的框架
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.patcog.2026.113109
Yande Li , Fang Ba , Minglun Gong , Li Cheng
Parkinson’s Disease (PD) is a progressive neurodegenerative disorder characterized by worsening motor symptoms such as bradykinesia, imbalance, tremor, rigidity, and gait disturbances. Clinician assessments are often time-consuming and costly, and the limited availability of specialists, along with patient mobility issues, complicates frequent evaluations. In this paper, we propose a novel end-to-end network to automatically qualify the severity of motor symptoms in PD, referred to as ParkinsonNet. Unlike most existing methods that focus on isolated tests, ParkinsonNet provides a unified learning framework that is evaluated across multiple PD motor symptoms, as demonstrated on finger tapping and gait. Specifically, to accurately perceive the gradual progression of motor symptoms throughout an entire test cycle (e.g., decrementing amplitude), a temporal self-attention enhancement module is designed by combining temporal compression with long-term temporal dependency modeling. To ease the issues of class imbalance and limited datasets, a similarity matching module is proposed that transforms the conventional classification or regression task into a similarity matching problem, matching the skeleton feature with its most similar texture feature. Additionally, a vector quantization module is incorporated to encode spatiotemporal features into a discrete-valued space, compressing and abstracting motion representations while retaining critical information for more accurate classification. Extensive experiments on two newly identified benchmark datasets demonstrate the superiority of our ParkinsonNet and set new benchmark performance for future algorithm development and evaluation. Our code will be released at this https URL.
帕金森病(PD)是一种进行性神经退行性疾病,其特征是运动症状恶化,如运动迟缓、不平衡、震颤、僵硬和步态障碍。临床医生的评估通常既耗时又昂贵,而且专家的可用性有限,再加上患者的流动性问题,使频繁的评估变得复杂。在本文中,我们提出了一个新的端到端网络来自动限定PD运动症状的严重程度,称为帕金森网。与大多数现有的专注于孤立测试的方法不同,ParkinsonNet提供了一个统一的学习框架,可以评估多种PD运动症状,如手指敲击和步态。具体而言,为了准确感知整个测试周期中运动症状的逐渐进展(例如,递减幅度),我们将时间压缩与长期时间依赖建模相结合,设计了一个时间自注意增强模块。为了缓解类不平衡和数据集有限的问题,提出了相似度匹配模块,将传统的分类或回归任务转化为相似度匹配问题,将骨架特征与其最相似的纹理特征进行匹配。此外,矢量量化模块用于将时空特征编码为离散值空间,压缩和抽象运动表示,同时保留关键信息以进行更准确的分类。在两个新识别的基准数据集上进行的大量实验证明了我们的帕金森网的优越性,并为未来的算法开发和评估设定了新的基准性能。我们的代码将在这个https URL发布。
{"title":"ParkinsonNet: A unified end-to-end framework for estimating Parkinson’s disease motor symptom severity","authors":"Yande Li ,&nbsp;Fang Ba ,&nbsp;Minglun Gong ,&nbsp;Li Cheng","doi":"10.1016/j.patcog.2026.113109","DOIUrl":"10.1016/j.patcog.2026.113109","url":null,"abstract":"<div><div>Parkinson’s Disease (PD) is a progressive neurodegenerative disorder characterized by worsening motor symptoms such as bradykinesia, imbalance, tremor, rigidity, and gait disturbances. Clinician assessments are often time-consuming and costly, and the limited availability of specialists, along with patient mobility issues, complicates frequent evaluations. In this paper, we propose a novel end-to-end network to automatically qualify the severity of motor symptoms in PD, referred to as ParkinsonNet. Unlike most existing methods that focus on isolated tests, ParkinsonNet provides a unified learning framework that is evaluated across multiple PD motor symptoms, as demonstrated on finger tapping and gait. Specifically, to accurately perceive the gradual progression of motor symptoms throughout an entire test cycle (e.g., decrementing amplitude), a temporal self-attention enhancement module is designed by combining temporal compression with long-term temporal dependency modeling. To ease the issues of class imbalance and limited datasets, a similarity matching module is proposed that transforms the conventional classification or regression task into a similarity matching problem, matching the skeleton feature with its most similar texture feature. Additionally, a vector quantization module is incorporated to encode spatiotemporal features into a discrete-valued space, compressing and abstracting motion representations while retaining critical information for more accurate classification. Extensive experiments on two newly identified benchmark datasets demonstrate the superiority of our ParkinsonNet and set new benchmark performance for future algorithm development and evaluation. Our code will be released at <span><span>this https URL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113109"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards open-vocabulary semantic segmentation for remote sensing images 面向开放词汇的遥感图像语义分割
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.patcog.2026.113120
Da Zhang , Mingmin Zeng , Xuelong Li
Open-vocabulary semantic segmentation (OVSS) for remote sensing images (RSI) aims to achieve precise segmentation of arbitrary semantic categories specified within RSI. However, existing mainstream OVSS models are mostly trained on natural images and struggle to handle the rotational diversity and unique characteristics of RSI, resulting in insufficient feature representation and category discrimination capabilities. To ameliorate this challenge, we propose ROSS, an open vocabulary semantic segmentation framework that combines effective feature fusion with dedicated modeling of RSI characteristics. Specifically, ROSS employs a dual-branch image encoder (DBIE): one branch leverages multi-directional augmentation to enhance the representation of rotation-invariant features, while the other incorporates remote sensing (RS) specific knowledge via an encoder pretrained on large-scale RSI data. During feature fusion, ROSS generates cost maps from both branches and designs a spatial-class dual-level cost aggregation (SDCA) module based on spatial and category information, thereby fully integrating global spatial context and category discriminability. Finally, we introduce a RS knowledge transfer upsampling module that efficiently fuses and reconstructs multi-scale features to achieve high-resolution and fine-grained segmentation. Experiments on four open-vocabulary RS datasets demonstrate that ROSS consistently outperforms current state-of-the-art (SOTA) models. This robust performance across different training and evaluation configurations verifies its effectiveness and broad applicability.
面向遥感图像的开放词汇语义分割(OVSS)旨在对遥感图像中指定的任意语义类别进行精确分割。然而,现有的主流OVSS模型大多是在自然图像上进行训练,难以处理RSI的旋转多样性和独特性,导致特征表示和类别识别能力不足。为了改善这一挑战,我们提出了ROSS,这是一个开放的词汇语义分割框架,它结合了有效的特征融合和RSI特征的专用建模。具体来说,ROSS采用了双分支图像编码器(DBIE):一个分支利用多向增强来增强旋转不变特征的表示,而另一个分支通过对大规模RSI数据进行预训练的编码器来结合遥感(RS)特定知识。在特征融合过程中,ROSS从两个分支生成成本图,并设计了基于空间和品类信息的空间级双层成本聚合(SDCA)模块,充分集成了全局空间脉络和品类可辨别性。最后,我们引入了一个RS知识迁移上采样模块,该模块可以有效地融合和重建多尺度特征,以实现高分辨率和细粒度的分割。在四个开放词汇RS数据集上的实验表明,ROSS始终优于当前最先进的(SOTA)模型。这种跨不同训练和评估配置的鲁棒性验证了其有效性和广泛的适用性。
{"title":"Towards open-vocabulary semantic segmentation for remote sensing images","authors":"Da Zhang ,&nbsp;Mingmin Zeng ,&nbsp;Xuelong Li","doi":"10.1016/j.patcog.2026.113120","DOIUrl":"10.1016/j.patcog.2026.113120","url":null,"abstract":"<div><div>Open-vocabulary semantic segmentation (OVSS) for remote sensing images (RSI) aims to achieve precise segmentation of arbitrary semantic categories specified within RSI. However, existing mainstream OVSS models are mostly trained on natural images and struggle to handle the rotational diversity and unique characteristics of RSI, resulting in insufficient feature representation and category discrimination capabilities. To ameliorate this challenge, we propose ROSS, an open vocabulary semantic segmentation framework that combines effective feature fusion with dedicated modeling of RSI characteristics. Specifically, ROSS employs a dual-branch image encoder (DBIE): one branch leverages multi-directional augmentation to enhance the representation of rotation-invariant features, while the other incorporates remote sensing (RS) specific knowledge via an encoder pretrained on large-scale RSI data. During feature fusion, ROSS generates cost maps from both branches and designs a spatial-class dual-level cost aggregation (SDCA) module based on spatial and category information, thereby fully integrating global spatial context and category discriminability. Finally, we introduce a RS knowledge transfer upsampling module that efficiently fuses and reconstructs multi-scale features to achieve high-resolution and fine-grained segmentation. Experiments on four open-vocabulary RS datasets demonstrate that ROSS consistently outperforms current state-of-the-art (SOTA) models. This robust performance across different training and evaluation configurations verifies its effectiveness and broad applicability.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113120"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater image enhancement via degradation information extraction and guidance 基于退化信息提取和制导的水下图像增强
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.patcog.2026.113121
Fukuan Wang , Fei Li , Chaojun Cen , Zhenbo Li , Qingling Duan
Underwater image degradation, caused by wavelength dependent light attenuation, color distortion, and backscatter, significantly impairs reliable visual perception and downstream analysis. Such degradation is inherently complex and composite, as multiple types and varying degrees often coexist within a single scene, posing major challenges for effective Underwater Image Enhancement (UIE). To achieve enhancement that preserves semantic content in the presence of coexisting degradations, we propose DIE2UIE, a novel enhancement framework that explicitly models and integrates both degradation- and semantic-aware prompts. Specifically, we design a Degradation Pattern Perception Module (DPPM) that leverages prompt learning based on Contrastive Language-Image Pretraining (CLIP) to align vision-language features, enabling semantically grounded degradation modeling and pattern-specific enhancement. Complementarily, a Semantic Information Extraction Module (SIEM) recovers object- and scene-level representations through tag prediction, promoting the preservation of semantic structures essential for downstream tasks. These two information streams are jointly embedded within a Transformer-based Degradation Information Extraction (DIE) module, which serves as a unified reasoning core to adaptively guide the enhancement process. Extensive experiments on benchmark and real-world datasets demonstrate that DIE2UIE consistently outperforms state-of-the-art methods in terms of perceptual quality and task-level performance. The code is available at here.
由波长相关光衰减、颜色失真和后向散射引起的水下图像退化,严重损害了可靠的视觉感知和下游分析。这种退化本质上是复杂和复合的,因为多种类型和不同程度的退化往往共存于一个场景中,这对有效的水下图像增强(UIE)提出了重大挑战。为了实现在存在共存退化的情况下保留语义内容的增强,我们提出了DIE2UIE,这是一个新的增强框架,它明确地建模并集成了退化和语义感知提示。具体来说,我们设计了一个退化模式感知模块(DPPM),该模块利用基于对比语言-图像预训练(CLIP)的快速学习来对齐视觉语言特征,从而实现基于语义的退化建模和模式特定增强。此外,语义信息提取模块(SIEM)通过标签预测恢复对象级和场景级表示,促进对下游任务必不可少的语义结构的保存。这两个信息流被联合嵌入到一个基于变压器的退化信息提取(DIE)模块中,作为一个统一的推理核心,自适应地指导增强过程。在基准和现实世界数据集上进行的大量实验表明,DIE2UIE在感知质量和任务级性能方面始终优于最先进的方法。代码可以在这里找到。
{"title":"Underwater image enhancement via degradation information extraction and guidance","authors":"Fukuan Wang ,&nbsp;Fei Li ,&nbsp;Chaojun Cen ,&nbsp;Zhenbo Li ,&nbsp;Qingling Duan","doi":"10.1016/j.patcog.2026.113121","DOIUrl":"10.1016/j.patcog.2026.113121","url":null,"abstract":"<div><div>Underwater image degradation, caused by wavelength dependent light attenuation, color distortion, and backscatter, significantly impairs reliable visual perception and downstream analysis. Such degradation is inherently complex and composite, as multiple types and varying degrees often coexist within a single scene, posing major challenges for effective Underwater Image Enhancement (UIE). To achieve enhancement that preserves semantic content in the presence of coexisting degradations, we propose DIE2UIE, a novel enhancement framework that explicitly models and integrates both degradation- and semantic-aware prompts. Specifically, we design a Degradation Pattern Perception Module (DPPM) that leverages prompt learning based on Contrastive Language-Image Pretraining (CLIP) to align vision-language features, enabling semantically grounded degradation modeling and pattern-specific enhancement. Complementarily, a Semantic Information Extraction Module (SIEM) recovers object- and scene-level representations through tag prediction, promoting the preservation of semantic structures essential for downstream tasks. These two information streams are jointly embedded within a Transformer-based Degradation Information Extraction (DIE) module, which serves as a unified reasoning core to adaptively guide the enhancement process. Extensive experiments on benchmark and real-world datasets demonstrate that DIE2UIE consistently outperforms state-of-the-art methods in terms of perceptual quality and task-level performance. The code is available at <span><span>here</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113121"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FProtoSeg: Fine-grained prototype alignment for Weakly Supervised Semantic Segmentation of histopathology images FProtoSeg:组织病理学图像弱监督语义分割的细粒度原型对齐
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.patcog.2026.113126
Meidan Ding , Wenting Chen , Xiaoling Luo , Haiqin Zhong , Linlin Shen
Weakly Supervised Semantic Segmentation (WSSS) for histopathology tissues has significantly improved to reduce the burden of annotation through class activation maps (CAMs). Nevertheless, accurate segmentation remains challenging due to the high intra-class variability across patients and the subtle inter-class differences, as early-stage abnormal cells often resemble normal ones. Moreover, WSSS methods tend to emphasize the most discriminative features, often neglecting outlier features that are from less common or more subtle morphological variations within a class. Despite progress in recent approaches, the reliance on a coarse, one-to-many mapping hampers their capacity to capture subtle, pixel-level distinctions. Motivated by this limitation, we hypothesize that adopting a fine-grained, one-to-one alignment will yield more accurate and complete segmentation outcomes. Therefore, we propose a novel fine-grained prototype alignment framework named FProtoSeg, with structure-aware prototype modeling and text-aware prototype alignment to extract more specific features and activate more complete CAMs. Specifically, structure-aware prototype modeling captures class characteristics by employing prototypes, thereby adapting to the semantic attributes of different instances. Text-aware prototype alignment aligns visual and textual features to enhance prototype awareness, ensuring that instance feature distributions are in harmony with text features. Experimental results demonstrate that FProtoSeg achieves state-of-the-art performance, attaining a mean Intersection over Union (mIoU) of 71.21% on the BCSS-WSSS dataset and 76.64% on the LUAD-HistoSeg dataset, significantly outperforming existing methods.
通过类激活图(CAMs),对组织病理组织的弱监督语义分割(WSSS)进行了显著改进,减少了标注的负担。然而,由于患者类别内的高变异性和微妙的类别间差异,由于早期异常细胞通常与正常细胞相似,因此准确的分割仍然具有挑战性。此外,WSSS方法倾向于强调最具区别性的特征,往往忽略了类中不太常见或更微妙的形态变化的离群特征。尽管最近的方法取得了进展,但对粗糙的一对多映射的依赖阻碍了它们捕捉细微的像素级差异的能力。由于这种限制,我们假设采用细粒度的一对一对齐将产生更准确和完整的分割结果。因此,我们提出了一种新颖的细粒度原型对齐框架FProtoSeg,该框架具有结构感知原型建模和文本感知原型对齐,以提取更具体的特征并激活更完整的cam。具体来说,结构感知原型建模通过使用原型捕获类特征,从而适应不同实例的语义属性。文本感知原型对齐通过对齐视觉和文本特征来增强原型感知,确保实例特征分布与文本特征保持一致。实验结果表明,FProtoSeg达到了最先进的性能,在BCSS-WSSS数据集上实现了71.21%的平均交联(Intersection over Union, mIoU),在LUAD-HistoSeg数据集上实现了76.64%,显著优于现有方法。
{"title":"FProtoSeg: Fine-grained prototype alignment for Weakly Supervised Semantic Segmentation of histopathology images","authors":"Meidan Ding ,&nbsp;Wenting Chen ,&nbsp;Xiaoling Luo ,&nbsp;Haiqin Zhong ,&nbsp;Linlin Shen","doi":"10.1016/j.patcog.2026.113126","DOIUrl":"10.1016/j.patcog.2026.113126","url":null,"abstract":"<div><div>Weakly Supervised Semantic Segmentation (WSSS) for histopathology tissues has significantly improved to reduce the burden of annotation through class activation maps (CAMs). Nevertheless, accurate segmentation remains challenging due to the high intra-class variability across patients and the subtle inter-class differences, as early-stage abnormal cells often resemble normal ones. Moreover, WSSS methods tend to emphasize the most discriminative features, often neglecting outlier features that are from less common or more subtle morphological variations within a class. Despite progress in recent approaches, the reliance on a coarse, one-to-many mapping hampers their capacity to capture subtle, pixel-level distinctions. Motivated by this limitation, we hypothesize that adopting a fine-grained, one-to-one alignment will yield more accurate and complete segmentation outcomes. Therefore, we propose a novel fine-grained prototype alignment framework named FProtoSeg, with structure-aware prototype modeling and text-aware prototype alignment to extract more specific features and activate more complete CAMs. Specifically, structure-aware prototype modeling captures class characteristics by employing prototypes, thereby adapting to the semantic attributes of different instances. Text-aware prototype alignment aligns visual and textual features to enhance prototype awareness, ensuring that instance feature distributions are in harmony with text features. Experimental results demonstrate that FProtoSeg achieves state-of-the-art performance, attaining a mean Intersection over Union (mIoU) of 71.21% on the BCSS-WSSS dataset and 76.64% on the LUAD-HistoSeg dataset, significantly outperforming existing methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113126"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1