首页 > 最新文献

Pattern Recognition最新文献

英文 中文
Prioritized scanning: Combining spatial information multiple instance learning for computational pathology 优先扫描:结合空间信息多实例学习的计算病理学
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.patcog.2026.113151
Yuqi Zhang , Jiakai Wang , Baoyu Liang , Yuancheng Yang , Siyang Wu , Chao Tong
Multiple instance learning (MIL) has emerged as a reliable paradigm that has propelled the integration of computational pathology (CPath) into clinical histopathology. However, despite significant advancements, current MIL approaches continue to face challenges due to inadequate spatial information representation resulting from the disorder of the original whole slide images (WSIs). To address this limitation, we first demonstrate the importance of prioritized scanning within the structured state space models (SSM). We introduce a MIL framework that incorporates spatial information, termed Prioritized Scanning MIL (PSMIL). PSMIL primarily comprises two branches and a fusion block. The first branch, known as the spatial branch, incorporates potential spatial information into the patch sequence using the original 2D positions and employs SSM to model the spatial features of the WSI. The second branch, referred to as the cross-spatial branch, utilizes a significance scoring block along with SSM to harness feature relationships among similar instances across spatial locations. Finally, a lightweight feature fusion block integrates the outputs of both branches, facilitating more comprehensive feature utilization. Extensive experiments on 5 popular datasets and 3 downstream tasks strongly demonstrate that PSMIL surpasses the state-of-the-art MIL methods significantly, up to 5.26% ACC improvements for cancer sub-typing. Our code is available at https://github.com/YuqiZhang-Buaa/PSMIL.
多实例学习(MIL)已经成为一种可靠的范式,它推动了计算病理学(CPath)与临床组织病理学的整合。然而,尽管取得了重大进展,目前的MIL方法仍然面临着挑战,因为原始整个幻灯片图像(wsi)的无序导致空间信息表示不足。为了解决这一限制,我们首先展示了在结构化状态空间模型(SSM)中优先扫描的重要性。我们引入了一个包含空间信息的MIL框架,称为优先扫描MIL (PSMIL)。PSMIL主要由两个分支和一个融合块组成。第一个分支称为空间分支,利用原始二维位置将潜在的空间信息整合到patch序列中,并使用SSM对WSI的空间特征进行建模。第二个分支称为跨空间分支,它利用显著性评分块和SSM来利用跨空间位置的类似实例之间的特征关系。最后,一个轻量级的特征融合块集成了两个分支的输出,便于更全面地利用特征。在5个流行数据集和3个下游任务上进行的大量实验表明,PSMIL显著优于最先进的MIL方法,在癌症亚型分型方面的ACC提高高达5.26%。我们的代码可在https://github.com/YuqiZhang-Buaa/PSMIL上获得。
{"title":"Prioritized scanning: Combining spatial information multiple instance learning for computational pathology","authors":"Yuqi Zhang ,&nbsp;Jiakai Wang ,&nbsp;Baoyu Liang ,&nbsp;Yuancheng Yang ,&nbsp;Siyang Wu ,&nbsp;Chao Tong","doi":"10.1016/j.patcog.2026.113151","DOIUrl":"10.1016/j.patcog.2026.113151","url":null,"abstract":"<div><div>Multiple instance learning (MIL) has emerged as a reliable paradigm that has propelled the integration of computational pathology (CPath) into clinical histopathology. However, despite significant advancements, current MIL approaches continue to face challenges due to inadequate spatial information representation resulting from the disorder of the original whole slide images (WSIs). To address this limitation, we first demonstrate the importance of prioritized scanning within the structured state space models (SSM). We introduce a MIL framework that incorporates spatial information, termed <strong>Prioritized Scanning MIL (PSMIL)</strong>. PSMIL primarily comprises two branches and a fusion block. The first branch, known as the spatial branch, incorporates potential spatial information into the patch sequence using the original 2D positions and employs SSM to model the spatial features of the WSI. The second branch, referred to as the cross-spatial branch, utilizes a significance scoring block along with SSM to harness feature relationships among similar instances across spatial locations. Finally, a lightweight feature fusion block integrates the outputs of both branches, facilitating more comprehensive feature utilization. Extensive experiments on 5 popular datasets and 3 downstream tasks strongly demonstrate that PSMIL surpasses the state-of-the-art MIL methods significantly, up to 5.26% ACC improvements for cancer sub-typing. Our code is available at <span><span>https://github.com/YuqiZhang-Buaa/PSMIL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113151"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Audio-visual perceptual quality measurement via multi-perspective spatio-temporal EEG analysis 基于多视角时空脑电图分析的视听感知质量测量
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.patcog.2026.113156
Shuzhan Hu , Mingyu Li , Yang Liu , Weiwei Jiang , Bingrui Geng , Wei Zhong , Long Ye
In human-centered communication systems, establishing human perception-aligned audio-visual quality assessment methods is crucial for enhancing multimedia system performance and service quality. However, conventional subjective evaluation methods based on user ratings are susceptible to biases induced by high-level cognitive processes. To address this limitation, we propose an electroencephalography (EEG) feature fusion approach to establish correlations between audio-visual distortions and perceptual experiences. Specifically, we construct an audio-visual degradation-EEG dataset by recording neural responses from subjects exposed to progressively degraded stimuli. Leveraging this dataset, we extract event-related potential (ERP) features to quantify variations in subjects’ perception of audio-visual quality, demonstrating the feasibility of EEG-based perceptual experience assessment. Capitalizing on EEG’s sensitivity to dynamic multimodal perceptual changes, we develop a multi-perspective feature fusion framework, incorporating a spatio-temporal feature fusion architecture and a diffusion-driven EEG augmentation strategy. This framework enables the extraction of experience-related features from single-trial EEG signals, establishing an EEG-based classifier to detect whether distortions induce perceptual experience alterations. Experimental results validate that EEG signals effectively reflect perception changes induced by quality degradation, while the proposed model achieves efficient and dynamic detection of perception alterations from single-trial EEG data.
在以人为中心的通信系统中,建立符合人的感知的视听质量评价方法是提高多媒体系统性能和服务质量的关键。然而,传统的基于用户评分的主观评价方法容易受到高层次认知过程的影响。为了解决这一限制,我们提出了一种脑电图(EEG)特征融合方法来建立视听扭曲和感知体验之间的相关性。具体来说,我们通过记录暴露于逐渐退化的刺激的受试者的神经反应,构建了一个视听退化-脑电图数据集。利用该数据集,我们提取事件相关电位(ERP)特征来量化受试者对视听质量感知的变化,证明了基于脑电图的感知体验评估的可行性。利用脑电对动态多模态感知变化的敏感性,我们开发了一个多视角特征融合框架,将时空特征融合架构和扩散驱动的脑电增强策略相结合。该框架能够从单次脑电图信号中提取与经验相关的特征,建立基于脑电图的分类器来检测扭曲是否会引起感知经验的改变。实验结果表明,脑电信号能够有效地反映质量退化引起的感知变化,该模型能够实现对单次脑电信号感知变化的高效、动态检测。
{"title":"Audio-visual perceptual quality measurement via multi-perspective spatio-temporal EEG analysis","authors":"Shuzhan Hu ,&nbsp;Mingyu Li ,&nbsp;Yang Liu ,&nbsp;Weiwei Jiang ,&nbsp;Bingrui Geng ,&nbsp;Wei Zhong ,&nbsp;Long Ye","doi":"10.1016/j.patcog.2026.113156","DOIUrl":"10.1016/j.patcog.2026.113156","url":null,"abstract":"<div><div>In human-centered communication systems, establishing human perception-aligned audio-visual quality assessment methods is crucial for enhancing multimedia system performance and service quality. However, conventional subjective evaluation methods based on user ratings are susceptible to biases induced by high-level cognitive processes. To address this limitation, we propose an electroencephalography (EEG) feature fusion approach to establish correlations between audio-visual distortions and perceptual experiences. Specifically, we construct an audio-visual degradation-EEG dataset by recording neural responses from subjects exposed to progressively degraded stimuli. Leveraging this dataset, we extract event-related potential (ERP) features to quantify variations in subjects’ perception of audio-visual quality, demonstrating the feasibility of EEG-based perceptual experience assessment. Capitalizing on EEG’s sensitivity to dynamic multimodal perceptual changes, we develop a multi-perspective feature fusion framework, incorporating a spatio-temporal feature fusion architecture and a diffusion-driven EEG augmentation strategy. This framework enables the extraction of experience-related features from single-trial EEG signals, establishing an EEG-based classifier to detect whether distortions induce perceptual experience alterations. Experimental results validate that EEG signals effectively reflect perception changes induced by quality degradation, while the proposed model achieves efficient and dynamic detection of perception alterations from single-trial EEG data.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113156"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain generalization via domain uncertainty shrinkage 通过域不确定性收缩进行域泛化
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.patcog.2026.113118
Jun-Zheng Chu , Bin Pan , Tian-Yang Shi , Zhen-Wei Shi
Ensuring model robustness against distributional shifts still presents a significant challenge in many machine learning applications. To address this issue, a wide range of domain generalization (DG) methods have been developed. However, these approaches mainly focus on invariant representations by leveraging multiple source domain data, which ignore the uncertainty presented from different domains. In this paper, we establish a novel DG framework in form of evidential deep learning (EDL-DG). To reach DG objective under finite given domains, we propose a new Domain Uncertainty Shrinkage (DUS) regularization scheme on the output Dirichlet distribution parameters, which achieves better generalization across unseen domains without introducing additional structures. Theoretically, we analyze the convergence of EDL-DG, and provide a generalization bound in the framework of PAC-Bayesian learning. We show that our proposed method reduce the PAC-Bayesian bound under certain conditions, and thus achieve better generalization across unseen domains. In our experiments, we validate the effectiveness our proposed method on DomainBed benchmark in multiple real-world datasets.
在许多机器学习应用中,确保模型对分布变化的鲁棒性仍然是一个重大挑战。为了解决这个问题,已经开发了各种领域泛化(DG)方法。然而,这些方法主要侧重于利用多源领域数据的不变表示,而忽略了来自不同领域的不确定性。在本文中,我们以证据深度学习(EDL-DG)的形式建立了一个新的DG框架。为了在有限给定域下达到DG目标,我们对输出的Dirichlet分布参数提出了一种新的域不确定性收缩(DUS)正则化方案,该方案在不引入额外结构的情况下实现了更好的跨未知域的泛化。从理论上分析了EDL-DG的收敛性,并给出了PAC-Bayesian学习框架下的泛化界。在一定条件下,我们的方法减小了PAC-Bayesian边界,从而实现了更好的跨未知域的泛化。在我们的实验中,我们在多个真实数据集的DomainBed基准上验证了我们提出的方法的有效性。
{"title":"Domain generalization via domain uncertainty shrinkage","authors":"Jun-Zheng Chu ,&nbsp;Bin Pan ,&nbsp;Tian-Yang Shi ,&nbsp;Zhen-Wei Shi","doi":"10.1016/j.patcog.2026.113118","DOIUrl":"10.1016/j.patcog.2026.113118","url":null,"abstract":"<div><div>Ensuring model robustness against distributional shifts still presents a significant challenge in many machine learning applications. To address this issue, a wide range of domain generalization (DG) methods have been developed. However, these approaches mainly focus on invariant representations by leveraging multiple source domain data, which ignore the uncertainty presented from different domains. In this paper, we establish a novel DG framework in form of evidential deep learning (EDL-DG). To reach DG objective under finite given domains, we propose a new <em>Domain Uncertainty Shrinkage</em> (<strong>DUS</strong>) regularization scheme on the output Dirichlet distribution parameters, which achieves better generalization across unseen domains without introducing additional structures. Theoretically, we analyze the convergence of EDL-DG, and provide a generalization bound in the framework of PAC-Bayesian learning. We show that our proposed method reduce the PAC-Bayesian bound under certain conditions, and thus achieve better generalization across unseen domains. In our experiments, we validate the effectiveness our proposed method on DomainBed benchmark in multiple real-world datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113118"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GaitMDF: Gait recognition via motion deformation field modeling and knowledge transfer GaitMDF:基于运动变形场建模和知识转移的步态识别
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.patcog.2026.113147
Wei Huo , Ke Wang , Jun Tang , Xudong Zhou , Nian Wang
Gait recognition aims to identify target subjects across non-overlapping camera viewpoints according to their unique walking patterns. Motion representation is the core task in constructing an applicable gait recognition system, which is required to characterize fine-grained dynamic posture changes. In current gait recognition research, multi-scale temporal modeling in conjunction with spatial representation learning is the mainstream line. However, such ideas describe walking patterns in an implicit manner, which often leads to missing important motion information. To address these challenges, we model continuous human body movement as motion deformation field sequences with more physical interpretability. And the learned deformation fields are seamlessly integrated into the proposed gait recognition framework GaitMDF. Specifically, we first learn the multi-scale deformation fields from silhouettes using the designed Deformation Field Generation Network (DFGNet) in a self-supervised manner. Then, we develop two powerful feature extraction network, i.e., Silhouette Feature Extractor (SFE) and Deformation Field Feature Extractor (DFFE), for the silhouette and deformation field sequences to obtain discriminative spatial-temporal representations. Furthermore, a two-stage knowledge distillation strategy is developed to transfer the motion features learned from DFFE to the mimetic deformation field features. By applying this strategy, we can not only preserve the motion information of the deformation fields but also significantly reduce computational cost in inference with no need for DFGNet and DFFE. Finally, the silhouette and the mimetic deformation field features are fused for identity recognition. Extensive experiments on three popular gait datasets demonstrate the effectiveness and superiority of the proposed method.
步态识别的目的是根据目标对象独特的行走模式,在非重叠的摄像机视点上识别目标对象。运动表征是构建一个适用的步态识别系统的核心任务,它需要对细粒度的动态姿态变化进行表征。在当前的步态识别研究中,结合空间表征学习的多尺度时间建模是主流方向。然而,这些想法以一种隐含的方式描述了行走模式,这通常会导致丢失重要的运动信息。为了解决这些挑战,我们将连续的人体运动建模为具有更多物理可解释性的运动变形场序列。将学习到的变形场无缝地集成到步态识别框架GaitMDF中。具体而言,我们首先使用设计的变形场生成网络(DFGNet)以自监督的方式从轮廓中学习多尺度变形场。然后,我们开发了两个功能强大的特征提取网络,即轮廓特征提取器(SFE)和变形场特征提取器(DFFE),以获得轮廓和变形场序列的判别性时空表征。在此基础上,提出了一种两阶段知识精馏策略,将从DFFE学习到的运动特征转化为模拟变形场特征。采用该策略不仅可以保留变形场的运动信息,而且可以在不需要DFGNet和DFFE的情况下显著降低推理的计算成本。最后,将轮廓特征与拟变形场特征融合进行身份识别。在三种常用的步态数据集上进行的大量实验证明了该方法的有效性和优越性。
{"title":"GaitMDF: Gait recognition via motion deformation field modeling and knowledge transfer","authors":"Wei Huo ,&nbsp;Ke Wang ,&nbsp;Jun Tang ,&nbsp;Xudong Zhou ,&nbsp;Nian Wang","doi":"10.1016/j.patcog.2026.113147","DOIUrl":"10.1016/j.patcog.2026.113147","url":null,"abstract":"<div><div>Gait recognition aims to identify target subjects across non-overlapping camera viewpoints according to their unique walking patterns. Motion representation is the core task in constructing an applicable gait recognition system, which is required to characterize fine-grained dynamic posture changes. In current gait recognition research, multi-scale temporal modeling in conjunction with spatial representation learning is the mainstream line. However, such ideas describe walking patterns in an implicit manner, which often leads to missing important motion information. To address these challenges, we model continuous human body movement as motion deformation field sequences with more physical interpretability. And the learned deformation fields are seamlessly integrated into the proposed gait recognition framework GaitMDF. Specifically, we first learn the multi-scale deformation fields from silhouettes using the designed Deformation Field Generation Network (DFGNet) in a self-supervised manner. Then, we develop two powerful feature extraction network, i.e., Silhouette Feature Extractor (SFE) and Deformation Field Feature Extractor (DFFE), for the silhouette and deformation field sequences to obtain discriminative spatial-temporal representations. Furthermore, a two-stage knowledge distillation strategy is developed to transfer the motion features learned from DFFE to the mimetic deformation field features. By applying this strategy, we can not only preserve the motion information of the deformation fields but also significantly reduce computational cost in inference with no need for DFGNet and DFFE. Finally, the silhouette and the mimetic deformation field features are fused for identity recognition. Extensive experiments on three popular gait datasets demonstrate the effectiveness and superiority of the proposed method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113147"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ParkinsonNet: A unified end-to-end framework for estimating Parkinson’s disease motor symptom severity ParkinsonNet:一个统一的端到端评估帕金森病运动症状严重程度的框架
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.patcog.2026.113109
Yande Li , Fang Ba , Minglun Gong , Li Cheng
Parkinson’s Disease (PD) is a progressive neurodegenerative disorder characterized by worsening motor symptoms such as bradykinesia, imbalance, tremor, rigidity, and gait disturbances. Clinician assessments are often time-consuming and costly, and the limited availability of specialists, along with patient mobility issues, complicates frequent evaluations. In this paper, we propose a novel end-to-end network to automatically qualify the severity of motor symptoms in PD, referred to as ParkinsonNet. Unlike most existing methods that focus on isolated tests, ParkinsonNet provides a unified learning framework that is evaluated across multiple PD motor symptoms, as demonstrated on finger tapping and gait. Specifically, to accurately perceive the gradual progression of motor symptoms throughout an entire test cycle (e.g., decrementing amplitude), a temporal self-attention enhancement module is designed by combining temporal compression with long-term temporal dependency modeling. To ease the issues of class imbalance and limited datasets, a similarity matching module is proposed that transforms the conventional classification or regression task into a similarity matching problem, matching the skeleton feature with its most similar texture feature. Additionally, a vector quantization module is incorporated to encode spatiotemporal features into a discrete-valued space, compressing and abstracting motion representations while retaining critical information for more accurate classification. Extensive experiments on two newly identified benchmark datasets demonstrate the superiority of our ParkinsonNet and set new benchmark performance for future algorithm development and evaluation. Our code will be released at this https URL.
帕金森病(PD)是一种进行性神经退行性疾病,其特征是运动症状恶化,如运动迟缓、不平衡、震颤、僵硬和步态障碍。临床医生的评估通常既耗时又昂贵,而且专家的可用性有限,再加上患者的流动性问题,使频繁的评估变得复杂。在本文中,我们提出了一个新的端到端网络来自动限定PD运动症状的严重程度,称为帕金森网。与大多数现有的专注于孤立测试的方法不同,ParkinsonNet提供了一个统一的学习框架,可以评估多种PD运动症状,如手指敲击和步态。具体而言,为了准确感知整个测试周期中运动症状的逐渐进展(例如,递减幅度),我们将时间压缩与长期时间依赖建模相结合,设计了一个时间自注意增强模块。为了缓解类不平衡和数据集有限的问题,提出了相似度匹配模块,将传统的分类或回归任务转化为相似度匹配问题,将骨架特征与其最相似的纹理特征进行匹配。此外,矢量量化模块用于将时空特征编码为离散值空间,压缩和抽象运动表示,同时保留关键信息以进行更准确的分类。在两个新识别的基准数据集上进行的大量实验证明了我们的帕金森网的优越性,并为未来的算法开发和评估设定了新的基准性能。我们的代码将在这个https URL发布。
{"title":"ParkinsonNet: A unified end-to-end framework for estimating Parkinson’s disease motor symptom severity","authors":"Yande Li ,&nbsp;Fang Ba ,&nbsp;Minglun Gong ,&nbsp;Li Cheng","doi":"10.1016/j.patcog.2026.113109","DOIUrl":"10.1016/j.patcog.2026.113109","url":null,"abstract":"<div><div>Parkinson’s Disease (PD) is a progressive neurodegenerative disorder characterized by worsening motor symptoms such as bradykinesia, imbalance, tremor, rigidity, and gait disturbances. Clinician assessments are often time-consuming and costly, and the limited availability of specialists, along with patient mobility issues, complicates frequent evaluations. In this paper, we propose a novel end-to-end network to automatically qualify the severity of motor symptoms in PD, referred to as ParkinsonNet. Unlike most existing methods that focus on isolated tests, ParkinsonNet provides a unified learning framework that is evaluated across multiple PD motor symptoms, as demonstrated on finger tapping and gait. Specifically, to accurately perceive the gradual progression of motor symptoms throughout an entire test cycle (e.g., decrementing amplitude), a temporal self-attention enhancement module is designed by combining temporal compression with long-term temporal dependency modeling. To ease the issues of class imbalance and limited datasets, a similarity matching module is proposed that transforms the conventional classification or regression task into a similarity matching problem, matching the skeleton feature with its most similar texture feature. Additionally, a vector quantization module is incorporated to encode spatiotemporal features into a discrete-valued space, compressing and abstracting motion representations while retaining critical information for more accurate classification. Extensive experiments on two newly identified benchmark datasets demonstrate the superiority of our ParkinsonNet and set new benchmark performance for future algorithm development and evaluation. Our code will be released at <span><span>this https URL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113109"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards open-vocabulary semantic segmentation for remote sensing images 面向开放词汇的遥感图像语义分割
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.patcog.2026.113120
Da Zhang , Mingmin Zeng , Xuelong Li
Open-vocabulary semantic segmentation (OVSS) for remote sensing images (RSI) aims to achieve precise segmentation of arbitrary semantic categories specified within RSI. However, existing mainstream OVSS models are mostly trained on natural images and struggle to handle the rotational diversity and unique characteristics of RSI, resulting in insufficient feature representation and category discrimination capabilities. To ameliorate this challenge, we propose ROSS, an open vocabulary semantic segmentation framework that combines effective feature fusion with dedicated modeling of RSI characteristics. Specifically, ROSS employs a dual-branch image encoder (DBIE): one branch leverages multi-directional augmentation to enhance the representation of rotation-invariant features, while the other incorporates remote sensing (RS) specific knowledge via an encoder pretrained on large-scale RSI data. During feature fusion, ROSS generates cost maps from both branches and designs a spatial-class dual-level cost aggregation (SDCA) module based on spatial and category information, thereby fully integrating global spatial context and category discriminability. Finally, we introduce a RS knowledge transfer upsampling module that efficiently fuses and reconstructs multi-scale features to achieve high-resolution and fine-grained segmentation. Experiments on four open-vocabulary RS datasets demonstrate that ROSS consistently outperforms current state-of-the-art (SOTA) models. This robust performance across different training and evaluation configurations verifies its effectiveness and broad applicability.
面向遥感图像的开放词汇语义分割(OVSS)旨在对遥感图像中指定的任意语义类别进行精确分割。然而,现有的主流OVSS模型大多是在自然图像上进行训练,难以处理RSI的旋转多样性和独特性,导致特征表示和类别识别能力不足。为了改善这一挑战,我们提出了ROSS,这是一个开放的词汇语义分割框架,它结合了有效的特征融合和RSI特征的专用建模。具体来说,ROSS采用了双分支图像编码器(DBIE):一个分支利用多向增强来增强旋转不变特征的表示,而另一个分支通过对大规模RSI数据进行预训练的编码器来结合遥感(RS)特定知识。在特征融合过程中,ROSS从两个分支生成成本图,并设计了基于空间和品类信息的空间级双层成本聚合(SDCA)模块,充分集成了全局空间脉络和品类可辨别性。最后,我们引入了一个RS知识迁移上采样模块,该模块可以有效地融合和重建多尺度特征,以实现高分辨率和细粒度的分割。在四个开放词汇RS数据集上的实验表明,ROSS始终优于当前最先进的(SOTA)模型。这种跨不同训练和评估配置的鲁棒性验证了其有效性和广泛的适用性。
{"title":"Towards open-vocabulary semantic segmentation for remote sensing images","authors":"Da Zhang ,&nbsp;Mingmin Zeng ,&nbsp;Xuelong Li","doi":"10.1016/j.patcog.2026.113120","DOIUrl":"10.1016/j.patcog.2026.113120","url":null,"abstract":"<div><div>Open-vocabulary semantic segmentation (OVSS) for remote sensing images (RSI) aims to achieve precise segmentation of arbitrary semantic categories specified within RSI. However, existing mainstream OVSS models are mostly trained on natural images and struggle to handle the rotational diversity and unique characteristics of RSI, resulting in insufficient feature representation and category discrimination capabilities. To ameliorate this challenge, we propose ROSS, an open vocabulary semantic segmentation framework that combines effective feature fusion with dedicated modeling of RSI characteristics. Specifically, ROSS employs a dual-branch image encoder (DBIE): one branch leverages multi-directional augmentation to enhance the representation of rotation-invariant features, while the other incorporates remote sensing (RS) specific knowledge via an encoder pretrained on large-scale RSI data. During feature fusion, ROSS generates cost maps from both branches and designs a spatial-class dual-level cost aggregation (SDCA) module based on spatial and category information, thereby fully integrating global spatial context and category discriminability. Finally, we introduce a RS knowledge transfer upsampling module that efficiently fuses and reconstructs multi-scale features to achieve high-resolution and fine-grained segmentation. Experiments on four open-vocabulary RS datasets demonstrate that ROSS consistently outperforms current state-of-the-art (SOTA) models. This robust performance across different training and evaluation configurations verifies its effectiveness and broad applicability.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113120"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater image enhancement via degradation information extraction and guidance 基于退化信息提取和制导的水下图像增强
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.patcog.2026.113121
Fukuan Wang , Fei Li , Chaojun Cen , Zhenbo Li , Qingling Duan
Underwater image degradation, caused by wavelength dependent light attenuation, color distortion, and backscatter, significantly impairs reliable visual perception and downstream analysis. Such degradation is inherently complex and composite, as multiple types and varying degrees often coexist within a single scene, posing major challenges for effective Underwater Image Enhancement (UIE). To achieve enhancement that preserves semantic content in the presence of coexisting degradations, we propose DIE2UIE, a novel enhancement framework that explicitly models and integrates both degradation- and semantic-aware prompts. Specifically, we design a Degradation Pattern Perception Module (DPPM) that leverages prompt learning based on Contrastive Language-Image Pretraining (CLIP) to align vision-language features, enabling semantically grounded degradation modeling and pattern-specific enhancement. Complementarily, a Semantic Information Extraction Module (SIEM) recovers object- and scene-level representations through tag prediction, promoting the preservation of semantic structures essential for downstream tasks. These two information streams are jointly embedded within a Transformer-based Degradation Information Extraction (DIE) module, which serves as a unified reasoning core to adaptively guide the enhancement process. Extensive experiments on benchmark and real-world datasets demonstrate that DIE2UIE consistently outperforms state-of-the-art methods in terms of perceptual quality and task-level performance. The code is available at here.
由波长相关光衰减、颜色失真和后向散射引起的水下图像退化,严重损害了可靠的视觉感知和下游分析。这种退化本质上是复杂和复合的,因为多种类型和不同程度的退化往往共存于一个场景中,这对有效的水下图像增强(UIE)提出了重大挑战。为了实现在存在共存退化的情况下保留语义内容的增强,我们提出了DIE2UIE,这是一个新的增强框架,它明确地建模并集成了退化和语义感知提示。具体来说,我们设计了一个退化模式感知模块(DPPM),该模块利用基于对比语言-图像预训练(CLIP)的快速学习来对齐视觉语言特征,从而实现基于语义的退化建模和模式特定增强。此外,语义信息提取模块(SIEM)通过标签预测恢复对象级和场景级表示,促进对下游任务必不可少的语义结构的保存。这两个信息流被联合嵌入到一个基于变压器的退化信息提取(DIE)模块中,作为一个统一的推理核心,自适应地指导增强过程。在基准和现实世界数据集上进行的大量实验表明,DIE2UIE在感知质量和任务级性能方面始终优于最先进的方法。代码可以在这里找到。
{"title":"Underwater image enhancement via degradation information extraction and guidance","authors":"Fukuan Wang ,&nbsp;Fei Li ,&nbsp;Chaojun Cen ,&nbsp;Zhenbo Li ,&nbsp;Qingling Duan","doi":"10.1016/j.patcog.2026.113121","DOIUrl":"10.1016/j.patcog.2026.113121","url":null,"abstract":"<div><div>Underwater image degradation, caused by wavelength dependent light attenuation, color distortion, and backscatter, significantly impairs reliable visual perception and downstream analysis. Such degradation is inherently complex and composite, as multiple types and varying degrees often coexist within a single scene, posing major challenges for effective Underwater Image Enhancement (UIE). To achieve enhancement that preserves semantic content in the presence of coexisting degradations, we propose DIE2UIE, a novel enhancement framework that explicitly models and integrates both degradation- and semantic-aware prompts. Specifically, we design a Degradation Pattern Perception Module (DPPM) that leverages prompt learning based on Contrastive Language-Image Pretraining (CLIP) to align vision-language features, enabling semantically grounded degradation modeling and pattern-specific enhancement. Complementarily, a Semantic Information Extraction Module (SIEM) recovers object- and scene-level representations through tag prediction, promoting the preservation of semantic structures essential for downstream tasks. These two information streams are jointly embedded within a Transformer-based Degradation Information Extraction (DIE) module, which serves as a unified reasoning core to adaptively guide the enhancement process. Extensive experiments on benchmark and real-world datasets demonstrate that DIE2UIE consistently outperforms state-of-the-art methods in terms of perceptual quality and task-level performance. The code is available at <span><span>here</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113121"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FProtoSeg: Fine-grained prototype alignment for Weakly Supervised Semantic Segmentation of histopathology images FProtoSeg:组织病理学图像弱监督语义分割的细粒度原型对齐
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.patcog.2026.113126
Meidan Ding , Wenting Chen , Xiaoling Luo , Haiqin Zhong , Linlin Shen
Weakly Supervised Semantic Segmentation (WSSS) for histopathology tissues has significantly improved to reduce the burden of annotation through class activation maps (CAMs). Nevertheless, accurate segmentation remains challenging due to the high intra-class variability across patients and the subtle inter-class differences, as early-stage abnormal cells often resemble normal ones. Moreover, WSSS methods tend to emphasize the most discriminative features, often neglecting outlier features that are from less common or more subtle morphological variations within a class. Despite progress in recent approaches, the reliance on a coarse, one-to-many mapping hampers their capacity to capture subtle, pixel-level distinctions. Motivated by this limitation, we hypothesize that adopting a fine-grained, one-to-one alignment will yield more accurate and complete segmentation outcomes. Therefore, we propose a novel fine-grained prototype alignment framework named FProtoSeg, with structure-aware prototype modeling and text-aware prototype alignment to extract more specific features and activate more complete CAMs. Specifically, structure-aware prototype modeling captures class characteristics by employing prototypes, thereby adapting to the semantic attributes of different instances. Text-aware prototype alignment aligns visual and textual features to enhance prototype awareness, ensuring that instance feature distributions are in harmony with text features. Experimental results demonstrate that FProtoSeg achieves state-of-the-art performance, attaining a mean Intersection over Union (mIoU) of 71.21% on the BCSS-WSSS dataset and 76.64% on the LUAD-HistoSeg dataset, significantly outperforming existing methods.
通过类激活图(CAMs),对组织病理组织的弱监督语义分割(WSSS)进行了显著改进,减少了标注的负担。然而,由于患者类别内的高变异性和微妙的类别间差异,由于早期异常细胞通常与正常细胞相似,因此准确的分割仍然具有挑战性。此外,WSSS方法倾向于强调最具区别性的特征,往往忽略了类中不太常见或更微妙的形态变化的离群特征。尽管最近的方法取得了进展,但对粗糙的一对多映射的依赖阻碍了它们捕捉细微的像素级差异的能力。由于这种限制,我们假设采用细粒度的一对一对齐将产生更准确和完整的分割结果。因此,我们提出了一种新颖的细粒度原型对齐框架FProtoSeg,该框架具有结构感知原型建模和文本感知原型对齐,以提取更具体的特征并激活更完整的cam。具体来说,结构感知原型建模通过使用原型捕获类特征,从而适应不同实例的语义属性。文本感知原型对齐通过对齐视觉和文本特征来增强原型感知,确保实例特征分布与文本特征保持一致。实验结果表明,FProtoSeg达到了最先进的性能,在BCSS-WSSS数据集上实现了71.21%的平均交联(Intersection over Union, mIoU),在LUAD-HistoSeg数据集上实现了76.64%,显著优于现有方法。
{"title":"FProtoSeg: Fine-grained prototype alignment for Weakly Supervised Semantic Segmentation of histopathology images","authors":"Meidan Ding ,&nbsp;Wenting Chen ,&nbsp;Xiaoling Luo ,&nbsp;Haiqin Zhong ,&nbsp;Linlin Shen","doi":"10.1016/j.patcog.2026.113126","DOIUrl":"10.1016/j.patcog.2026.113126","url":null,"abstract":"<div><div>Weakly Supervised Semantic Segmentation (WSSS) for histopathology tissues has significantly improved to reduce the burden of annotation through class activation maps (CAMs). Nevertheless, accurate segmentation remains challenging due to the high intra-class variability across patients and the subtle inter-class differences, as early-stage abnormal cells often resemble normal ones. Moreover, WSSS methods tend to emphasize the most discriminative features, often neglecting outlier features that are from less common or more subtle morphological variations within a class. Despite progress in recent approaches, the reliance on a coarse, one-to-many mapping hampers their capacity to capture subtle, pixel-level distinctions. Motivated by this limitation, we hypothesize that adopting a fine-grained, one-to-one alignment will yield more accurate and complete segmentation outcomes. Therefore, we propose a novel fine-grained prototype alignment framework named FProtoSeg, with structure-aware prototype modeling and text-aware prototype alignment to extract more specific features and activate more complete CAMs. Specifically, structure-aware prototype modeling captures class characteristics by employing prototypes, thereby adapting to the semantic attributes of different instances. Text-aware prototype alignment aligns visual and textual features to enhance prototype awareness, ensuring that instance feature distributions are in harmony with text features. Experimental results demonstrate that FProtoSeg achieves state-of-the-art performance, attaining a mean Intersection over Union (mIoU) of 71.21% on the BCSS-WSSS dataset and 76.64% on the LUAD-HistoSeg dataset, significantly outperforming existing methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113126"},"PeriodicalIF":7.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attribute graph adjusted trace ratio linear discriminant analysis for feature extraction 属性图调整迹比线性判别分析特征提取
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.patcog.2026.113136
Quan Wang , Hao Lei , Fei Wang , Xinpei Wen , Zhiping Lin , Feiping Nie
Trace Ratio Linear Discriminant Analysis (TRLDA) is an appealing supervised feature extraction method because it explicitly reflects the Euclidean distances between and within classes of projected samples while preserving data similarity through its orthogonal constraint. However, TRLDA fails to account for inter-attribute correlations, which may limit its discriminant capability. To overcome this limitation, we propose Attribute Graph Adjusted Trace Ratio Linear Discriminant Analysis (AGATRLDA), a novel method that incorporates attribute-level relationships into the discriminant projection matrix. In our approach, each attribute is represented as a point formed by the values of that attribute across all samples. An attribute graph is then constructed by connecting these attribute points with edges weighted according to their pairwise similarity. By integrating the Laplacian matrix of this attribute graph into the optimization framework, AGATRLDA adjusts the discriminant projection matrix to account for inter-attribute correlations. This adjustment encourages attributes with higher similarity to have more aligned coefficients in the projection matrix, thereby improving discriminative performance. Experimental results demonstrate that AGATRLDA consistently outperforms the original TRLDA method as well as several state-of-the-art feature extraction techniques, validating the benefit of incorporating inter-attribute correlations in the discriminant learning process.
迹比线性判别分析(TRLDA)是一种很有吸引力的监督特征提取方法,因为它明确地反映了投影样本类之间和类内的欧几里得距离,同时通过其正交约束保持数据相似性。然而,TRLDA没有考虑属性间的相关性,这可能会限制其判别能力。为了克服这一限制,我们提出了属性图调整迹比线性判别分析(AGATRLDA),这是一种将属性级关系纳入判别投影矩阵的新方法。在我们的方法中,每个属性都表示为由所有样本中该属性的值构成的点。然后将这些属性点与根据它们的两两相似度加权的边连接起来,构造属性图。通过将该属性图的拉普拉斯矩阵整合到优化框架中,AGATRLDA调整判别投影矩阵以考虑属性间的相关性。这种调整鼓励具有较高相似性的属性在投影矩阵中具有更多对齐系数,从而提高判别性能。实验结果表明,AGATRLDA始终优于原始的TRLDA方法以及几种最先进的特征提取技术,验证了在判别学习过程中纳入属性间相关性的好处。
{"title":"Attribute graph adjusted trace ratio linear discriminant analysis for feature extraction","authors":"Quan Wang ,&nbsp;Hao Lei ,&nbsp;Fei Wang ,&nbsp;Xinpei Wen ,&nbsp;Zhiping Lin ,&nbsp;Feiping Nie","doi":"10.1016/j.patcog.2026.113136","DOIUrl":"10.1016/j.patcog.2026.113136","url":null,"abstract":"<div><div>Trace Ratio Linear Discriminant Analysis (TRLDA) is an appealing supervised feature extraction method because it explicitly reflects the Euclidean distances between and within classes of projected samples while preserving data similarity through its orthogonal constraint. However, TRLDA fails to account for inter-attribute correlations, which may limit its discriminant capability. To overcome this limitation, we propose Attribute Graph Adjusted Trace Ratio Linear Discriminant Analysis (AGATRLDA), a novel method that incorporates attribute-level relationships into the discriminant projection matrix. In our approach, each attribute is represented as a point formed by the values of that attribute across all samples. An attribute graph is then constructed by connecting these attribute points with edges weighted according to their pairwise similarity. By integrating the Laplacian matrix of this attribute graph into the optimization framework, AGATRLDA adjusts the discriminant projection matrix to account for inter-attribute correlations. This adjustment encourages attributes with higher similarity to have more aligned coefficients in the projection matrix, thereby improving discriminative performance. Experimental results demonstrate that AGATRLDA consistently outperforms the original TRLDA method as well as several state-of-the-art feature extraction techniques, validating the benefit of incorporating inter-attribute correlations in the discriminant learning process.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113136"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lifelong content-based histopathology image retrieval via bilevel coreset selection and distance consistency rehearsal 基于双层核心集选择和距离一致性排练的终身基于内容的组织病理学图像检索
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.patcog.2026.113135
Xinyu Zhu , Zhiguo Jiang , Kun Wu , Jun Shi , Yushan Zheng
Content-based histopathological image retrieval (CBHIR) has shown strong performance on static databases by retrieving whole slide images (WSIs) with similar content to query images. However, in clinical settings, the rapid growth of WSI databases challenges current CBHIR methods, which either require costly retraining or suffer performance degradation on new data, e.g., simple fine-tuning causes a 37.5% drop in mAP@5 compared to joint training. To address this, we propose a Lifelong Content-based Histopathology Image Retrieval (LCBHIR) framework that mitigates catastrophic forgetting in continual retrieval, where models lose prior knowledge when updated on expanding databases. The central challenge is balancing stability and plasticity. To enhance plasticity, we design a local memory bank with bilevel coreset sampling, formulating instance selection as a two-level optimization problem. This assigns higher weights to informative or hard-to-learn samples, refining decision boundaries in the feature space. To preserve stability, we introduce a distance consistency rehearsal (DCR) module, which maintains the relative feature distances of old samples, ensuring consistency across retrieval tasks and improving reliability in clinical applications. We validate our method on a large-scale continual WSI dataset from TCGA projects, comprising approximately 7400 WSIs across 6 primary sites and 19 cancer subtypes. The experimental results have demonstrated the proposed method is effective and is superior to the state-of-the-art methods, achieving 5.7 ∼ 19.4% higher mAP compared to existing continual learning methods. The code is available at https://github.com/OliverZXY/LCBHIR.
基于内容的组织病理学图像检索(CBHIR)在静态数据库中通过检索具有相似内容的整张幻灯片图像(wsi)显示出较强的性能。然而,在临床环境中,WSI数据库的快速增长挑战了当前的CBHIR方法,这些方法要么需要昂贵的再训练,要么在新数据上性能下降,例如,与联合训练相比,简单的微调会导致mAP@5下降37.5%。为了解决这个问题,我们提出了一个终身基于内容的组织病理学图像检索(LCBHIR)框架,该框架减轻了持续检索中的灾难性遗忘,即模型在扩展数据库更新时丢失先验知识。核心挑战是平衡稳定性和可塑性。为了提高可塑性,我们设计了一个具有双层核集采样的局部记忆库,将实例选择制定为一个两级优化问题。这为信息丰富或难以学习的样本分配了更高的权重,从而细化了特征空间中的决策边界。为了保持稳定性,我们引入了距离一致性排练(DCR)模块,该模块保持旧样本的相对特征距离,确保检索任务的一致性,提高临床应用的可靠性。我们在TCGA项目的大规模连续WSI数据集上验证了我们的方法,该数据集包括6个原发部位和19种癌症亚型的约7400个WSI。实验结果表明,该方法的mAP值比现有的持续学习方法高出5.7 ~ 19.4%,是一种有效的方法。代码可在https://github.com/OliverZXY/LCBHIR上获得。
{"title":"Lifelong content-based histopathology image retrieval via bilevel coreset selection and distance consistency rehearsal","authors":"Xinyu Zhu ,&nbsp;Zhiguo Jiang ,&nbsp;Kun Wu ,&nbsp;Jun Shi ,&nbsp;Yushan Zheng","doi":"10.1016/j.patcog.2026.113135","DOIUrl":"10.1016/j.patcog.2026.113135","url":null,"abstract":"<div><div>Content-based histopathological image retrieval (CBHIR) has shown strong performance on static databases by retrieving whole slide images (WSIs) with similar content to query images. However, in clinical settings, the rapid growth of WSI databases challenges current CBHIR methods, which either require costly retraining or suffer performance degradation on new data, e.g., simple fine-tuning causes a 37.5% drop in mAP@5 compared to joint training. To address this, we propose a Lifelong Content-based Histopathology Image Retrieval (LCBHIR) framework that mitigates catastrophic forgetting in continual retrieval, where models lose prior knowledge when updated on expanding databases. The central challenge is balancing stability and plasticity. To enhance plasticity, we design a local memory bank with bilevel coreset sampling, formulating instance selection as a two-level optimization problem. This assigns higher weights to informative or hard-to-learn samples, refining decision boundaries in the feature space. To preserve stability, we introduce a distance consistency rehearsal (DCR) module, which maintains the relative feature distances of old samples, ensuring consistency across retrieval tasks and improving reliability in clinical applications. We validate our method on a large-scale continual WSI dataset from TCGA projects, comprising approximately 7400 WSIs across 6 primary sites and 19 cancer subtypes. The experimental results have demonstrated the proposed method is effective and is superior to the state-of-the-art methods, achieving 5.7 ∼ 19.4% higher mAP compared to existing continual learning methods. The code is available at <span><span>https://github.com/OliverZXY/LCBHIR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113135"},"PeriodicalIF":7.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1