首页 > 最新文献

Pattern Recognition最新文献

英文 中文
DEER: Diffusion-empowered efficient restoration for underwater images 鹿:扩散增强水下图像的有效恢复
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-31 DOI: 10.1016/j.patcog.2026.113167
Cheng Wang , Junyang Chen , Weichen Zhao , Wanhui Gao , Ge Jiao
Underwater images suffer from severe degradation caused by light attenuation and noise interference. Existing methods struggle to achieve a balance between performance and inference efficiency. To address this problem, we propose a Diffusion-Empowered Efficient Restoration (DEER) framework, comprising an enhancement network and a restoration network. The former incorporates two key modules: the High-Frequency Detail Enhancement (HFDE) module introduces a min-pooling channel to recover dark details suppressed by medium absorption, complementing max-pooling and average-pooling to capture comprehensive physical edge characteristics; meanwhile, the Multi-Scale Fusion (MSF) module utilizes multi-scale analysis to address the spatial non-uniformity of color casts. Collectively, they provide rich frequency-domain priors for the subsequent restoration. Regarding the latter, unlike previous approaches that directly utilized a diffusion model for generation, we employ it as a diffusion-guided learned prior. By providing dynamic gradient guidance during the training phase, the lightweight network learns the natural image manifold while avoiding smoothing artifacts induced by pixel-wise mimicry. During inference, the diffusion model is discarded, allowing the lightweight restoration model to achieve accelerated inference. Experimental results demonstrate that DEER outperforms state-of-the-art approaches, achieves improvement of 0.7% ∼ 5.6% across nearly all metrics on the LSUI and UIEB datasets. Our code is available at here.
水下图像受到光衰减和噪声干扰的严重影响。现有的方法很难在性能和推理效率之间取得平衡。为了解决这个问题,我们提出了一个扩散授权的有效恢复(DEER)框架,该框架包括一个增强网络和一个恢复网络。前者包含两个关键模块:高频细节增强(HFDE)模块引入最小池化通道来恢复被介质吸收抑制的暗细节,补充了最大池化和平均池化,以捕获全面的物理边缘特征;同时,多尺度融合(MSF)模块利用多尺度分析来解决色模的空间非均匀性问题。总的来说,它们为随后的恢复提供了丰富的频域先验。对于后者,与之前直接使用扩散模型进行生成的方法不同,我们将其用作扩散引导学习先验。通过在训练阶段提供动态梯度引导,轻量级网络学习自然图像流形,同时避免由逐像素模仿引起的平滑伪影。在推理过程中,放弃扩散模型,允许轻量级恢复模型实现加速推理。实验结果表明,DEER优于最先进的方法,在LSUI和UIEB数据集的几乎所有指标上实现了0.7% ~ 5.6%的改进。我们的代码可以在这里找到。
{"title":"DEER: Diffusion-empowered efficient restoration for underwater images","authors":"Cheng Wang ,&nbsp;Junyang Chen ,&nbsp;Weichen Zhao ,&nbsp;Wanhui Gao ,&nbsp;Ge Jiao","doi":"10.1016/j.patcog.2026.113167","DOIUrl":"10.1016/j.patcog.2026.113167","url":null,"abstract":"<div><div>Underwater images suffer from severe degradation caused by light attenuation and noise interference. Existing methods struggle to achieve a balance between performance and inference efficiency. To address this problem, we propose a Diffusion-Empowered Efficient Restoration (DEER) framework, comprising an enhancement network and a restoration network. The former incorporates two key modules: the High-Frequency Detail Enhancement (HFDE) module introduces a min-pooling channel to recover dark details suppressed by medium absorption, complementing max-pooling and average-pooling to capture comprehensive physical edge characteristics; meanwhile, the Multi-Scale Fusion (MSF) module utilizes multi-scale analysis to address the spatial non-uniformity of color casts. Collectively, they provide rich frequency-domain priors for the subsequent restoration. Regarding the latter, unlike previous approaches that directly utilized a diffusion model for generation, we employ it as a diffusion-guided learned prior. By providing dynamic gradient guidance during the training phase, the lightweight network learns the natural image manifold while avoiding smoothing artifacts induced by pixel-wise mimicry. During inference, the diffusion model is discarded, allowing the lightweight restoration model to achieve accelerated inference. Experimental results demonstrate that DEER outperforms state-of-the-art approaches, achieves improvement of 0.7% ∼ 5.6% across nearly all metrics on the LSUI and UIEB datasets. Our code is available at <span><span>here</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113167"},"PeriodicalIF":7.6,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-MedSAM: Adapting SAM-assisted semi-supervised multi-modality learning for medical endoscopic image segmentation Semi-MedSAM:将sam辅助的半监督多模态学习用于医学内窥镜图像分割
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-31 DOI: 10.1016/j.patcog.2026.113206
Junhao Li , Yun Li , Junhao Wu , Chaojie Ji , Zhijie Chen , Wenbin Lei , Ruxin Wang
Accurate recognition of lesions in endoscopic images is essential for effective diagnosis and treatment. Multi-modal learning effectively utilizes complementary clues derived from multiple modalities, which can promote performance in lesion area detection. However, accessing the amount of annotated paired images for multi-modal learning is time-consuming and costly. Segment Anything Model (SAM) is a powerful vision foundation model that excels in natural image segmentation, but it encounters performance degradation in endoscopic scenes due to a lack of medical-specific knowledge. Besides, the simple structure of the SAM decoder fails to effectively capture fine-grained details among complex lesion structures and low-contrast tissue organs in endoscopic images. To utilize the powerful feature extraction capability of the foundation model and address the scarcity dilemma in medical image annotation, we present a novel prompt-free SAM-assisted framework, Semi-MedSAM, for semi-supervised multi-modal learning. The proposed Semi-MedSAM integrates an effective SAM-based backbone comprising a designed multi-expert-instructed encoder as well as a hierarchical prototypical decoder into a prompt-free semi-supervised framework. Extensive experiments on three multi-modal endoscopic datasets demonstrate the superior segmentation performance of our Semi-MedSAM.
准确识别病变的内镜图像是必要的有效的诊断和治疗。多模态学习有效地利用了来自多模态的互补线索,可以提高损伤区域检测的性能。然而,为多模态学习访问大量带注释的配对图像是耗时且昂贵的。SAM (Segment Anything Model)是一种功能强大的视觉基础模型,在自然图像分割方面表现出色,但在内窥镜场景中由于缺乏医学专业知识而导致性能下降。此外,SAM解码器结构简单,无法有效捕获内镜图像中复杂病变结构和低对比度组织器官之间的细粒度细节。为了利用基础模型强大的特征提取能力,解决医学图像标注中的稀缺性困境,我们提出了一种新的基于半监督多模态学习的无提示sam辅助框架Semi-MedSAM。提出的Semi-MedSAM集成了一个有效的基于sam的骨干,包括设计的多专家指导编码器和分层原型解码器到一个无提示的半监督框架中。在三个多模态内窥镜数据集上进行的大量实验表明,我们的Semi-MedSAM具有优越的分割性能。
{"title":"Semi-MedSAM: Adapting SAM-assisted semi-supervised multi-modality learning for medical endoscopic image segmentation","authors":"Junhao Li ,&nbsp;Yun Li ,&nbsp;Junhao Wu ,&nbsp;Chaojie Ji ,&nbsp;Zhijie Chen ,&nbsp;Wenbin Lei ,&nbsp;Ruxin Wang","doi":"10.1016/j.patcog.2026.113206","DOIUrl":"10.1016/j.patcog.2026.113206","url":null,"abstract":"<div><div>Accurate recognition of lesions in endoscopic images is essential for effective diagnosis and treatment. Multi-modal learning effectively utilizes complementary clues derived from multiple modalities, which can promote performance in lesion area detection. However, accessing the amount of annotated paired images for multi-modal learning is time-consuming and costly. Segment Anything Model (SAM) is a powerful vision foundation model that excels in natural image segmentation, but it encounters performance degradation in endoscopic scenes due to a lack of medical-specific knowledge. Besides, the simple structure of the SAM decoder fails to effectively capture fine-grained details among complex lesion structures and low-contrast tissue organs in endoscopic images. To utilize the powerful feature extraction capability of the foundation model and address the scarcity dilemma in medical image annotation, we present a novel prompt-free SAM-assisted framework, Semi-MedSAM, for semi-supervised multi-modal learning. The proposed Semi-MedSAM integrates an effective SAM-based backbone comprising a designed multi-expert-instructed encoder as well as a hierarchical prototypical decoder into a prompt-free semi-supervised framework. Extensive experiments on three multi-modal endoscopic datasets demonstrate the superior segmentation performance of our Semi-MedSAM.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113206"},"PeriodicalIF":7.6,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UPGP:Backdoor defense via unlearning perturbation and orthogonality-constraint gradient projection UPGP:基于遗忘摄动和正交约束梯度投影的后门防御
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-31 DOI: 10.1016/j.patcog.2026.113211
Jingtai Li , Xiujiu Yuan , Jiwei Tian , Shiwei Lu , Dengxiu Yu
In the field of artificial intelligence, the wide adoption of third-party data has heightened the risk of backdoor attacks based on data poisoning. Although post-training defenses can mitigate such attacks, aggressive strategies often degrade the model’s performance on its main task. To address this, a novel method that combines backdoor detection and elimination through machine unlearning is proposed. Specifically, unlearning perturbation is first defined to capture the parameter variation induced by forgetting a subset of samples. Subsequently, experiments confirm that backdoor samples exhibit lower sensitivity to perturbations generated from normal samples. In addition, a learning-dynamics analysis attributes this discrepancy to unlearning sensitivity, which is defined as the inner product between the gradients of normal and backdoor samples. This analysis further demonstrates that this metric quantifies the extent to which backdoor removal perturbs the model’s main task. Leveraging this insight, an orthogonality-constrained gradient projection method projects the unlearning gradient onto the null space of the normal-sample gradient, thereby eliminating the aforementioned unlearning sensitivity and preserving the accuracy of normal samples. The proposed method is evaluated across six backdoor attack scenarios and two network architectures, reducing the average attack success rate by 96.34 percentage points and improving robust accuracy by 83.68 percentage points, while maintaining the model’s performance on the main task.
在人工智能领域,第三方数据的广泛采用加剧了基于数据中毒的后门攻击风险。尽管训练后的防御可以减轻这种攻击,但攻击性策略通常会降低模型在其主要任务上的性能。为了解决这个问题,提出了一种结合后门检测和通过机器学习消除后门的新方法。具体来说,首先定义了遗忘扰动,以捕获由遗忘样本子集引起的参数变化。随后,实验证实,后门样品对正常样品产生的扰动表现出较低的敏感性。此外,学习动力学分析将这种差异归因于学习敏感性,它被定义为正常样本和后门样本梯度之间的内积。该分析进一步表明,该度量量化了后门移除对模型主要任务的干扰程度。利用这一见解,一种正交约束梯度投影方法将学习梯度投影到正态样本梯度的零空间上,从而消除了上述的学习敏感性并保持了正态样本的准确性。该方法在6种后门攻击场景和2种网络架构下进行了评估,平均攻击成功率降低了96.34个百分点,鲁棒准确率提高了83.68个百分点,同时保持了模型在主要任务上的性能。
{"title":"UPGP:Backdoor defense via unlearning perturbation and orthogonality-constraint gradient projection","authors":"Jingtai Li ,&nbsp;Xiujiu Yuan ,&nbsp;Jiwei Tian ,&nbsp;Shiwei Lu ,&nbsp;Dengxiu Yu","doi":"10.1016/j.patcog.2026.113211","DOIUrl":"10.1016/j.patcog.2026.113211","url":null,"abstract":"<div><div>In the field of artificial intelligence, the wide adoption of third-party data has heightened the risk of backdoor attacks based on data poisoning. Although post-training defenses can mitigate such attacks, aggressive strategies often degrade the model’s performance on its main task. To address this, a novel method that combines backdoor detection and elimination through machine unlearning is proposed. Specifically, unlearning perturbation is first defined to capture the parameter variation induced by forgetting a subset of samples. Subsequently, experiments confirm that backdoor samples exhibit lower sensitivity to perturbations generated from normal samples. In addition, a learning-dynamics analysis attributes this discrepancy to unlearning sensitivity, which is defined as the inner product between the gradients of normal and backdoor samples. This analysis further demonstrates that this metric quantifies the extent to which backdoor removal perturbs the model’s main task. Leveraging this insight, an orthogonality-constrained gradient projection method projects the unlearning gradient onto the null space of the normal-sample gradient, thereby eliminating the aforementioned unlearning sensitivity and preserving the accuracy of normal samples. The proposed method is evaluated across six backdoor attack scenarios and two network architectures, reducing the average attack success rate by 96.34 percentage points and improving robust accuracy by 83.68 percentage points, while maintaining the model’s performance on the main task.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113211"},"PeriodicalIF":7.6,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PSR: Proactive soft-orthogonal regulation for long-tailed class-incremental learning 长尾类增量学习的主动软正交调节
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-31 DOI: 10.1016/j.patcog.2026.113207
Zhihan Fu, Zhiqi Zhang, Shipeng Liao, Zhengyu Huang, Zerun Chen, Tianyu Shen
Long-tailed class-incremental learning (LT-CIL) faces the challenge of imbalanced data streams that can weaken tail-class representations due to the double impact of inherent bias toward head classes and catastrophic forgetting. Existing methods typically employ passive adjustments to the feature space to alleviate conflicts, while tail classes often suffer from inadequate learning capacity, particularly under conditions of extreme imbalance. To address this limitation, this paper proposes a proactive soft-orthogonal regulation strategy, which reserves embedding space for future classes during the base phase and guides new classes to occupy these spaces while maintaining clear inter-class boundaries in the incremental phases. In contrast to hard-orthogonal or rigid constraints, our proposed soft-orthogonal strategy preserves semantic continuity of the feature space while enforcing necessary separation between old and new classes, thereby facilitating the natural embedding of tail classes. The proposed method demonstrates state-of-the-art performance across multiple benchmarks, exhibiting notable robustness, strong generalization capabilities, and scalability to varying task complexities.
长尾类增量学习(LT-CIL)面临着数据流不平衡的挑战,由于对头类的固有偏见和灾难性遗忘的双重影响,可能会削弱尾类表征。现有方法通常采用被动调整特征空间来缓解冲突,而尾类往往存在学习能力不足的问题,特别是在极度不平衡的情况下。为了解决这一限制,本文提出了一种主动的软正交调节策略,该策略在基阶段为未来的类保留嵌入空间,并引导新类占用这些空间,同时在增量阶段保持清晰的类间边界。与硬正交或刚性约束相比,我们提出的软正交策略保留了特征空间的语义连续性,同时强制新旧类之间进行必要的分离,从而促进了尾类的自然嵌入。所提出的方法在多个基准测试中展示了最先进的性能,表现出显著的鲁棒性、强大的泛化能力和对不同任务复杂性的可扩展性。
{"title":"PSR: Proactive soft-orthogonal regulation for long-tailed class-incremental learning","authors":"Zhihan Fu,&nbsp;Zhiqi Zhang,&nbsp;Shipeng Liao,&nbsp;Zhengyu Huang,&nbsp;Zerun Chen,&nbsp;Tianyu Shen","doi":"10.1016/j.patcog.2026.113207","DOIUrl":"10.1016/j.patcog.2026.113207","url":null,"abstract":"<div><div>Long-tailed class-incremental learning (LT-CIL) faces the challenge of imbalanced data streams that can weaken tail-class representations due to the double impact of inherent bias toward head classes and catastrophic forgetting. Existing methods typically employ passive adjustments to the feature space to alleviate conflicts, while tail classes often suffer from inadequate learning capacity, particularly under conditions of extreme imbalance. To address this limitation, this paper proposes a proactive soft-orthogonal regulation strategy, which reserves embedding space for future classes during the base phase and guides new classes to occupy these spaces while maintaining clear inter-class boundaries in the incremental phases. In contrast to hard-orthogonal or rigid constraints, our proposed soft-orthogonal strategy preserves semantic continuity of the feature space while enforcing necessary separation between old and new classes, thereby facilitating the natural embedding of tail classes. The proposed method demonstrates state-of-the-art performance across multiple benchmarks, exhibiting notable robustness, strong generalization capabilities, and scalability to varying task complexities.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113207"},"PeriodicalIF":7.6,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequency-enhanced wavelet transformer based decoder for medical image segmentation 基于频率增强小波变换的医学图像分割解码器
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-30 DOI: 10.1016/j.patcog.2026.113198
Haitao Yin, Yongchang Xu
The efficiency and flexibility of decoding method are crucial to the application of pre-trained model in medical image segmentation. However, most conventional decoding methods are operated only in single-scale spatial domain, resulting in limited representation of multi-scale long-range dependencies and degradation of amplitude-phase spectrum consistency. Consequently, they tend to generate inaccurate segmentation for low-contrast and complex anatomical structures. To tackle these issues, we propose a Frequency-Enhanced Wavelet Transformer based Decoder (FEWTD) to fully capture multi-scale features and highlight the informative frequency spectrum. The core component of FEWTD is the Wavelet-Fourier Mixer (WF-Mixer), which consists of a Multi-Directional Wavelet Attention (MDWA) module and a Fourier Self-Adjustment (FSA) module. Specifically, the MDWA module involves multi-scale sub-band deformable self-attention and directional sub-band cross-attention in the Wavelet domain, endowing it with the capability of capturing multi-directional and multi-scale relationships. The FSA module jointly adjusts the Fourier-domain components through a shared adjuster, and filters out uninformative information for segmentation by utilizing channel attention. FEWTD is a plug-and-play decoder, which can be flexibly integrated into various pre-trained encoders. Extensive experiments on the ISIC2018, Synapse, and ACDC datasets verify that the combination of FEWTD and pre-trained PVTv2-B2 (PVT-FEWTD-B2) is superior to state-of-the-art methods. The source codes will be available on https://github.com/yongchangxu/FEWTD.
解码方法的高效性和灵活性是预训练模型在医学图像分割中应用的关键。然而,大多数传统的解码方法仅在单尺度空间域中操作,导致多尺度远程依赖关系的表示有限,并且降低了幅相谱一致性。因此,对于低对比度和复杂的解剖结构,它们往往产生不准确的分割。为了解决这些问题,我们提出了一种基于频率增强小波变换的解码器(FEWTD),以充分捕捉多尺度特征并突出信息频谱。FEWTD的核心部件是小波-傅立叶混频器(WF-Mixer),它由多向小波注意(MDWA)模块和傅立叶自调整(FSA)模块组成。具体而言,MDWA模块在小波域涉及多尺度子带变形自注意和方向子带交叉注意,使其具有捕获多方向、多尺度关系的能力。FSA模块通过共享调节器共同调整傅里叶域分量,并利用通道关注过滤掉非信息进行分割。FEWTD是一种即插即用的解码器,可以灵活地集成到各种预训练的编码器中。在ISIC2018、Synapse和ACDC数据集上进行的大量实验验证了FEWTD和预训练PVTv2-B2 (PVT-FEWTD-B2)的组合优于最先进的方法。源代码可在https://github.com/yongchangxu/FEWTD上获得。
{"title":"Frequency-enhanced wavelet transformer based decoder for medical image segmentation","authors":"Haitao Yin,&nbsp;Yongchang Xu","doi":"10.1016/j.patcog.2026.113198","DOIUrl":"10.1016/j.patcog.2026.113198","url":null,"abstract":"<div><div>The efficiency and flexibility of decoding method are crucial to the application of pre-trained model in medical image segmentation. However, most conventional decoding methods are operated only in single-scale spatial domain, resulting in limited representation of multi-scale long-range dependencies and degradation of amplitude-phase spectrum consistency. Consequently, they tend to generate inaccurate segmentation for low-contrast and complex anatomical structures. To tackle these issues, we propose a Frequency-Enhanced Wavelet Transformer based Decoder (FEWTD) to fully capture multi-scale features and highlight the informative frequency spectrum. The core component of FEWTD is the Wavelet-Fourier Mixer (WF-Mixer), which consists of a Multi-Directional Wavelet Attention (MDWA) module and a Fourier Self-Adjustment (FSA) module. Specifically, the MDWA module involves multi-scale sub-band deformable self-attention and directional sub-band cross-attention in the Wavelet domain, endowing it with the capability of capturing multi-directional and multi-scale relationships. The FSA module jointly adjusts the Fourier-domain components through a shared adjuster, and filters out uninformative information for segmentation by utilizing channel attention. FEWTD is a plug-and-play decoder, which can be flexibly integrated into various pre-trained encoders. Extensive experiments on the ISIC2018, Synapse, and ACDC datasets verify that the combination of FEWTD and pre-trained PVTv2-B2 (PVT-FEWTD-B2) is superior to state-of-the-art methods. The source codes will be available on <span><span>https://github.com/yongchangxu/FEWTD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113198"},"PeriodicalIF":7.6,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GrassNet: State space model meets graph neural network GrassNet:状态空间模型满足图神经网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-30 DOI: 10.1016/j.patcog.2026.113197
Gongpei Zhao , Tao Wang , Yi Jin , Congyan Lang , Yidong Li , Haibin Ling
Designing spectral convolutional networks presents a significant challenge in graph learning. In traditional spectral graph neural networks (GNNs), polynomial-based methods are commonly used to design filters via the Laplacian matrix. In practical applications, however, these polynomial methods encounter inherent limitations which primarily arise from the low-order truncation of polynomial filters and the lack of overall modeling of the graph spectrum. This leads to poor performance of existing spectral approaches on real-world graph data, especially when the spectrum is highly concentrated or contains many numerically identical values, as they tend to apply the exact same modulation to signals with the same frequencies. To overcome these issues, in this paper, we propose Graph State Space Network (GrassNet), a novel graph neural network with theoretical support that provides a simple but effective scheme to design and learn arbitrary graph spectral filters. In particular, our GrassNet introduces structured state space models (SSMs) to model the correlations of graph signals at different frequencies and derives a unique rectification for each frequency in the graph spectrum. To the best of our knowledge, our work is the first to employ SSMs for the design of GNN spectral filters, and it theoretically offers greater expressive power compared with polynomial filters. Extensive experiments on nine public benchmarks reveal that GrassNet achieves superior performance in real-world graph modeling tasks. The code is available at: https://github.com/Graph-ZKY/grassnet.
谱卷积网络的设计是图学习中的一个重大挑战。在传统的谱图神经网络(gnn)中,通常采用基于多项式的方法通过拉普拉斯矩阵来设计滤波器。然而,在实际应用中,这些多项式方法遇到了固有的局限性,主要是由于多项式滤波器的低阶截断和缺乏对图谱的整体建模。这导致现有的频谱方法在现实世界的图形数据上表现不佳,特别是当频谱高度集中或包含许多数值相同时,因为它们倾向于对具有相同频率的信号应用完全相同的调制。为了克服这些问题,本文提出了图状态空间网络(GrassNet),这是一种具有理论支持的新型图神经网络,它提供了一种简单而有效的方案来设计和学习任意图谱滤波器。特别是,我们的GrassNet引入了结构化状态空间模型(ssm)来模拟不同频率的图信号的相关性,并为图频谱中的每个频率推导出唯一的整流。据我们所知,我们的工作是第一个将ssm用于GNN频谱滤波器的设计,与多项式滤波器相比,它在理论上提供了更大的表达能力。在9个公共基准上的广泛实验表明,GrassNet在现实世界的图形建模任务中取得了卓越的性能。代码可从https://github.com/Graph-ZKY/grassnet获得。
{"title":"GrassNet: State space model meets graph neural network","authors":"Gongpei Zhao ,&nbsp;Tao Wang ,&nbsp;Yi Jin ,&nbsp;Congyan Lang ,&nbsp;Yidong Li ,&nbsp;Haibin Ling","doi":"10.1016/j.patcog.2026.113197","DOIUrl":"10.1016/j.patcog.2026.113197","url":null,"abstract":"<div><div>Designing spectral convolutional networks presents a significant challenge in graph learning. In traditional spectral graph neural networks (GNNs), polynomial-based methods are commonly used to design filters via the Laplacian matrix. In practical applications, however, these polynomial methods encounter inherent limitations which primarily arise from the low-order truncation of polynomial filters and the lack of overall modeling of the graph spectrum. This leads to poor performance of existing spectral approaches on real-world graph data, especially when the spectrum is highly concentrated or contains many numerically identical values, as they tend to apply the exact same modulation to signals with the same frequencies. To overcome these issues, in this paper, we propose <strong>Gra</strong>ph <strong>S</strong>tate <strong>S</strong>pace <strong>Net</strong>work (GrassNet), a novel graph neural network with theoretical support that provides a simple but effective scheme to design and learn arbitrary graph spectral filters. In particular, our GrassNet introduces structured state space models (SSMs) to model the correlations of graph signals at different frequencies and derives a unique rectification for each frequency in the graph spectrum. To the best of our knowledge, our work is the first to employ SSMs for the design of GNN spectral filters, and it theoretically offers greater expressive power compared with polynomial filters. Extensive experiments on nine public benchmarks reveal that GrassNet achieves superior performance in real-world graph modeling tasks. The code is available at: <span><span>https://github.com/Graph-ZKY/grassnet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113197"},"PeriodicalIF":7.6,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Amplitude-guided deep reinforcement learning for semi-supervised layer segmentation 半监督层分割的幅度引导深度强化学习
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-30 DOI: 10.1016/j.patcog.2026.113204
Enting Gao , Zian Zha , Yonggang Li , Junhui Zhu , Yong Wang , Xinjian Chen , Naihui Zhou , Dehui Xiang
Accurate segmentation of scalp tissue layers is essential for mechanistic studies and staging of androgenetic alopecia (AGA), a common form of hair loss that impacts quality of life and mental health. High-resolution magnetic resonance imaging (HR-MR) offers a promising assessment tool. However, accurate segmentation remains challenging due to the lack of large-scale annotated datasets, structural deformation, and low image quality. To address these issues, an Amplitude-guided Deep Reinforcement Learning (ADRL) framework is designed to decouple the data distribution of images and adaptively fuse into the distribution of unlabeled images. This enables effective feature learning of lamellar and asymmetrically thickened structures from both labeled and unlabeled data. Then, phase component alignment (PHA) is imposed to mitigate the adverse impacts of noise or artifacts. To further enhance the discriminative capability of this network, a Cross-Power Spectrum Correlation (CPSC) module is proposed to mitigate inaccurate segmentation of layer structures. Comprehensive experiments on a scalp HR-MR image dataset and a publicly available retinal OCT image dataset demonstrate that our method significantly outperforms state-of-the-art methods in semi-supervised layer segmentation.
雄激素性脱发(AGA)是一种影响生活质量和心理健康的常见脱发,准确分割头皮组织层对于雄激素性脱发的机制研究和分期至关重要。高分辨率磁共振成像(HR-MR)是一种很有前途的评估工具。然而,由于缺乏大规模的注释数据集、结构变形和低图像质量,准确分割仍然具有挑战性。为了解决这些问题,设计了一个幅度引导深度强化学习(ADRL)框架来解耦图像的数据分布,并自适应地融合到未标记图像的分布中。这使得从标记和未标记数据中有效地学习层状和不对称加厚结构的特征成为可能。然后,相位分量对准(PHA)被施加以减轻噪声或伪影的不利影响。为了进一步提高该网络的判别能力,提出了一个交叉功率谱相关(CPSC)模块,以减轻层结构的不准确分割。在头皮HR-MR图像数据集和公开可用的视网膜OCT图像数据集上的综合实验表明,我们的方法在半监督层分割方面明显优于最先进的方法。
{"title":"Amplitude-guided deep reinforcement learning for semi-supervised layer segmentation","authors":"Enting Gao ,&nbsp;Zian Zha ,&nbsp;Yonggang Li ,&nbsp;Junhui Zhu ,&nbsp;Yong Wang ,&nbsp;Xinjian Chen ,&nbsp;Naihui Zhou ,&nbsp;Dehui Xiang","doi":"10.1016/j.patcog.2026.113204","DOIUrl":"10.1016/j.patcog.2026.113204","url":null,"abstract":"<div><div>Accurate segmentation of scalp tissue layers is essential for mechanistic studies and staging of androgenetic alopecia (AGA), a common form of hair loss that impacts quality of life and mental health. High-resolution magnetic resonance imaging (HR-MR) offers a promising assessment tool. However, accurate segmentation remains challenging due to the lack of large-scale annotated datasets, structural deformation, and low image quality. To address these issues, an Amplitude-guided Deep Reinforcement Learning (ADRL) framework is designed to decouple the data distribution of images and adaptively fuse into the distribution of unlabeled images. This enables effective feature learning of lamellar and asymmetrically thickened structures from both labeled and unlabeled data. Then, phase component alignment (PHA) is imposed to mitigate the adverse impacts of noise or artifacts. To further enhance the discriminative capability of this network, a Cross-Power Spectrum Correlation (CPSC) module is proposed to mitigate inaccurate segmentation of layer structures. Comprehensive experiments on a scalp HR-MR image dataset and a publicly available retinal OCT image dataset demonstrate that our method significantly outperforms state-of-the-art methods in semi-supervised layer segmentation.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113204"},"PeriodicalIF":7.6,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal direction discovery via related conditional residual 通过相关条件残差发现因果方向
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-30 DOI: 10.1016/j.patcog.2026.113196
Shaofan Chen , Guoyuan He , Wentao Ma , Hao Zhang , Tongqing Zhou , Siwei Wang , Lichuan Gu
Causal discovery is a critical problem with broad applications across diverse research disciplines, where determining causal directions remains a major challenge. Most bi-variate causal direction discovery methods strongly depend on assumptions about data distributions and lack scalability to multivariate causal direction discovery. To this end, we propose Related Conditional Residual Proportion Causal Discovery (RPCD), a novel method for inferring causal directions from joint observation. Our general idea is that under the presence of causality without confounding, selection bias and feedback, the prediction error in the true causal direction is smaller. Specifically, we calculate the conditional expectation of the Related Conditional Residual Proportion (RCRP) under the aforementioned assumptions to distinguish the cause from the effect. In addition to bi-variate causal direction discovery, we extend RPCD to multivariate causal discovery by embedding it within constraint-based methods, which enables efficient causal direction discovery in complex causal structures. We theoretically analyze the conditions under which RCRP can reliably identify causal directions in both bi-variate and multivariate scenarios. To evaluate RPCD’s effectiveness, we conduct extensive experiments on synthetic and real-world datasets. Extensive experiments demonstrate that RPCD achieves state-of-the-art accuracy (0.84 average) on benchmark bi-variate datasets and significantly improves orientation accuracy in multivariate causal discovery. Our source code is available at https://github.com/CSF819/RPCD.
因果发现是一个在不同研究学科中广泛应用的关键问题,其中确定因果方向仍然是一个主要挑战。大多数双变量因果方向发现方法强烈依赖于对数据分布的假设,缺乏多变量因果方向发现的可扩展性。为此,我们提出了关联条件残差比例因果发现(RPCD),这是一种从联合观测推断因果方向的新方法。我们的总体思路是,在没有混杂、选择偏差和反馈的因果关系存在下,真实因果方向上的预测误差较小。具体而言,我们在上述假设下计算相关条件剩余比(RCRP)的条件期望,以区分因果关系。除了双变量因果方向发现之外,我们通过将RPCD嵌入到基于约束的方法中,将其扩展到多变量因果发现,从而能够在复杂的因果结构中有效地发现因果方向。我们从理论上分析了RCRP在双变量和多变量情景下可靠地识别因果方向的条件。为了评估RPCD的有效性,我们在合成数据集和实际数据集上进行了广泛的实验。大量实验表明,RPCD在基准双变量数据集上达到了最先进的精度(平均0.84),并显著提高了多变量因果发现的定向精度。我们的源代码可从https://github.com/CSF819/RPCD获得。
{"title":"Causal direction discovery via related conditional residual","authors":"Shaofan Chen ,&nbsp;Guoyuan He ,&nbsp;Wentao Ma ,&nbsp;Hao Zhang ,&nbsp;Tongqing Zhou ,&nbsp;Siwei Wang ,&nbsp;Lichuan Gu","doi":"10.1016/j.patcog.2026.113196","DOIUrl":"10.1016/j.patcog.2026.113196","url":null,"abstract":"<div><div>Causal discovery is a critical problem with broad applications across diverse research disciplines, where determining causal directions remains a major challenge. Most bi-variate causal direction discovery methods strongly depend on assumptions about data distributions and lack scalability to multivariate causal direction discovery. To this end, we propose Related Conditional Residual Proportion Causal Discovery (RPCD), a novel method for inferring causal directions from joint observation. Our general idea is that under the presence of causality without confounding, selection bias and feedback, the prediction error in the true causal direction is smaller. Specifically, we calculate the conditional expectation of the Related Conditional Residual Proportion (RCRP) under the aforementioned assumptions to distinguish the cause from the effect. In addition to bi-variate causal direction discovery, we extend RPCD to multivariate causal discovery by embedding it within constraint-based methods, which enables efficient causal direction discovery in complex causal structures. We theoretically analyze the conditions under which RCRP can reliably identify causal directions in both bi-variate and multivariate scenarios. To evaluate RPCD’s effectiveness, we conduct extensive experiments on synthetic and real-world datasets. Extensive experiments demonstrate that RPCD achieves state-of-the-art accuracy (0.84 average) on benchmark bi-variate datasets and significantly improves orientation accuracy in multivariate causal discovery. Our source code is available at <span><span>https://github.com/CSF819/RPCD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113196"},"PeriodicalIF":7.6,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Class-imbalanced graph contrastive clustering for sleep apnea prediction in mental health 心理健康睡眠呼吸暂停预测的类别不平衡图对比聚类
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-30 DOI: 10.1016/j.patcog.2026.113157
Xin Liu , Xinke Wang , Jinyang Huang , Dan Guo , Meng Wang
Obstructive Sleep Apnea (OSA), as a prevalent sleep disorder, has impacts that extend far beyond the physiological level. It disrupts the neural mechanisms underlying affective processing and cognitive control, leading to emotional dysregulation, anxiety, and depressive symptoms, which pose significant risks to mental health. Research in this area provides key insights into emotional dynamics and mental health assessment. In this paper, we consider class imbalance in OSA data and rethink the homophilous graph assumption, observing that tail-class samples can form identifiable subgraphs and the message passing mechanism enhances their representational capacity. Hence, we construct a homophilous graph based on respiratory samples using dynamic time warping. Building on this, we propose a Class-Aware Graph Contrastive Clustering (GA-GCC) framework to address class imbalance. Specifically, at first, a locally weighted graph contrastive learning module is utilized to mine same-class samples within low-order neighborhoods. Second, to further emphasize tail class, we design a frequency-weighted graph semantic clustering module that generates frequency-based weights using pseudo-labels derived from semantic clustering. Finally, a globally class-aware weight optimization module is develop to integrate local and frequency-based weights, enabling GA-GCC to adaptively balances class distributions and mitigates intra-class redundancy. Since the lack of public non-contact OSA datasets limits multimodal affective research, we release ROSA, a radar-based dataset for three-class sleep apnea prediction. Extensive experiments on ROSA and other long-tailed datasets demonstrate the superior performance of GA-GCC, highlighting its potential for clinical and mental health applications.
阻塞性睡眠呼吸暂停(OSA)作为一种常见的睡眠障碍,其影响远远超出了生理层面。它破坏了情感处理和认知控制的神经机制,导致情绪失调、焦虑和抑郁症状,对心理健康构成重大风险。这一领域的研究为情绪动力学和心理健康评估提供了关键的见解。在本文中,我们考虑了OSA数据中的类不平衡,并重新考虑了同态图假设,观察到尾类样本可以形成可识别的子图,并且消息传递机制增强了它们的表示能力。因此,我们利用动态时间规整构造了一个基于呼吸样本的同调图。在此基础上,我们提出了一个类感知图对比聚类(GA-GCC)框架来解决类不平衡问题。具体而言,首先利用局部加权图对比学习模块挖掘低阶邻域内的同类样本。其次,为了进一步强调尾类,我们设计了一个频率加权图语义聚类模块,该模块使用从语义聚类中派生的伪标签生成基于频率的权重。最后,开发了全局类感知权重优化模块,集成了局部和基于频率的权重,使GA-GCC能够自适应平衡类分布并减轻类内冗余。由于缺乏公开的非接触式睡眠呼吸暂停数据集限制了多模态情感研究,我们发布了ROSA,一个基于雷达的三级睡眠呼吸暂停预测数据集。在ROSA和其他长尾数据集上的大量实验证明了GA-GCC的优越性能,突出了其在临床和心理健康应用方面的潜力。
{"title":"Class-imbalanced graph contrastive clustering for sleep apnea prediction in mental health","authors":"Xin Liu ,&nbsp;Xinke Wang ,&nbsp;Jinyang Huang ,&nbsp;Dan Guo ,&nbsp;Meng Wang","doi":"10.1016/j.patcog.2026.113157","DOIUrl":"10.1016/j.patcog.2026.113157","url":null,"abstract":"<div><div>Obstructive Sleep Apnea (OSA), as a prevalent sleep disorder, has impacts that extend far beyond the physiological level. It disrupts the neural mechanisms underlying affective processing and cognitive control, leading to emotional dysregulation, anxiety, and depressive symptoms, which pose significant risks to mental health. Research in this area provides key insights into emotional dynamics and mental health assessment. In this paper, we consider class imbalance in OSA data and rethink the homophilous graph assumption, observing that tail-class samples can form identifiable subgraphs and the message passing mechanism enhances their representational capacity. Hence, we construct a homophilous graph based on respiratory samples using dynamic time warping. Building on this, we propose a Class-Aware Graph Contrastive Clustering (GA-GCC) framework to address class imbalance. Specifically, at first, a locally weighted graph contrastive learning module is utilized to mine same-class samples within low-order neighborhoods. Second, to further emphasize tail class, we design a frequency-weighted graph semantic clustering module that generates frequency-based weights using pseudo-labels derived from semantic clustering. Finally, a globally class-aware weight optimization module is develop to integrate local and frequency-based weights, enabling GA-GCC to adaptively balances class distributions and mitigates intra-class redundancy. Since the lack of public non-contact OSA datasets limits multimodal affective research, we release ROSA, a radar-based dataset for three-class sleep apnea prediction. Extensive experiments on ROSA and other long-tailed datasets demonstrate the superior performance of GA-GCC, highlighting its potential for clinical and mental health applications.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113157"},"PeriodicalIF":7.6,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lifelong scene graph generation 终身场景图生成
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-30 DOI: 10.1016/j.patcog.2026.113132
Tao He , Xin Hu , Tongtong Wu , Dongyang Zhang , Ming Li , Yuan-Fang Li , Fei Richard Yu
Scene Graph Generation (SGG) aims to predict visual relationships between object pairs in an image. Existing SGG approaches typically adopt a one-time training paradigm, which requires retraining on the entire dataset when new relationship types emerge-an impractical solution that leads to catastrophic forgetting. In this work, we introduce Lifelong Scene Graph Generation (LSGG), a challenging and practical setting where predicates arrive sequentially in a streaming fashion. We propose ICSGG, a novel in-context learning framework that reformulates visual features into symbolic textual tokens compatible with pre-trained language models. To retain prior knowledge while adapting to new tasks, ICSGG employs a knowledge-aware prompt retrieval strategy that selects relevant exemplars as in-context demonstrations for each query. This enables effective continual learning through prompt-based reasoning. Extensive experiments on two large-scale benchmarks-Visual Genome (VG) and Open Images v6-demonstrate that our method significantly outperforms existing SGG models in both lifelong and conventional settings, e.g., with about 4 ∼ 5% points better than the state-of-the-art PGSG.
场景图生成(Scene Graph Generation, SGG)旨在预测图像中物体对之间的视觉关系。现有的SGG方法通常采用一次性训练范式,当新的关系类型出现时,需要对整个数据集进行重新训练,这是一种导致灾难性遗忘的不切实际的解决方案。在这项工作中,我们介绍了终身场景图生成(LSGG),这是一个具有挑战性和实用性的设置,其中谓词以流方式顺序到达。我们提出了ICSGG,这是一个新的上下文学习框架,它将视觉特征重新表述为与预训练的语言模型兼容的符号文本标记。为了在适应新任务的同时保留先前的知识,ICSGG采用了一种知识感知的提示检索策略,该策略为每个查询选择相关的示例作为上下文演示。这可以通过基于提示的推理实现有效的持续学习。在两个大规模基准-视觉基因组(VG)和开放图像(Open Images) -上进行的大量实验表明,我们的方法在终身和传统设置中都明显优于现有的SGG模型,例如,比最先进的PGSG高出约4 ~ 5%。
{"title":"Lifelong scene graph generation","authors":"Tao He ,&nbsp;Xin Hu ,&nbsp;Tongtong Wu ,&nbsp;Dongyang Zhang ,&nbsp;Ming Li ,&nbsp;Yuan-Fang Li ,&nbsp;Fei Richard Yu","doi":"10.1016/j.patcog.2026.113132","DOIUrl":"10.1016/j.patcog.2026.113132","url":null,"abstract":"<div><div>Scene Graph Generation (SGG) aims to predict visual relationships between object pairs in an image. Existing SGG approaches typically adopt a one-time training paradigm, which requires retraining on the entire dataset when new relationship types emerge-an impractical solution that leads to catastrophic forgetting. In this work, we introduce Lifelong Scene Graph Generation (LSGG), a challenging and practical setting where predicates arrive sequentially in a streaming fashion. We propose ICSGG, a novel in-context learning framework that reformulates visual features into symbolic textual tokens compatible with pre-trained language models. To retain prior knowledge while adapting to new tasks, ICSGG employs a knowledge-aware prompt retrieval strategy that selects relevant exemplars as in-context demonstrations for each query. This enables effective continual learning through prompt-based reasoning. Extensive experiments on two large-scale benchmarks-Visual Genome (VG) and Open Images v<sub>6</sub>-demonstrate that our method significantly outperforms existing SGG models in both lifelong and conventional settings, e.g., with about 4 ∼ 5% points better than the state-of-the-art PGSG.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113132"},"PeriodicalIF":7.6,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1