首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
Tadmo: A tabular distance measure with move operations 带移动操作的表格式距离测量
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-15 DOI: 10.1016/j.patrec.2025.11.009
Dirko Coetsee , Steve Kroon , Ralf Kistner , Adem Kikaj , McElory Hoffmann , Luc De Raedt
Tabular data is ubiquitous in pattern recognition, yet accurately measuring differences between tables remains challenging. Conventional methods rely on cell substitutions and row/column insertions and deletions, often overestimating the difference when cells are simply repositioned. We propose a distance metric that considers move operations, capturing structural changes more faithfully. Although exact computation is NP-complete, a greedy approach computes an effective approximation in practice. Experimental results on real-world datasets demonstrate that our approach yields a more compact and intuitive measure of table dissimilarity, enhancing applications such as clustering, table extraction evaluation, and version history recovery.
表格数据在模式识别中无处不在,但准确测量表之间的差异仍然具有挑战性。传统的方法依赖于细胞替换和行/列插入和删除,当细胞只是重新定位时,往往高估了差异。我们提出了一个考虑移动操作的距离度量,更忠实地捕捉结构变化。虽然精确计算是np完全的,但贪婪方法在实际中计算出一个有效的近似。在真实数据集上的实验结果表明,我们的方法产生了更紧凑和直观的表不相似性度量,增强了诸如聚类、表提取评估和版本历史恢复等应用程序。
{"title":"Tadmo: A tabular distance measure with move operations","authors":"Dirko Coetsee ,&nbsp;Steve Kroon ,&nbsp;Ralf Kistner ,&nbsp;Adem Kikaj ,&nbsp;McElory Hoffmann ,&nbsp;Luc De Raedt","doi":"10.1016/j.patrec.2025.11.009","DOIUrl":"10.1016/j.patrec.2025.11.009","url":null,"abstract":"<div><div>Tabular data is ubiquitous in pattern recognition, yet accurately measuring differences between tables remains challenging. Conventional methods rely on cell substitutions and row/column insertions and deletions, often overestimating the difference when cells are simply repositioned. We propose a distance metric that considers move operations, capturing structural changes more faithfully. Although exact computation is NP-complete, a greedy approach computes an effective approximation in practice. Experimental results on real-world datasets demonstrate that our approach yields a more compact and intuitive measure of table dissimilarity, enhancing applications such as clustering, table extraction evaluation, and version history recovery.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 212-218"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discriminative response pruning for robust and efficient deep networks under label noise 标签噪声下鲁棒高效深度网络的判别响应剪枝
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-13 DOI: 10.1016/j.patrec.2025.11.025
Shuwen Jin, Junzhu Mao, Zeren Sun, Yazhou Yao
Pruning is widely recognized as a promising approach for reducing the computational and storage demands of deep neural networks, facilitating lightweight model deployment on resource-limited devices. However, most existing pruning techniques assume the availability of accurate training labels, overlooking the prevalence of noisy labels in real-world settings. Deep networks have strong memorization capability, making them prone to overfitting noisy labels and thereby sensitive to the removal of network parameters. As a result, existing methods often encounter limitations when directly applied to the task of pruning models trained with noisy labels. To this end, we propose Discriminative Response Pruning (DRP) to robustly prune models trained with noisy labels. Specifically, DRP begins by identifying clean and noisy samples and reorganizing them into class-specific subsets. Then, it estimates the importance of model parameters by evaluating their responses to each subset, rewarding parameters exhibiting strong responses to clean data and penalizing those overfitting to noisy data. A class-wise reweighted aggregation strategy is then employed to compute the final importance score, which guides the pruning decisions. Extensive experiments across various models and noise conditions are conducted to demonstrate the efficacy and robustness of our method.
修剪被广泛认为是一种很有前途的方法,可以减少深度神经网络的计算和存储需求,促进在资源有限的设备上部署轻量级模型。然而,大多数现有的修剪技术假设了准确训练标签的可用性,忽略了现实环境中噪声标签的普遍存在。深度网络具有较强的记忆能力,容易出现噪声标签过拟合,因此对网络参数的去除较为敏感。因此,现有的方法在直接应用于使用噪声标签训练的模型剪枝任务时往往会遇到局限性。为此,我们提出了判别响应剪枝(Discriminative Response Pruning, DRP)来对带有噪声标签训练的模型进行鲁棒剪枝。具体来说,DRP首先识别干净和有噪声的样本,并将它们重新组织成特定类别的子集。然后,它通过评估模型参数对每个子集的响应来估计模型参数的重要性,奖励对干净数据表现出强烈响应的参数,惩罚那些对噪声数据过拟合的参数。然后采用类重加权聚合策略来计算最终的重要性分数,从而指导修剪决策。在各种模型和噪声条件下进行了广泛的实验,以证明我们的方法的有效性和鲁棒性。
{"title":"Discriminative response pruning for robust and efficient deep networks under label noise","authors":"Shuwen Jin,&nbsp;Junzhu Mao,&nbsp;Zeren Sun,&nbsp;Yazhou Yao","doi":"10.1016/j.patrec.2025.11.025","DOIUrl":"10.1016/j.patrec.2025.11.025","url":null,"abstract":"<div><div>Pruning is widely recognized as a promising approach for reducing the computational and storage demands of deep neural networks, facilitating lightweight model deployment on resource-limited devices. However, most existing pruning techniques assume the availability of accurate training labels, overlooking the prevalence of noisy labels in real-world settings. Deep networks have strong memorization capability, making them prone to overfitting noisy labels and thereby sensitive to the removal of network parameters. As a result, existing methods often encounter limitations when directly applied to the task of pruning models trained with noisy labels. To this end, we propose Discriminative Response Pruning (DRP) to robustly prune models trained with noisy labels. Specifically, DRP begins by identifying clean and noisy samples and reorganizing them into class-specific subsets. Then, it estimates the importance of model parameters by evaluating their responses to each subset, rewarding parameters exhibiting strong responses to clean data and penalizing those overfitting to noisy data. A class-wise reweighted aggregation strategy is then employed to compute the final importance score, which guides the pruning decisions. Extensive experiments across various models and noise conditions are conducted to demonstrate the efficacy and robustness of our method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 170-177"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAM-guided prompt learning for Multiple Sclerosis lesion segmentation sam引导下的多发性硬化症病灶分割提示学习
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-17 DOI: 10.1016/j.patrec.2025.11.018
Federica Proietto Salanitri , Giovanni Bellitto , Salvatore Calcagno , Ulas Bagci , Concetto Spampinato , Manuela Pennisi
Accurate segmentation of Multiple Sclerosis (MS) lesions remains a critical challenge in medical image analysis due to their small size, irregular shape, and sparse distribution. Despite recent progress in vision foundation models — such as SAM and its medical variant MedSAM — these models have not yet been explored in the context of MS lesion segmentation. Moreover, their reliance on manually crafted prompts and high inference-time computational cost limits their applicability in clinical workflows, especially in resource-constrained environments. In this work, we introduce a novel training-time framework for effective and efficient MS lesion segmentation. Our method leverages SAM solely during training to guide a prompt learner that automatically discovers task-specific embeddings. At inference, SAM is replaced by a lightweight convolutional aggregator that maps the learned embeddings directly into segmentation masks—enabling fully automated, low-cost deployment. We show that our approach significantly outperforms existing specialized methods on the public MSLesSeg dataset, establishing new performance benchmarks in a domain where foundation models had not previously been applied. To assess generalizability, we also evaluate our method on pancreas and prostate segmentation tasks, where it achieves competitive accuracy while requiring an order of magnitude fewer parameters and computational resources compared to SAM-based pipelines. By eliminating the need for foundation models at inference time, our framework enables efficient segmentation without sacrificing accuracy. This design bridges the gap between large-scale pretraining and real-world clinical deployment, offering a scalable and practical solution for MS lesion segmentation and beyond. Code is available at https://github.com/perceivelab/MS-SAM-LESS.
由于多发性硬化症(MS)病变体积小、形状不规则、分布稀疏,其准确分割一直是医学图像分析中的一个关键挑战。尽管最近在视觉基础模型(如SAM及其医学变体MedSAM)方面取得了进展,但这些模型尚未在MS病变分割的背景下进行探索。此外,它们对手工制作提示和高推断时间计算成本的依赖限制了它们在临床工作流程中的适用性,特别是在资源受限的环境中。在这项工作中,我们引入了一种新的训练时间框架,用于有效和高效的MS病变分割。我们的方法仅在训练期间利用SAM来指导快速学习者自动发现特定于任务的嵌入。在推理中,SAM被一个轻量级的卷积聚合器取代,该聚合器将学习到的嵌入直接映射到分割掩码中,从而实现全自动、低成本的部署。我们表明,我们的方法在公共MSLesSeg数据集上显著优于现有的专门方法,在以前没有应用基础模型的领域建立了新的性能基准。为了评估泛化性,我们还在胰腺和前列腺分割任务中评估了我们的方法,与基于sam的管道相比,它在需要更少的参数和计算资源的同时达到了相当的准确性。通过在推理时消除对基础模型的需求,我们的框架可以在不牺牲准确性的情况下实现有效的分割。该设计弥合了大规模预训练和实际临床部署之间的差距,为MS病变分割等提供了可扩展和实用的解决方案。代码可从https://github.com/perceivelab/MS-SAM-LESS获得。
{"title":"SAM-guided prompt learning for Multiple Sclerosis lesion segmentation","authors":"Federica Proietto Salanitri ,&nbsp;Giovanni Bellitto ,&nbsp;Salvatore Calcagno ,&nbsp;Ulas Bagci ,&nbsp;Concetto Spampinato ,&nbsp;Manuela Pennisi","doi":"10.1016/j.patrec.2025.11.018","DOIUrl":"10.1016/j.patrec.2025.11.018","url":null,"abstract":"<div><div>Accurate segmentation of Multiple Sclerosis (MS) lesions remains a critical challenge in medical image analysis due to their small size, irregular shape, and sparse distribution. Despite recent progress in vision foundation models — such as SAM and its medical variant MedSAM — these models have not yet been explored in the context of MS lesion segmentation. Moreover, their reliance on manually crafted prompts and high inference-time computational cost limits their applicability in clinical workflows, especially in resource-constrained environments. In this work, we introduce a novel training-time framework for effective and efficient MS lesion segmentation. Our method leverages SAM solely during training to guide a prompt learner that automatically discovers task-specific embeddings. At inference, SAM is replaced by a lightweight convolutional aggregator that maps the learned embeddings directly into segmentation masks—enabling fully automated, low-cost deployment. We show that our approach significantly outperforms existing specialized methods on the public MSLesSeg dataset, establishing new performance benchmarks in a domain where foundation models had not previously been applied. To assess generalizability, we also evaluate our method on pancreas and prostate segmentation tasks, where it achieves competitive accuracy while requiring an order of magnitude fewer parameters and computational resources compared to SAM-based pipelines. By eliminating the need for foundation models at inference time, our framework enables efficient segmentation without sacrificing accuracy. This design bridges the gap between large-scale pretraining and real-world clinical deployment, offering a scalable and practical solution for MS lesion segmentation and beyond. Code is available at <span><span>https://github.com/perceivelab/MS-SAM-LESS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 205-211"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer-based dynamic cell bounding box refinement for end-to-end Table Structure Recognition 基于变压器的端到端表结构识别的动态单元格边界盒细化
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-10 DOI: 10.1016/j.patrec.2025.11.011
Yang Xue, Haosheng Cai, Zhuoming Li, Lianwen Jin
Table Structure Recognition (TSR) can adopt image-to-sequence solutions to predict both logical and physical structure simultaneously. However, while these models excel at identifying the logical structure, they often struggle with accurate cell detection. To address this challenge, we propose a Transformer-based Dynamic cell bounding Box refinement for end-to-end TSR, named DynamicBoxTransformer. Specifically, we incorporate a cell bounding box regression decoder, which takes the output of the HTML sequence decoder as input. The cell regression decoder uses reference bounding box coordinates to create spatial queries that provide explicit guidance to key areas and enhance the accuracy of cell bounding boxes layer by layer. To mitigate error accumulation, we introduce denoising training, particularly focusing on the offset of rows and columns. In addition, we design masks that enable the model to make full use of contextual information. Experimental results show that our DynamicBoxTransformer achieves competitive performance on natural scene table datasets. Compared to previous image-to-sequence approaches, DynamicBoxTransformer demonstrates significant improvements in accurate cell detection.
表结构识别(TSR)可以采用图像到序列的方法同时预测逻辑结构和物理结构。然而,虽然这些模型在识别逻辑结构方面表现出色,但它们往往难以准确地检测细胞。为了解决这一挑战,我们提出了一个基于transformer的端到端TSR动态单元边界盒改进,命名为DynamicBoxTransformer。具体来说,我们结合了一个单元格边界框回归解码器,它将HTML序列解码器的输出作为输入。单元格回归解码器使用参考边界框坐标创建空间查询,为关键区域提供明确的指导,并逐层提高单元格边界框的准确性。为了减少误差积累,我们引入去噪训练,特别关注行和列的偏移量。此外,我们还设计了遮罩,使模型能够充分利用上下文信息。实验结果表明,DynamicBoxTransformer在自然场景表数据集上取得了较好的性能。与以前的图像到序列方法相比,DynamicBoxTransformer在准确的细胞检测方面有了显着改进。
{"title":"Transformer-based dynamic cell bounding box refinement for end-to-end Table Structure Recognition","authors":"Yang Xue,&nbsp;Haosheng Cai,&nbsp;Zhuoming Li,&nbsp;Lianwen Jin","doi":"10.1016/j.patrec.2025.11.011","DOIUrl":"10.1016/j.patrec.2025.11.011","url":null,"abstract":"<div><div>Table Structure Recognition (TSR) can adopt image-to-sequence solutions to predict both logical and physical structure simultaneously. However, while these models excel at identifying the logical structure, they often struggle with accurate cell detection. To address this challenge, we propose a Transformer-based Dynamic cell bounding Box refinement for end-to-end TSR, named DynamicBoxTransformer. Specifically, we incorporate a cell bounding box regression decoder, which takes the output of the HTML sequence decoder as input. The cell regression decoder uses reference bounding box coordinates to create spatial queries that provide explicit guidance to key areas and enhance the accuracy of cell bounding boxes layer by layer. To mitigate error accumulation, we introduce denoising training, particularly focusing on the offset of rows and columns. In addition, we design masks that enable the model to make full use of contextual information. Experimental results show that our DynamicBoxTransformer achieves competitive performance on natural scene table datasets. Compared to previous image-to-sequence approaches, DynamicBoxTransformer demonstrates significant improvements in accurate cell detection.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 106-112"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Additive decomposition of one-dimensional signals using Transformers 基于变压器的一维信号加性分解
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-17 DOI: 10.1016/j.patrec.2025.11.002
Samuele Salti , Andrea Pinto , Alessandro Lanza , Serena Morigi
One-dimensional signal decomposition is a well-established and widely used technique across various scientific fields. It serves as a highly valuable pre-processing step for data analysis. While traditional decomposition techniques often rely on mathematical models, recent research suggests that applying the latest deep learning models to this very ill-posed inverse problem represents an exciting, unexplored area with promising potential. This work presents a novel method for the additive decomposition of one-dimensional signals. We leverage the Transformer architecture to decompose signals into their constituent components: piecewise constant, smooth (trend), highly-oscillatory, and noise components. Our model, trained on synthetic data, achieves excellent accuracy in modeling and decomposing input signals from the same distribution, as demonstrated by the experimental results.
一维信号分解是一种成熟且广泛应用于各个科学领域的技术。它是数据分析中非常有价值的预处理步骤。虽然传统的分解技术通常依赖于数学模型,但最近的研究表明,将最新的深度学习模型应用于这个非常不适定的逆问题代表了一个令人兴奋的、尚未开发的领域,具有很大的潜力。本文提出了一种一维信号加性分解的新方法。我们利用Transformer架构将信号分解为它们的组成组件:分段常量、平滑(趋势)、高振荡和噪声组件。实验结果表明,我们的模型在模拟和分解来自相同分布的输入信号方面取得了优异的精度。
{"title":"Additive decomposition of one-dimensional signals using Transformers","authors":"Samuele Salti ,&nbsp;Andrea Pinto ,&nbsp;Alessandro Lanza ,&nbsp;Serena Morigi","doi":"10.1016/j.patrec.2025.11.002","DOIUrl":"10.1016/j.patrec.2025.11.002","url":null,"abstract":"<div><div>One-dimensional signal decomposition is a well-established and widely used technique across various scientific fields. It serves as a highly valuable pre-processing step for data analysis. While traditional decomposition techniques often rely on mathematical models, recent research suggests that applying the latest deep learning models to this very ill-posed inverse problem represents an exciting, unexplored area with promising potential. This work presents a novel method for the additive decomposition of one-dimensional signals. We leverage the Transformer architecture to decompose signals into their constituent components: piecewise constant, smooth (trend), highly-oscillatory, and noise components. Our model, trained on synthetic data, achieves excellent accuracy in modeling and decomposing input signals from the same distribution, as demonstrated by the experimental results.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 239-245"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145617788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Modal masked autoencoder and parallel Mamba for 3D brain tumor segmentation 用于三维脑肿瘤分割的多模态蒙面自编码器和并行曼巴
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-01 DOI: 10.1016/j.patrec.2025.10.020
Yaya Huang , Litong Liu , Tianzhen Zhang , Sisi Wang , Chee-Ming Ting
Accurate segmentation of brain tumors from multimodal MRI is essential for diagnosis and treatment planning. However, most existing approaches can only process single type of data modality, without exploiting the complementary information across different modalities. To overcome this limitation, a novel framework called MFMamba which integrates modality-aware masked autoencoder pretraining, a gated fusion strategy, and a Mamba-based backbone for efficient long-range modeling is proposed. In this design, one modality is fully masked while others are partially masked, forcing the network to reconstruct missing data through cross-modal learning. The gated fusion module then selectively incorporates generative priors into task-specific features, enhancing multimodal representations. Experimental results on the BraTS 2023 dataset show that MFMamba achieves Dice score of 93.77% for Whole Tumor and 92.69% for Tumor Core, corresponding to 1.6–2.1% improvements over state-of-the-art baselines. The gains are statistically significant (p<0.05), indicating the framework’s ability to deliver more precise tumor boundary delineation. Overall, the results suggest that modality-aware fusion can enhance segmentation quality while maintaining computational efficiency, underscoring its potential application for clinical image analysis. The implementation is publicly available at https://github.com/ministerhuang/MFMamba.
从多模态MRI中准确分割脑肿瘤对诊断和治疗计划至关重要。然而,大多数现有方法只能处理单一类型的数据模态,而不能利用不同模态之间的互补信息。为了克服这一限制,提出了一种新的框架MFMamba,该框架集成了模态感知掩膜自编码器预训练、门控融合策略和基于mamba的高效远程建模主干。在这个设计中,一个模态被完全屏蔽,而其他模态被部分屏蔽,迫使网络通过跨模态学习来重建缺失的数据。然后,门控融合模块选择性地将生成先验合并到特定于任务的特征中,增强多模态表示。在BraTS 2023数据集上的实验结果表明,MFMamba对整个肿瘤的Dice得分为93.77%,对肿瘤核心的Dice得分为92.69%,比最先进的基线提高了1.6-2.1%。结果具有统计学意义(p<0.05),表明该框架能够提供更精确的肿瘤边界划分。总体而言,结果表明,模式感知融合可以在保持计算效率的同时提高分割质量,强调其在临床图像分析中的潜在应用。该实现可在https://github.com/ministerhuang/MFMamba上公开获得。
{"title":"Multi-Modal masked autoencoder and parallel Mamba for 3D brain tumor segmentation","authors":"Yaya Huang ,&nbsp;Litong Liu ,&nbsp;Tianzhen Zhang ,&nbsp;Sisi Wang ,&nbsp;Chee-Ming Ting","doi":"10.1016/j.patrec.2025.10.020","DOIUrl":"10.1016/j.patrec.2025.10.020","url":null,"abstract":"<div><div>Accurate segmentation of brain tumors from multimodal MRI is essential for diagnosis and treatment planning. However, most existing approaches can only process single type of data modality, without exploiting the complementary information across different modalities. To overcome this limitation, a novel framework called MFMamba which integrates modality-aware masked autoencoder pretraining, a gated fusion strategy, and a Mamba-based backbone for efficient long-range modeling is proposed. In this design, one modality is fully masked while others are partially masked, forcing the network to reconstruct missing data through cross-modal learning. The gated fusion module then selectively incorporates generative priors into task-specific features, enhancing multimodal representations. Experimental results on the BraTS 2023 dataset show that MFMamba achieves Dice score of 93.77% for Whole Tumor and 92.69% for Tumor Core, corresponding to 1.6–2.1% improvements over state-of-the-art baselines. The gains are statistically significant (<span><math><mrow><mi>p</mi><mo>&lt;</mo><mn>0</mn><mo>.</mo><mn>05</mn></mrow></math></span>), indicating the framework’s ability to deliver more precise tumor boundary delineation. Overall, the results suggest that modality-aware fusion can enhance segmentation quality while maintaining computational efficiency, underscoring its potential application for clinical image analysis. The implementation is publicly available at <span><span>https://github.com/ministerhuang/MFMamba</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 40-46"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable multimodal brain imaging through a multiple-branch neural network 通过多分支神经网络可解释的多模态脑成像
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-22 DOI: 10.1016/j.patrec.2025.11.030
Giuseppe Placidi , Alessia Cipriani , Michele Nappi , Matteo Polsinelli
Brain studies require the use of several complementary imaging modalities. When some modality is unavailable, Artificial Intelligence (AI) has recently provided ways to estimate them. Radiologists modulate the use of the available modalities depending on the task they have to perform. We aim to trace artificially the radiological process through a multibranch neural network architecture, the StarNet. The goal is to explain how and where different imaging modalities, either really collected or artificially reconstructed, are used in different radiological tasks by reading inside the structure of the network. To do that, StarNet includes several satellite networks, one per source modality, connected at each layer by a central unit. This design enables us to assess the contribution of each imaging modality, identifying where the contribution occurs, and to quantify the variations if certain modalities are substituted with AI-generated counterparts. The ultimate goal is to enable data-related and task-related ablation studies through the complete explainability of StarNet, thus offering radiologists clear guidance on which imaging sequences contribute to the task, to what extent, and at which stages of the process. As an example, we applied the proposed architecture to the 2D slices extracted from 3D volumes acquired with multimodal magnetic resonance imaging (MRI), to assess: 1. The role of the used imaging modalities; 2. The change in role when the radiological task changes; 3. The effects of synthetic data on the process. The results are presented and discussed.
脑研究需要使用几种互补的成像方式。当某些形态不可用时,人工智能(AI)最近提供了评估它们的方法。放射科医生根据他们必须执行的任务调整可用模式的使用。我们的目标是通过一个多分支神经网络架构,即StarNet,人工地追踪放射过程。目的是通过解读神经网络的内部结构,解释不同的成像模式(无论是真实收集的还是人工重建的)如何以及在哪里被用于不同的放射学任务。为了做到这一点,StarNet包括几个卫星网络,每个源模式一个,每层由一个中心单元连接。这种设计使我们能够评估每种成像模式的贡献,确定贡献发生的位置,并量化某些模式被人工智能生成的对应模式所取代时的变化。最终目标是通过StarNet的完全可解释性来实现与数据相关和与任务相关的消融研究,从而为放射科医生提供明确的指导,说明哪些成像序列有助于任务,在多大程度上以及在过程的哪个阶段。作为一个例子,我们将所提出的架构应用于从多模态磁共振成像(MRI)获得的3D体中提取的2D切片,以评估:1。所使用的成像模式的作用;2. 放射任务变化时角色的变化;3. 合成数据对过程的影响。给出了实验结果并进行了讨论。
{"title":"Explainable multimodal brain imaging through a multiple-branch neural network","authors":"Giuseppe Placidi ,&nbsp;Alessia Cipriani ,&nbsp;Michele Nappi ,&nbsp;Matteo Polsinelli","doi":"10.1016/j.patrec.2025.11.030","DOIUrl":"10.1016/j.patrec.2025.11.030","url":null,"abstract":"<div><div>Brain studies require the use of several complementary imaging modalities. When some modality is unavailable, Artificial Intelligence (AI) has recently provided ways to estimate them. Radiologists modulate the use of the available modalities depending on the task they have to perform. We aim to trace artificially the radiological process through a multibranch neural network architecture, the StarNet. The goal is to explain how and where different imaging modalities, either really collected or artificially reconstructed, are used in different radiological tasks by reading inside the structure of the network. To do that, StarNet includes several satellite networks, one per source modality, connected at each layer by a central unit. This design enables us to assess the contribution of each imaging modality, identifying where the contribution occurs, and to quantify the variations if certain modalities are substituted with AI-generated counterparts. The ultimate goal is to enable data-related and task-related ablation studies through the complete explainability of StarNet, thus offering radiologists clear guidance on which imaging sequences contribute to the task, to what extent, and at which stages of the process. As an example, we applied the proposed architecture to the 2D slices extracted from 3D volumes acquired with multimodal magnetic resonance imaging (MRI), to assess: 1. The role of the used imaging modalities; 2. The change in role when the radiological task changes; 3. The effects of synthetic data on the process. The results are presented and discussed.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 254-260"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145617787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mind the data: Evaluating data quality sensitivity in medical LLMs 关注数据:评估医学法学硕士的数据质量敏感性
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-04 DOI: 10.1016/j.patrec.2025.11.007
Xiaodong Han , Yibing Zhan , Jun Ni , Baosheng Yu , Dapeng Tao
Large language models (LLMs) are increasingly deployed in medical applications, yet the sensitivity of these systems to input data quality has been underexplored. To address this issue, this paper constructs a low-quality medical records (LQMR) dataset to systematically simulate three common categories of structured data anomalies: missing values, plausibility errors, and conformance violations. This resource is employed to evaluate the performance of both general-purpose LLMs (e.g., GPT, DeepSeek) and medicine-specific LLMs on diagnostic tasks under controlled data degradation. Our experiments show that data anomalies significantly degrade diagnostic accuracy, with plausibility errors having the most detrimental effect. For instance, performance drops by up to 16.79% when plausibility errors are introduced, while conformance violations cause a 11.45% drop. Moreover, we find that current LLMs, including domain-specific models, struggle to detect subtle yet critical errors in clinical records, often leading to incorrect diagnoses. These findings underscore the critical importance of data quality in medical AI applications. We also explore future directions, including the need for anomaly-aware training, data quality conditioning, and integrating symbolic medical knowledge to enhance model robustness and error detection in real-world clinical settings. We hope our findings could contribute to the development of more reliable and resilient medical AI systems.
大型语言模型(llm)越来越多地应用于医疗应用,但这些系统对输入数据质量的敏感性尚未得到充分探索。为了解决这一问题,本文构建了一个低质量医疗记录(LQMR)数据集,系统地模拟了三种常见的结构化数据异常:缺失值、合理性错误和一致性违规。该资源用于评估通用llm(例如,GPT, DeepSeek)和医学专用llm在受控数据退化下诊断任务的性能。我们的实验表明,数据异常显著降低了诊断的准确性,其中似是而非的错误具有最不利的影响。例如,当引入合理性错误时,性能下降高达16.79%,而一致性违反导致11.45%的下降。此外,我们发现目前的法学硕士,包括特定领域的模型,很难发现临床记录中细微但关键的错误,经常导致错误的诊断。这些发现强调了数据质量在医疗人工智能应用中的至关重要性。我们还探讨了未来的发展方向,包括对异常感知训练、数据质量调节和整合符号医学知识的需求,以增强现实世界临床环境中的模型鲁棒性和错误检测。我们希望我们的发现能有助于开发更可靠、更有弹性的医疗人工智能系统。
{"title":"Mind the data: Evaluating data quality sensitivity in medical LLMs","authors":"Xiaodong Han ,&nbsp;Yibing Zhan ,&nbsp;Jun Ni ,&nbsp;Baosheng Yu ,&nbsp;Dapeng Tao","doi":"10.1016/j.patrec.2025.11.007","DOIUrl":"10.1016/j.patrec.2025.11.007","url":null,"abstract":"<div><div>Large language models (LLMs) are increasingly deployed in medical applications, yet the sensitivity of these systems to input data quality has been underexplored. To address this issue, this paper constructs a low-quality medical records (LQMR) dataset to systematically simulate three common categories of structured data anomalies: missing values, plausibility errors, and conformance violations. This resource is employed to evaluate the performance of both general-purpose LLMs (e.g., GPT, DeepSeek) and medicine-specific LLMs on diagnostic tasks under controlled data degradation. Our experiments show that data anomalies significantly degrade diagnostic accuracy, with plausibility errors having the most detrimental effect. For instance, performance drops by up to 16.79% when plausibility errors are introduced, while conformance violations cause a 11.45% drop. Moreover, we find that current LLMs, including domain-specific models, struggle to detect subtle yet critical errors in clinical records, often leading to incorrect diagnoses. These findings underscore the critical importance of data quality in medical AI applications. We also explore future directions, including the need for anomaly-aware training, data quality conditioning, and integrating symbolic medical knowledge to enhance model robustness and error detection in real-world clinical settings. We hope our findings could contribute to the development of more reliable and resilient medical AI systems.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 68-74"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collaborative feature alignment with global–local fusion for fine-grained sketch-based image retrieval 基于全局-局部融合的细粒度素描图像检索协同特征对齐
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-06 DOI: 10.1016/j.patrec.2025.11.003
Xuan Zhang , Ming Zhao , Lixiang Ma
The demand for fine-grained sketch-based image retrieval is rapidly growing. However, it faces two major challenges: the difficulty in capturing fine-grained details and the large domain gap between modalities. To address these challenges, we propose a novel framework: collaborative feature alignment with global–local fusion network, including fine-grained mask-based feature extraction module, global–local adaptive normalization feature fusion module, feature completion and augmentation module and collaborative feature alignment strategy. Specifically, we introduce a channel attention based mask to direct the network’s focus towards detailed regions to capture fine-grained information. Then, the dual-level adaptive normalization fusion mechanism is employed to align style discrepancies at both global and local levels, facilitating more consistent representations. Features are disentangled into style-related representations and structure-related representations, and style-related information is cross-modal supplemented to enhance feature expressiveness. Additionally, an alignment loss is introduced, enabling efficient retrieval while avoiding additional alignment during inference. Extensive experiments are conducted on QMUL-ShoeV2 and QMUL-ChairV2 datasets validate the effectiveness of the proposed method.
对基于草图的细粒度图像检索的需求正在迅速增长。然而,它面临着两个主要的挑战:难以捕获细粒度的细节和模式之间的大域差距。为了解决这些问题,我们提出了一种新的框架:基于全局-局部融合网络的协同特征对齐,包括基于细粒度掩模的特征提取模块、全局-局部自适应归一化特征融合模块、特征补全和增强模块以及协同特征对齐策略。具体来说,我们引入了一个基于通道注意力的掩码,将网络的焦点引导到详细的区域,以捕获细粒度的信息。然后,采用双层自适应归一化融合机制对全局和局部级别的风格差异进行对齐,促进更一致的表示。将特征分解为与风格相关的表征和与结构相关的表征,并跨模态补充与风格相关的信息,增强特征的表达性。此外,引入了对齐损失,实现了高效检索,同时避免了推理期间的额外对齐。在qmu - shoev2和qmu - chairv2数据集上进行了大量实验,验证了该方法的有效性。
{"title":"Collaborative feature alignment with global–local fusion for fine-grained sketch-based image retrieval","authors":"Xuan Zhang ,&nbsp;Ming Zhao ,&nbsp;Lixiang Ma","doi":"10.1016/j.patrec.2025.11.003","DOIUrl":"10.1016/j.patrec.2025.11.003","url":null,"abstract":"<div><div>The demand for fine-grained sketch-based image retrieval is rapidly growing. However, it faces two major challenges: the difficulty in capturing fine-grained details and the large domain gap between modalities. To address these challenges, we propose a novel framework: collaborative feature alignment with global–local fusion network, including fine-grained mask-based feature extraction module, global–local adaptive normalization feature fusion module, feature completion and augmentation module and collaborative feature alignment strategy. Specifically, we introduce a channel attention based mask to direct the network’s focus towards detailed regions to capture fine-grained information. Then, the dual-level adaptive normalization fusion mechanism is employed to align style discrepancies at both global and local levels, facilitating more consistent representations. Features are disentangled into style-related representations and structure-related representations, and style-related information is cross-modal supplemented to enhance feature expressiveness. Additionally, an alignment loss is introduced, enabling efficient retrieval while avoiding additional alignment during inference. Extensive experiments are conducted on QMUL-ShoeV2 and QMUL-ChairV2 datasets validate the effectiveness of the proposed method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 135-141"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdaPL: Adaptive Pseudo Labeling for deep active learning in image classification AdaPL:图像分类中深度主动学习的自适应伪标记
IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-11 DOI: 10.1016/j.patrec.2025.11.024
Qiang Fang, Xin Xu
Deep supervised learning has achieved remarkable success in many fields, but it often relies on a large amount of annotated data, leading to high costs. An alternative solution is active learning, which aims to enable models to achieve optimal performance with less annotated data. Most standard active learning methods focus on proposing better selection strategies for labeling representative samples while ignoring other unlabeled samples. Inspired by the fact that the reasonable utilization of unlabeled data can improve model performance, we present a novel framework for active learning with pseudo-labeling in this paper. The core of our approach is a novel pseudo-labeling method with an adaptive threshold. Extensive experiments on three typical image classification tasks demonstrate that our approach achieves state-of-the-art performance compared to existing baseline methods. Moreover, our approach is efficient, flexible, and task-agnostic, making it compatible with most standard active learning strategies. Our code will be available at https://github.com/nudtqiangfang/AdaPL.
深度监督学习在许多领域都取得了显著的成功,但它往往依赖于大量的标注数据,导致成本很高。另一种解决方案是主动学习,其目的是使模型在较少注释的数据下实现最佳性能。大多数标准的主动学习方法侧重于提出更好的选择策略来标记代表性样本,而忽略了其他未标记的样本。基于对未标记数据的合理利用可以提高模型性能这一事实的启发,本文提出了一种基于伪标记的主动学习框架。该方法的核心是一种具有自适应阈值的伪标记方法。在三个典型的图像分类任务上进行的大量实验表明,与现有的基线方法相比,我们的方法达到了最先进的性能。此外,我们的方法是高效、灵活和任务不可知的,使其与大多数标准的主动学习策略兼容。我们的代码可以在https://github.com/nudtqiangfang/AdaPL上找到。
{"title":"AdaPL: Adaptive Pseudo Labeling for deep active learning in image classification","authors":"Qiang Fang,&nbsp;Xin Xu","doi":"10.1016/j.patrec.2025.11.024","DOIUrl":"10.1016/j.patrec.2025.11.024","url":null,"abstract":"<div><div>Deep supervised learning has achieved remarkable success in many fields, but it often relies on a large amount of annotated data, leading to high costs. An alternative solution is active learning, which aims to enable models to achieve optimal performance with less annotated data. Most standard active learning methods focus on proposing better selection strategies for labeling representative samples while ignoring other unlabeled samples. Inspired by the fact that the reasonable utilization of unlabeled data can improve model performance, we present a novel framework for active learning with pseudo-labeling in this paper. The core of our approach is a novel pseudo-labeling method with an adaptive threshold. Extensive experiments on three typical image classification tasks demonstrate that our approach achieves state-of-the-art performance compared to existing baseline methods. Moreover, our approach is efficient, flexible, and task-agnostic, making it compatible with most standard active learning strategies. Our code will be available at <span><span>https://github.com/nudtqiangfang/AdaPL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 185-190"},"PeriodicalIF":3.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1