Pub Date : 2025-11-25DOI: 10.1016/j.patrec.2025.11.035
Haiyan Xu , Min Wang , Gang Xu , Qian Shen
Deep learning-based change detection methods often rely on many annotations, and automated sample generation methods for change detection are usually implemented via pixelwise comparisons after performing bitemporal image classification. In bitemporal images, high-rise buildings have different directional height displacements caused by different viewing angles; this issue generally causes serious false alarms in the automatic samples generated by post-classification comparison (PCC). In this study, by utilizing features such as the roof textures and facade geometry features of bitemporal buildings, automatic high-rise building change discrimination is implemented by matching the features of the building roofs and conducting height displacement triangle comparisons, which eliminates the false changes caused by building height displacements and preserves the true changes. Furthermore, method validation experiments were conducted on high-resolution images of Nanjing and Suzhou, two Chinese cities, and the results verify that the proposed method can automatically generate high-quality building samples with height displacement, which facilitates the training of deep learning-based change detection models.
{"title":"Identifying real changes for height displaced buildings to aid in deep learning training sample generation","authors":"Haiyan Xu , Min Wang , Gang Xu , Qian Shen","doi":"10.1016/j.patrec.2025.11.035","DOIUrl":"10.1016/j.patrec.2025.11.035","url":null,"abstract":"<div><div>Deep learning-based change detection methods often rely on many annotations, and automated sample generation methods for change detection are usually implemented via pixelwise comparisons after performing bitemporal image classification. In bitemporal images, high-rise buildings have different directional height displacements caused by different viewing angles; this issue generally causes serious false alarms in the automatic samples generated by post-classification comparison (PCC). In this study, by utilizing features such as the roof textures and facade geometry features of bitemporal buildings, automatic high-rise building change discrimination is implemented by matching the features of the building roofs and conducting height displacement triangle comparisons, which eliminates the false changes caused by building height displacements and preserves the true changes. Furthermore, method validation experiments were conducted on high-resolution images of Nanjing and Suzhou, two Chinese cities, and the results verify that the proposed method can automatically generate high-quality building samples with height displacement, which facilitates the training of deep learning-based change detection models.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 269-277"},"PeriodicalIF":3.3,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24DOI: 10.1016/j.patrec.2025.11.031
Wided Hechkel , Rim Missaoui , Abdelhamid Helali , Marco Leo
Alzheimer’s disease (AD) is the leading cause of dementia worldwide. It attacks the elderly population, causing a dangerous cognitive decline and memory loss due to the degeneration and atrophy of brain neurons. Recent developments in machine learning techniques for the detection and classification of AD boost the early diagnosis and enable slowing the disease by adopting preclinical treatments. However, a major defect of these techniques is their high complexity architectures and their less generalizability, which provokes difficulties in clinical integration. This paper presents a new approach that combines convolutional neural network (CNN) and support vector machines (SVM) for the detection of AD. CNN stage enhances the accuracy of the system because it is an excellent feature extractor. SVM stage handles classification performance by optimizing the decision boundaries; meanwhile, it requires fewer hyperparameter updates compared to end-to-end CNN with Softmax classifier. SVM reduces the computational cost of the training. Experiments are conducted on the Kaggle dataset for Magnetic Resonance Imaging (MRI) brain images of AD. The hybrid model achieved accuracy scores of 98.52 %, 97.71 %, and 97.58 % for the training set, validation set, and testing set respectively, inference times per sample of 0.0588s, 0.0586s, and 0.0592s on the above three sets respectively. Obtained results confirm high effectiveness and potential prospect of the developed CNN-SVM model in early diagnosis of AD with reduced implementation complexity.
{"title":"Hybrid CNN and SVM model for Alzheimer’s disease classification using categorical focal loss function","authors":"Wided Hechkel , Rim Missaoui , Abdelhamid Helali , Marco Leo","doi":"10.1016/j.patrec.2025.11.031","DOIUrl":"10.1016/j.patrec.2025.11.031","url":null,"abstract":"<div><div>Alzheimer’s disease (AD) is the leading cause of dementia worldwide. It attacks the elderly population, causing a dangerous cognitive decline and memory loss due to the degeneration and atrophy of brain neurons. Recent developments in machine learning techniques for the detection and classification of AD boost the early diagnosis and enable slowing the disease by adopting preclinical treatments. However, a major defect of these techniques is their high complexity architectures and their less generalizability, which provokes difficulties in clinical integration. This paper presents a new approach that combines convolutional neural network (CNN) and support vector machines (SVM) for the detection of AD. CNN stage enhances the accuracy of the system because it is an excellent feature extractor. SVM stage handles classification performance by optimizing the decision boundaries; meanwhile, it requires fewer hyperparameter updates compared to end-to-end CNN with Softmax classifier. SVM reduces the computational cost of the training. Experiments are conducted on the Kaggle dataset for Magnetic Resonance Imaging (MRI) brain images of AD. The hybrid model achieved accuracy scores of 98.52 %, 97.71 %, and 97.58 % for the training set, validation set, and testing set respectively, inference times per sample of 0.0588s, 0.0586s, and 0.0592s on the above three sets respectively. Obtained results confirm high effectiveness and potential prospect of the developed CNN-SVM model in early diagnosis of AD with reduced implementation complexity.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 261-268"},"PeriodicalIF":3.3,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145617790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brain studies require the use of several complementary imaging modalities. When some modality is unavailable, Artificial Intelligence (AI) has recently provided ways to estimate them. Radiologists modulate the use of the available modalities depending on the task they have to perform. We aim to trace artificially the radiological process through a multibranch neural network architecture, the StarNet. The goal is to explain how and where different imaging modalities, either really collected or artificially reconstructed, are used in different radiological tasks by reading inside the structure of the network. To do that, StarNet includes several satellite networks, one per source modality, connected at each layer by a central unit. This design enables us to assess the contribution of each imaging modality, identifying where the contribution occurs, and to quantify the variations if certain modalities are substituted with AI-generated counterparts. The ultimate goal is to enable data-related and task-related ablation studies through the complete explainability of StarNet, thus offering radiologists clear guidance on which imaging sequences contribute to the task, to what extent, and at which stages of the process. As an example, we applied the proposed architecture to the 2D slices extracted from 3D volumes acquired with multimodal magnetic resonance imaging (MRI), to assess: 1. The role of the used imaging modalities; 2. The change in role when the radiological task changes; 3. The effects of synthetic data on the process. The results are presented and discussed.
{"title":"Explainable multimodal brain imaging through a multiple-branch neural network","authors":"Giuseppe Placidi , Alessia Cipriani , Michele Nappi , Matteo Polsinelli","doi":"10.1016/j.patrec.2025.11.030","DOIUrl":"10.1016/j.patrec.2025.11.030","url":null,"abstract":"<div><div>Brain studies require the use of several complementary imaging modalities. When some modality is unavailable, Artificial Intelligence (AI) has recently provided ways to estimate them. Radiologists modulate the use of the available modalities depending on the task they have to perform. We aim to trace artificially the radiological process through a multibranch neural network architecture, the StarNet. The goal is to explain how and where different imaging modalities, either really collected or artificially reconstructed, are used in different radiological tasks by reading inside the structure of the network. To do that, StarNet includes several satellite networks, one per source modality, connected at each layer by a central unit. This design enables us to assess the contribution of each imaging modality, identifying where the contribution occurs, and to quantify the variations if certain modalities are substituted with AI-generated counterparts. The ultimate goal is to enable data-related and task-related ablation studies through the complete explainability of StarNet, thus offering radiologists clear guidance on which imaging sequences contribute to the task, to what extent, and at which stages of the process. As an example, we applied the proposed architecture to the 2D slices extracted from 3D volumes acquired with multimodal magnetic resonance imaging (MRI), to assess: 1. The role of the used imaging modalities; 2. The change in role when the radiological task changes; 3. The effects of synthetic data on the process. The results are presented and discussed.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 254-260"},"PeriodicalIF":3.3,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145617787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20DOI: 10.1016/j.patrec.2025.11.022
Shuzhen Rao , Jun Huang
In graph few-shot learning, meta-training tasks are sampled to improve the model’s ability to learn from limited nodes. Existing methods adapted from computer vision, generally employ random task sampling, which can lead to excessive task randomness. This hinders effective training on the graph as models struggle to adapt to tasks with substantial variations in classes and nodes. To address this issue, we propose a novel method called TRARM, i.e., Task RAndomness Reduced graph Meta-learning to mitigate adverse effects of excessive task randomness. Firstly, we design progressive grouping-based sampling to adjust combinations of classes and nodes by stages, thereby enabling more focused and efficient meta-training. Secondly, complementing sampling, a unified memory-based meta-update module is first deployed to effectively accumulate cross-task knowledge, improving both efficiency and stability of meta-learning. Despite its simplicity, comprehensive experiments demonstrate the superior performance of TRARM on four widely used benchmarks.
{"title":"Mitigating task randomness in graph few-shot learning","authors":"Shuzhen Rao , Jun Huang","doi":"10.1016/j.patrec.2025.11.022","DOIUrl":"10.1016/j.patrec.2025.11.022","url":null,"abstract":"<div><div>In graph few-shot learning, meta-training tasks are sampled to improve the model’s ability to learn from limited nodes. Existing methods adapted from computer vision, generally employ random task sampling, which can lead to excessive task randomness. This hinders effective training on the graph as models struggle to adapt to tasks with substantial variations in classes and nodes. To address this issue, we propose a novel method called TRARM, i.e., <strong>T</strong>ask <strong>RA</strong>ndomness <strong>R</strong>educed graph <strong>M</strong>eta-learning to mitigate adverse effects of excessive task randomness. Firstly, we design progressive grouping-based sampling to adjust combinations of classes and nodes by stages, thereby enabling more focused and efficient meta-training. Secondly, complementing sampling, a unified memory-based meta-update module is first deployed to effectively accumulate cross-task knowledge, improving both efficiency and stability of meta-learning. Despite its simplicity, comprehensive experiments demonstrate the superior performance of TRARM on four widely used benchmarks.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 232-238"},"PeriodicalIF":3.3,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145617789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate segmentation of Multiple Sclerosis (MS) lesions remains a critical challenge in medical image analysis due to their small size, irregular shape, and sparse distribution. Despite recent progress in vision foundation models — such as SAM and its medical variant MedSAM — these models have not yet been explored in the context of MS lesion segmentation. Moreover, their reliance on manually crafted prompts and high inference-time computational cost limits their applicability in clinical workflows, especially in resource-constrained environments. In this work, we introduce a novel training-time framework for effective and efficient MS lesion segmentation. Our method leverages SAM solely during training to guide a prompt learner that automatically discovers task-specific embeddings. At inference, SAM is replaced by a lightweight convolutional aggregator that maps the learned embeddings directly into segmentation masks—enabling fully automated, low-cost deployment. We show that our approach significantly outperforms existing specialized methods on the public MSLesSeg dataset, establishing new performance benchmarks in a domain where foundation models had not previously been applied. To assess generalizability, we also evaluate our method on pancreas and prostate segmentation tasks, where it achieves competitive accuracy while requiring an order of magnitude fewer parameters and computational resources compared to SAM-based pipelines. By eliminating the need for foundation models at inference time, our framework enables efficient segmentation without sacrificing accuracy. This design bridges the gap between large-scale pretraining and real-world clinical deployment, offering a scalable and practical solution for MS lesion segmentation and beyond. Code is available at https://github.com/perceivelab/MS-SAM-LESS.
{"title":"SAM-guided prompt learning for Multiple Sclerosis lesion segmentation","authors":"Federica Proietto Salanitri , Giovanni Bellitto , Salvatore Calcagno , Ulas Bagci , Concetto Spampinato , Manuela Pennisi","doi":"10.1016/j.patrec.2025.11.018","DOIUrl":"10.1016/j.patrec.2025.11.018","url":null,"abstract":"<div><div>Accurate segmentation of Multiple Sclerosis (MS) lesions remains a critical challenge in medical image analysis due to their small size, irregular shape, and sparse distribution. Despite recent progress in vision foundation models — such as SAM and its medical variant MedSAM — these models have not yet been explored in the context of MS lesion segmentation. Moreover, their reliance on manually crafted prompts and high inference-time computational cost limits their applicability in clinical workflows, especially in resource-constrained environments. In this work, we introduce a novel training-time framework for effective and efficient MS lesion segmentation. Our method leverages SAM solely during training to guide a prompt learner that automatically discovers task-specific embeddings. At inference, SAM is replaced by a lightweight convolutional aggregator that maps the learned embeddings directly into segmentation masks—enabling fully automated, low-cost deployment. We show that our approach significantly outperforms existing specialized methods on the public MSLesSeg dataset, establishing new performance benchmarks in a domain where foundation models had not previously been applied. To assess generalizability, we also evaluate our method on pancreas and prostate segmentation tasks, where it achieves competitive accuracy while requiring an order of magnitude fewer parameters and computational resources compared to SAM-based pipelines. By eliminating the need for foundation models at inference time, our framework enables efficient segmentation without sacrificing accuracy. This design bridges the gap between large-scale pretraining and real-world clinical deployment, offering a scalable and practical solution for MS lesion segmentation and beyond. Code is available at <span><span>https://github.com/perceivelab/MS-SAM-LESS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 205-211"},"PeriodicalIF":3.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One-dimensional signal decomposition is a well-established and widely used technique across various scientific fields. It serves as a highly valuable pre-processing step for data analysis. While traditional decomposition techniques often rely on mathematical models, recent research suggests that applying the latest deep learning models to this very ill-posed inverse problem represents an exciting, unexplored area with promising potential. This work presents a novel method for the additive decomposition of one-dimensional signals. We leverage the Transformer architecture to decompose signals into their constituent components: piecewise constant, smooth (trend), highly-oscillatory, and noise components. Our model, trained on synthetic data, achieves excellent accuracy in modeling and decomposing input signals from the same distribution, as demonstrated by the experimental results.
{"title":"Additive decomposition of one-dimensional signals using Transformers","authors":"Samuele Salti , Andrea Pinto , Alessandro Lanza , Serena Morigi","doi":"10.1016/j.patrec.2025.11.002","DOIUrl":"10.1016/j.patrec.2025.11.002","url":null,"abstract":"<div><div>One-dimensional signal decomposition is a well-established and widely used technique across various scientific fields. It serves as a highly valuable pre-processing step for data analysis. While traditional decomposition techniques often rely on mathematical models, recent research suggests that applying the latest deep learning models to this very ill-posed inverse problem represents an exciting, unexplored area with promising potential. This work presents a novel method for the additive decomposition of one-dimensional signals. We leverage the Transformer architecture to decompose signals into their constituent components: piecewise constant, smooth (trend), highly-oscillatory, and noise components. Our model, trained on synthetic data, achieves excellent accuracy in modeling and decomposing input signals from the same distribution, as demonstrated by the experimental results.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 239-245"},"PeriodicalIF":3.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145617788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1016/j.patrec.2025.10.013
Hyunjong Lee , Jangho Lee , Jaekoo Lee
Lane detection is an important topic in the future mobility solutions. Real-world environmental challenges such as background clutter, varying illumination, and occlusions pose significant obstacles to effective lane detection, particularly when relying on data-driven approaches that require substantial effort and cost for data collection and annotation. To address these issues, lane detection methods must leverage contextual and global information from surrounding lanes and objects. In this paper, we propose a Spatial Attention Mutual Information Regularization with a pre-trained model as an Oracle, called SAMIRO. SAMIRO enhances lane detection performance by transferring knowledge from a pre-trained model while preserving domain-agnostic spatial information. Leveraging SAMIRO’s plug-and-play characteristic, we integrate it into various state-of-the-art lane detection approaches and conduct extensive experiments on major benchmarks such as CULane, Tusimple, and LLAMAS. The results demonstrate that SAMIRO consistently improves performance across different models and datasets. The code will be made available upon publication.
{"title":"SAMIRO: Spatial Attention Mutual Information Regularization with a pre-trained model as Oracle for lane detection","authors":"Hyunjong Lee , Jangho Lee , Jaekoo Lee","doi":"10.1016/j.patrec.2025.10.013","DOIUrl":"10.1016/j.patrec.2025.10.013","url":null,"abstract":"<div><div>Lane detection is an important topic in the future mobility solutions. Real-world environmental challenges such as background clutter, varying illumination, and occlusions pose significant obstacles to effective lane detection, particularly when relying on data-driven approaches that require substantial effort and cost for data collection and annotation. To address these issues, lane detection methods must leverage contextual and global information from surrounding lanes and objects. In this paper, we propose a <em>Spatial Attention Mutual Information Regularization with a pre-trained model as an Oracle</em>, called <em>SAMIRO</em>. SAMIRO enhances lane detection performance by transferring knowledge from a pre-trained model while preserving domain-agnostic spatial information. Leveraging SAMIRO’s plug-and-play characteristic, we integrate it into various state-of-the-art lane detection approaches and conduct extensive experiments on major benchmarks such as CULane, Tusimple, and LLAMAS. The results demonstrate that SAMIRO consistently improves performance across different models and datasets. The code will be made available upon publication.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 198-204"},"PeriodicalIF":3.3,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-15DOI: 10.1016/j.patrec.2025.11.009
Dirko Coetsee , Steve Kroon , Ralf Kistner , Adem Kikaj , McElory Hoffmann , Luc De Raedt
Tabular data is ubiquitous in pattern recognition, yet accurately measuring differences between tables remains challenging. Conventional methods rely on cell substitutions and row/column insertions and deletions, often overestimating the difference when cells are simply repositioned. We propose a distance metric that considers move operations, capturing structural changes more faithfully. Although exact computation is NP-complete, a greedy approach computes an effective approximation in practice. Experimental results on real-world datasets demonstrate that our approach yields a more compact and intuitive measure of table dissimilarity, enhancing applications such as clustering, table extraction evaluation, and version history recovery.
{"title":"Tadmo: A tabular distance measure with move operations","authors":"Dirko Coetsee , Steve Kroon , Ralf Kistner , Adem Kikaj , McElory Hoffmann , Luc De Raedt","doi":"10.1016/j.patrec.2025.11.009","DOIUrl":"10.1016/j.patrec.2025.11.009","url":null,"abstract":"<div><div>Tabular data is ubiquitous in pattern recognition, yet accurately measuring differences between tables remains challenging. Conventional methods rely on cell substitutions and row/column insertions and deletions, often overestimating the difference when cells are simply repositioned. We propose a distance metric that considers move operations, capturing structural changes more faithfully. Although exact computation is NP-complete, a greedy approach computes an effective approximation in practice. Experimental results on real-world datasets demonstrate that our approach yields a more compact and intuitive measure of table dissimilarity, enhancing applications such as clustering, table extraction evaluation, and version history recovery.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 212-218"},"PeriodicalIF":3.3,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stroke is a leading cause of morbidity and mortality worldwide. Accurate segmentation of post-stroke lesions on MRI is crucial for assessing brain damage and informing rehabilitation. Manual segmentation, however, is time-consuming and prone to error, motivating the development of automated approaches. This study investigates how deep learning with multimodal MRI can improve automated lesion segmentation in sub-acute and chronic stroke. A single-modality baseline was trained on the public ATLAS v2.0 dataset (655 T1-w scans) using the nnU-Net v2 framework and evaluated on an independent clinical cohort (45 patients with paired T1-w and FLAIR MRI). On this internal dataset, we conducted a systematic ablation comparing (i) direct transfer of the ATLAS baseline, (ii) fine-tuning using T1-w only, and (iii) fusion of T1-w and FLAIR inputs through early, mid, and late fusion strategies, each tested with metric averaging and ensembling.
The ATLAS baseline model achieved a mean Dice score of 0.64 and a lesion-wise F1 score of 0.67. On the clinical dataset, ensembling improved performance (Dice 0.70 vs. 0.68; F1 0.79 vs. 0.73), while fine-tuning on T1-w data further increased accuracy (Dice 0.72; F1 0.78). The best overall results were obtained with a T1+FLAIR late-fusion ensemble (Dice 0.75; F1 0.80; Average Surface Distance (ASD) 2.94 mm), with statistically significant improvements, especially for small and medium lesions.
These results show that fine-tuning and multimodal fusion — particularly late fusion — improve generalization for post-stroke lesion segmentation, supporting robust, reproducible quantification in clinical settings.
中风是全世界发病率和死亡率的主要原因。脑卒中后MRI病变的准确分割对于评估脑损伤和告知康复至关重要。然而,手工分割既耗时又容易出错,这促使了自动化方法的发展。本研究探讨了多模态MRI的深度学习如何改善亚急性和慢性中风的自动病灶分割。使用nnU-Net v2框架在公共ATLAS v2.0数据集(655个T1-w扫描)上训练单模态基线,并在独立临床队列(45例配对T1-w和FLAIR MRI患者)中进行评估。在这个内部数据集上,我们进行了系统的消融比较(i) ATLAS基线的直接转移,(ii)仅使用T1-w进行微调,以及(iii)通过早期、中期和后期融合策略融合T1-w和FLAIR输入,每种策略都使用度量平均和集合进行测试。ATLAS基线模型的平均Dice评分为0.64,逐病变F1评分为0.67。在临床数据集上,集成提高了性能(Dice 0.70 vs. 0.68; F1 0.79 vs. 0.73),而在T1-w数据上的微调进一步提高了准确性(Dice 0.72; F1 0.78)。T1+FLAIR晚期融合整体效果最好(Dice 0.75; F1 0.80;平均表面距离(ASD) 2.94 mm),具有统计学上显著的改善,特别是对于中小型病变。这些结果表明,微调和多模态融合-特别是后期融合-提高了脑卒中后病变分割的泛化,支持临床环境中稳健、可重复的量化。
{"title":"Deep learning and multi-modal MRI for the segmentation of sub-acute and chronic stroke lesions","authors":"Alessandro Di Matteo , Youwan Mahé , Stéphanie Leplaideur , Isabelle Bonan , Elise Bannier , Francesca Galassi","doi":"10.1016/j.patrec.2025.11.017","DOIUrl":"10.1016/j.patrec.2025.11.017","url":null,"abstract":"<div><div>Stroke is a leading cause of morbidity and mortality worldwide. Accurate segmentation of post-stroke lesions on MRI is crucial for assessing brain damage and informing rehabilitation. Manual segmentation, however, is time-consuming and prone to error, motivating the development of automated approaches. This study investigates how deep learning with multimodal MRI can improve automated lesion segmentation in sub-acute and chronic stroke. A single-modality baseline was trained on the public ATLAS v2.0 dataset (655 T1-w scans) using the nnU-Net v2 framework and evaluated on an independent clinical cohort (45 patients with paired T1-w and FLAIR MRI). On this internal dataset, we conducted a systematic ablation comparing (i) direct transfer of the ATLAS baseline, (ii) fine-tuning using T1-w only, and (iii) fusion of T1-w and FLAIR inputs through early, mid, and late fusion strategies, each tested with metric averaging and ensembling.</div><div>The ATLAS baseline model achieved a mean Dice score of 0.64 and a lesion-wise F1 score of 0.67. On the clinical dataset, ensembling improved performance (Dice 0.70 vs. 0.68; F1 0.79 vs. 0.73), while fine-tuning on T1-w data further increased accuracy (Dice 0.72; F1 0.78). The best overall results were obtained with a T1+FLAIR late-fusion ensemble (Dice 0.75; F1 0.80; Average Surface Distance (ASD) 2.94 mm), with statistically significant improvements, especially for small and medium lesions.</div><div>These results show that fine-tuning and multimodal fusion — particularly late fusion — improve generalization for post-stroke lesion segmentation, supporting robust, reproducible quantification in clinical settings.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 225-231"},"PeriodicalIF":3.3,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145617713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-14DOI: 10.1016/j.patrec.2025.11.027
Samuel Maddox , Lemuel Puglisi , Fatemeh Darabifard , Alzheimer’s Disease Neuroimaging Initiative , Australian Imaging Biomarkers and Lifestyle flagship study of aging , Saber Sami , Daniele Ravi
Accurate brain age prediction from MRI is a promising biomarker for brain health and neurodegenerative disease risk, but current deep learning models often lack anatomical specificity and clinical insight. We present a regional patch-based ensemble framework that uses 3D Convolutional Neural Networks (CNNs) trained on bilateral patches from ten subcortical structures, enhancing anatomical sensitivity. Ensemble predictions are combined with cognitive assessments to derive a cognitively informed proxy for cognitive reserve (CR-Proxy), quantifying resilience to age-related brain changes. We train our framework on a large, multi-cohort dataset of healthy controls and test it on independent samples that include individuals with Alzheimer’s disease and mild cognitive impairment. The results demonstrate that our method achieves robust brain age prediction and provides a practical, interpretable CR-Proxy capable of distinguishing diagnostic groups and identifying individuals with high or low cognitive reserve. This pipeline offers a scalable, clinically accessible tool for early risk assessment and personalized brain health monitoring.
{"title":"Regional patch-based MRI brain age modeling with an interpretable cognitive reserve proxy","authors":"Samuel Maddox , Lemuel Puglisi , Fatemeh Darabifard , Alzheimer’s Disease Neuroimaging Initiative , Australian Imaging Biomarkers and Lifestyle flagship study of aging , Saber Sami , Daniele Ravi","doi":"10.1016/j.patrec.2025.11.027","DOIUrl":"10.1016/j.patrec.2025.11.027","url":null,"abstract":"<div><div>Accurate brain age prediction from MRI is a promising biomarker for brain health and neurodegenerative disease risk, but current deep learning models often lack anatomical specificity and clinical insight. We present a regional patch-based ensemble framework that uses 3D Convolutional Neural Networks (CNNs) trained on bilateral patches from ten subcortical structures, enhancing anatomical sensitivity. Ensemble predictions are combined with cognitive assessments to derive a cognitively informed proxy for cognitive reserve (CR-Proxy), quantifying resilience to age-related brain changes. We train our framework on a large, multi-cohort dataset of healthy controls and test it on independent samples that include individuals with Alzheimer’s disease and mild cognitive impairment. The results demonstrate that our method achieves robust brain age prediction and provides a practical, interpretable CR-Proxy capable of distinguishing diagnostic groups and identifying individuals with high or low cognitive reserve. This pipeline offers a scalable, clinically accessible tool for early risk assessment and personalized brain health monitoring.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"199 ","pages":"Pages 219-224"},"PeriodicalIF":3.3,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}