首页 > 最新文献

Machine learning with applications最新文献

英文 中文
Comparing model-specific and model-agnostic features importance methods using machine learning with technical indicators: A NASDAQ sector-based study 使用机器学习和技术指标比较模型特定和模型不可知特征重要性方法:一项基于纳斯达克行业的研究
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-11-25 DOI: 10.1016/j.mlwa.2025.100799
Jeonghoe Lee, Lin Cai
Predicting stock prices is crucial for making informed investment decisions as stock markets significantly influence the global economy. Although previous studies have explored feature importance methods for stock price prediction, comprehensive comparisons of those methods have been limited. This study aims to provide a detailed comparison of different feature importance methods for selecting technical indicators to predict stock prices. Specifically, this research analyzed financial data from the 11 sectors of the NASDAQ. A moving window forecasting framework was implemented to dynamically capture the evolving patterns in financial markets over time. Model-specific feature importance methods were compared with model-agnostic approaches. Multiple machine learning algorithms, including Random Forest (RF), and Multi-layer Neural Network (MNNs), were employed to forecast stock prices. Additionally, extensive hyperparameter tuning was conducted to improve model explainability, contributing to the field of Explainable Artificial Intelligence (XAI). The results highlight the predictive effectiveness of different feature importance methods in selecting optimal technical indicators, thereby offering valuable insights for enhancing stock price forecasting accuracy and model transparency. In summary, this research offers a comprehensive comparison of feature importance methods, emphasizing their application in the selection of technical indicators in a dynamic, rolling prediction setting.
由于股市对全球经济的影响很大,预测股价对于做出明智的投资决策至关重要。虽然已有研究探索了特征重要性方法用于股票价格预测,但对这些方法的综合比较有限。本研究旨在对选择技术指标预测股价的不同特征重要性方法进行详细比较。具体来说,本研究分析了纳斯达克11个板块的财务数据。实现了一个移动窗口预测框架,以动态捕捉金融市场随时间变化的模式。将特定于模型的特征重要性方法与不可知模型的方法进行了比较。采用随机森林(Random Forest, RF)和多层神经网络(Multi-layer Neural Network, MNNs)等多种机器学习算法进行股价预测。此外,进行了广泛的超参数调优以提高模型的可解释性,为可解释人工智能(Explainable Artificial Intelligence, XAI)领域做出了贡献。结果突出了不同特征重要性方法在选择最优技术指标方面的预测效果,从而为提高股价预测精度和模型透明度提供了有价值的见解。综上所述,本研究对特征重要性方法进行了全面比较,强调了它们在动态滚动预测环境下技术指标选择中的应用。
{"title":"Comparing model-specific and model-agnostic features importance methods using machine learning with technical indicators: A NASDAQ sector-based study","authors":"Jeonghoe Lee,&nbsp;Lin Cai","doi":"10.1016/j.mlwa.2025.100799","DOIUrl":"10.1016/j.mlwa.2025.100799","url":null,"abstract":"<div><div>Predicting stock prices is crucial for making informed investment decisions as stock markets significantly influence the global economy. Although previous studies have explored feature importance methods for stock price prediction, comprehensive comparisons of those methods have been limited. This study aims to provide a detailed comparison of different feature importance methods for selecting technical indicators to predict stock prices. Specifically, this research analyzed financial data from the 11 sectors of the NASDAQ. A moving window forecasting framework was implemented to dynamically capture the evolving patterns in financial markets over time. Model-specific feature importance methods were compared with model-agnostic approaches. Multiple machine learning algorithms, including Random Forest (RF), and Multi-layer Neural Network (MNNs), were employed to forecast stock prices. Additionally, extensive hyperparameter tuning was conducted to improve model explainability, contributing to the field of Explainable Artificial Intelligence (XAI). The results highlight the predictive effectiveness of different feature importance methods in selecting optimal technical indicators, thereby offering valuable insights for enhancing stock price forecasting accuracy and model transparency. In summary, this research offers a comprehensive comparison of feature importance methods, emphasizing their application in the selection of technical indicators in a dynamic, rolling prediction setting.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100799"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hybrid DEA–fuzzy clustering approach for accurate reference set identification 一种用于准确识别参考集的混合dea -模糊聚类方法
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-09 DOI: 10.1016/j.mlwa.2025.100818
Sara Fanati Rashidi , Maryam Olfati , Seyedali Mirjalili , Crina Grosan , Jan Platoš , Vaclav Snášel
This study integrates Data Envelopment Analysis (DEA) with Machine Learning (ML) to address key limitations of traditional DEA in identifying reference sets for inefficient Decision-Making Units (DMUs). In DEA, inefficient units are evaluated against benchmark units; however, some benchmarks may be inappropriate or even outliers, which can distort the efficiency frontier. Moreover, when a new DMU is added, the entire model must be recalculated, resulting in high computational costs for large datasets. To overcome these issues, we propose a hybrid approach that combines Fuzzy C-Means (FCM) and Possibilistic Fuzzy C-Means (PFCM) clustering. By leveraging Euclidean distance and membership degrees, the method identifies closer and more relevant reference units, while a sensitivity threshold is introduced to control the number of benchmarks according to practical requirements. The effectiveness of the proposed method is validated on two datasets: a banking dataset and a banknote authentication dataset with 1,372 samples. Results show that the reference sets derived from this ML-based framework achieve 71.6%–98.3% agreement with DEA, while overcoming two major drawbacks: (1) sensitivity to dataset size and (2) inclusion of inappropriate reference units. Furthermore, statistical analyses, including confidence intervals and McNemar’s test, confirm the robustness and practical significance of the findings.
本研究将数据包络分析(DEA)与机器学习(ML)相结合,以解决传统DEA在识别低效决策单元(dmu)参考集方面的关键局限性。在DEA中,低效单元根据基准单元进行评估;然而,一些基准可能是不合适的,甚至是异常值,这可能会扭曲效率边界。此外,当增加一个新的DMU时,必须重新计算整个模型,这对于大型数据集来说,计算成本很高。为了克服这些问题,我们提出了一种结合模糊c均值(FCM)和可能性模糊c均值(PFCM)聚类的混合方法。该方法利用欧几里得距离和隶属度来识别更接近、更相关的参考单元,同时根据实际需要引入灵敏度阈值来控制基准的数量。在两个数据集上验证了该方法的有效性:一个银行数据集和一个包含1,372个样本的钞票认证数据集。结果表明,基于ml框架的参考集与DEA的一致性达到71.6%-98.3%,同时克服了两个主要缺点:(1)对数据集大小的敏感性;(2)包含不适当的参考单元。此外,统计分析,包括置信区间和McNemar的检验,证实了研究结果的稳健性和现实意义。
{"title":"A hybrid DEA–fuzzy clustering approach for accurate reference set identification","authors":"Sara Fanati Rashidi ,&nbsp;Maryam Olfati ,&nbsp;Seyedali Mirjalili ,&nbsp;Crina Grosan ,&nbsp;Jan Platoš ,&nbsp;Vaclav Snášel","doi":"10.1016/j.mlwa.2025.100818","DOIUrl":"10.1016/j.mlwa.2025.100818","url":null,"abstract":"<div><div>This study integrates Data Envelopment Analysis (DEA) with Machine Learning (ML) to address key limitations of traditional DEA in identifying reference sets for inefficient Decision-Making Units (DMUs). In DEA, inefficient units are evaluated against benchmark units; however, some benchmarks may be inappropriate or even outliers, which can distort the efficiency frontier. Moreover, when a new DMU is added, the entire model must be recalculated, resulting in high computational costs for large datasets. To overcome these issues, we propose a hybrid approach that combines Fuzzy C-Means (FCM) and Possibilistic Fuzzy C-Means (PFCM) clustering. By leveraging Euclidean distance and membership degrees, the method identifies closer and more relevant reference units, while a sensitivity threshold is introduced to control the number of benchmarks according to practical requirements. The effectiveness of the proposed method is validated on two datasets: a banking dataset and a banknote authentication dataset with 1,372 samples. Results show that the reference sets derived from this ML-based framework achieve 71.6%–98.3% agreement with DEA, while overcoming two major drawbacks: (1) sensitivity to dataset size and (2) inclusion of inappropriate reference units. Furthermore, statistical analyses, including confidence intervals and McNemar’s test, confirm the robustness and practical significance of the findings.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100818"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decoding vision transformer variations for image classification: A guide to performance and usability 解码图像分类用视觉变换器的变化:性能和可用性指南
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-01-14 DOI: 10.1016/j.mlwa.2026.100844
João Montrezol , Hugo S. Oliveira , Hélder P. Oliveira
With the rise of Transformers, Vision Transformers (ViTs) have become a new standard in visual recognition. This has led to the development of numerous architectures with diverse designs and applications. This survey identifies 22 key ViT and hybrid CNN–ViT models, along with 5 top Convolutional Neural Network (CNN) models. These were selected based on their new architecture, relevance to benchmarks, and overall impact. The models are organised using a defined taxonomy formed by CNN-based, pure Transformer-based, and hybrid architectures. We analyse their main components, training methods, and computational features, while assessing performance using reported results on standard benchmarks such as ImageNet and CIFAR, along with our training and fine-tuning evaluations on specific imaging datasets. In addition to accuracy, we look at real-world deployment issues by analysing the trade-offs between accuracy and efficiency in embedded, mobile, and clinical settings. The results indicate that modern CNNs are still very competitive in limited-resource environments, while advanced ViT variants perform well after large-scale pretraining, especially in areas with high variability. Hybrid CNN–ViT architectures, on the other hand, tend to offer the best balance between accuracy, data efficiency, and computational cost. This survey establishes a consolidated benchmark and reference framework for understanding the evolution, capabilities, and practical applicability of contemporary vision architectures.
随着变形金刚的兴起,视觉变形金刚(Vision Transformers, ViTs)已经成为视觉识别的新标准。这导致了许多具有不同设计和应用程序的架构的发展。该调查确定了22个关键的ViT和混合CNN - ViT模型,以及5个顶级卷积神经网络(CNN)模型。它们是根据它们的新架构、与基准的相关性以及总体影响来选择的。这些模型使用由基于cnn、纯基于transformer和混合架构形成的定义分类法进行组织。我们分析了它们的主要组成部分、训练方法和计算特征,同时使用ImageNet和CIFAR等标准基准测试报告的结果评估性能,以及我们对特定成像数据集的训练和微调评估。除了准确性之外,我们还通过分析在嵌入式、移动和临床环境中准确性和效率之间的权衡来研究实际部署问题。结果表明,现代cnn在有限资源环境中仍然具有很强的竞争力,而先进的ViT变体在大规模预训练后表现良好,特别是在高变异性的区域。另一方面,混合CNN-ViT架构倾向于在准确性、数据效率和计算成本之间提供最佳平衡。这个调查建立了一个统一的基准和参考框架,用于理解当代视觉体系结构的演变、能力和实际适用性。
{"title":"Decoding vision transformer variations for image classification: A guide to performance and usability","authors":"João Montrezol ,&nbsp;Hugo S. Oliveira ,&nbsp;Hélder P. Oliveira","doi":"10.1016/j.mlwa.2026.100844","DOIUrl":"10.1016/j.mlwa.2026.100844","url":null,"abstract":"<div><div>With the rise of Transformers, Vision Transformers (ViTs) have become a new standard in visual recognition. This has led to the development of numerous architectures with diverse designs and applications. This survey identifies 22 key ViT and hybrid CNN–ViT models, along with 5 top Convolutional Neural Network (CNN) models. These were selected based on their new architecture, relevance to benchmarks, and overall impact. The models are organised using a defined taxonomy formed by CNN-based, pure Transformer-based, and hybrid architectures. We analyse their main components, training methods, and computational features, while assessing performance using reported results on standard benchmarks such as ImageNet and CIFAR, along with our training and fine-tuning evaluations on specific imaging datasets. In addition to accuracy, we look at real-world deployment issues by analysing the trade-offs between accuracy and efficiency in embedded, mobile, and clinical settings. The results indicate that modern CNNs are still very competitive in limited-resource environments, while advanced ViT variants perform well after large-scale pretraining, especially in areas with high variability. Hybrid CNN–ViT architectures, on the other hand, tend to offer the best balance between accuracy, data efficiency, and computational cost. This survey establishes a consolidated benchmark and reference framework for understanding the evolution, capabilities, and practical applicability of contemporary vision architectures.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100844"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ExpressNet-MoE: A hybrid deep neural network for emotion recognition ExpressNet-MoE:一个用于情感识别的混合深度神经网络
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-01-02 DOI: 10.1016/j.mlwa.2025.100830
Deeptimaan Banerjee, Prateek Gothwal, Ashis Kumer Biswas
In many domains, including online education, healthcare, security, and human–computer interaction, facial emotion recognition (FER) is essential. Real-world FER is still difficult because of factors like head positions, occlusions, illumination shifts, and demographic diversity. Engagement detection system, which is essential in virtual learning platforms is severely challenged by these factors. In this article, we propose ExpressNet-MoE, a novel hybrid deep learning architecture that combines Convolutional Neural Networks (CNNs) with a Mixture of Experts (MoE) framework to address these challenges. The proposed model dynamically selects the most relevant expert networks for each input, thereby improving generalization and adaptability across diverse datasets. Our methodology involves training ExpressNet-MoE independently on several benchmark datasets after preprocessing facial pictures using BlazeFace for face detection and alignment. To maintain class distribution, stratified sampling is used to divide each dataset into training and testing groups. Our model improves on the accuracy of emotion recognition by utilizing multi-scale feature extraction to collect both global and local facial features. ExpressNet-MoE includes numerous CNN-based feature extractors, a MoE module for adaptive feature selection, and finally a residual network backbone for deep feature learning. To demonstrate efficacy of our proposed model we evaluated it on four widely used datasets: AffectNet7, AffectNet8, RAF-DB, and FER-2013; and compared with current state-of-the-art methods. Our model achieves accuracies of 74.40% ± 0.45 on AffectNet7, 71.98% ± 0.66 on AffectNet8, 83.41% ± 1.06 on RAF-DB, and 67.05% ± 2.08 on FER-2013. Overall, the findings indicate that the adaptive expert selection and multi-scale feature extraction significantly enhances the robustness of facial emotion recognition across diverse real-world conditions and how it may be used to develop end-to-end emotion recognition systems in practical settings. Reproducible codes and results are made publicly accessible at https://github.com/DeeptimaanB/ExpressNet-MoE.
在许多领域,包括在线教育、医疗保健、安全和人机交互,面部情感识别(FER)是必不可少的。由于头部位置、遮挡、光照变化和人口多样性等因素,现实世界的FER仍然很困难。这些因素对虚拟学习平台中必不可少的敬业度检测系统构成了严峻的挑战。在本文中,我们提出了ExpressNet-MoE,这是一种新型的混合深度学习架构,将卷积神经网络(cnn)与混合专家(MoE)框架相结合,以解决这些挑战。该模型为每个输入动态选择最相关的专家网络,从而提高了不同数据集的泛化和适应性。我们的方法包括在使用BlazeFace对人脸图像进行预处理后,在几个基准数据集上独立训练ExpressNet-MoE进行人脸检测和对齐。为了保持类分布,采用分层抽样的方法将每个数据集划分为训练组和测试组。我们的模型利用多尺度特征提取来收集全局和局部面部特征,从而提高了情绪识别的准确性。ExpressNet-MoE包括许多基于cnn的特征提取器,用于自适应特征选择的MoE模块,最后是用于深度特征学习的残差网络骨干。为了证明我们提出的模型的有效性,我们在四个广泛使用的数据集上进行了评估:AffectNet7、AffectNet8、RAF-DB和FER-2013;与目前最先进的方法相比。该模型在AffectNet7上的准确率为74.40%±0.45,在AffectNet8上的准确率为71.98%±0.66,在RAF-DB上的准确率为83.41%±1.06,在FER-2013上的准确率为67.05%±2.08。总体而言,研究结果表明,自适应专家选择和多尺度特征提取显著增强了面部情绪识别在不同现实世界条件下的鲁棒性,以及如何将其用于开发实际环境中的端到端情绪识别系统。可复制的代码和结果可在https://github.com/DeeptimaanB/ExpressNet-MoE上公开访问。
{"title":"ExpressNet-MoE: A hybrid deep neural network for emotion recognition","authors":"Deeptimaan Banerjee,&nbsp;Prateek Gothwal,&nbsp;Ashis Kumer Biswas","doi":"10.1016/j.mlwa.2025.100830","DOIUrl":"10.1016/j.mlwa.2025.100830","url":null,"abstract":"<div><div>In many domains, including online education, healthcare, security, and human–computer interaction, facial emotion recognition (FER) is essential. Real-world FER is still difficult because of factors like head positions, occlusions, illumination shifts, and demographic diversity. Engagement detection system, which is essential in virtual learning platforms is severely challenged by these factors. In this article, we propose ExpressNet-MoE, a novel hybrid deep learning architecture that combines Convolutional Neural Networks (CNNs) with a Mixture of Experts (MoE) framework to address these challenges. The proposed model dynamically selects the most relevant expert networks for each input, thereby improving generalization and adaptability across diverse datasets. Our methodology involves training ExpressNet-MoE independently on several benchmark datasets after preprocessing facial pictures using BlazeFace for face detection and alignment. To maintain class distribution, stratified sampling is used to divide each dataset into training and testing groups. Our model improves on the accuracy of emotion recognition by utilizing multi-scale feature extraction to collect both global and local facial features. ExpressNet-MoE includes numerous CNN-based feature extractors, a MoE module for adaptive feature selection, and finally a residual network backbone for deep feature learning. To demonstrate efficacy of our proposed model we evaluated it on four widely used datasets: <span><math><msub><mrow><mtext>AffectNet</mtext></mrow><mrow><mn>7</mn></mrow></msub></math></span>, <span><math><msub><mrow><mtext>AffectNet</mtext></mrow><mrow><mn>8</mn></mrow></msub></math></span>, RAF-DB, and FER-2013; and compared with current state-of-the-art methods. Our model achieves accuracies of 74.40% <span><math><mo>±</mo></math></span> 0.45 on <span><math><msub><mrow><mtext>AffectNet</mtext></mrow><mrow><mn>7</mn></mrow></msub></math></span>, 71.98% <span><math><mo>±</mo></math></span> 0.66 on <span><math><msub><mrow><mtext>AffectNet</mtext></mrow><mrow><mn>8</mn></mrow></msub></math></span>, 83.41% <span><math><mo>±</mo></math></span> 1.06 on RAF-DB, and 67.05% <span><math><mo>±</mo></math></span> 2.08 on FER-2013. Overall, the findings indicate that the adaptive expert selection and multi-scale feature extraction significantly enhances the robustness of facial emotion recognition across diverse real-world conditions and how it may be used to develop end-to-end emotion recognition systems in practical settings. Reproducible codes and results are made publicly accessible at <span><span>https://github.com/DeeptimaanB/ExpressNet-MoE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100830"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced Heart disease prediction using LLM ranked feature selection, Dynamic custom Kernel 增强心脏病预测使用LLM排名特征选择,动态自定义内核
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-01-31 DOI: 10.1016/j.mlwa.2026.100860
Nikesh P.L. , Sebastian Terence , Anishin Raj , Jude Immaculate , Deepak Mishra
Heart disease, a major cause of death worldwide, accounts for millions of deaths each year. This makes it critical to detect heart disease at an earlier stage so that a treatment plan, including medications and counseling, can be started. Machine learning (ML) algorithms trained on large datasets have made it possible to predict heart disease more effectively. Traditional machine learning approaches provide statistical correlations, but often lack explicit integration of clinical knowledge, which limits their usefulness in real-world scenarios. This paper investigates the use of Large Language Model (LLM) combined with Retrieval-Augmented Generation (RAG) to derive clinically grounded feature relevance based on medical guidelines. A curated corpus of medical guidelines and practice protocols from internationally approved organizations was used to train the RAG pipeline. The features were ranked using LLM powered by RAG, and themost important features were selected and used in a Support Vector Machine (SVM) with a custom kernel. A custom formulation combining linear and non linear functions were explored as an auxiliary modeling component. This enables the model to keep the clinical importance of the features, linear transparency and also captures complex interactions using a polynomial function. This approach is evaluated on the UCI Heart Disease dataset, which includes data from Cleveland, Hungary, Switzerland, and VA Medical Center in Long Beach. This study conducted in two parts one using only Cleveland alone and a full set of data using all 4 regions. This integration of statistical learning with LLM driven reasoning supports cardiovascular risk assessment in a clinically informed manner. This approach helps to identify clinically relevant features for the learning process. On the Cleveland dataset the model achieved an accuracy of 95%, an F1 score of 0.936, and an AUC-ROC of 0.973, but it was comparable with traditional models and without weighted kernel due to the size of the data. When applied on the combined dataset, using the entire UCI dataset, the model achieved an accuracy of 93.3%, F1 score 0.923 and AUC-ROC of 0.961. Statistical testing showed that the weighted and unweighted kernels performed similarly, suggesting that the primary contribution arises from clinically guided feature selection rather than kernel weighting. The combination of statistical methods and reasoning from LLM models improves both the effectiveness and clarity of predictions. This process helps develop clinically informed AI systems for cardiovascular risk assessment. This paper also includes a comparative study of logistic regression, decision tree, random forest, gradient boosting, and support vector machine with RBF, sigmoid, linear and polynomial kernels.
心脏病是全世界死亡的主要原因之一,每年造成数百万人死亡。因此,在早期阶段发现心脏病至关重要,这样就可以开始制定包括药物和咨询在内的治疗计划。在大型数据集上训练的机器学习(ML)算法使更有效地预测心脏病成为可能。传统的机器学习方法提供统计相关性,但往往缺乏临床知识的明确整合,这限制了它们在现实世界中的实用性。本文研究了使用大型语言模型(LLM)结合检索增强生成(RAG)来获得基于医疗指南的临床基础特征相关性。国际认可组织的医疗准则和实践规程的精选语料库用于培训RAG管道。使用由RAG支持的LLM对特征进行排序,并选择最重要的特征并在具有自定义内核的支持向量机(SVM)中使用。结合线性和非线性函数的自定义公式作为辅助建模组件进行了探索。这使模型能够保持特征的临床重要性,线性透明度,并使用多项式函数捕获复杂的相互作用。该方法在UCI心脏病数据集上进行了评估,该数据集包括来自克利夫兰、匈牙利、瑞士和长滩VA医疗中心的数据。本研究分两部分进行,第一部分仅使用克利夫兰,另一部分使用所有4个地区的全套数据。这种统计学学习与法学硕士驱动推理的整合支持心血管风险评估在临床知情的方式。这种方法有助于识别学习过程的临床相关特征。在Cleveland数据集上,模型的准确率为95%,F1得分为0.936,AUC-ROC为0.973,但由于数据的大小,它与传统模型具有可比性,并且没有加权核。当应用于组合数据集时,使用整个UCI数据集,该模型的准确率为93.3%,F1得分为0.923,AUC-ROC为0.961。统计测试表明,加权和未加权的核表现相似,这表明主要贡献来自临床指导的特征选择,而不是核加权。统计方法和LLM模型推理的结合提高了预测的有效性和清晰度。这一过程有助于开发临床知情的人工智能系统,用于心血管风险评估。本文还比较研究了逻辑回归、决策树、随机森林、梯度增强和支持向量机与RBF、s型核、线性核和多项式核的关系。
{"title":"Enhanced Heart disease prediction using LLM ranked feature selection, Dynamic custom Kernel","authors":"Nikesh P.L. ,&nbsp;Sebastian Terence ,&nbsp;Anishin Raj ,&nbsp;Jude Immaculate ,&nbsp;Deepak Mishra","doi":"10.1016/j.mlwa.2026.100860","DOIUrl":"10.1016/j.mlwa.2026.100860","url":null,"abstract":"<div><div>Heart disease, a major cause of death worldwide, accounts for millions of deaths each year. This makes it critical to detect heart disease at an earlier stage so that a treatment plan, including medications and counseling, can be started. Machine learning (ML) algorithms trained on large datasets have made it possible to predict heart disease more effectively. Traditional machine learning approaches provide statistical correlations, but often lack explicit integration of clinical knowledge, which limits their usefulness in real-world scenarios. This paper investigates the use of Large Language Model (LLM) combined with Retrieval-Augmented Generation (RAG) to derive clinically grounded feature relevance based on medical guidelines. A curated corpus of medical guidelines and practice protocols from internationally approved organizations was used to train the RAG pipeline. The features were ranked using LLM powered by RAG, and themost important features were selected and used in a Support Vector Machine (SVM) with a custom kernel. A custom formulation combining linear and non linear functions were explored as an auxiliary modeling component. This enables the model to keep the clinical importance of the features, linear transparency and also captures complex interactions using a polynomial function. This approach is evaluated on the UCI Heart Disease dataset, which includes data from Cleveland, Hungary, Switzerland, and VA Medical Center in Long Beach. This study conducted in two parts one using only Cleveland alone and a full set of data using all 4 regions. This integration of statistical learning with LLM driven reasoning supports cardiovascular risk assessment in a clinically informed manner. This approach helps to identify clinically relevant features for the learning process. On the Cleveland dataset the model achieved an accuracy of 95%, an F1 score of 0.936, and an AUC-ROC of 0.973, but it was comparable with traditional models and without weighted kernel due to the size of the data. When applied on the combined dataset, using the entire UCI dataset, the model achieved an accuracy of 93.3%, F1 score 0.923 and AUC-ROC of 0.961. Statistical testing showed that the weighted and unweighted kernels performed similarly, suggesting that the primary contribution arises from clinically guided feature selection rather than kernel weighting. The combination of statistical methods and reasoning from LLM models improves both the effectiveness and clarity of predictions. This process helps develop clinically informed AI systems for cardiovascular risk assessment. This paper also includes a comparative study of logistic regression, decision tree, random forest, gradient boosting, and support vector machine with RBF, sigmoid, linear and polynomial kernels.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100860"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146188223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning based adaptive soft error mitigation efficiency 基于机器学习的自适应软错误缓解效率
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-11-25 DOI: 10.1016/j.mlwa.2025.100797
Nicholas Maurer, Mohammed Abdallah
This work presents a novel adaptive framework for soft error mitigation in space-based systems, designed to resolve the fundamental conflict between system performance and radiation protection. By leveraging a Long Short-Term Memory (LSTM) model to predict real-time solar particle flux, our approach dynamically enables or disables software-based mitigation techniques. This contrasts with the static, "always-on" methods of existing systems, offering a significant improvement in computational efficiency. The proposed LSTM model was trained on NASA solar particle flux data, achieving a mean average error of 7.65e-6, demonstrating its high accuracy in predicting nonlinear particle events. Our simulation, which applies this predictive model to a tiered system of redundant processing, checkpointing, and watchdog timers, shows a substantial reduction in overhead. During the 18,414-second test period, the combined adaptive mitigation methods introduced only 20.75–51.6 s of overhead, representing a 99.4 % reduction in overhead compared to continuous, static mitigation. This research's primary contribution is a demonstrated proof-of-concept for an intelligent, self-adaptive system that can maintain high reliability while drastically improving performance. This approach provides a pathway for utilizing more cost-effective commercial-off-the-shelf (COTS) processors in radiation-intensive environments.
这项工作提出了一种新的自适应框架,用于天基系统的软误差缓解,旨在解决系统性能和辐射防护之间的根本冲突。通过利用长短期记忆(LSTM)模型来预测实时太阳粒子通量,我们的方法动态启用或禁用基于软件的缓解技术。这与现有系统的静态“永远在线”方法形成对比,显著提高了计算效率。本文提出的LSTM模型在NASA太阳粒子通量数据上进行了训练,平均误差为7.65e-6,对非线性粒子事件的预测精度较高。我们的模拟将该预测模型应用于冗余处理、检查点和看门狗计时器的分层系统,结果显示开销大幅减少。在18414秒的测试期间,组合的自适应缓解方法只带来了20.75-51.6秒的开销,与连续的静态缓解相比,减少了99.4%的开销。这项研究的主要贡献是对智能自适应系统的概念验证,该系统可以在大幅提高性能的同时保持高可靠性。这种方法为在辐射密集的环境中利用更具成本效益的商用现成(COTS)处理器提供了一条途径。
{"title":"Machine learning based adaptive soft error mitigation efficiency","authors":"Nicholas Maurer,&nbsp;Mohammed Abdallah","doi":"10.1016/j.mlwa.2025.100797","DOIUrl":"10.1016/j.mlwa.2025.100797","url":null,"abstract":"<div><div>This work presents a novel adaptive framework for soft error mitigation in space-based systems, designed to resolve the fundamental conflict between system performance and radiation protection. By leveraging a Long Short-Term Memory (LSTM) model to predict real-time solar particle flux, our approach dynamically enables or disables software-based mitigation techniques. This contrasts with the static, \"always-on\" methods of existing systems, offering a significant improvement in computational efficiency. The proposed LSTM model was trained on NASA solar particle flux data, achieving a mean average error of 7.65e-6, demonstrating its high accuracy in predicting nonlinear particle events. Our simulation, which applies this predictive model to a tiered system of redundant processing, checkpointing, and watchdog timers, shows a substantial reduction in overhead. During the 18,414-second test period, the combined adaptive mitigation methods introduced only 20.75–51.6 s of overhead, representing a 99.4 % reduction in overhead compared to continuous, static mitigation. This research's primary contribution is a demonstrated proof-of-concept for an intelligent, self-adaptive system that can maintain high reliability while drastically improving performance. This approach provides a pathway for utilizing more cost-effective commercial-off-the-shelf (COTS) processors in radiation-intensive environments.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100797"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145624850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TQC: An intelligent clustering approach for large-scale, noisy, and imbalanced data TQC:一种针对大规模、嘈杂和不平衡数据的智能聚类方法
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-11-26 DOI: 10.1016/j.mlwa.2025.100800
Ali Asghari
As an unsupervised learning method, clustering is a critical technique in artificial intelligence for organizing raw data into meaningful groups. In this process, data is partitioned based on the internal similarity of members within the same cluster and the maximum external distance from other clusters. Beyond business analytics, healthcare, economics, and other fields, clustering has been widely applied across disciplines. Extracting practical knowledge from large datasets relies on an effective clustering technique. Processing speed, especially for large datasets, handling noisy data and outliers, and ensuring high accuracy are the main challenges in clustering. These problems are especially significant in contemporary applications, where heterogeneous and inherently noisy datasets are prevalent. Combining the Trees Social Relation Algorithm (TSR) with the Queue Learning (QL) algorithm, the proposed approach, TQC (Tree-Queue Clustering), addresses these problems. While the QL algorithm enhances clustering accuracy, the TSR method focuses on accelerating clustering. The suggested approach first divides the data into smaller groups. Then, by effectively computing group memberships, TSR's migration process causes clusters to develop progressively. Handling noise and outliers helps the QL algorithm prevent local optima and improve clustering efficiency. This hybrid approach ensures the formation of high-quality clusters and accelerates convergence. The suggested method is validated across several real-world datasets of varying sizes and properties. Experimental results, evaluated using five performance metrics — MICD, ARI, NMI, ET, and ODR — and compared with eight state-of-the-art algorithms, demonstrate the proposed method's superior performance in both speed and accuracy.
聚类作为一种无监督学习方法,是人工智能中将原始数据组织成有意义组的关键技术。在此过程中,根据同一簇内成员的内部相似度和与其他簇的最大外部距离对数据进行分区。除了业务分析、医疗保健、经济学和其他领域之外,集群还被广泛应用于各个学科。从大型数据集中提取实用知识依赖于有效的聚类技术。处理速度,特别是对于大型数据集,处理噪声数据和异常值,并确保高准确性是聚类的主要挑战。这些问题在当代应用中尤为重要,因为异构和固有噪声数据集很普遍。将树社会关系算法(TSR)与队列学习算法(QL)相结合,提出的树队列聚类(TQC)方法解决了这些问题。QL算法提高了聚类的精度,而TSR方法侧重于加速聚类。建议的方法首先将数据分成较小的组。然后,通过有效地计算群组成员,TSR的迁移过程使集群逐步发展。处理噪声和异常值有助于QL算法防止局部最优,提高聚类效率。这种混合方式保证了高质量集群的形成,加速了收敛。建议的方法在几个不同大小和属性的真实数据集上进行了验证。实验结果,使用五个性能指标(MICD, ARI, NMI, ET和ODR)进行评估,并与八种最先进的算法进行比较,证明了该方法在速度和准确性方面的优越性能。
{"title":"TQC: An intelligent clustering approach for large-scale, noisy, and imbalanced data","authors":"Ali Asghari","doi":"10.1016/j.mlwa.2025.100800","DOIUrl":"10.1016/j.mlwa.2025.100800","url":null,"abstract":"<div><div>As an unsupervised learning method, clustering is a critical technique in artificial intelligence for organizing raw data into meaningful groups. In this process, data is partitioned based on the internal similarity of members within the same cluster and the maximum external distance from other clusters. Beyond business analytics, healthcare, economics, and other fields, clustering has been widely applied across disciplines. Extracting practical knowledge from large datasets relies on an effective clustering technique. Processing speed, especially for large datasets, handling noisy data and outliers, and ensuring high accuracy are the main challenges in clustering. These problems are especially significant in contemporary applications, where heterogeneous and inherently noisy datasets are prevalent. Combining the Trees Social Relation Algorithm (TSR) with the Queue Learning (QL) algorithm, the proposed approach, TQC (Tree-Queue Clustering), addresses these problems. While the QL algorithm enhances clustering accuracy, the TSR method focuses on accelerating clustering. The suggested approach first divides the data into smaller groups. Then, by effectively computing group memberships, TSR's migration process causes clusters to develop progressively. Handling noise and outliers helps the QL algorithm prevent local optima and improve clustering efficiency. This hybrid approach ensures the formation of high-quality clusters and accelerates convergence. The suggested method is validated across several real-world datasets of varying sizes and properties. Experimental results, evaluated using five performance metrics — MICD, ARI, NMI, ET, and ODR — and compared with eight state-of-the-art algorithms, demonstrate the proposed method's superior performance in both speed and accuracy.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100800"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145624851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine-interactive decision-assistance using a pre-trained natural language processing model for 4D printing technique selection 使用预训练的自然语言处理模型进行4D打印技术选择的机器交互决策辅助
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-30 DOI: 10.1016/j.mlwa.2025.100833
Chandramohan Abhishek , Nadimpalli Raghukiran
The present research showcases a machine-interactive approach for making decisions using a pre-trained natural language processing (NLP) model. The method is developed for 4D (4-dimensional) printing technique selection, as a plurality of variables is involved, such as process, material, design, and sequence selections. Due to the availability of numerous options, arriving at a preferred choice of technique requires expertise and time. The developed method aids in finding assistance from a single source. The approach incorporates bidirectional encoder representations from transformers (BERT), which accommodates parallel meanings of user requests, such as synonyms and adjectives, among others. The closed-loop system is programmed with a set of 7 prompts. It also introduces additional affirmation prompts to navigate both ambiguous phrasing and out-of-scope detection in order to receive a meaningful recommendation from the machine. The rule-governed technique (lightweight rule set) guides the selection of the conformable request during each prompt. The inference-based approach takes user requests, performs objective classification using BERT according to selected criteria, then dynamically filters the data, and recommends suggestions, with an inference time of 0.79 s. The modified model also establishes multi-level relationships among prompts for text classification. k-fold validation reached highest possible accuracy upon training with optimal hyperparameters. The fine-tuned method developed in Python environment can be generalized for other systems. The present research demonstrates the possibility of adapting an openly accessible model for developing a decision-assistance system with minimal personal computational resources.
本研究展示了使用预训练的自然语言处理(NLP)模型进行决策的机器交互方法。该方法是为4D(四维)打印技术选择而开发的,因为涉及多个变量,如工艺,材料,设计和顺序选择。由于可供选择的方法很多,要找到一种最佳的技术需要专业知识和时间。开发的方法有助于从单一来源寻求帮助。该方法结合了来自转换器(BERT)的双向编码器表示,它可以容纳用户请求的并行含义,例如同义词和形容词等。闭环系统由一组7个提示程序编程。它还引入了额外的确认提示,以导航模糊的短语和超出范围的检测,以便从机器接收有意义的推荐。规则控制的技术(轻量级规则集)指导在每个提示期间选择符合的请求。基于推理的方法接受用户请求,根据选择的标准使用BERT进行客观分类,然后动态过滤数据并推荐建议,推理时间为0.79 s。修改后的模型还建立了文本分类提示之间的多级关系。K-fold验证在最优超参数训练后达到最高可能的准确性。在Python环境中开发的微调方法可以推广到其他系统。目前的研究表明,采用开放可访问的模型来开发具有最小个人计算资源的决策辅助系统的可能性。
{"title":"Machine-interactive decision-assistance using a pre-trained natural language processing model for 4D printing technique selection","authors":"Chandramohan Abhishek ,&nbsp;Nadimpalli Raghukiran","doi":"10.1016/j.mlwa.2025.100833","DOIUrl":"10.1016/j.mlwa.2025.100833","url":null,"abstract":"<div><div>The present research showcases a machine-interactive approach for making decisions using a pre-trained natural language processing (NLP) model. The method is developed for 4D (4-dimensional) printing technique selection, as a plurality of variables is involved, such as process, material, design, and sequence selections. Due to the availability of numerous options, arriving at a preferred choice of technique requires expertise and time. The developed method aids in finding assistance from a single source. The approach incorporates bidirectional encoder representations from transformers (BERT), which accommodates parallel meanings of user requests, such as synonyms and adjectives, among others. The closed-loop system is programmed with a set of 7 prompts. It also introduces additional affirmation prompts to navigate both ambiguous phrasing and out-of-scope detection in order to receive a meaningful recommendation from the machine. The rule-governed technique (lightweight rule set) guides the selection of the conformable request during each prompt. The inference-based approach takes user requests, performs objective classification using BERT according to selected criteria, then dynamically filters the data, and recommends suggestions, with an inference time of 0.79 s. The modified model also establishes multi-level relationships among prompts for text classification. k-fold validation reached highest possible accuracy upon training with optimal hyperparameters. The fine-tuned method developed in Python environment can be generalized for other systems. The present research demonstrates the possibility of adapting an openly accessible model for developing a decision-assistance system with minimal personal computational resources.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100833"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synonym extraction from Japanese patent documents using term definition sentences 使用术语定义句从日语专利文件中提取同义词
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-01-21 DOI: 10.1016/j.mlwa.2026.100848
Koji Marusaki , Seiya Kawano , Asahi Hentona , Hirofumi Nonaka
Conducting prior patent searches before developing technologies and filing patent applications in companies or universities is essential for understanding technological trends among competitors and academic institutions, as well as for increasing the likelihood of obtaining patent rights. In these searches, it is important not only to include relevant keywords in the search queries but also to incorporate related terms retrieved from a thesaurus. To support this, methods using word embeddings for automatically extracting such synonyms have recently been proposed. However, patent documents often contain unique expressions and compound terms, such as specialized technical terminology and abstract conceptual terms, which are difficult to accurately capture using existing large language models trained at the token level.
In this study, we investigate a method for extracting synonyms from patent documents by embedding the definition sentences that explain technical terms. The experimental results demonstrate that the proposed method achieves more precise synonym extraction than conventional word embedding approaches, and it can contribute to the expansion of existing thesauri.
Thus, this research is expected to improve the recall of prior art searches and support the automatic extraction of technical elements for identifying technological trends.
在开发技术和向公司或大学提交专利申请之前进行事先专利检索,对于了解竞争对手和学术机构之间的技术趋势以及增加获得专利权的可能性至关重要。在这些搜索中,重要的是不仅要在搜索查询中包含相关的关键字,而且要合并从同义词库中检索到的相关术语。为了支持这一点,最近提出了使用词嵌入来自动提取此类同义词的方法。然而,专利文献通常包含独特的表达和复合术语,例如专门的技术术语和抽象的概念术语,使用在令牌级别训练的现有大型语言模型很难准确捕获这些术语。在这项研究中,我们研究了一种通过嵌入解释技术术语的定义句来从专利文件中提取同义词的方法。实验结果表明,该方法比传统的词嵌入方法获得了更精确的同义词提取,并有助于现有同义词库的扩展。因此,本研究有望提高现有技术检索的召回率,并支持自动提取技术元素以识别技术趋势。
{"title":"Synonym extraction from Japanese patent documents using term definition sentences","authors":"Koji Marusaki ,&nbsp;Seiya Kawano ,&nbsp;Asahi Hentona ,&nbsp;Hirofumi Nonaka","doi":"10.1016/j.mlwa.2026.100848","DOIUrl":"10.1016/j.mlwa.2026.100848","url":null,"abstract":"<div><div>Conducting prior patent searches before developing technologies and filing patent applications in companies or universities is essential for understanding technological trends among competitors and academic institutions, as well as for increasing the likelihood of obtaining patent rights. In these searches, it is important not only to include relevant keywords in the search queries but also to incorporate related terms retrieved from a thesaurus. To support this, methods using word embeddings for automatically extracting such synonyms have recently been proposed. However, patent documents often contain unique expressions and compound terms, such as specialized technical terminology and abstract conceptual terms, which are difficult to accurately capture using existing large language models trained at the token level.</div><div>In this study, we investigate a method for extracting synonyms from patent documents by embedding the definition sentences that explain technical terms. The experimental results demonstrate that the proposed method achieves more precise synonym extraction than conventional word embedding approaches, and it can contribute to the expansion of existing thesauri.</div><div>Thus, this research is expected to improve the recall of prior art searches and support the automatic extraction of technical elements for identifying technological trends.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100848"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PRCSL: A privacy-preserving continual split learning framework for decentralized medical diagnosis PRCSL:用于分散医疗诊断的隐私保护持续分裂学习框架
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-29 DOI: 10.1016/j.mlwa.2025.100828
Jungmin Eom , Minjun Kang , Myungkeun Yoon , Nikil Dutt , Jinkyu Kim , Jaekoo Lee
Deep learning-based medical AI systems are increasingly deployed for disease diagnosis in decentralized healthcare environments where data are siloed across hospitals and IoT devices and cannot be freely shared due to strict privacy and security regulations. However, most existing continual learning and distributed learning approaches either assume centrally aggregated data or overlook incremental clinical changes, leading to catastrophic forgetting when applied to real-world medical data streams.
This paper introduces a novel healthcare-specific framework that integrates continual learning and distributed learning methods to utilize medical AI models effectively by addressing the practical constraints of the healthcare and medical ecosystem, such as data privacy, security, and changing clinical environments. Through the proposed framework, medical clients, such as hospital devices and IoT-based smart devices, can collaboratively train deep learning-based models on distributed computing resources without sharing sensitive data. Additionally, by considering incremental characteristics in medical environments such as mutations, new diseases, and abnormalities, the proposed framework can improve the disease diagnosis of medical AI models in actual clinical scenarios.
We propose Privacy-preserving Rehearsal-based Continual Split Learning (PRCSL), a healthcare-specific continual split learning framework that combines differential-privacy-based exemplar sharing, a mutual information alignment (MIA) module to correct representation shifts induced by noisy exemplars, and a parameter-free nearest-mean-of-exemplars (NME) classifier to mitigate task-recency bias under non-IID data distributions. o=Across eight benchmark datasets, including four MedMNIST subsets, HAM10000, CCH5000, c=CIFAR,cp=, p=100, and SVHN, PRCSL achieves competitive performance compared with representative continual learning baselines in terms of average accuracy and average forgetting. In particular, PRCSL achieves up to 3.62%p higher average accuracy than the best baseline. These results indicate that PRCSL enables privacy-preserving, communication-efficient, and continually adaptable medical AI in realistic decentralized clinical and IoT-enabled ecosystems. Our code is publicly available at our repository.
基于深度学习的医疗人工智能系统越来越多地部署在分散的医疗环境中进行疾病诊断,这些环境中的数据分散在医院和物联网设备之间,由于严格的隐私和安全法规,无法自由共享。然而,大多数现有的持续学习和分布式学习方法要么假设集中汇总的数据,要么忽略增量临床变化,在应用于现实世界的医疗数据流时导致灾难性的遗忘。本文介绍了一种新的医疗保健特定框架,该框架集成了持续学习和分布式学习方法,通过解决医疗保健和医疗生态系统的实际限制,如数据隐私、安全性和不断变化的临床环境,有效地利用医疗人工智能模型。通过提出的框架,医疗客户端(如医院设备和基于物联网的智能设备)可以在不共享敏感数据的情况下,在分布式计算资源上协同训练基于深度学习的模型。此外,通过考虑突变、新疾病、异常等医疗环境中的增量特征,该框架可以提高医疗AI模型在实际临床场景中的疾病诊断能力。我们提出了一种基于隐私保护预演的持续分裂学习(PRCSL),这是一种医疗保健特定的持续分裂学习框架,它结合了基于差分隐私的范例共享,一个相互信息校准(MIA)模块来纠正由噪声范例引起的表示移位,以及一个无参数的最接近范例均值(NME)分类器来减轻非iid数据分布下的任务近因偏差。在八个基准数据集上,包括四个MedMNIST子集,HAM10000, CCH5000, c=CIFAR,cp=, p=100和SVHN, PRCSL在平均准确率和平均遗忘方面与代表性的持续学习基线相比具有竞争力。特别是,PRCSL的平均准确度比最佳基线高出3.62%p。这些结果表明,PRCSL能够在现实的分散临床和物联网生态系统中实现隐私保护、通信高效和持续适应性强的医疗人工智能。我们的代码在我们的存储库中是公开的。
{"title":"PRCSL: A privacy-preserving continual split learning framework for decentralized medical diagnosis","authors":"Jungmin Eom ,&nbsp;Minjun Kang ,&nbsp;Myungkeun Yoon ,&nbsp;Nikil Dutt ,&nbsp;Jinkyu Kim ,&nbsp;Jaekoo Lee","doi":"10.1016/j.mlwa.2025.100828","DOIUrl":"10.1016/j.mlwa.2025.100828","url":null,"abstract":"<div><div>Deep learning-based medical AI systems are increasingly deployed for disease diagnosis in decentralized healthcare environments where data are siloed across hospitals and IoT devices and cannot be freely shared due to strict privacy and security regulations. However, most existing continual learning and distributed learning approaches either assume centrally aggregated data or overlook incremental clinical changes, leading to catastrophic forgetting when applied to real-world medical data streams.</div><div>This paper introduces a novel healthcare-specific framework that integrates continual learning and distributed learning methods to utilize medical AI models effectively by addressing the practical constraints of the healthcare and medical ecosystem, such as data privacy, security, and changing clinical environments. Through the proposed framework, medical clients, such as hospital devices and IoT-based smart devices, can collaboratively train deep learning-based models on distributed computing resources without sharing sensitive data. Additionally, by considering incremental characteristics in medical environments such as mutations, new diseases, and abnormalities, the proposed framework can improve the disease diagnosis of medical AI models in actual clinical scenarios.</div><div>We propose Privacy-preserving Rehearsal-based Continual Split Learning (PRCSL), a healthcare-specific continual split learning framework that combines differential-privacy-based exemplar sharing, a mutual information alignment (MIA) module to correct representation shifts induced by noisy exemplars, and a parameter-free nearest-mean-of-exemplars (NME) classifier to mitigate task-recency bias under non-IID data distributions. o=Across eight benchmark datasets, including four MedMNIST subsets, HAM10000, CCH5000, c=CIFAR,cp=, p=100, and SVHN, PRCSL achieves competitive performance compared with representative continual learning baselines in terms of average accuracy and average forgetting. In particular, PRCSL achieves up to 3.62%p higher average accuracy than the best baseline. These results indicate that PRCSL enables privacy-preserving, communication-efficient, and continually adaptable medical AI in realistic decentralized clinical and IoT-enabled ecosystems. Our code is publicly available at our repository.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100828"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine learning with applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1