首页 > 最新文献

Machine learning with applications最新文献

英文 中文
BCLSA: Advancing Bangla sentiment analysis with concept-level reasoning and efficiency BCLSA:以概念级推理和效率推进孟加拉语情绪分析
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-11-08 DOI: 10.1016/j.mlwa.2025.100793
Mohammad Aman Ullah
Accurate sentiment analysis in Bangla remains a significant research challenge due to limited annotated corpora, complex morphology, insufficient linguistic resources, and the absence of interpretable concept-level knowledge bases. Existing approaches often struggle to capture context-dependent sentiment, idiomatic expressions, and domain adaptability, further constrained by the low-resource nature of the language. To address these limitations, this study proposes the Bangla Concept-Level Sentiment Analysis (BCLSA) framework, introducing two dedicated algorithms: a Bangla-specific concept extraction method and the Concept-Level Sentiment Analysis for Bangla (CLSA-Bn) weighted scoring algorithm. The first extracts sentiment-bearing concepts through syntactic pattern recognition, multiword expression detection, and affective lexicon mapping, while the second refines polarity estimation via negation handling, modifier scaling, and weighted aggregation for interpretable classification. To mitigate data scarcity and morphological variation, BCLSA applies language-specific preprocessing, including Unicode normalization, phonetic correction, and lemmatization. Evaluations on 10,243 formal news articles and 12,084 informal social media comments show that CLSA-Bn outperforms the Bi-LSTM and SVM baselines, achieving 90.2 % Accuracy, 90 % Macro-F1, 85 % Matthews Correlation Coefficient (MCC), and 94 % Area Under the Curve (AUC) for formal text, and 86.8 % Accuracy, 86 % Macro-F1, and 91 % AUC for informal text. The proposed Concept-Level Polarity Accuracy (CLPA) metric confirmed semantic fidelity above 88 %. Efficiency analysis revealed that CLSA-Bn requires only 30 s initialization, 5 ms inference, and a 50 MB model. Error rate analysis further confirmed robustness with the lowest misclassification ratios (9.8 % formal, 13.2 % informal), demonstrating balanced improvement in performance and error minimization.
由于孟加拉语的标注语料库有限、形态复杂、语言资源不足以及缺乏可解释的概念级知识库,准确的孟加拉语情感分析仍然是一个重大的研究挑战。现有的方法常常难以捕捉上下文相关的情感、习惯表达和领域适应性,这进一步受到语言低资源特性的限制。为了解决这些限制,本研究提出了孟加拉语概念级情感分析(BCLSA)框架,引入了两种专用算法:孟加拉语特定概念提取方法和孟加拉语概念级情感分析(CLSA-Bn)加权评分算法。第一种方法通过句法模式识别、多词表达检测和情感词汇映射提取情感承载概念,而第二种方法通过否定处理、修饰语缩放和可解释分类的加权聚合来改进极性估计。为了减轻数据稀缺性和形态变化,BCLSA应用了特定于语言的预处理,包括Unicode规范化、语音校正和词法化。对10,243篇正式新闻文章和12,084篇非正式社交媒体评论的评估表明,CLSA-Bn优于Bi-LSTM和SVM基线,在正式文本中达到90.2%的准确率、90%的宏观f1、85%的马修斯相关系数(MCC)和94%的曲线下面积(AUC),在非正式文本中达到86.8%的准确率、86%的宏观f1和91%的AUC。提出的概念级极性精度(CLPA)度量确认语义保真度超过88%。效率分析表明,CLSA-Bn只需要30秒初始化,5毫秒推理和50 MB模型。错误率分析进一步证实了鲁棒性,最低的错误分类比率(9.8%正式,13.2%非正式),证明了性能和错误最小化的平衡改进。
{"title":"BCLSA: Advancing Bangla sentiment analysis with concept-level reasoning and efficiency","authors":"Mohammad Aman Ullah","doi":"10.1016/j.mlwa.2025.100793","DOIUrl":"10.1016/j.mlwa.2025.100793","url":null,"abstract":"<div><div>Accurate sentiment analysis in Bangla remains a significant research challenge due to limited annotated corpora, complex morphology, insufficient linguistic resources, and the absence of interpretable concept-level knowledge bases. Existing approaches often struggle to capture context-dependent sentiment, idiomatic expressions, and domain adaptability, further constrained by the low-resource nature of the language. To address these limitations, this study proposes the Bangla Concept-Level Sentiment Analysis (BCLSA) framework, introducing two dedicated algorithms: a Bangla-specific concept extraction method and the Concept-Level Sentiment Analysis for Bangla (CLSA-Bn) weighted scoring algorithm. The first extracts sentiment-bearing concepts through syntactic pattern recognition, multiword expression detection, and affective lexicon mapping, while the second refines polarity estimation via negation handling, modifier scaling, and weighted aggregation for interpretable classification. To mitigate data scarcity and morphological variation, BCLSA applies language-specific preprocessing, including Unicode normalization, phonetic correction, and lemmatization. Evaluations on 10,243 formal news articles and 12,084 informal social media comments show that CLSA-Bn outperforms the Bi-LSTM and SVM baselines, achieving 90.2 % Accuracy, 90 % Macro-F1, 85 % Matthews Correlation Coefficient (MCC), and 94 % Area Under the Curve (AUC) for formal text, and 86.8 % Accuracy, 86 % Macro-F1, and 91 % AUC for informal text. The proposed Concept-Level Polarity Accuracy (CLPA) metric confirmed semantic fidelity above 88 %. Efficiency analysis revealed that CLSA-Bn requires only 30 s initialization, 5 ms inference, and a 50 MB model. Error rate analysis further confirmed robustness with the lowest misclassification ratios (9.8 % formal, 13.2 % informal), demonstrating balanced improvement in performance and error minimization.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100793"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating deep learning and econometrics for stock price prediction: A comprehensive comparison of LSTM, transformers, and traditional time series models 整合深度学习和计量经济学用于股票价格预测:LSTM、变压器和传统时间序列模型的综合比较
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-09-10 DOI: 10.1016/j.mlwa.2025.100730
Eyas Gaffar A. Osman, Faisal A. Otaibi
This study presents a comprehensive empirical comparison between state-of-the-art deep learning models including Long Short-Term Memory (LSTM) networks, Transformer architectures, and traditional econometric models (ARIMA and VAR) for stock price prediction, with particular focus on performance during the COVID-19 pandemic crisis. Using daily S&P 500 data from 2015 to 2020, we rigorously evaluate model performance across multiple metrics including Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). Our findings demonstrate that while Transformer models achieve the best overall performance with an RMSE of 41.87 and directional accuracy of 69.1 %, LSTM networks provide an optimal balance between performance (RMSE: 43.25) and computational efficiency. Both deep learning approaches significantly outperform traditional econometric methods, with LSTM achieving a 53.3 % reduction in RMSE compared to ARIMA models. During the COVID-19 crisis period, deep learning models demonstrated exceptional robustness, with Transformers showing only 45 % performance degradation compared to over 100 % degradation in traditional models. Through comprehensive attention analysis, we provide insights into model interpretability, revealing adaptive behavior across market regimes. The study contributes to the growing literature on artificial intelligence applications in finance by providing rigorous empirical evidence for the superiority of modern deep learning approaches, while addressing the critical need for comparison with cutting-edge Transformer architectures that have revolutionized machine learning in recent years.
本研究对最先进的深度学习模型(包括长短期记忆(LSTM)网络、Transformer架构)和传统计量经济学模型(ARIMA和VAR)进行了全面的实证比较,并特别关注了2019冠状病毒病大流行危机期间的表现。使用2015年至2020年的每日标准普尔500指数数据,我们严格评估了多个指标的模型性能,包括均方根误差(RMSE)、平均绝对误差(MAE)和平均绝对百分比误差(MAPE)。我们的研究结果表明,虽然Transformer模型的整体性能最佳,RMSE为41.87,方向精度为69.1%,但LSTM网络在性能(RMSE: 43.25)和计算效率之间提供了最佳平衡。两种深度学习方法都明显优于传统的计量经济学方法,与ARIMA模型相比,LSTM的RMSE降低了53.3%。在2019冠状病毒病危机期间,深度学习模型表现出出色的鲁棒性,变形金刚的性能仅下降45%,而传统模型的性能下降超过100%。通过全面的注意力分析,我们提供了对模型可解释性的见解,揭示了市场机制中的适应性行为。该研究通过为现代深度学习方法的优越性提供严格的经验证据,为人工智能在金融领域的应用提供了越来越多的文献,同时解决了与近年来彻底改变机器学习的尖端Transformer架构进行比较的关键需求。
{"title":"Integrating deep learning and econometrics for stock price prediction: A comprehensive comparison of LSTM, transformers, and traditional time series models","authors":"Eyas Gaffar A. Osman,&nbsp;Faisal A. Otaibi","doi":"10.1016/j.mlwa.2025.100730","DOIUrl":"10.1016/j.mlwa.2025.100730","url":null,"abstract":"<div><div>This study presents a comprehensive empirical comparison between state-of-the-art deep learning models including Long Short-Term Memory (LSTM) networks, Transformer architectures, and traditional econometric models (ARIMA and VAR) for stock price prediction, with particular focus on performance during the COVID-19 pandemic crisis. Using daily S&amp;P 500 data from 2015 to 2020, we rigorously evaluate model performance across multiple metrics including Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). Our findings demonstrate that while Transformer models achieve the best overall performance with an RMSE of 41.87 and directional accuracy of 69.1 %, LSTM networks provide an optimal balance between performance (RMSE: 43.25) and computational efficiency. Both deep learning approaches significantly outperform traditional econometric methods, with LSTM achieving a 53.3 % reduction in RMSE compared to ARIMA models. During the COVID-19 crisis period, deep learning models demonstrated exceptional robustness, with Transformers showing only 45 % performance degradation compared to over 100 % degradation in traditional models. Through comprehensive attention analysis, we provide insights into model interpretability, revealing adaptive behavior across market regimes. The study contributes to the growing literature on artificial intelligence applications in finance by providing rigorous empirical evidence for the superiority of modern deep learning approaches, while addressing the critical need for comparison with cutting-edge Transformer architectures that have revolutionized machine learning in recent years.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100730"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Forecasting political voting: A high dimensional machine learning approach 预测政治投票:一种高维机器学习方法
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-09-23 DOI: 10.1016/j.mlwa.2025.100739
Pedro Caiua Campelo Albuquerque , Daniel Oliveira Cajueiro
We present a novel machine learning approach to predict voting patterns in Brazil’s Chamber of Deputies. Using a high-dimensional dataset and a time-series methodology, our models aim to accurately forecast legislative decisions. Unlike prior studies that often focus on single ideological dimensions, our approach integrates a broad feature set, including party guidelines, proposition characteristics, and deputy voting history, to improve predictive power. We train time-series models for each legislature, comparing ensembles like Random Forests and Gradient Boosting, which are validated using three-fold chronological splits to ensure temporal integrity. Our analysis highlights the significant influence of party guidelines and pork-barrel politics on voting behavior. Additionally, we identify key predictors, including the theme and source of the legislative proposition, as well as the deputies’ voting history. This work demonstrates the feasibility of accurately forecasting legislative votes, offering a valuable tool for stakeholders to anticipate legislative outcomes and enhancing the transparency of the political process.
我们提出了一种新的机器学习方法来预测巴西众议院的投票模式。使用高维数据集和时间序列方法,我们的模型旨在准确预测立法决策。与之前的研究不同,我们的方法通常集中在单一的意识形态维度上,我们的方法集成了广泛的特征集,包括政党指导方针、命题特征和副投票历史,以提高预测能力。我们为每个立法机构训练时间序列模型,比较随机森林和梯度增强等集合,这些集合使用三倍时间分裂进行验证,以确保时间完整性。我们的分析强调了政党指导方针和分肥政治对投票行为的重大影响。此外,我们确定了关键的预测因素,包括立法提案的主题和来源,以及代表的投票历史。这项工作证明了准确预测立法投票的可行性,为利益相关者预测立法结果和提高政治进程的透明度提供了有价值的工具。
{"title":"Forecasting political voting: A high dimensional machine learning approach","authors":"Pedro Caiua Campelo Albuquerque ,&nbsp;Daniel Oliveira Cajueiro","doi":"10.1016/j.mlwa.2025.100739","DOIUrl":"10.1016/j.mlwa.2025.100739","url":null,"abstract":"<div><div>We present a novel machine learning approach to predict voting patterns in Brazil’s Chamber of Deputies. Using a high-dimensional dataset and a time-series methodology, our models aim to accurately forecast legislative decisions. Unlike prior studies that often focus on single ideological dimensions, our approach integrates a broad feature set, including party guidelines, proposition characteristics, and deputy voting history, to improve predictive power. We train time-series models for each legislature, comparing ensembles like Random Forests and Gradient Boosting, which are validated using three-fold chronological splits to ensure temporal integrity. Our analysis highlights the significant influence of party guidelines and pork-barrel politics on voting behavior. Additionally, we identify key predictors, including the theme and source of the legislative proposition, as well as the deputies’ voting history. This work demonstrates the feasibility of accurately forecasting legislative votes, offering a valuable tool for stakeholders to anticipate legislative outcomes and enhancing the transparency of the political process.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100739"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-source plume tracing via multi-agent reinforcement learning under common UAV-faults 基于多智能体强化学习的无人机常见故障多源羽流追踪
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-09-18 DOI: 10.1016/j.mlwa.2025.100737
Pedro Antonio Alarcon Granadeno, Theodore Chambers, Jane Cleland-Huang
Hazardous airborne gas releases from accidents, leaks, or wildfires require rapid localization of emission sources under uncertain and turbulent conditions. Traditional gradient-based or biologically inspired strategies struggle in multi-source environments where odor cues are intermittent, aliased, and partially observed. We address this challenge by formulating multi-source plume tracing in three-dimensional fields as a cooperative partially observable Markov game. To solve it, we introduce an Action-Specific Double Deep Recurrent Q-Network (ADDRQN) that conditions on action–observation pairs to improve latent-state inference, and integrates teammate information through a permutation-invariant set encoder. Training follows a randomized centralized-training and decentralized-execution regime with host randomization, team-size variation, and noise injection. This yields a policy that is robust to agent failures (hardware malfunction, battery depletion, etc.), resilient to intermittent communication blackouts, and tolerant of sensor noise. Empirical evaluation in simulated Gaussian plume environments shows that ADDRQN achieves higher success rates and shorter localization times than non-action baselines, maintains strong performance under mid-mission disruptions, and scales predictably with team size.
事故、泄漏或野火释放的有害气体需要在不确定和湍流条件下快速定位排放源。传统的基于梯度或生物启发的策略在多源环境中挣扎,这些环境中的气味线索是间歇性的、混叠的和部分观察到的。我们通过制定多源羽流追踪在三维领域作为一个合作的部分可观察的马尔可夫博弈来解决这一挑战。为了解决这一问题,我们引入了一种特定动作的双深度循环q -网络(ADDRQN),该网络对动作观察对进行条件设置以改进潜在状态推断,并通过排列不变集编码器集成队友信息。训练遵循随机的集中训练和分散执行机制,包括主机随机化、团队规模变化和噪音注入。这产生了一种策略,该策略对代理故障(硬件故障、电池耗尽等)具有鲁棒性,对间歇性通信中断具有弹性,并且能够容忍传感器噪声。在模拟高斯羽流环境中的经验评估表明,与不行动基线相比,ADDRQN获得了更高的成功率和更短的定位时间,在任务中期中断时保持了较强的性能,并且随着团队规模的增加而可预测地扩展。
{"title":"Multi-source plume tracing via multi-agent reinforcement learning under common UAV-faults","authors":"Pedro Antonio Alarcon Granadeno,&nbsp;Theodore Chambers,&nbsp;Jane Cleland-Huang","doi":"10.1016/j.mlwa.2025.100737","DOIUrl":"10.1016/j.mlwa.2025.100737","url":null,"abstract":"<div><div>Hazardous airborne gas releases from accidents, leaks, or wildfires require rapid localization of emission sources under uncertain and turbulent conditions. Traditional gradient-based or biologically inspired strategies struggle in multi-source environments where odor cues are intermittent, aliased, and partially observed. We address this challenge by formulating multi-source plume tracing in three-dimensional fields as a cooperative partially observable Markov game. To solve it, we introduce an Action-Specific Double Deep Recurrent Q-Network (ADDRQN) that conditions on action–observation pairs to improve latent-state inference, and integrates teammate information through a permutation-invariant set encoder. Training follows a randomized centralized-training and decentralized-execution regime with host randomization, team-size variation, and noise injection. This yields a policy that is robust to agent failures (hardware malfunction, battery depletion, etc.), resilient to intermittent communication blackouts, and tolerant of sensor noise. Empirical evaluation in simulated Gaussian plume environments shows that ADDRQN achieves higher success rates and shorter localization times than non-action baselines, maintains strong performance under mid-mission disruptions, and scales predictably with team size.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100737"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145121157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Time series modeling of Monkeypox incidence in Central Africa’s endemic regions 中非流行区猴痘发病率的时间序列模型
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-11-01 DOI: 10.1016/j.mlwa.2025.100778
Chidozie Williams Chukwu , George Obaido , Ibomoiye Domor Mienye , Kehinde Aruleba , Ebenezer Esenogho , Cameron Modisane
Monkeypox is a re-emerging zoonotic viral disease endemic to the Democratic Republic of Congo (DRC) and other Central African countries, with recurrent outbreaks posing persistent public health threats. Accurate short-term forecasts of Mpox incidence are essential for guiding surveillance, preparedness, and timely interventions. In this study, we analyzed daily confirmed Mpox cases data from May 1, 2022 to May 31, 2025, using autoregressive integrated moving average and Prophet models, complemented by wavelet analysis. Model performance varied by country, reflecting differences in reporting the quality and outbreak intensity. Prophet generally outperformed ARIMA in settings with smoother incidence trajectories, while ARIMA was more effective in capturing abrupt local fluctuations. Wavelet analysis further revealed country-specific temporal–frequency patterns, highlighting differences in epidemic periodicity across the region. These findings underscore the utility of combining statistical and decompositional approaches for Mpox forecasting in resource-limited settings. Strengthening surveillance systems, improving data quality, and adopting flexible, country-specific forecasting frameworks will be critical for developing effective early-warning systems and guiding evidence-based interventions in Central Africa.
猴痘是刚果民主共和国(DRC)和其他中非国家流行的一种再次出现的人畜共患病毒性疾病,其反复暴发构成持续的公共卫生威胁。准确的Mpox发病率短期预测对于指导监测、防范和及时干预至关重要。在这项研究中,我们使用自回归综合移动平均和先知模型,辅以小波分析,分析了2022年5月1日至2025年5月31日每日确诊的Mpox病例数据。模型的表现因国家而异,反映了报告质量和疫情强度的差异。在更平滑的发生率轨迹设置中,Prophet通常优于ARIMA,而ARIMA在捕获突然的局部波动方面更有效。小波分析进一步揭示了各国特有的时间-频率模式,突出了该区域流行病周期性的差异。这些发现强调了在资源有限的情况下,将统计和分解方法结合起来进行Mpox预测的效用。加强监测系统、提高数据质量和采用灵活的、针对具体国家的预测框架,对于在中非建立有效的预警系统和指导循证干预措施至关重要。
{"title":"Time series modeling of Monkeypox incidence in Central Africa’s endemic regions","authors":"Chidozie Williams Chukwu ,&nbsp;George Obaido ,&nbsp;Ibomoiye Domor Mienye ,&nbsp;Kehinde Aruleba ,&nbsp;Ebenezer Esenogho ,&nbsp;Cameron Modisane","doi":"10.1016/j.mlwa.2025.100778","DOIUrl":"10.1016/j.mlwa.2025.100778","url":null,"abstract":"<div><div>Monkeypox is a re-emerging zoonotic viral disease endemic to the Democratic Republic of Congo (DRC) and other Central African countries, with recurrent outbreaks posing persistent public health threats. Accurate short-term forecasts of Mpox incidence are essential for guiding surveillance, preparedness, and timely interventions. In this study, we analyzed daily confirmed Mpox cases data from May 1, 2022 to May 31, 2025, using autoregressive integrated moving average and Prophet models, complemented by wavelet analysis. Model performance varied by country, reflecting differences in reporting the quality and outbreak intensity. Prophet generally outperformed ARIMA in settings with smoother incidence trajectories, while ARIMA was more effective in capturing abrupt local fluctuations. Wavelet analysis further revealed country-specific temporal–frequency patterns, highlighting differences in epidemic periodicity across the region. These findings underscore the utility of combining statistical and decompositional approaches for Mpox forecasting in resource-limited settings. Strengthening surveillance systems, improving data quality, and adopting flexible, country-specific forecasting frameworks will be critical for developing effective early-warning systems and guiding evidence-based interventions in Central Africa.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100778"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of knowledge distillation for onboard defect detection on an Unmanned Aircraft System for light aircraft general visual inspections 基于知识蒸馏的机载缺陷检测在轻型飞机目视检测系统中的实现
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-11-03 DOI: 10.1016/j.mlwa.2025.100782
Luke Connolly , James Garland , Diarmuid O’Gorman , Edmond F. Tobin
Visual inspections of aircraft are a vital part of routine procedures for maintenance personnel in the aviation industry. However, these inspections take up a considerable amount of time to perform and are susceptible to human error. To mitigate this, utilising image classification for detecting defects is proposed, leveraging transfer learning and knowledge distillation within MATLAB to develop an efficient and deployable model. Transfer learning is applied to a ResNet-50 model, adapting it to classify aircraft defects using a curated dataset. This fine-tuned model is then utilised as a teacher in the knowledge distillation process, where a compact SqueezeNet model (the student) learns from both hard and soft labels to replicate its performance while significantly reducing computational demands. This allows for optimising deep-learning models for deployment on smaller hardware, making the student model suitable for use on an Unmanned Aircraft System (UAS) to filter out images that do not contain a defect, reducing workload for ground personnel. The proposed method offers a solution for improving the efficiency and accuracy of defect detection during a general visual inspection in the aviation industry. Targeted defects here are damaged_skin, missing_or_damaged_rivets, and panel_missing alongside a class denoting no_defect. The knowledge-distilled SqueezeNet model achieves 95.37% validation accuracy and 90.72% inference accuracy, with a 96.9% reduction in model size compared to ResNet-50. The teacher model has a size of 85.77 MB, while the student model is significantly smaller at 2.66 MB, making it ideal for deployment on embedded systems with limited resources.
飞机目视检查是航空工业维修人员例行程序的重要组成部分。然而,这些检查需要花费相当多的时间来执行,并且容易受到人为错误的影响。为了缓解这一点,提出利用图像分类来检测缺陷,利用MATLAB中的迁移学习和知识蒸馏来开发一个高效且可部署的模型。将迁移学习应用于ResNet-50模型,使其适应使用精心设计的数据集对飞机缺陷进行分类。然后,这个微调模型被用作知识蒸馏过程中的老师,在这个过程中,紧凑的SqueezeNet模型(学生)从硬标签和软标签中学习,以复制其性能,同时显着降低计算需求。这允许优化深度学习模型,以便在较小的硬件上部署,使学生模型适合在无人驾驶飞机系统(UAS)上使用,以过滤掉不包含缺陷的图像,减少地面人员的工作量。该方法为提高航空工业中目测缺陷检测的效率和准确性提供了一种解决方案。这里的目标缺陷是damaged_skin, missing_or_damaged_rivets,和panel_missing以及一个表示no_defect的类。经过知识提炼的SqueezeNet模型验证准确率为95.37%,推理准确率为90.72%,与ResNet-50相比,模型大小减少了96.9%。教师模型的大小为85.77 MB,而学生模型的大小要小得多,只有2.66 MB,这使得它非常适合在资源有限的嵌入式系统上部署。
{"title":"Implementation of knowledge distillation for onboard defect detection on an Unmanned Aircraft System for light aircraft general visual inspections","authors":"Luke Connolly ,&nbsp;James Garland ,&nbsp;Diarmuid O’Gorman ,&nbsp;Edmond F. Tobin","doi":"10.1016/j.mlwa.2025.100782","DOIUrl":"10.1016/j.mlwa.2025.100782","url":null,"abstract":"<div><div>Visual inspections of aircraft are a vital part of routine procedures for maintenance personnel in the aviation industry. However, these inspections take up a considerable amount of time to perform and are susceptible to human error. To mitigate this, utilising image classification for detecting defects is proposed, leveraging transfer learning and knowledge distillation within MATLAB to develop an efficient and deployable model. Transfer learning is applied to a ResNet-50 model, adapting it to classify aircraft defects using a curated dataset. This fine-tuned model is then utilised as a teacher in the knowledge distillation process, where a compact SqueezeNet model (the student) learns from both hard and soft labels to replicate its performance while significantly reducing computational demands. This allows for optimising deep-learning models for deployment on smaller hardware, making the student model suitable for use on an Unmanned Aircraft System (UAS) to filter out images that do not contain a defect, reducing workload for ground personnel. The proposed method offers a solution for improving the efficiency and accuracy of defect detection during a general visual inspection in the aviation industry. Targeted defects here are <em>damaged_skin</em>, <em>missing_or_damaged_rivets</em>, and <em>panel_missing</em> alongside a class denoting <em>no_defect</em>. The knowledge-distilled SqueezeNet model achieves 95.37% validation accuracy and 90.72% inference accuracy, with a 96.9% reduction in model size compared to ResNet-50. The teacher model has a size of 85.77 MB, while the student model is significantly smaller at 2.66 MB, making it ideal for deployment on embedded systems with limited resources.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100782"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative framework for multi-horizon time series forecasting: Neural networks with adaptive preprocessing 多视界时间序列预测的比较框架:自适应预处理的神经网络
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-11-01 DOI: 10.1016/j.mlwa.2025.100781
Ana Lazcano , Julio E. Sandubete , Miguel A. Jaramillo-Morán
Accurate multi-horizon time series forecasting remains a major challenge in predictive modeling due to cumulative error propagation over extended horizons. This study proposes a unified and reproducible framework that integrates adaptive signal decomposition with neural network architectures under a Multiple-Input Multiple-Output (MIMO) strategy, effectively removing recursive dependencies and improving training stability. Three representative neural models Multilayer Perceptron (MLP), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM) are systematically combined with both classical and adaptive preprocessing techniques, namely trend–fluctuation separation, Empirical Mode Decomposition (EMD), Variational Mode Decomposition (VMD), and Empirical Wavelet Transform (EWT). The framework is validated on three economic and energy-related datasets (electricity demand, natural gas prices, and CO₂ emission allowances), generating forecasts up to ten steps ahead and evaluated through RMSE, MAPE, and R² metrics. Experimental results show that adaptive decomposition, particularly VMD and EMD, yield the highest accuracy and stability across prediction horizons, while EWT provides consistent intermediate improvements and classical trend-based methods offer only marginal benefits. Moreover, computational analysis demonstrates that the proposed approach remains lightweight and efficient, with training and inference times suitable for real-world deployment. Overall, the findings confirm that coupling adaptive preprocessing with MIMO-based neural forecasting enhances accuracy, robustness, and interpretability without increasing architectural complexity, establishing a practical foundation for multi-horizon forecasting in economic and financial domains.
由于多视界时间序列的累积误差在大视界上的传播,准确的多视界时间序列预测仍然是预测建模的主要挑战。本研究提出了一个统一且可重复的框架,该框架将自适应信号分解与多输入多输出(MIMO)策略下的神经网络架构相结合,有效地消除了递归依赖,提高了训练稳定性。本文将多层感知器(MLP)、长短期记忆(LSTM)和双向LSTM (BiLSTM)三种具有代表性的神经模型系统地结合了趋势-波动分离、经验模态分解(EMD)、变分模态分解(VMD)和经验小波变换(EWT)等经典预处理技术和自适应预处理技术。该框架在三个经济和能源相关数据集(电力需求、天然气价格和CO₂排放配额)上进行验证,生成最多提前10步的预测,并通过RMSE、MAPE和R²指标进行评估。实验结果表明,自适应分解,特别是VMD和EMD,在预测范围内具有最高的准确性和稳定性,而EWT提供了一致的中间改进,而经典的基于趋势的方法只能提供边际效益。此外,计算分析表明,所提出的方法仍然是轻量级和高效的,训练和推理时间适合实际部署。总体而言,研究结果证实,将自适应预处理与基于mimo的神经预测相结合,在不增加体系结构复杂性的情况下,提高了预测的准确性、鲁棒性和可解释性,为经济和金融领域的多视界预测奠定了实践基础。
{"title":"A comparative framework for multi-horizon time series forecasting: Neural networks with adaptive preprocessing","authors":"Ana Lazcano ,&nbsp;Julio E. Sandubete ,&nbsp;Miguel A. Jaramillo-Morán","doi":"10.1016/j.mlwa.2025.100781","DOIUrl":"10.1016/j.mlwa.2025.100781","url":null,"abstract":"<div><div>Accurate multi-horizon time series forecasting remains a major challenge in predictive modeling due to cumulative error propagation over extended horizons. This study proposes a unified and reproducible framework that integrates adaptive signal decomposition with neural network architectures under a Multiple-Input Multiple-Output (MIMO) strategy, effectively removing recursive dependencies and improving training stability. Three representative neural models Multilayer Perceptron (MLP), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM) are systematically combined with both classical and adaptive preprocessing techniques, namely trend–fluctuation separation, Empirical Mode Decomposition (EMD), Variational Mode Decomposition (VMD), and Empirical Wavelet Transform (EWT). The framework is validated on three economic and energy-related datasets (electricity demand, natural gas prices, and CO₂ emission allowances), generating forecasts up to ten steps ahead and evaluated through RMSE, MAPE, and R² metrics. Experimental results show that adaptive decomposition, particularly VMD and EMD, yield the highest accuracy and stability across prediction horizons, while EWT provides consistent intermediate improvements and classical trend-based methods offer only marginal benefits. Moreover, computational analysis demonstrates that the proposed approach remains lightweight and efficient, with training and inference times suitable for real-world deployment. Overall, the findings confirm that coupling adaptive preprocessing with MIMO-based neural forecasting enhances accuracy, robustness, and interpretability without increasing architectural complexity, establishing a practical foundation for multi-horizon forecasting in economic and financial domains.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100781"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning-based 3D reconstruction of dentate nuclei in Friedreich’s ataxia from T2*weighted MR images 基于深度学习的基于T2*加权MR图像的friedrich共济失调齿状核三维重建
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-11-07 DOI: 10.1016/j.mlwa.2025.100790
Trushal Sardhara , Ravi Dadsena , Roland C. Aydin , Ralf-Dieter Hilgers , Leon Horn , Jörg B. Schulz , Kathrin Reetz , Sandro Romanzetti , Imis Dogan
Dentate nucleus (DN) degeneration is a key neuropathological feature in Friedreich’s ataxia (FRDA), and its accurate quantification is critical for understanding disease progression. However, its visualization and volumetry require iron-sensitive imaging techniques and time-consuming segmentation procedures, posing challenges for conventional ML approaches due to small datasets typical of rare diseases. We present a transfer learning–based machine learning pipeline for automated DN segmentation that directly uses standard T2*-weighted Magnetic Resonance Imaging (MRI), which highlights the DN without additional processing, and is designed to perform robustly with limited annotated data. Using 38 manually labeled subjects (18 FRDA, 20 controls), the model was validated via five-fold cross-validation and an independent hold-out test set, achieving Dice scores of 0.81–0.87 and outperforming classical atlas-based methods. Pretraining improved performance by ∼10% in patients and >5% in controls. Applied to 181 longitudinal scans from 33 FRDA patients and 33 controls, the model revealed significantly reduced DN volumes in FRDA, with reductions correlating with disease duration and clinical severity over time. Our approach provides a scalable and reproducible segmentation framework, requiring minimal annotated data and no preprocessing, while demonstrating robust performance across cross-validation and independent testing. Additionally, it enables the first longitudinal volumetric analysis of DN in FRDA using standard T2*-weighted MRI, demonstrating its practical utility for monitoring neurodegenerative changes. Overall, this work illustrates how transfer learning can overcome data scarcity in rare diseases and provides a robust methodology for automated MRI segmentation in both research and clinical applications.
齿状核(DN)变性是弗里德里希共济失调(FRDA)的一个重要神经病理特征,其准确定量对了解疾病进展至关重要。然而,其可视化和体积测量需要铁敏感成像技术和耗时的分割程序,由于罕见疾病典型的小数据集,对传统的ML方法提出了挑战。我们提出了一种基于迁移学习的机器学习管道,用于自动分割DN,该管道直接使用标准的T2*加权磁共振成像(MRI),它突出显示DN而无需额外处理,并且设计用于在有限的注释数据下稳健地执行。使用38名人工标记的受试者(18名FRDA, 20名对照),通过五重交叉验证和独立的保留测试集对模型进行验证,获得了0.81-0.87的Dice分数,优于经典的基于地图集的方法。预训练使患者和对照组的表现分别提高了10%和5%。应用于33名FRDA患者和33名对照组的181次纵向扫描,该模型显示FRDA中DN体积显著减少,随着时间的推移,这种减少与疾病持续时间和临床严重程度相关。我们的方法提供了一个可扩展和可重复的分割框架,需要最少的注释数据,不需要预处理,同时在交叉验证和独立测试中展示了强大的性能。此外,它可以使用标准T2*加权MRI首次对FRDA中的DN进行纵向体积分析,证明其在监测神经退行性变化方面的实用价值。总的来说,这项工作说明了迁移学习如何克服罕见疾病的数据稀缺性,并为研究和临床应用中的自动MRI分割提供了一种强大的方法。
{"title":"Deep learning-based 3D reconstruction of dentate nuclei in Friedreich’s ataxia from T2*weighted MR images","authors":"Trushal Sardhara ,&nbsp;Ravi Dadsena ,&nbsp;Roland C. Aydin ,&nbsp;Ralf-Dieter Hilgers ,&nbsp;Leon Horn ,&nbsp;Jörg B. Schulz ,&nbsp;Kathrin Reetz ,&nbsp;Sandro Romanzetti ,&nbsp;Imis Dogan","doi":"10.1016/j.mlwa.2025.100790","DOIUrl":"10.1016/j.mlwa.2025.100790","url":null,"abstract":"<div><div>Dentate nucleus (DN) degeneration is a key neuropathological feature in Friedreich’s ataxia (FRDA), and its accurate quantification is critical for understanding disease progression. However, its visualization and volumetry require iron-sensitive imaging techniques and time-consuming segmentation procedures, posing challenges for conventional ML approaches due to small datasets typical of rare diseases. We present a transfer learning–based machine learning pipeline for automated DN segmentation that directly uses standard T2*-weighted Magnetic Resonance Imaging (MRI), which highlights the DN without additional processing, and is designed to perform robustly with limited annotated data. Using 38 manually labeled subjects (18 FRDA, 20 controls), the model was validated via five-fold cross-validation and an independent hold-out test set, achieving Dice scores of 0.81–0.87 and outperforming classical atlas-based methods. Pretraining improved performance by ∼10% in patients and &gt;5% in controls. Applied to 181 longitudinal scans from 33 FRDA patients and 33 controls, the model revealed significantly reduced DN volumes in FRDA, with reductions correlating with disease duration and clinical severity over time. Our approach provides a scalable and reproducible segmentation framework, requiring minimal annotated data and no preprocessing, while demonstrating robust performance across cross-validation and independent testing. Additionally, it enables the first longitudinal volumetric analysis of DN in FRDA using standard T2*-weighted MRI, demonstrating its practical utility for monitoring neurodegenerative changes. Overall, this work illustrates how transfer learning can overcome data scarcity in rare diseases and provides a robust methodology for automated MRI segmentation in both research and clinical applications.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100790"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phase-based physics-informed sequence-to-sequence neural network for typhoon intensity and trajectory prediction 基于相位的物理信息序列对序列神经网络用于台风强度和轨迹预测
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-11-01 DOI: 10.1016/j.mlwa.2025.100774
Ying-Yi Hong, Daryll John Medina
Typhoons are among the most destructive natural disasters in the Asia-Pacific region, causing extensive damage to infrastructure, agriculture, and human lives. Accurate and timely prediction of a typhoon's intensity and trajectory is crucial for effective disaster response and risk mitigation. However, existing forecasting models often struggle to capture the highly dynamic and heterogeneous nature of a typhoon’s lifecycle, leading to inaccuracies during periods of rapid intensification or abrupt trajectory shifts. This study proposes a novel phase-based, physics-informed sequence-to-sequence neural network to address these limitations. The framework decomposes the forecasting task into distinct lifecycle phases—formation, intensification, and dissipation—using specialized deep learning models for each. A key innovation is the integration of physical constraints derived from the Navier–Stokes equations directly into the model’s loss function, ensuring that predictions are both data-driven and physically consistent. Using the Digital Typhoon Dataset, the proposed method achieves a mean absolute error (MAE) of 17.83 km for trajectory prediction and 7.28 kt for intensity forecasting, representing a significant improvement over existing approaches. Moreover, the proposed method also requires only 3 h of historical data to generate forecasts, compared to more (e.g., 48) hours needed by entire-time series approaches, providing earlier inference critical for disaster preparedness and protection of infrastructure such as offshore wind farms.
台风是亚太地区最具破坏性的自然灾害之一,对基础设施、农业和人类生命造成广泛破坏。准确和及时地预测台风的强度和轨迹对于有效地应对灾害和减轻风险至关重要。然而,现有的预报模型往往难以捕捉台风生命周期的高度动态和异质性,导致在快速增强或轨迹突变期间不准确。本研究提出了一种新的基于相位的、物理信息的序列到序列神经网络来解决这些限制。该框架将预测任务分解为不同的生命周期阶段——形成、强化和消散——每个阶段都使用专门的深度学习模型。一个关键的创新是将来自Navier-Stokes方程的物理约束直接集成到模型的损失函数中,确保预测既受数据驱动又具有物理一致性。利用数字台风数据集,该方法的轨迹预测平均绝对误差(MAE)为17.83 km,强度预测平均绝对误差为7.28 kt,与现有方法相比有显著提高。此外,与整个时间序列方法需要更多(例如48小时)的时间序列方法相比,所提出的方法只需要3小时的历史数据来生成预测,这为防灾和保护海上风电场等基础设施提供了至关重要的早期推断。
{"title":"Phase-based physics-informed sequence-to-sequence neural network for typhoon intensity and trajectory prediction","authors":"Ying-Yi Hong,&nbsp;Daryll John Medina","doi":"10.1016/j.mlwa.2025.100774","DOIUrl":"10.1016/j.mlwa.2025.100774","url":null,"abstract":"<div><div>Typhoons are among the most destructive natural disasters in the Asia-Pacific region, causing extensive damage to infrastructure, agriculture, and human lives. Accurate and timely prediction of a typhoon's intensity and trajectory is crucial for effective disaster response and risk mitigation. However, existing forecasting models often struggle to capture the highly dynamic and heterogeneous nature of a typhoon’s lifecycle, leading to inaccuracies during periods of rapid intensification or abrupt trajectory shifts. This study proposes a novel phase-based, physics-informed sequence-to-sequence neural network to address these limitations. The framework decomposes the forecasting task into distinct lifecycle phases—formation, intensification, and dissipation—using specialized deep learning models for each. A key innovation is the integration of physical constraints derived from the Navier–Stokes equations directly into the model’s loss function, ensuring that predictions are both data-driven and physically consistent. Using the Digital Typhoon Dataset, the proposed method achieves a mean absolute error (MAE) of 17.83 km for trajectory prediction and 7.28 kt for intensity forecasting, representing a significant improvement over existing approaches. Moreover, the proposed method also requires only 3 h of historical data to generate forecasts, compared to more (e.g., 48) hours needed by entire-time series approaches, providing earlier inference critical for disaster preparedness and protection of infrastructure such as offshore wind farms.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100774"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conformal validation: A deferral policy using uncertainty quantification with a human-in-the-loop for model validation 保形验证:一种使用不确定性量化和人在环模型验证的延迟策略
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-09-15 DOI: 10.1016/j.mlwa.2025.100733
Paul Horton, Alexandru Florea, Brandon Stringfield
Validating performance is a key challenge facing the adoption of machine learning models in high risk applications. Current validation methods assess performance marginally over the entire testing dataset, which can fail to identify regions in the distribution with insufficient performance. In this paper, we propose Conformal Validation, a systems-based approach with a calibrated form of uncertainty quantification using a conformal prediction framework as a part of the validation process to reduce performance gaps. Specifically, the policy defers a subset of observations for which the predictive model is most uncertain and provides a human with informative prediction sets to make the ancillary decision. We evaluate this policy on an image classification task where images are distorted with varying levels of gaussian blur for a quantifiable measure of added difficulty. The model is compared to human performance on the most difficult observations, i.e., those where the model is most uncertain, to simulate the scenario when a human is the alternative decision-maker. We evaluate performance on three arms: the model independently, humans with access to a set of classes the model is most confident in, and humans independently. The deferral policy is simple to understand, applicable to any predictive model, and easy to implement while, in this case, keeping humans in the loop for improved trustworthiness. Conformal Validation incorporates a risk assessment that is conditioned on the prediction set length and can be tuned to the needs of the application.
验证性能是在高风险应用中采用机器学习模型所面临的关键挑战。当前的验证方法在整个测试数据集上略微评估性能,这可能无法识别分布中性能不足的区域。在本文中,我们提出了保形验证,这是一种基于系统的方法,具有校准形式的不确定性量化,使用保形预测框架作为验证过程的一部分,以减少性能差距。具体来说,该策略延迟了预测模型最不确定的观测子集,并为人类提供了信息丰富的预测集,以做出辅助决策。我们在一个图像分类任务上评估这个策略,其中图像被不同程度的高斯模糊扭曲,以衡量增加的难度。将该模型与人类在最困难的观察(即模型最不确定的观察)中的表现进行比较,以模拟人类作为替代决策者的情景。我们在三个方面评估性能:独立的模型,可以访问模型最自信的一组类的人,以及独立的人。延迟策略易于理解,适用于任何预测模型,并且易于实现,同时在这种情况下,将人类留在循环中以提高可信度。适形验证结合了一个风险评估,该评估以预测集长度为条件,并且可以根据应用程序的需要进行调整。
{"title":"Conformal validation: A deferral policy using uncertainty quantification with a human-in-the-loop for model validation","authors":"Paul Horton,&nbsp;Alexandru Florea,&nbsp;Brandon Stringfield","doi":"10.1016/j.mlwa.2025.100733","DOIUrl":"10.1016/j.mlwa.2025.100733","url":null,"abstract":"<div><div>Validating performance is a key challenge facing the adoption of machine learning models in high risk applications. Current validation methods assess performance marginally over the entire testing dataset, which can fail to identify regions in the distribution with insufficient performance. In this paper, we propose Conformal Validation, a systems-based approach with a calibrated form of uncertainty quantification using a conformal prediction framework as a part of the validation process to reduce performance gaps. Specifically, the policy defers a subset of observations for which the predictive model is most uncertain and provides a human with informative prediction sets to make the ancillary decision. We evaluate this policy on an image classification task where images are distorted with varying levels of gaussian blur for a quantifiable measure of added difficulty. The model is compared to human performance on the most difficult observations, i.e., those where the model is most uncertain, to simulate the scenario when a human is the alternative decision-maker. We evaluate performance on three arms: the model independently, humans with access to a set of classes the model is most confident in, and humans independently. The deferral policy is simple to understand, applicable to any predictive model, and easy to implement while, in this case, keeping humans in the loop for improved trustworthiness. Conformal Validation incorporates a risk assessment that is conditioned on the prediction set length and can be tuned to the needs of the application.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100733"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145108028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine learning with applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1