首页 > 最新文献

Machine learning with applications最新文献

英文 中文
Intrinsic Dimension Estimating Autoencoder (IDEA) using CancelOut layer and a projected loss 使用CancelOut层和投影损失的自编码器(IDEA)固有维估计
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-01-20 DOI: 10.1016/j.mlwa.2026.100850
Antoine Oriou , Philipp Krah , Julian Koellermeier
This paper introduces the Intrinsic Dimension Estimating Autoencoder (IDEA), which identifies the underlying intrinsic dimension of a wide range of datasets whose samples lie on either linear or nonlinear manifolds. Beyond estimating the intrinsic dimension, IDEA is also able to reconstruct the original dataset after projecting it onto the corresponding latent space, which is structured using re-weighted double CancelOut layers. Our key contribution is the introduction of the projected reconstruction loss term, guiding the training of the model by continuously assessing the reconstruction quality under the removal of an additional latent dimension.
We first assess the performance of IDEA on a series of theoretical benchmarks to validate its robustness. These experiments allow us to test its reconstruction ability and compare its performance with state-of-the-art intrinsic dimension estimators. The benchmarks show good accuracy and high versatility of our approach. Subsequently, we apply our model to data generated from the numerical solution of a vertically resolved one-dimensional free-surface flow, following a pointwise discretization of the vertical velocity profile in the horizontal direction, vertical direction, and time. IDEA succeeds in estimating the dataset’s intrinsic dimension and then reconstructs the original solution by working directly within the projection space identified by the network.
本文介绍了一种固有维数估计自编码器(IDEA),它可以识别大量样本位于线性或非线性流形上的数据集的内在维数。除了估计固有维数之外,IDEA还能够将原始数据集投影到相应的潜在空间后重建原始数据集,该潜在空间使用重新加权的双CancelOut层进行结构化。我们的主要贡献是引入了预测重建损失项,通过在去除额外潜在维度的情况下持续评估重建质量来指导模型的训练。我们首先评估了IDEA在一系列理论基准上的表现,以验证其稳健性。这些实验使我们能够测试其重建能力,并将其性能与最先进的内维估计器进行比较。基准测试表明,我们的方法具有很高的准确性和通用性。随后,我们将我们的模型应用于垂直分解的一维自由表面流的数值解生成的数据,随后在水平方向、垂直方向和时间上对垂直速度剖面进行点向离散化。IDEA成功地估计了数据集的内在维度,然后通过直接在网络识别的投影空间内工作来重建原始解。
{"title":"Intrinsic Dimension Estimating Autoencoder (IDEA) using CancelOut layer and a projected loss","authors":"Antoine Oriou ,&nbsp;Philipp Krah ,&nbsp;Julian Koellermeier","doi":"10.1016/j.mlwa.2026.100850","DOIUrl":"10.1016/j.mlwa.2026.100850","url":null,"abstract":"<div><div>This paper introduces the Intrinsic Dimension Estimating Autoencoder (IDEA), which identifies the underlying intrinsic dimension of a wide range of datasets whose samples lie on either linear or nonlinear manifolds. Beyond estimating the intrinsic dimension, IDEA is also able to reconstruct the original dataset after projecting it onto the corresponding latent space, which is structured using re-weighted double CancelOut layers. Our key contribution is the introduction of the <em>projected reconstruction loss</em> term, guiding the training of the model by continuously assessing the reconstruction quality under the removal of an additional latent dimension.</div><div>We first assess the performance of IDEA on a series of theoretical benchmarks to validate its robustness. These experiments allow us to test its reconstruction ability and compare its performance with state-of-the-art intrinsic dimension estimators. The benchmarks show good accuracy and high versatility of our approach. Subsequently, we apply our model to data generated from the numerical solution of a vertically resolved one-dimensional free-surface flow, following a pointwise discretization of the vertical velocity profile in the horizontal direction, vertical direction, and time. IDEA succeeds in estimating the dataset’s intrinsic dimension and then reconstructs the original solution by working directly within the projection space identified by the network.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100850"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of the remaining charge retention time of an electric vehicle battery 电动汽车电池剩余电荷保持时间的估计
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-05 DOI: 10.1016/j.mlwa.2025.100813
Chourik Fousseni , Martin Otis , Khaled Ziane
Accurately estimating the remaining driving time (RDT) of an electric vehicle (EV) battery is essential for optimizing energy management and enhancing user experience. However, traditional estimation methods do not adequately account for the influence of temperature, driving characteristics and vehicle driving time, leading to less accurate predictions and suboptimal range management. To address these limitations, this study presents a method for estimating the remaining charge retention time by integrating temperature and driving characteristics, which refines predictions and improves model reliability. Furthermore, data from the National Big Data Alliance for New Energy Vehicles (NDANEV) were employed to develop a predictive model based on machine learning (ML) models. The different ML models compared in this study are Linear Regression, LSTM, RF, Prophet, LightGBM, and XGBoost. The model performance was evaluated using mean absolute error (MAE), root mean square error (RMSE), the coefficient of determination (R2) and the prediction runtime to assess the prediction accuracy. The results show that the R2 values for Prophet, Random Forest, LSTM, XGBoost, and LightGBM are 0.91, 0.94, 0.95, 0.94, and 0.94 respectively. This suggests that XGBoost outperforms the other models, providing the most accurate estimate of the remaining driving time. In addition, the result confirms that considering driving characteristics and ambient temperature improves the reliability and robustness of estimations. These advancements contribute to more efficient energy management and optimized charging strategies.
准确估算电动汽车电池的剩余行驶时间(RDT)对于优化能源管理和提升用户体验至关重要。然而,传统的估算方法没有充分考虑温度、驾驶特性和车辆行驶时间的影响,导致预测精度较低,里程管理不理想。为了解决这些限制,本研究提出了一种通过集成温度和驱动特性来估计剩余电荷保留时间的方法,该方法可以改进预测并提高模型的可靠性。此外,利用国家新能源汽车大数据联盟(NDANEV)的数据开发了基于机器学习(ML)模型的预测模型。本研究中比较的不同机器学习模型有线性回归、LSTM、RF、Prophet、LightGBM和XGBoost。采用平均绝对误差(MAE)、均方根误差(RMSE)、决定系数(R2)和预测运行时间对模型性能进行评价,以评估预测精度。结果表明,Prophet、Random Forest、LSTM、XGBoost和LightGBM的R2值分别为0.91、0.94、0.95、0.94和0.94。这表明XGBoost优于其他模型,提供了最准确的剩余驾驶时间估计。此外,结果证实,考虑驾驶特性和环境温度提高了估计的可靠性和鲁棒性。这些进步有助于更有效的能源管理和优化充电策略。
{"title":"Estimation of the remaining charge retention time of an electric vehicle battery","authors":"Chourik Fousseni ,&nbsp;Martin Otis ,&nbsp;Khaled Ziane","doi":"10.1016/j.mlwa.2025.100813","DOIUrl":"10.1016/j.mlwa.2025.100813","url":null,"abstract":"<div><div>Accurately estimating the remaining driving time (RDT) of an electric vehicle (EV) battery is essential for optimizing energy management and enhancing user experience. However, traditional estimation methods do not adequately account for the influence of temperature, driving characteristics and vehicle driving time, leading to less accurate predictions and suboptimal range management. To address these limitations, this study presents a method for estimating the remaining charge retention time by integrating temperature and driving characteristics, which refines predictions and improves model reliability. Furthermore, data from the National Big Data Alliance for New Energy Vehicles (NDANEV) were employed to develop a predictive model based on machine learning (ML) models. The different ML models compared in this study are Linear Regression, LSTM, RF, Prophet, LightGBM, and XGBoost. The model performance was evaluated using mean absolute error (MAE), root mean square error (RMSE), the coefficient of determination (<span><math><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></math></span>) and the prediction runtime to assess the prediction accuracy. The results show that the <span><math><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></math></span> values for Prophet, Random Forest, LSTM, XGBoost, and LightGBM are 0.91, 0.94, 0.95, 0.94, and 0.94 respectively. This suggests that XGBoost outperforms the other models, providing the most accurate estimate of the remaining driving time. In addition, the result confirms that considering driving characteristics and ambient temperature improves the reliability and robustness of estimations. These advancements contribute to more efficient energy management and optimized charging strategies.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100813"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predictive modeling of error categories in English-Slovak machine translation using automatic evaluation metrics 基于自动评价指标的英语-斯洛伐克语机器翻译错误分类预测建模
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-04 DOI: 10.1016/j.mlwa.2025.100810
Dasa Munkova , Lucia Benkova , Michal Munk , Lubomir Benko , Petr Hajek
This paper presents a language-specific adaptation for automatic identification of machine translation (MT) errors using a comprehensive set of open-source evaluation metrics. The approach focuses on the English–Slovak translation direction, addressing challenges posed by Slovak’s highly inflectional and low-resource nature. Predictive models were developed for five key error categories (Predication, Modal and communication sentence framework, Syntactic-semantic correlativeness, Compound/complex sentences, and Lexical semantics) by employing forward stepwise regression and validated through bootstrapping techniques. The models estimate the probability of error occurrence in MT segments, demonstrating stable and comparable performance across training and test datasets, as measured by Somers’ D. While human expert evaluation remains essential for verifying flagged segments, the proposed approach significantly reduces evaluator workload by prioritizing likely error-containing segments. This methodology offers a scalable and adaptable framework for MT quality assessment across languages and text styles, with potential to improve automated translation evaluation and post-editing processes.
本文提出了一种针对机器翻译(MT)错误自动识别的特定语言适配,使用了一套全面的开源评估指标。该方法侧重于英语-斯洛伐克语翻译方向,解决斯洛伐克语高度屈折和低资源性质所带来的挑战。采用前向逐步回归方法,对预测、情态与交际句框架、句法语义相关性、复合句/复合句、词汇语义等5个关键错误类别建立预测模型,并通过自举技术进行验证。该模型估计了MT片段中错误发生的概率,在训练和测试数据集上展示了稳定和可比较的性能,正如Somers ' d所测量的那样。尽管人类专家评估对于验证标记的片段仍然至关重要,但所提出的方法通过优先考虑可能包含错误的片段,大大减少了评估人员的工作量。该方法为跨语言和文本风格的翻译质量评估提供了一个可扩展和适应性强的框架,具有改进自动翻译评估和后期编辑过程的潜力。
{"title":"Predictive modeling of error categories in English-Slovak machine translation using automatic evaluation metrics","authors":"Dasa Munkova ,&nbsp;Lucia Benkova ,&nbsp;Michal Munk ,&nbsp;Lubomir Benko ,&nbsp;Petr Hajek","doi":"10.1016/j.mlwa.2025.100810","DOIUrl":"10.1016/j.mlwa.2025.100810","url":null,"abstract":"<div><div>This paper presents a language-specific adaptation for automatic identification of machine translation (MT) errors using a comprehensive set of open-source evaluation metrics. The approach focuses on the English–Slovak translation direction, addressing challenges posed by Slovak’s highly inflectional and low-resource nature. Predictive models were developed for five key error categories (Predication, Modal and communication sentence framework, Syntactic-semantic correlativeness, Compound/complex sentences, and Lexical semantics) by employing forward stepwise regression and validated through bootstrapping techniques. The models estimate the probability of error occurrence in MT segments, demonstrating stable and comparable performance across training and test datasets, as measured by Somers’ D. While human expert evaluation remains essential for verifying flagged segments, the proposed approach significantly reduces evaluator workload by prioritizing likely error-containing segments. This methodology offers a scalable and adaptable framework for MT quality assessment across languages and text styles, with potential to improve automated translation evaluation and post-editing processes.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100810"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging multimodal large language models to extract mechanistic insights from biomedical visuals: A case study on COVID-19 and neurodegenerative diseases 利用多模态大语言模型从生物医学视觉中提取机制见解:COVID-19和神经退行性疾病的案例研究
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-06 DOI: 10.1016/j.mlwa.2025.100816
Elizaveta Popova , Marc Jacobs , Martin Hofmann-Apitius , Negin Sadat Babaiha

Background

The COVID-19 pandemic has intensified concerns about its long-term neurological impact, with evidence linking SARS-CoV-2 infection to neurodegenerative diseases (NDDs) such as Alzheimer’s (AD) and Parkinson’s (PD). Patients with these conditions face higher risk of severe COVID-19 outcomes and may experience accelerated cognitive or motor decline following infection. Proposed mechanisms—including neuroinflammation, blood–brain barrier (BBB) disruption, and abnormal protein aggregation—closely mirror core features of neurodegenerative pathology. However, current knowledge remains fragmented across text, figures, and pathway diagrams, limiting integration into computational models that could reveal systemic patterns.

Results

To address this gap, we applied GPT-4 Omni (GPT-4o), a multimodal large language model (LLM), to extract mechanistic insights from biomedical figures. Over 10,000 images were retrieved through targeted searches on COVID-19 and neurodegeneration; after automated and manual filtering, a curated subset was analyzed. GPT-4o extracted biological relationships as semantic triples, grouped into six mechanistic categories—including microglial activation and barrier disruption—using ontology-guided similarity and assembled into a Neo4j knowledge graph (KG). Accuracy was evaluated against a gold-standard dataset of expert-annotated images using Biomedical Bidirectional Encoder Representations from Transformers (BioBERT)–based semantic matching. This evaluation enabled prompt tuning, threshold optimization, and hyperparameter assessment. Results show that GPT-4o successfully recovers both established and novel mechanisms, yielding interpretable outputs that illuminate complex biological links between SARS-CoV-2 and neurodegeneration.

Conclusions

This study demonstrates the potential of multimodal LLMs to mine biomedical visual data at scale. By complementing text mining and integrating figure-derived knowledge, our framework advances understanding of COVID-19–related neurodegeneration and supports future translational research.
COVID-19大流行加剧了人们对其长期神经系统影响的担忧,有证据表明SARS-CoV-2感染与阿尔茨海默病(AD)和帕金森病(PD)等神经退行性疾病(ndd)有关。患有这些疾病的患者面临COVID-19严重后果的更高风险,并可能在感染后加速认知或运动能力下降。提出的机制-包括神经炎症,血脑屏障(BBB)破坏和异常蛋白质聚集-密切反映了神经退行性病理的核心特征。然而,目前的知识仍然分散在文本、图形和路径图中,限制了集成到可以揭示系统模式的计算模型中。为了解决这一差距,我们应用了多模态大语言模型(LLM) GPT-4 Omni (gpt - 40)来从生物医学数据中提取机制见解。通过针对COVID-19和神经退行性疾病的定向搜索检索了1万多张图像;在自动和手动过滤之后,分析了一个精心挑选的子集。gpt - 40将生物关系提取为语义三元组,使用本体引导的相似性将其分为六个机制类别,包括小胶质细胞激活和屏障破坏,并组装成Neo4j知识图(KG)。使用基于变形金刚的生物医学双向编码器表示(BioBERT)的语义匹配,对专家注释图像的金标准数据集进行准确性评估。该评估支持即时调优、阈值优化和超参数评估。结果表明,gpt - 40成功恢复了已建立的和新的机制,产生了可解释的输出,阐明了SARS-CoV-2与神经变性之间的复杂生物学联系。本研究证明了多模态llm在大规模挖掘生物医学视觉数据方面的潜力。通过补充文本挖掘和整合图形衍生知识,我们的框架促进了对covid -19相关神经变性的理解,并支持未来的转化研究。
{"title":"Leveraging multimodal large language models to extract mechanistic insights from biomedical visuals: A case study on COVID-19 and neurodegenerative diseases","authors":"Elizaveta Popova ,&nbsp;Marc Jacobs ,&nbsp;Martin Hofmann-Apitius ,&nbsp;Negin Sadat Babaiha","doi":"10.1016/j.mlwa.2025.100816","DOIUrl":"10.1016/j.mlwa.2025.100816","url":null,"abstract":"<div><h3>Background</h3><div>The COVID-19 pandemic has intensified concerns about its long-term neurological impact, with evidence linking SARS-CoV-2 infection to neurodegenerative diseases (NDDs) such as Alzheimer’s (AD) and Parkinson’s (PD). Patients with these conditions face higher risk of severe COVID-19 outcomes and may experience accelerated cognitive or motor decline following infection. Proposed mechanisms—including neuroinflammation, blood–brain barrier (BBB) disruption, and abnormal protein aggregation—closely mirror core features of neurodegenerative pathology. However, current knowledge remains fragmented across text, figures, and pathway diagrams, limiting integration into computational models that could reveal systemic patterns.</div></div><div><h3>Results</h3><div>To address this gap, we applied GPT-4 Omni (GPT-4o), a multimodal large language model (LLM), to extract mechanistic insights from biomedical figures. Over 10,000 images were retrieved through targeted searches on COVID-19 and neurodegeneration; after automated and manual filtering, a curated subset was analyzed. GPT-4o extracted biological relationships as semantic triples, grouped into six mechanistic categories—including microglial activation and barrier disruption—using ontology-guided similarity and assembled into a Neo4j knowledge graph (KG). Accuracy was evaluated against a gold-standard dataset of expert-annotated images using Biomedical Bidirectional Encoder Representations from Transformers (BioBERT)–based semantic matching. This evaluation enabled prompt tuning, threshold optimization, and hyperparameter assessment. Results show that GPT-4o successfully recovers both established and novel mechanisms, yielding interpretable outputs that illuminate complex biological links between SARS-CoV-2 and neurodegeneration.</div></div><div><h3>Conclusions</h3><div>This study demonstrates the potential of multimodal LLMs to mine biomedical visual data at scale. By complementing text mining and integrating figure-derived knowledge, our framework advances understanding of COVID-19–related neurodegeneration and supports future translational research.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100816"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Oncology data extraction with large language models from real-world breast cancer electronic health records in Spanish 从西班牙语的真实乳腺癌电子健康记录中使用大型语言模型提取肿瘤数据
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-01-05 DOI: 10.1016/j.mlwa.2026.100837
Julio Montes-Torres , Francisco J. Moreno-Barea , Leonardo Franco , Nuria Ribelles , Emilio Alba , José M. Jerez
The integration of Artificial Intelligence (AI) in healthcare systems has the potential to significantly enhance patient care and streamline clinical processes. This research investigates the utilisation of generative AI and large language models (LLMs) for oncological information extraction (IE) from Spanish real electronic health records (EHRs) to enhance clinical decision-making and research. We conducted a comparative analysis of GPT-4.5 and 11 state-of-the-art, locally executable LLM-based chatbots, including Llama 3.2, Mistral-Small 3.2, and Phi-4, to extract specific clinical entities from real EHR narratives. Our evaluation workflow aimed to assess the performance of these models in contexts with computational constraints, specifically targeting the extraction of breast cancer prognostic factors. Initial findings indicate that while open-source LLM models are improving, they are not yet equivalent to human specialists in terms of Named Entity Recognition (NER) accuracy. The language of the clinical records notably influences performance, revealing that smaller models particularly struggle with Spanish text. However, with careful model selection and output post-processing, Mistral-Small 3.2 achieved a detection F1 score of over 74.7% for critical TNM information. This study highlights significant potential for generative AI in clinical IE but underscores the need for ongoing improvements, particularly in handling linguistic diversity. Locally managed open source models are still far from performing like a human specialist, but addressing common model shortcomings can facilitate the integration of AI-driven solutions into public healthcare systems, thereby improving patient outcomes and fostering efficient data utilisation.
人工智能(AI)在医疗保健系统中的集成有可能显著提高患者护理和简化临床流程。本研究探讨了利用生成式人工智能和大型语言模型(llm)从西班牙真实电子健康记录(EHRs)中提取肿瘤信息(IE),以增强临床决策和研究。我们对GPT-4.5和11个最先进的、本地可执行的基于llm的聊天机器人(包括Llama 3.2、Mistral-Small 3.2和Phi-4)进行了比较分析,以从真实的电子病历叙述中提取特定的临床实体。我们的评估工作流程旨在评估这些模型在计算限制情况下的性能,特别是针对乳腺癌预后因素的提取。最初的研究结果表明,虽然开源LLM模型正在改进,但在命名实体识别(NER)的准确性方面,它们还不能与人类专家相提并论。临床记录的语言明显影响了表现,表明较小的模型在西班牙语文本方面尤其困难。然而,经过仔细的模型选择和输出后处理,Mistral-Small 3.2对关键TNM信息的检测F1得分超过74.7%。这项研究强调了生成式人工智能在临床IE中的巨大潜力,但也强调了持续改进的必要性,特别是在处理语言多样性方面。本地管理的开源模型仍远未达到人类专家的水平,但解决常见的模型缺陷可以促进将人工智能驱动的解决方案集成到公共医疗保健系统中,从而改善患者的治疗效果并促进有效的数据利用。
{"title":"Oncology data extraction with large language models from real-world breast cancer electronic health records in Spanish","authors":"Julio Montes-Torres ,&nbsp;Francisco J. Moreno-Barea ,&nbsp;Leonardo Franco ,&nbsp;Nuria Ribelles ,&nbsp;Emilio Alba ,&nbsp;José M. Jerez","doi":"10.1016/j.mlwa.2026.100837","DOIUrl":"10.1016/j.mlwa.2026.100837","url":null,"abstract":"<div><div>The integration of Artificial Intelligence (AI) in healthcare systems has the potential to significantly enhance patient care and streamline clinical processes. This research investigates the utilisation of generative AI and large language models (LLMs) for oncological information extraction (IE) from Spanish real electronic health records (EHRs) to enhance clinical decision-making and research. We conducted a comparative analysis of GPT-4.5 and 11 state-of-the-art, locally executable LLM-based chatbots, including Llama 3.2, Mistral-Small 3.2, and Phi-4, to extract specific clinical entities from real EHR narratives. Our evaluation workflow aimed to assess the performance of these models in contexts with computational constraints, specifically targeting the extraction of breast cancer prognostic factors. Initial findings indicate that while open-source LLM models are improving, they are not yet equivalent to human specialists in terms of Named Entity Recognition (NER) accuracy. The language of the clinical records notably influences performance, revealing that smaller models particularly struggle with Spanish text. However, with careful model selection and output post-processing, Mistral-Small 3.2 achieved a detection F1 score of over 74.7% for critical TNM information. This study highlights significant potential for generative AI in clinical IE but underscores the need for ongoing improvements, particularly in handling linguistic diversity. Locally managed open source models are still far from performing like a human specialist, but addressing common model shortcomings can facilitate the integration of AI-driven solutions into public healthcare systems, thereby improving patient outcomes and fostering efficient data utilisation.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100837"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A traffic-aware federated learning prediction framework with custom aggregation 具有自定义聚合的流量感知联邦学习预测框架
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-29 DOI: 10.1016/j.mlwa.2025.100829
Seerat Kaur, Sukhjit Singh Sehra, Darisuh Ebrahimi, Emad A. Mohammed
Reliable traffic predictions are essential for managing congestion, optimizing routes, improving commuter safety, and advancing the performance of intelligent transportation systems (ITS). However, existing centralized systems often lack adaptability to real-world traffic patterns and fail to capture spatio-temporal variability and client-level heterogeneity. These systems require large amounts of sensitive data to be collected on central servers, intensifying privacy risks. This study proposes a privacy-preserving Federated Learning (FL) framework for traffic flow and speed prediction (5 to 60 mins ahead) using non-independent and identically distributed (non-IID) traffic data. The objectives of this study are threefold: (1) design a client-aware custom FL aggregation strategy that accounts for traffic heterogeneity and client-specific dynamics, ignored in standard FL methods, (2) improve personalization by grouping clients based on real-world traffic pattern similarity via clustering-based approach and, (3) enhance convergence and predictive performance of global aggregation using dynamic, traffic-aware aggregation scores. The proposed framework designs a hybrid FL long-short-term memory (FedLSTM) model augmented with an attention mechanism to effectively model both temporal and spatial traffic variations across junctions, while ensuring that all raw data remains local. To improve learning under traffic diversity and imbalanced traffic distribution patterns, we propose a custom traffic-aware aggregation strategy that dynamically weighs client contributions based on six traffic-based metrics. Evaluations on clustered client partitions demonstrate that our custom aggregation consistently outperformed the baseline strategies across multiple evaluation metrics. These results highlight the effectiveness of integrating traffic-aware aggregation in enhancing the performance and generalization capability of FL-based traffic prediction frameworks.
可靠的交通预测对于管理拥堵、优化路线、提高通勤安全性和提高智能交通系统(ITS)的性能至关重要。然而,现有的集中式系统往往缺乏对现实世界交通模式的适应性,无法捕捉时空变化和客户级异质性。这些系统需要在中央服务器上收集大量敏感数据,从而加剧了隐私风险。本研究提出了一个隐私保护的联邦学习(FL)框架,用于使用非独立和同分布(非iid)交通数据进行交通流量和速度预测(提前5至60分钟)。本研究的目标有三个:(1)设计一个客户感知的自定义FL聚合策略,该策略考虑了标准FL方法中忽略的流量异质性和客户特定动态;(2)通过基于聚类的方法,根据真实交通模式的相似性对客户进行分组,从而提高个性化;(3)使用动态的、流量感知的聚合分数,增强全局聚合的收敛性和预测性能。该框架设计了一个混合FL长短期记忆(FedLSTM)模型,增强了注意机制,以有效地模拟跨路口的时空交通变化,同时确保所有原始数据保持本地。为了提高在流量多样性和不平衡流量分布模式下的学习能力,我们提出了一种自定义流量感知聚合策略,该策略基于六个基于流量的指标动态加权客户端贡献。对集群客户机分区的评估表明,我们的自定义聚合在多个评估指标上的性能始终优于基线策略。这些结果突出了集成流量感知聚合在提高基于fl的流量预测框架的性能和泛化能力方面的有效性。
{"title":"A traffic-aware federated learning prediction framework with custom aggregation","authors":"Seerat Kaur,&nbsp;Sukhjit Singh Sehra,&nbsp;Darisuh Ebrahimi,&nbsp;Emad A. Mohammed","doi":"10.1016/j.mlwa.2025.100829","DOIUrl":"10.1016/j.mlwa.2025.100829","url":null,"abstract":"<div><div>Reliable traffic predictions are essential for managing congestion, optimizing routes, improving commuter safety, and advancing the performance of intelligent transportation systems (ITS). However, existing centralized systems often lack adaptability to real-world traffic patterns and fail to capture spatio-temporal variability and client-level heterogeneity. These systems require large amounts of sensitive data to be collected on central servers, intensifying privacy risks. This study proposes a privacy-preserving Federated Learning (FL) framework for traffic flow and speed prediction (5 to 60 mins ahead) using non-independent and identically distributed (non-IID) traffic data. The objectives of this study are threefold: (1) design a client-aware custom FL aggregation strategy that accounts for traffic heterogeneity and client-specific dynamics, ignored in standard FL methods, (2) improve personalization by grouping clients based on real-world traffic pattern similarity via clustering-based approach and, (3) enhance convergence and predictive performance of global aggregation using dynamic, traffic-aware aggregation scores. The proposed framework designs a hybrid FL long-short-term memory (FedLSTM) model augmented with an attention mechanism to effectively model both temporal and spatial traffic variations across junctions, while ensuring that all raw data remains local. To improve learning under traffic diversity and imbalanced traffic distribution patterns, we propose a custom traffic-aware aggregation strategy that dynamically weighs client contributions based on six traffic-based metrics. Evaluations on clustered client partitions demonstrate that our custom aggregation consistently outperformed the baseline strategies across multiple evaluation metrics. These results highlight the effectiveness of integrating traffic-aware aggregation in enhancing the performance and generalization capability of FL-based traffic prediction frameworks.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100829"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal Digital Twins for cyber–physical security in water systems: A framework for robust anomaly detection 水系统网络物理安全的因果数字孪生:一个鲁棒异常检测框架
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-17 DOI: 10.1016/j.mlwa.2025.100824
Mohammadhossein Homaei , Mehran Tarif , Pablo García Rodríguez , Andrés Caro , Mar Ávila
Industrial Control Systems (ICS) in water distribution and treatment face cyber–physical attacks exploiting network and physical vulnerabilities. Current water system anomaly detection methods rely on correlations, yielding high false alarms and poor root cause analysis. We propose a Causal Digital Twin (CDT) framework for water infrastructures, combining causal inference with digital twin modeling. CDT supports association for pattern detection, intervention for system response, and counterfactual analysis for water attack prevention. Evaluated on water-related datasets SWaT, WADI, and HAI, CDT shows high compliance with physical constraints (90.8% for SWaT, 87.4%–90.8% across datasets) and structural Hamming distance 0.133 ± 0.02. F1-scores are 0.944±0.014 (SWaT), 0.902±0.021 (WADI), 0.923±0.018 (HAI, p<0.0024). Multi-scale temporal detection strategies (τ{5,10,20}) enable 91.7% detection of stealthy attacks through cumulative causal discrepancy analysis. CDT reduces false positives by 48% compared to state-of-the-art methods (70% vs. statistical baselines), achieves 78.4% root cause accuracy, and enables counterfactual defenses reducing attack success by up to 89.1%. Real-time performance at 3.2 ms latency ensures safe and interpretable operation for medium-scale water systems.
供水和处理中的工业控制系统(ICS)面临着利用网络和物理漏洞的网络物理攻击。目前的水系统异常检测方法依赖于相关性,产生高误报和差的根本原因分析。我们提出了一个因果数字孪生(CDT)框架,将因果推理与数字孪生建模相结合。CDT支持模式检测的关联、系统响应的干预以及水攻击预防的反事实分析。在水相关数据集SWaT、WADI和HAI上评估,CDT对物理约束的符合性较高(SWaT为90.8%,跨数据集为87.4%-90.8%),结构汉明距离为0.133±0.02。f1评分分别为0.944±0.014 (SWaT), 0.902±0.021 (WADI), 0.923±0.018 (HAI, p<0.0024)。多尺度时间检测策略(τ∈{5,10,20})通过累积因果差异分析,能够检测出91.7%的隐身攻击。与最先进的方法相比,CDT减少了48%的误报(与统计基线相比为70%),实现了78.4%的根本原因准确性,并使反事实防御将攻击成功率降低了89.1%。3.2 ms延迟的实时性能确保了中等规模水系统的安全和可解释的操作。
{"title":"Causal Digital Twins for cyber–physical security in water systems: A framework for robust anomaly detection","authors":"Mohammadhossein Homaei ,&nbsp;Mehran Tarif ,&nbsp;Pablo García Rodríguez ,&nbsp;Andrés Caro ,&nbsp;Mar Ávila","doi":"10.1016/j.mlwa.2025.100824","DOIUrl":"10.1016/j.mlwa.2025.100824","url":null,"abstract":"<div><div>Industrial Control Systems (ICS) in water distribution and treatment face cyber–physical attacks exploiting network and physical vulnerabilities. Current water system anomaly detection methods rely on correlations, yielding high false alarms and poor root cause analysis. We propose a Causal Digital Twin (CDT) framework for water infrastructures, combining causal inference with digital twin modeling. CDT supports association for pattern detection, intervention for system response, and counterfactual analysis for water attack prevention. Evaluated on water-related datasets SWaT, WADI, and HAI, CDT shows high compliance with physical constraints (90.8% for SWaT, 87.4%–90.8% across datasets) and structural Hamming distance 0.133 ± 0.02. F1-scores are <span><math><mrow><mn>0</mn><mo>.</mo><mn>944</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>014</mn></mrow></math></span> (SWaT), <span><math><mrow><mn>0</mn><mo>.</mo><mn>902</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>021</mn></mrow></math></span> (WADI), <span><math><mrow><mn>0</mn><mo>.</mo><mn>923</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>018</mn></mrow></math></span> (HAI, <span><math><mrow><mi>p</mi><mo>&lt;</mo><mn>0</mn><mo>.</mo><mn>0024</mn></mrow></math></span>). Multi-scale temporal detection strategies (<span><math><mrow><mi>τ</mi><mo>∈</mo><mrow><mo>{</mo><mn>5</mn><mo>,</mo><mn>10</mn><mo>,</mo><mn>20</mn><mo>}</mo></mrow></mrow></math></span>) enable 91.7% detection of stealthy attacks through cumulative causal discrepancy analysis. CDT reduces false positives by 48% compared to state-of-the-art methods (70% vs. statistical baselines), achieves 78.4% root cause accuracy, and enables counterfactual defenses reducing attack success by up to 89.1%. Real-time performance at 3.2 ms latency ensures safe and interpretable operation for medium-scale water systems.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100824"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAFE AI metrics: An integrated approach SAFE AI指标:综合方法
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-15 DOI: 10.1016/j.mlwa.2025.100821
Paolo Giudici, Vasily Kolesnikov
We contribute to the field of AI governance with the development of a unified compliance metric that integrates three key dimensions of SAFE Artificial Intelligence: Security, Accuracy, and Explainability. While these aspects are typically assessed in isolation, the proposed approach integrates them into a single and interpretable metric, grounded in a consistent mathematical structure. To develop an integrated framework, the outputs of machine learning models are evaluated under three risk dimensions, that correspond to different input data perturbations: data removal (for accuracy); data poisoning (for security); and feature removal (for explainability). The experimentation of the methodology on both real and simulated datasets shows that the integrated metric improves compliance monitoring and enables a consistent evaluation of AI risks.
我们通过开发统一的合规指标为人工智能治理领域做出贡献,该指标集成了SAFE人工智能的三个关键维度:安全性、准确性和可解释性。虽然这些方面通常是单独评估的,但建议的方法将它们集成到一个统一的、可解释的度量标准中,以一致的数学结构为基础。为了开发一个集成框架,机器学习模型的输出在三个风险维度下进行评估,这些风险维度对应于不同的输入数据扰动:数据删除(为了准确性);数据中毒(为了安全);以及特征删除(为了可解释性)。该方法在真实和模拟数据集上的实验表明,综合指标改善了合规性监控,并能够对人工智能风险进行一致的评估。
{"title":"SAFE AI metrics: An integrated approach","authors":"Paolo Giudici,&nbsp;Vasily Kolesnikov","doi":"10.1016/j.mlwa.2025.100821","DOIUrl":"10.1016/j.mlwa.2025.100821","url":null,"abstract":"<div><div>We contribute to the field of AI governance with the development of a unified compliance metric that integrates three key dimensions of SAFE Artificial Intelligence: Security, Accuracy, and Explainability. While these aspects are typically assessed in isolation, the proposed approach integrates them into a single and interpretable metric, grounded in a consistent mathematical structure. To develop an integrated framework, the outputs of machine learning models are evaluated under three risk dimensions, that correspond to different input data perturbations: data removal (for accuracy); data poisoning (for security); and feature removal (for explainability). The experimentation of the methodology on both real and simulated datasets shows that the integrated metric improves compliance monitoring and enables a consistent evaluation of AI risks.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100821"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trust but verify: Image-aware evaluation of radiology report generators 信任但要验证:放射学报告生成器的图像感知评估
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-01-20 DOI: 10.1016/j.mlwa.2026.100851
Sayeh Gholipour Picha, Dawood Al Chanti, Alice Caplier
Large language and vision-language models have greatly advanced automated chest X-ray report generation (RRG),. yet current evaluation practices remain largely text-based and detached from image evidence. Traditional machine translation metrics fail to determine whether generated findings are clinically correct or visually grounded, limiting their suitability for medical applications.
This study introduces a comprehensive, image-aware evaluation framework that integrates the VICCA (Visual Interpretation and Comprehension of Chest X-ray Anomalies) protocol with the domain-specific semantic metric MCSE (Medical Corpus Similarity Evaluation). VICCA combines visual grounding and text-guided image generation to assess visual-textual consistency, while MCSE measures semantic and factual fidelity through clinically meaningful entities, negations, and modifiers. Together, they provide a unified, semi-reference-free assessment of pathology-level accuracy, semantic coherence, and visual consistency.
Five representative RRG models, R2Gen, M2Trans, CXR-RePaiR, RGRG, and MedGemma, are benchmarked on 2461 MIMIC-CXR studies using a standardized pipeline. Results reveal systematic trade-offs: models with high pathology agreement often generate semantically weak or visually inconsistent reports, whereas textually fluent models may lack proper image grounding. By integrating clinical semantics and visual reliability within a single multimodal framework, VICCA establishes a robust paradigm for evaluating the trustworthiness and interpretability of AI-generated radiology reports.
大型语言和视觉语言模型大大提高了自动胸部x射线报告生成(RRG)。然而,目前的评估实践仍然主要基于文本,与图像证据脱节。传统的机器翻译指标无法确定生成的结果是否临床正确或视觉基础,限制了它们对医学应用的适用性。本研究引入了一个全面的图像感知评估框架,该框架将VICCA(胸部x射线异常的视觉解释和理解)协议与领域特定语义度量MCSE(医学语料库相似性评估)集成在一起。VICCA结合了视觉基础和文本引导图像生成来评估视觉-文本一致性,而MCSE通过临床有意义的实体、否定和修饰语来测量语义和事实的保真度。总之,他们提供了一个统一的,半参考自由的评估病理水平的准确性,语义一致性和视觉一致性。R2Gen、M2Trans、CXR-RePaiR、RGRG和MedGemma这5种具有代表性的RRG模型在2461项MIMIC-CXR研究中使用标准化管道进行基准测试。结果揭示了系统的权衡:具有高病理一致性的模型通常生成语义弱或视觉不一致的报告,而文本流畅的模型可能缺乏适当的图像基础。通过将临床语义和视觉可靠性整合到一个单一的多模态框架中,VICCA为评估人工智能生成的放射学报告的可信度和可解释性建立了一个强大的范例。
{"title":"Trust but verify: Image-aware evaluation of radiology report generators","authors":"Sayeh Gholipour Picha,&nbsp;Dawood Al Chanti,&nbsp;Alice Caplier","doi":"10.1016/j.mlwa.2026.100851","DOIUrl":"10.1016/j.mlwa.2026.100851","url":null,"abstract":"<div><div>Large language and vision-language models have greatly advanced automated chest X-ray report generation (RRG),. yet current evaluation practices remain largely text-based and detached from image evidence. Traditional machine translation metrics fail to determine whether generated findings are clinically correct or visually grounded, limiting their suitability for medical applications.</div><div>This study introduces a comprehensive, image-aware evaluation framework that integrates the VICCA (<em>Visual Interpretation and Comprehension of Chest X-ray Anomalies</em>) protocol with the domain-specific semantic metric MCSE (<em>Medical Corpus Similarity Evaluation</em>). VICCA combines visual grounding and text-guided image generation to assess visual-textual consistency, while MCSE measures semantic and factual fidelity through clinically meaningful entities, negations, and modifiers. Together, they provide a unified, semi-reference-free assessment of pathology-level accuracy, semantic coherence, and visual consistency.</div><div>Five representative RRG models, R2Gen, M2Trans, CXR-RePaiR, RGRG, and MedGemma, are benchmarked on 2461 MIMIC-CXR studies using a standardized pipeline. Results reveal systematic trade-offs: models with high pathology agreement often generate semantically weak or visually inconsistent reports, whereas textually fluent models may lack proper image grounding. By integrating clinical semantics and visual reliability within a single multimodal framework, VICCA establishes a robust paradigm for evaluating the trustworthiness and interpretability of AI-generated radiology reports.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100851"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A causal machine learning framework for analyzing the heterogeneous effects of weather conditions on arrival delays in public rail transportation 一个因果机器学习框架,用于分析天气条件对公共铁路交通到达延误的异质影响
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-02-04 DOI: 10.1016/j.mlwa.2026.100854
Elham Ahmadi , André Ludwig , Henrik Leopold
Extreme weather is often perceived as a major threat to the reliability of rail transportation. This study investigates regional rail operations in central Germany and examines whether severe weather conditions causally affect train arrival delays. We combine several years of operational data (2017–2022) with weather observations from the German Weather Service and define four component-specific treatments for adverse conditions: extreme temperature, strong wind, heavy rainfall, and snow presence. To estimate causal effects, we employ a Double Machine Learning framework based on partial linear regression (PLR–DML) and Causal Forest–DML. By reframing the weather–delay relationship as a causal question rather than a purely predictive one, the study provides evidence on whether extreme weather constitutes a materially relevant source of arrival delays. Across all four weather components, estimated average treatment effects are small in magnitude; analyses of individualized and group-level effects reveal no consistent or robust patterns of heterogeneity across lines, seasons, weekdays, or hours of the day. Robustness checks indicate that these operationally negligible estimates are not sensitive to outliers, rare-event imbalance, or to reasonable perturbations consistent with plausible unobserved confounding. Exploratory mediation analyses are consistent with the interpretation that passenger-flow variables do not materially amplify the already negligible estimated effects of severe weather on delays. Overall, the results suggest that, in this regional network, severe weather does not materially increase arrival delays. The findings underscore the importance of rigorous causal diagnostics for distinguishing perceived from materially relevant sources of delay in transportation reliability studies.
极端天气通常被认为是铁路运输可靠性的主要威胁。本研究调查了德国中部地区的铁路运营,并研究了恶劣的天气条件是否会影响火车到达延误。我们将几年的运行数据(2017-2022)与德国气象局的天气观测相结合,定义了四种针对不利条件的特定组件处理方法:极端温度、强风、强降雨和降雪。为了估计因果效应,我们采用了基于部分线性回归(PLR-DML)和因果森林- dml的双机器学习框架。通过将天气与延误的关系重新定义为一个因果问题,而不是一个纯粹的预测问题,该研究为极端天气是否构成延误的实质性相关来源提供了证据。在所有四个天气成分中,估计的平均处理效果在量级上很小;对个体和群体水平效应的分析显示,在不同的线路、季节、工作日或一天中的不同时段,没有一致的或强有力的异质性模式。稳健性检查表明,这些操作上可忽略的估计对异常值、罕见事件不平衡或与看似未观察到的混淆相一致的合理扰动不敏感。探索性中介分析与客流变量的解释是一致的,客流变量并没有实质性地放大已经可以忽略的恶劣天气对延误的估计影响。总的来说,结果表明,在这个区域网络中,恶劣天气不会实质性地增加到达延误。研究结果强调了严格的因果诊断的重要性,以区分运输可靠性研究中感知到的与物质相关的延误来源。
{"title":"A causal machine learning framework for analyzing the heterogeneous effects of weather conditions on arrival delays in public rail transportation","authors":"Elham Ahmadi ,&nbsp;André Ludwig ,&nbsp;Henrik Leopold","doi":"10.1016/j.mlwa.2026.100854","DOIUrl":"10.1016/j.mlwa.2026.100854","url":null,"abstract":"<div><div>Extreme weather is often perceived as a major threat to the reliability of rail transportation. This study investigates regional rail operations in central Germany and examines whether severe weather conditions causally affect train arrival delays. We combine several years of operational data (2017–2022) with weather observations from the German Weather Service and define four component-specific treatments for adverse conditions: extreme temperature, strong wind, heavy rainfall, and snow presence. To estimate causal effects, we employ a Double Machine Learning framework based on partial linear regression (PLR–DML) and Causal Forest–DML. By reframing the weather–delay relationship as a causal question rather than a purely predictive one, the study provides evidence on whether extreme weather constitutes a materially relevant source of arrival delays. Across all four weather components, estimated average treatment effects are small in magnitude; analyses of individualized and group-level effects reveal no consistent or robust patterns of heterogeneity across lines, seasons, weekdays, or hours of the day. Robustness checks indicate that these operationally negligible estimates are not sensitive to outliers, rare-event imbalance, or to reasonable perturbations consistent with plausible unobserved confounding. Exploratory mediation analyses are consistent with the interpretation that passenger-flow variables do not materially amplify the already negligible estimated effects of severe weather on delays. Overall, the results suggest that, in this regional network, severe weather does not materially increase arrival delays. The findings underscore the importance of rigorous causal diagnostics for distinguishing perceived from materially relevant sources of delay in transportation reliability studies.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100854"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146187666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine learning with applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1