首页 > 最新文献

Machine learning with applications最新文献

英文 中文
Enhancing IDS performance through a comparative analysis of Random Forest, XGBoost, and Deep Neural Networks 通过比较分析随机森林、XGBoost和深度神经网络来增强IDS性能
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-09-27 DOI: 10.1016/j.mlwa.2025.100738
Sow Thierno Hamidou, Adda Mehdi
Intrusion Detection Systems (IDS) face major challenges in network security, notably the need to combine a high detection rate with reliable performance. This reliability is often affected by class imbalances and inadequate hyperparameter optimization. This article addresses the issue of improving the detection rate of IDS by evaluating and comparing three machine learning algorithms: Random Forest (RF), XGBoost, and Deep Neural Networks (DNN), using the NSL-KDD dataset. In our methodology, we integrate SMOTE (Synthetic Minority Oversampling Technique) to tackle the unbalanced nature of the data, ensuring a more balanced representation of the different classes. This approach helps optimize model performance, reduce bias, and enhance robustness. Additionally, hyperparameter optimization is performed using Optuna, ensuring that each algorithm operates at its optimal level. The results show that our model, using the Random Forest algorithm, achieves an accuracy of 99.80%, surpassing the performance of XGBoost and Deep Neural Networks (DNN). This makes our approach a true asset for intrusion detection methods in computer networks.
入侵检测系统(IDS)面临着网络安全的重大挑战,特别是需要将高检测率与可靠的性能相结合。这种可靠性经常受到类不平衡和不充分的超参数优化的影响。本文通过使用NSL-KDD数据集评估和比较三种机器学习算法:随机森林(RF)、XGBoost和深度神经网络(DNN),解决了提高IDS检测率的问题。在我们的方法中,我们整合了SMOTE(合成少数过采样技术)来解决数据的不平衡性质,确保不同类别的更平衡的表示。这种方法有助于优化模型性能,减少偏差,增强鲁棒性。此外,使用Optuna执行超参数优化,确保每个算法在其最佳水平上运行。结果表明,我们的模型使用随机森林算法,达到99.80%的准确率,超过了XGBoost和深度神经网络(DNN)的性能。这使我们的方法成为计算机网络中入侵检测方法的真正资产。
{"title":"Enhancing IDS performance through a comparative analysis of Random Forest, XGBoost, and Deep Neural Networks","authors":"Sow Thierno Hamidou,&nbsp;Adda Mehdi","doi":"10.1016/j.mlwa.2025.100738","DOIUrl":"10.1016/j.mlwa.2025.100738","url":null,"abstract":"<div><div>Intrusion Detection Systems (IDS) face major challenges in network security, notably the need to combine a high detection rate with reliable performance. This reliability is often affected by class imbalances and inadequate hyperparameter optimization. This article addresses the issue of improving the detection rate of IDS by evaluating and comparing three machine learning algorithms: Random Forest (RF), XGBoost, and Deep Neural Networks (DNN), using the NSL-KDD dataset. In our methodology, we integrate SMOTE (Synthetic Minority Oversampling Technique) to tackle the unbalanced nature of the data, ensuring a more balanced representation of the different classes. This approach helps optimize model performance, reduce bias, and enhance robustness. Additionally, hyperparameter optimization is performed using Optuna, ensuring that each algorithm operates at its optimal level. The results show that our model, using the Random Forest algorithm, achieves an accuracy of 99.80%, surpassing the performance of XGBoost and Deep Neural Networks (DNN). This makes our approach a true asset for intrusion detection methods in computer networks.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100738"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural-Enhanced Two-Step Modified Newton–Lavrentiev Method: A structure-preserving deep learning approach for ill-posed inverse problems 神经增强两步改进Newton-Lavrentiev方法:一种病态逆问题的保结构深度学习方法
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-10-17 DOI: 10.1016/j.mlwa.2025.100761
Suresan Pareth
Ill-posed inverse problems frequently arise in scientific and medical imaging, where recovering stable and high-fidelity solutions from incomplete or noisy data remains a central challenge. Motivated by this need, we propose a novel hybrid solver framework, the Neural-Enhanced Two-Step Modified Newton–Lavrentiev Method (NE-TSMNLM), which integrates deep neural corrections into the classical Two-Step Modified Newton–Lavrentiev Method for solving nonlinear inverse problems. Unlike black-box neural operators, our design preserves the convergence structure of the classical iteration while embedding neural modules for adaptive correction, regularization, and convergence prediction.
We establish theoretical guarantees on stability and convergence: under mild assumptions, the NE-TSMNLM method inherits the convergence of the classical TSMNLM and improves the effective convergence rate to q̃=q1+β with β>0. This demonstrates the acceleration effect due to neural corrections, which has been theoretically proven.
We validate the proposed framework on synthetic and medical inverse problems, including low-dose Computed Tomography (CT) reconstruction, where NE-TSMNLM achieves a 50% radiation dose reduction while maintaining structural fidelity. Initial implementations show promising results with slight degradation (e.g., 17.3% error increase) due to untrained modules and data scarcity. We identify clear pathways for improvement using Transformer-based modules, residual-aware training, and scalable synthetic data.
These results position NE-TSMNLM as a structure-preserving neural framework with rigorous mathematical guarantees, bridging classical regularization theory and deep learning for stable, efficient, and interpretable scientific machine learning.
不适定逆问题经常出现在科学和医学成像中,其中从不完整或嘈杂的数据中恢复稳定和高保真的解决方案仍然是一个核心挑战。基于这一需求,我们提出了一种新的混合求解器框架,即神经增强的两步修正牛顿-拉夫伦提耶夫方法(NE-TSMNLM),它将深度神经校正集成到经典的两步修正牛顿-拉夫伦提耶夫方法中,用于求解非线性逆问题。与黑盒神经算子不同,我们的设计保留了经典迭代的收敛结构,同时嵌入了自适应校正、正则化和收敛预测的神经模块。建立了稳定性和收敛性的理论保证:在温和的假设条件下,NE-TSMNLM方法继承了经典TSMNLM的收敛性,并在β>;0时将有效收敛率提高到q =q1+β。这证明了由神经校正引起的加速效应,这在理论上已经得到了证明。我们在合成和医学逆问题上验证了所提出的框架,包括低剂量计算机断层扫描(CT)重建,其中NE-TSMNLM在保持结构保真度的同时实现了50%的辐射剂量降低。由于未经训练的模块和数据稀缺,最初的实现显示出有希望的结果,但有轻微的退化(例如,错误增加17.3%)。我们使用基于transformer的模块、残差感知训练和可扩展的合成数据来确定明确的改进途径。这些结果将NE-TSMNLM定位为具有严格数学保证的结构保持神经框架,将经典正则化理论和深度学习连接起来,实现稳定、高效和可解释的科学机器学习。
{"title":"Neural-Enhanced Two-Step Modified Newton–Lavrentiev Method: A structure-preserving deep learning approach for ill-posed inverse problems","authors":"Suresan Pareth","doi":"10.1016/j.mlwa.2025.100761","DOIUrl":"10.1016/j.mlwa.2025.100761","url":null,"abstract":"<div><div>Ill-posed inverse problems frequently arise in scientific and medical imaging, where recovering stable and high-fidelity solutions from incomplete or noisy data remains a central challenge. Motivated by this need, we propose a novel hybrid solver framework, the <strong>Neural-Enhanced Two-Step Modified Newton–Lavrentiev Method (NE-TSMNLM)</strong>, which integrates deep neural corrections into the classical Two-Step Modified Newton–Lavrentiev Method for solving nonlinear inverse problems. Unlike black-box neural operators, our design preserves the convergence structure of the classical iteration while embedding neural modules for adaptive correction, regularization, and convergence prediction.</div><div>We establish theoretical guarantees on stability and convergence: under mild assumptions, the NE-TSMNLM method inherits the convergence of the classical TSMNLM and improves the effective convergence rate to <span><math><mrow><mover><mrow><mi>q</mi></mrow><mrow><mo>̃</mo></mrow></mover><mo>=</mo><msup><mrow><mi>q</mi></mrow><mrow><mn>1</mn><mo>+</mo><mi>β</mi></mrow></msup></mrow></math></span> with <span><math><mrow><mi>β</mi><mo>&gt;</mo><mn>0</mn></mrow></math></span>. This demonstrates the acceleration effect due to neural corrections, which has been theoretically proven.</div><div>We validate the proposed framework on synthetic and medical inverse problems, including low-dose Computed Tomography (CT) reconstruction, where NE-TSMNLM achieves a 50% radiation dose reduction while maintaining structural fidelity. Initial implementations show promising results with slight degradation (e.g., 17.3% error increase) due to untrained modules and data scarcity. We identify clear pathways for improvement using Transformer-based modules, residual-aware training, and scalable synthetic data.</div><div>These results position NE-TSMNLM as a structure-preserving neural framework with rigorous mathematical guarantees, bridging classical regularization theory and deep learning for stable, efficient, and interpretable scientific machine learning.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100761"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145321181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward clinical reliability: Visualizing and interpreting ai-based classification in peripheral blood smear analysis 临床可靠性:外周血涂片分析中基于人工智能分类的可视化和解释
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-10-31 DOI: 10.1016/j.mlwa.2025.100780
Hiroaki Iwata, Tsukie Shibayama, Miku Watanabe, Hisashi Shimohiro
With the advent of digital microscopy, the International Council for Standardization in Hematology recommends digital imaging and artificial intelligence (AI) algorithms for automatically classifying blood cells in peripheral blood smears to enhance diagnostic efficiency and accuracy. Nevertheless, while early AI studies have shown promising results in classifying white blood cells, the prediction process often remains unclear. Herein, we aimed to build a highly accurate model and visualize the basis of its predictions. The dataset comprised peripheral blood smear images of normal cells from individuals without infections, hematological disorders, or tumors, who were not undergoing any drug treatment at the time of blood collection. The images were obtained using a CellaVision DM96 analyzer at the Core Laboratory of Hospital Clínic de Barcelona. We used VGG16 and ResNet50 with transfer learning on ImageNet and applied the Grad-CAM method to visualize the image regions on which the model focused for classification. The model effectively recognized features, such as nuclear indentation and cytoplasmic color, which are crucial for classifying promyelocytes, myelocytes, and metamyelocytes. Traditionally, the basis of AI model predictions has been opaque, posing a challenge for medical applications. Our visualized classification basis clarifies the decision-making process of the model. These insights suggest that understanding these features can make the predictions of AI models more reliable and interpretable. Our findings improve diagnostic efficiency and suggest the potential of AI-based diagnostic support systems. Future research should validate this model’s performance using more extensive datasets and different cell types to enhance its reliability and practicality.
随着数字显微镜的出现,国际血液学标准化委员会推荐使用数字成像和人工智能(AI)算法对外周血涂片中的血细胞进行自动分类,以提高诊断效率和准确性。然而,尽管早期的人工智能研究在白细胞分类方面取得了可喜的成果,但预测过程往往仍不清楚。在这里,我们的目标是建立一个高度精确的模型,并将其预测的基础可视化。该数据集包括来自没有感染、血液系统疾病或肿瘤的个体的正常细胞的外周血涂片图像,这些个体在采血时没有接受任何药物治疗。图像使用巴塞罗那医院Clínic核心实验室的CellaVision DM96分析仪获得。我们在ImageNet上使用VGG16和ResNet50进行迁移学习,并应用Grad-CAM方法对模型所关注的图像区域进行可视化分类。该模型有效地识别了核压痕和细胞质颜色等特征,这些特征对早幼粒细胞、髓细胞和变髓细胞的分类至关重要。传统上,人工智能模型预测的基础是不透明的,这对医疗应用构成了挑战。我们的可视化分类基础明确了模型的决策过程。这些见解表明,理解这些特征可以使人工智能模型的预测更加可靠和可解释。我们的研究结果提高了诊断效率,并提出了基于人工智能的诊断支持系统的潜力。未来的研究应使用更广泛的数据集和不同的细胞类型来验证该模型的性能,以提高其可靠性和实用性。
{"title":"Toward clinical reliability: Visualizing and interpreting ai-based classification in peripheral blood smear analysis","authors":"Hiroaki Iwata,&nbsp;Tsukie Shibayama,&nbsp;Miku Watanabe,&nbsp;Hisashi Shimohiro","doi":"10.1016/j.mlwa.2025.100780","DOIUrl":"10.1016/j.mlwa.2025.100780","url":null,"abstract":"<div><div>With the advent of digital microscopy, the International Council for Standardization in Hematology recommends digital imaging and artificial intelligence (AI) algorithms for automatically classifying blood cells in peripheral blood smears to enhance diagnostic efficiency and accuracy. Nevertheless, while early AI studies have shown promising results in classifying white blood cells, the prediction process often remains unclear. Herein, we aimed to build a highly accurate model and visualize the basis of its predictions. The dataset comprised peripheral blood smear images of normal cells from individuals without infections, hematological disorders, or tumors, who were not undergoing any drug treatment at the time of blood collection. The images were obtained using a CellaVision DM96 analyzer at the Core Laboratory of Hospital Clínic de Barcelona. We used VGG16 and ResNet50 with transfer learning on ImageNet and applied the Grad-CAM method to visualize the image regions on which the model focused for classification. The model effectively recognized features, such as nuclear indentation and cytoplasmic color, which are crucial for classifying promyelocytes, myelocytes, and metamyelocytes. Traditionally, the basis of AI model predictions has been opaque, posing a challenge for medical applications. Our visualized classification basis clarifies the decision-making process of the model. These insights suggest that understanding these features can make the predictions of AI models more reliable and interpretable. Our findings improve diagnostic efficiency and suggest the potential of AI-based diagnostic support systems. Future research should validate this model’s performance using more extensive datasets and different cell types to enhance its reliability and practicality.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100780"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature importance analysis of optimized machine learning modeling for predicting customers satisfaction at the United States Airlines 用于预测美国航空公司客户满意度的优化机器学习模型的特征重要性分析
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-09-10 DOI: 10.1016/j.mlwa.2025.100734
Hamid Mirzahossein, Soheil Rezashoar
Customer experience is crucial in the airline industry, as understanding passenger satisfaction helps airlines improve service quality. This study evaluates hyperparameter optimization and feature interpretability in machine learning models for predicting airline passenger satisfaction. Support Vector Machine (SVM) and Multilayer Perceptron (MLP) models were tested for binary classification, labeling passengers as ‘Satisfied’ or ‘Neutral or Dissatisfied’ using a Kaggle dataset with ∼104,000 training and ∼26,000 test records. Hyperparameter tuning used grid search with 10-fold cross-validation. For SVM, the optimal setup included the RBF kernel, C = 10, and gamma = ‘auto’, achieving a mean score of 0.9606. For MLP, the best configuration used no regularization, "he" initialization, ReLU activation, 30 epochs, batch size of 32, two hidden layers with 32 neurons each, and a learning rate of 0.001, yielding a mean score of 0.9556. Performance metrics included accuracy, precision, recall, and F1-Score, with SVM achieving a test accuracy of 0.96, precision of 0.97, and F1-Score of 0.95, slightly outperforming MLP by <1 %, though MLP was faster at 0.3 s versus SVM’s 18 s. Both models surpassed baseline models and prior studies, benefiting from robust preprocessing and a large dataset. Permutation importance analysis identified Type of Travel, Inflight Wi-Fi Service, Customer Type, and Online Boarding as key predictors, emphasizing passenger needs for digital connectivity and personalized services. These insights guide airlines to prioritize reliable Wi-Fi and efficient online boarding to enhance satisfaction, loyalty, and competitive positioning.
客户体验对航空业至关重要,因为了解乘客满意度有助于航空公司提高服务质量。本研究评估了用于预测航空乘客满意度的机器学习模型中的超参数优化和特征可解释性。支持向量机(SVM)和多层感知器(MLP)模型进行了二元分类测试,使用具有约104,000个训练记录和约26,000个测试记录的Kaggle数据集将乘客标记为“满意”或“中性或不满意”。超参数调优使用网格搜索和10倍交叉验证。对于SVM,最优设置包括RBF核,C = 10, gamma = ' auto ',平均得分为0.9606。对于MLP,最佳配置使用不正则化,“he”初始化,ReLU激活,30次epoch,批大小为32,两个隐藏层,每个隐藏层有32个神经元,学习率为0.001,平均得分为0.9556。性能指标包括准确率、精密度、召回率和F1-Score,其中SVM的测试准确率为0.96,精密度为0.97,F1-Score为0.95,略优于MLP 1%,尽管MLP比SVM的18 s快0.3 s。得益于强大的预处理和庞大的数据集,这两个模型都超越了基线模型和先前的研究。排列重要性分析将旅行类型、机上Wi-Fi服务、客户类型和在线登机作为关键预测因素,强调了乘客对数字连接和个性化服务的需求。这些见解指导航空公司优先考虑可靠的Wi-Fi和高效的在线登机,以提高满意度、忠诚度和竞争定位。
{"title":"Feature importance analysis of optimized machine learning modeling for predicting customers satisfaction at the United States Airlines","authors":"Hamid Mirzahossein,&nbsp;Soheil Rezashoar","doi":"10.1016/j.mlwa.2025.100734","DOIUrl":"10.1016/j.mlwa.2025.100734","url":null,"abstract":"<div><div>Customer experience is crucial in the airline industry, as understanding passenger satisfaction helps airlines improve service quality. This study evaluates hyperparameter optimization and feature interpretability in machine learning models for predicting airline passenger satisfaction. Support Vector Machine (SVM) and Multilayer Perceptron (MLP) models were tested for binary classification, labeling passengers as ‘Satisfied’ or ‘Neutral or Dissatisfied’ using a Kaggle dataset with ∼104,000 training and ∼26,000 test records. Hyperparameter tuning used grid search with 10-fold cross-validation. For SVM, the optimal setup included the RBF kernel, <em>C</em> = 10, and gamma = ‘auto’, achieving a mean score of 0.9606. For MLP, the best configuration used no regularization, \"he\" initialization, ReLU activation, 30 epochs, batch size of 32, two hidden layers with 32 neurons each, and a learning rate of 0.001, yielding a mean score of 0.9556. Performance metrics included accuracy, precision, recall, and F1-Score, with SVM achieving a test accuracy of 0.96, precision of 0.97, and F1-Score of 0.95, slightly outperforming MLP by &lt;1 %, though MLP was faster at 0.3 s versus SVM’s 18 s. Both models surpassed baseline models and prior studies, benefiting from robust preprocessing and a large dataset. Permutation importance analysis identified Type of Travel, Inflight Wi-Fi Service, Customer Type, and Online Boarding as key predictors, emphasizing passenger needs for digital connectivity and personalized services. These insights guide airlines to prioritize reliable Wi-Fi and efficient online boarding to enhance satisfaction, loyalty, and competitive positioning.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100734"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145108030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning approaches to traffic accident severity prediction: Addressing class imbalance 交通事故严重程度预测的机器学习方法:解决类别不平衡
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-11-07 DOI: 10.1016/j.mlwa.2025.100792
Mohammad Amin Amiri , Saeid Afshari , Ali Soltani
Road traffic injuries continue to pose a significant public health challenge in Australia, with pedestrians representing one of the most vulnerable road user groups. Accurate prediction of injury severity, particularly fatal outcomes, is essential for improving road safety interventions and resource allocation. This study applies advanced machine learning techniques to predict pedestrian crash severity using national hospitalization and mortality data collected from 2011 to 2021. The analysis focuses on addressing class imbalance, a common issue in injury data by evaluating the impact of several data balancing methods, including SMOTE, ADASYN, Random Oversampling (ROS), and Threshold Moving. We implement and compare four supervised learning algorithms: Logistic Regression, Support Vector Machine (SVM), Decision Tree, and XGBoost. Model performance is assessed using F1-score and macro-accuracy, with a focus on the minority (fatality) class. Results show that XGBoost combined with Threshold Moving achieves the highest performance, yielding an F1-score of 72% for fatality classification and a macro-accuracy of 84%. Additionally, feature importance analysis using SHAP values reveals age, gender, road user type, and crash location as key predictors of injury severity. The study highlights the critical role of data balancing strategies in enhancing predictive accuracy for rare but high-impact outcomes. These findings provide actionable insights for transport authorities and policymakers seeking to develop data-driven, targeted safety measures to protect pedestrians and reduce the severity of crash outcomes.
道路交通伤害继续对澳大利亚的公共卫生构成重大挑战,行人是最脆弱的道路使用者群体之一。准确预测伤害严重程度,特别是致命后果,对于改进道路安全干预措施和资源分配至关重要。本研究采用先进的机器学习技术,利用2011年至2021年收集的全国住院和死亡率数据预测行人碰撞严重程度。分析的重点是通过评估几种数据平衡方法的影响,包括SMOTE、ADASYN、随机过采样(ROS)和阈值移动,来解决损伤数据中的一个常见问题——类别失衡。我们实现并比较了四种监督学习算法:逻辑回归、支持向量机(SVM)、决策树和XGBoost。模型性能使用f1分数和宏观精度进行评估,重点关注少数(死亡)类别。结果表明,XGBoost结合Threshold Moving实现了最高的性能,在死亡率分类方面的f1得分为72%,宏观精度为84%。此外,使用SHAP值的特征重要性分析显示,年龄、性别、道路使用者类型和碰撞位置是损伤严重程度的关键预测因素。该研究强调了数据平衡策略在提高罕见但高影响结果的预测准确性方面的关键作用。这些发现为交通管理部门和政策制定者提供了可行的见解,帮助他们制定数据驱动的、有针对性的安全措施,以保护行人,降低碰撞后果的严重程度。
{"title":"Machine learning approaches to traffic accident severity prediction: Addressing class imbalance","authors":"Mohammad Amin Amiri ,&nbsp;Saeid Afshari ,&nbsp;Ali Soltani","doi":"10.1016/j.mlwa.2025.100792","DOIUrl":"10.1016/j.mlwa.2025.100792","url":null,"abstract":"<div><div>Road traffic injuries continue to pose a significant public health challenge in Australia, with pedestrians representing one of the most vulnerable road user groups. Accurate prediction of injury severity, particularly fatal outcomes, is essential for improving road safety interventions and resource allocation. This study applies advanced machine learning techniques to predict pedestrian crash severity using national hospitalization and mortality data collected from 2011 to 2021. The analysis focuses on addressing class imbalance, a common issue in injury data by evaluating the impact of several data balancing methods, including SMOTE, ADASYN, Random Oversampling (ROS), and Threshold Moving. We implement and compare four supervised learning algorithms: Logistic Regression, Support Vector Machine (SVM), Decision Tree, and XGBoost. Model performance is assessed using F1-score and macro-accuracy, with a focus on the minority (fatality) class. Results show that XGBoost combined with Threshold Moving achieves the highest performance, yielding an F1-score of 72% for fatality classification and a macro-accuracy of 84%. Additionally, feature importance analysis using SHAP values reveals age, gender, road user type, and crash location as key predictors of injury severity. The study highlights the critical role of data balancing strategies in enhancing predictive accuracy for rare but high-impact outcomes. These findings provide actionable insights for transport authorities and policymakers seeking to develop data-driven, targeted safety measures to protect pedestrians and reduce the severity of crash outcomes.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100792"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning-driven predictive modeling of temperature-dependent mechanical properties in austenitic stainless steels 奥氏体不锈钢温度相关力学性能的机器学习驱动预测建模
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-11-06 DOI: 10.1016/j.mlwa.2025.100786
Movaffaq Kateb , Sahar Safarian
This work demonstrates that modern tree‑based models can effectively model complex, temperature-dependent mechanical responses, including highly nonlinear and even non-monotonic trends, in austenitic stainless steel and highlights limitations of composition‑only empirical models. To ensure robust model evaluation, we employed multiple validation strategies including repeated random train and test partitions and leave-one-out cross-validation. While one might assume that steel grade is fully captured by its composition, local assessments within narrower compositional ranges reveal different feature importance rankings than those observed in the full dataset. Grade-specific (AISI 304, 316, 321 and 347) feature importance analysis offered deeper insights into local alloy behavior and demonstrated the advantage of disaggregated modeling in avoiding misleading conclusions. Clustering and SHAP analyses further revealed a temperature-sensitive role of nitrogen, which strengthens the alloy through interstitial and fine precipitate mechanisms at lower temperatures but loses effectiveness at elevated temperatures due to precipitate coarsening. This highlights how data-driven methods can uncover metallurgically consistent, temperature-dependent strengthening behaviors not captured by simpler models. Our results confirm that temperature governs the mechanical performance of austenitic stainless steels, with other features contributing marginally, particularly for UTS. Additionally, the model achieved a notably high score for elongation, highlighting the critical role of testing temperature in addressing the long-standing challenge of poor elongation predictions in composition-only or composition-processing models. This suggests that low accuracy in previous studies is more likely due to dataset limitations rather than shortcomings of tree-based models.
这项工作表明,现代基于树的模型可以有效地模拟复杂的、温度相关的机械响应,包括奥氏体不锈钢的高度非线性甚至非单调趋势,并突出了仅成分经验模型的局限性。为了确保模型评估的鲁棒性,我们采用了多种验证策略,包括重复随机训练和测试分区以及留一交叉验证。虽然人们可能会认为钢铁等级完全由其成分捕获,但在较窄的成分范围内的局部评估显示出与完整数据集中观察到的特征重要性排名不同。特定等级(AISI 304、316、321和347)特征重要性分析提供了对局部合金行为的更深入了解,并展示了分类建模在避免误导性结论方面的优势。聚类分析和SHAP分析进一步揭示了氮的温度敏感作用,在较低温度下,氮通过间隙和细晶析出机制强化合金,但在高温下,由于析出物变粗而失去效果。这突出了数据驱动的方法如何能够揭示冶金一致的、依赖于温度的强化行为,这些行为没有被更简单的模型捕获。我们的研究结果证实,温度控制着奥氏体不锈钢的机械性能,其他特性的影响很小,特别是对于UTS。此外,该模型在伸长率方面取得了显著的高分,突出了测试温度在解决仅成分或成分加工模型中伸长率预测不佳的长期挑战中的关键作用。这表明先前研究的低准确性更可能是由于数据集的限制,而不是基于树的模型的缺点。
{"title":"Machine learning-driven predictive modeling of temperature-dependent mechanical properties in austenitic stainless steels","authors":"Movaffaq Kateb ,&nbsp;Sahar Safarian","doi":"10.1016/j.mlwa.2025.100786","DOIUrl":"10.1016/j.mlwa.2025.100786","url":null,"abstract":"<div><div>This work demonstrates that modern tree‑based models can effectively model complex, temperature-dependent mechanical responses, including highly nonlinear and even non-monotonic trends, in austenitic stainless steel and highlights limitations of composition‑only empirical models. To ensure robust model evaluation, we employed multiple validation strategies including repeated random train and test partitions and leave-one-out cross-validation. While one might assume that steel grade is fully captured by its composition, local assessments within narrower compositional ranges reveal different feature importance rankings than those observed in the full dataset. Grade-specific (AISI 304, 316, 321 and 347) feature importance analysis offered deeper insights into local alloy behavior and demonstrated the advantage of disaggregated modeling in avoiding misleading conclusions. Clustering and SHAP analyses further revealed a temperature-sensitive role of nitrogen, which strengthens the alloy through interstitial and fine precipitate mechanisms at lower temperatures but loses effectiveness at elevated temperatures due to precipitate coarsening. This highlights how data-driven methods can uncover metallurgically consistent, temperature-dependent strengthening behaviors not captured by simpler models. Our results confirm that temperature governs the mechanical performance of austenitic stainless steels, with other features contributing marginally, particularly for UTS. Additionally, the model achieved a notably high score for elongation, highlighting the critical role of testing temperature in addressing the long-standing challenge of poor elongation predictions in composition-only or composition-processing models. This suggests that low accuracy in previous studies is more likely due to dataset limitations rather than shortcomings of tree-based models.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100786"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust anomaly detection through multi-modal autoencoder fusion for small vehicle damage detection 基于多模态自编码器融合的小型车辆损伤鲁棒检测
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-11-12 DOI: 10.1016/j.mlwa.2025.100794
Sara Khan , Mehmed Yüksel , Frank Kirchner
Wear and tear detection in fleet and shared vehicle systems is a critical challenge, particularly in rental and car-sharing services, where minor damage, such as dents, scratches, and underbody impacts, often goes unnoticed or is detected too late. Currently, manual inspection methods are the default approach, but are labour-intensive and prone to human error. In contrast, state-of-the-art image-based methods are less reliable when the vehicle is moving, and they cannot effectively capture underbody damage due to limited visual access and spatial coverage. This work introduces a novel multi-modal architecture based on anomaly detection to address these issues. Sensors such as Inertial Measurement Units (IMUs) and microphones are integrated into a compact device mounted on the vehicle’s windshield. This approach supports real-time damage detection while avoiding the need for highly resource-intensive sensors. We developed multiple variants of multi-modal autoencoder-based architectures and evaluated them against unimodal and state-of-the-art methods. Our multi-modal ensemble model with pooling achieved the highest performance, with a Receiver Operating Characteristic-Area Under Curve (ROC-AUC) of 92%, demonstrating its effectiveness in real-world applications. This approach can also be extended to other applications, such as improving automotive safety. It can integrate with airbag systems for efficient deployment and help autonomous vehicles by complementing other sensors in collision detection.
车队和共享车辆系统的磨损检测是一项关键挑战,特别是在租赁和汽车共享服务中,轻微的损坏,如凹痕、划痕和车底撞击,往往被忽视或检测得太晚。目前,人工检查方法是默认的方法,但这是劳动密集型的,容易出现人为错误。相比之下,最先进的基于图像的方法在车辆移动时不太可靠,而且由于视觉访问和空间覆盖范围有限,它们无法有效捕获车底损坏。本文介绍了一种基于异常检测的新型多模态体系结构来解决这些问题。惯性测量单元(imu)和麦克风等传感器集成到安装在车辆挡风玻璃上的紧凑型设备中。这种方法支持实时损伤检测,同时避免了对高度资源密集型传感器的需求。我们开发了基于多模态自编码器的架构的多种变体,并针对单模态和最先进的方法对它们进行了评估。我们的带有池的多模态集成模型实现了最高的性能,接收器工作特性曲线下面积(ROC-AUC)为92%,证明了其在实际应用中的有效性。这种方法也可以扩展到其他应用,例如提高汽车安全性。它可以与安全气囊系统集成,以实现高效部署,并通过补充其他传感器来帮助自动驾驶汽车进行碰撞检测。
{"title":"Robust anomaly detection through multi-modal autoencoder fusion for small vehicle damage detection","authors":"Sara Khan ,&nbsp;Mehmed Yüksel ,&nbsp;Frank Kirchner","doi":"10.1016/j.mlwa.2025.100794","DOIUrl":"10.1016/j.mlwa.2025.100794","url":null,"abstract":"<div><div>Wear and tear detection in fleet and shared vehicle systems is a critical challenge, particularly in rental and car-sharing services, where minor damage, such as dents, scratches, and underbody impacts, often goes unnoticed or is detected too late. Currently, manual inspection methods are the default approach, but are labour-intensive and prone to human error. In contrast, state-of-the-art image-based methods are less reliable when the vehicle is moving, and they cannot effectively capture underbody damage due to limited visual access and spatial coverage. This work introduces a novel multi-modal architecture based on anomaly detection to address these issues. Sensors such as Inertial Measurement Units (IMUs) and microphones are integrated into a compact device mounted on the vehicle’s windshield. This approach supports real-time damage detection while avoiding the need for highly resource-intensive sensors. We developed multiple variants of multi-modal autoencoder-based architectures and evaluated them against unimodal and state-of-the-art methods. Our multi-modal ensemble model with pooling achieved the highest performance, with a Receiver Operating Characteristic-Area Under Curve (ROC-AUC) of 92%, demonstrating its effectiveness in real-world applications. This approach can also be extended to other applications, such as improving automotive safety. It can integrate with airbag systems for efficient deployment and help autonomous vehicles by complementing other sensors in collision detection.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100794"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the latent distribution of logistic regression — An empirical study on spectroscopic profiling datasets 论逻辑回归的潜在分布——光谱分析数据集的实证研究
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-09-04 DOI: 10.1016/j.mlwa.2025.100712
Yinsheng Zhang, Mingming He, Haiyan Wang
Logistic regression is a simple yet widely used classification model in spectroscopic profiling analysis. Considering the model’s output represents a probability, this paper will investigate its latent distribution assumption, i.e., its inner linear regressor unit follows a standard logistic distribution. An empirical study on five spectroscopic profiling open datasets, i.e., wine, coffee, olive oil, cheese, and milk powder, was conducted to verify this latent distribution assertion. This paper measured the GoF (Goodness of Fit) of each dataset’s latent variable from three aspects, i.e., curve fitting, P–P and Q–Q plots, and K–S test. After hyper-parameter optimization and proper training, the latent variable, as a weighted sum of the original features, has demonstrated a high level of GoF on all the five datasets. This study verifies the suitability of logistic regression in spectroscopic profiling analysis and answers why the model output can be interpreted as a conditional probability.
逻辑回归是一种简单而广泛应用于光谱分析的分类模型。考虑到模型的输出代表一个概率,本文将研究其潜在分布假设,即其内部线性回归量单元服从标准logistic分布。对葡萄酒、咖啡、橄榄油、奶酪和奶粉等5个光谱分析开放数据集进行了实证研究,验证了这一潜在分布断言。本文从曲线拟合、P-P和Q-Q图以及K-S检验三个方面对每个数据集潜在变量的GoF(拟合优度)进行了测量。经过超参数优化和适当的训练,潜变量作为原始特征的加权和,在所有5个数据集上都表现出较高的GoF水平。本研究验证了逻辑回归在光谱剖面分析中的适用性,并回答了为什么模型输出可以解释为条件概率。
{"title":"On the latent distribution of logistic regression — An empirical study on spectroscopic profiling datasets","authors":"Yinsheng Zhang,&nbsp;Mingming He,&nbsp;Haiyan Wang","doi":"10.1016/j.mlwa.2025.100712","DOIUrl":"10.1016/j.mlwa.2025.100712","url":null,"abstract":"<div><div>Logistic regression is a simple yet widely used classification model in spectroscopic profiling analysis. Considering the model’s output represents a probability, this paper will investigate its latent distribution assumption, i.e., its inner linear regressor unit follows a standard logistic distribution. An empirical study on five spectroscopic profiling open datasets, i.e., wine, coffee, olive oil, cheese, and milk powder, was conducted to verify this latent distribution assertion. This paper measured the GoF (Goodness of Fit) of each dataset’s latent variable from three aspects, i.e., curve fitting, P–P and Q–Q plots, and K–S test. After hyper-parameter optimization and proper training, the latent variable, as a weighted sum of the original features, has demonstrated a high level of GoF on all the five datasets. This study verifies the suitability of logistic regression in spectroscopic profiling analysis and answers why the model output can be interpreted as a conditional probability.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100712"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A machine learning approach to vulnerability detection combining software metrics and topic modelling: Evidence from smart contracts 结合软件度量和主题建模的漏洞检测机器学习方法:来自智能合约的证据
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-10-17 DOI: 10.1016/j.mlwa.2025.100759
Giacomo Ibba , Rumyana Neykova , Marco Ortu , Roberto Tonelli , Steve Counsell , Giuseppe Destefanis
This paper introduces a methodology for software vulnerability detection that combines structural and semantic analysis through software metrics and topic modelling. We evaluate the approach using smart contracts as a case study, focusing on their structural properties and the presence of known security vulnerabilities. We identify the most relevant metrics for vulnerability detection, evaluate multiple machine learning classifiers for both binary and multi-label classification, and improve classification performance by integrating topic modelling techniques.
Our analysis shows that metrics such as cyclomatic complexity, nesting depth, and function calls are strongly associated with vulnerability presence. Using these metrics, the Random Forest classifier achieved strong performance in binary classification (AUC: 0.982, accuracy: 0.977, F1-score: 0.808) and multi-label classification (AUC: 0.951, accuracy: 0.729, F1-score: 0.839). The addition of topic modelling using Non-Negative Matrix Factorisation further improved results, increasing the F1-score to 0.881. The evaluation is conducted on Ethereum smart contracts written in Solidity.
本文介绍了一种通过软件度量和主题建模,将结构分析和语义分析相结合的软件漏洞检测方法。我们使用智能合约作为案例研究来评估这种方法,重点关注它们的结构属性和已知安全漏洞的存在。我们确定了最相关的漏洞检测指标,评估了二元和多标签分类的多个机器学习分类器,并通过集成主题建模技术提高了分类性能。我们的分析表明,圈复杂度、嵌套深度和函数调用等指标与漏洞存在密切相关。使用这些指标,Random Forest分类器在二元分类(AUC: 0.982,准确率:0.977,F1-score: 0.808)和多标签分类(AUC: 0.951,准确率:0.729,F1-score: 0.839)方面取得了较好的表现。使用非负矩阵分解添加主题建模进一步改善了结果,将f1得分提高到0.881。评估是对用Solidity编写的以太坊智能合约进行的。
{"title":"A machine learning approach to vulnerability detection combining software metrics and topic modelling: Evidence from smart contracts","authors":"Giacomo Ibba ,&nbsp;Rumyana Neykova ,&nbsp;Marco Ortu ,&nbsp;Roberto Tonelli ,&nbsp;Steve Counsell ,&nbsp;Giuseppe Destefanis","doi":"10.1016/j.mlwa.2025.100759","DOIUrl":"10.1016/j.mlwa.2025.100759","url":null,"abstract":"<div><div>This paper introduces a methodology for software vulnerability detection that combines structural and semantic analysis through software metrics and topic modelling. We evaluate the approach using smart contracts as a case study, focusing on their structural properties and the presence of known security vulnerabilities. We identify the most relevant metrics for vulnerability detection, evaluate multiple machine learning classifiers for both binary and multi-label classification, and improve classification performance by integrating topic modelling techniques.</div><div>Our analysis shows that metrics such as cyclomatic complexity, nesting depth, and function calls are strongly associated with vulnerability presence. Using these metrics, the Random Forest classifier achieved strong performance in binary classification (AUC: 0.982, accuracy: 0.977, F1-score: 0.808) and multi-label classification (AUC: 0.951, accuracy: 0.729, F1-score: 0.839). The addition of topic modelling using Non-Negative Matrix Factorisation further improved results, increasing the F1-score to 0.881. The evaluation is conducted on Ethereum smart contracts written in Solidity.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100759"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145363475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty quantification by large language models 大型语言模型的不确定性量化
IF 4.9 Pub Date : 2025-12-01 Epub Date: 2025-11-01 DOI: 10.1016/j.mlwa.2025.100773
Dorianis M. Perez, Bryan E. Kaiser, Ismael Boureima
As reasoning capabilities of large language models (LLMs) continue to advance, they are being integrated into increasingly complex scientific workflows, with the goal of developing agents capable of generating evidence-based explanations and testing hypotheses and theories. However, despite their rapid progress, most existing evaluations of LLM reasoning focus on accuracy or consistency rather than on uncertainty quantification (UQ), which is essential for evidence-based reasoning because it quantifies the degree of trustworthiness of evidence-based explanations. Current approaches to LLM uncertainty remain fragmented, often lacking standardized benchmarks that test models under varying task complexities. To address this gap, we introduce the first benchmark suite designed to evaluate UQ by LLM-based agents and tools. The benchmark targets one of the most fundamental UQ problem: estimating whether one quantity is probably larger than another under uncertainty. It includes two progressively complex tasks: a simple inequality test, where models judge whether one of two sets of samples is “larger,” “smaller,” or “uncertain” with 95% confidence, and a complex inequality test, where models assess interventional probabilities requiring multiple intermediate calculations. We found that reasoning models are generally capable of UQ (scores 70%) in the simple inequality case but do not score appreciably better than random guessing (scores 33%) for the complex inequality case if the UQ method and intermediate steps are not provided in the prompt. Our implementation is available at https://github.com/bekaiser-LANL/tether.
随着大型语言模型(llm)推理能力的不断提高,它们正被整合到日益复杂的科学工作流程中,其目标是开发能够生成基于证据的解释和测试假设和理论的代理。然而,尽管LLM推理取得了快速进展,但大多数现有的评估都集中在准确性或一致性上,而不是不确定性量化(UQ),这对于基于证据的推理至关重要,因为它量化了基于证据的解释的可信度程度。当前LLM不确定性的方法仍然是碎片化的,通常缺乏在不同任务复杂性下测试模型的标准化基准。为了解决这一差距,我们引入了第一个基准测试套件,旨在通过基于llm的代理和工具评估UQ。基准的目标是最基本的UQ问题之一:估计在不确定的情况下,一个量是否可能大于另一个量。它包括两个逐渐复杂的任务:一个简单的不等式检验,模型判断两组样本中的一个是“更大”、“更小”还是“不确定”,置信度为95%;另一个复杂的不等式检验,模型评估需要多次中间计算的干预概率。我们发现,推理模型在简单不等式情况下通常能够获得UQ(得分≥70%),但如果提示中没有提供UQ方法和中间步骤,则推理模型的得分并不明显优于复杂不等式情况下的随机猜测(得分≥33%)。我们的实现可以在https://github.com/bekaiser-LANL/tether上获得。
{"title":"Uncertainty quantification by large language models","authors":"Dorianis M. Perez,&nbsp;Bryan E. Kaiser,&nbsp;Ismael Boureima","doi":"10.1016/j.mlwa.2025.100773","DOIUrl":"10.1016/j.mlwa.2025.100773","url":null,"abstract":"<div><div>As reasoning capabilities of large language models (LLMs) continue to advance, they are being integrated into increasingly complex scientific workflows, with the goal of developing agents capable of generating evidence-based explanations and testing hypotheses and theories. However, despite their rapid progress, most existing evaluations of LLM reasoning focus on accuracy or consistency rather than on uncertainty quantification (UQ), which is essential for evidence-based reasoning because it quantifies the degree of trustworthiness of evidence-based explanations. Current approaches to LLM uncertainty remain fragmented, often lacking standardized benchmarks that test models under varying task complexities. To address this gap, we introduce the first benchmark suite designed to evaluate UQ by LLM-based agents and tools. The benchmark targets one of the most fundamental UQ problem: estimating whether one quantity is probably larger than another under uncertainty. It includes two progressively complex tasks: a simple inequality test, where models judge whether one of two sets of samples is “larger,” “smaller,” or “uncertain” with 95% confidence, and a complex inequality test, where models assess interventional probabilities requiring multiple intermediate calculations. We found that reasoning models are generally capable of UQ (scores <span><math><mrow><mo>≳</mo><mn>70</mn><mtext>%</mtext></mrow></math></span>) in the simple inequality case but do not score appreciably better than random guessing (scores <span><math><mrow><mo>∼</mo><mn>33</mn><mtext>%</mtext></mrow></math></span>) for the complex inequality case if the UQ method and intermediate steps are not provided in the prompt. Our implementation is available at <span><span>https://github.com/bekaiser-LANL/tether</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100773"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine learning with applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1