首页 > 最新文献

Machine learning with applications最新文献

英文 中文
Benchmarking and validation of prompting techniques for AI-assisted industrial PLC programming 人工智能辅助工业PLC编程提示技术的基准测试和验证
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-11-27 DOI: 10.1016/j.mlwa.2025.100804
Ketut Adnyana , Andreas Schwung
Industrial automation in Industry 5.0 demands deterministic, safety-compliant PLC code across heterogeneous vendor ecosystems. Prompt-engineered large language models (LLMs) offer a path forward but require reproducible methods and rigorous validation. This study introduces LLM-PLC-AS, a hybrid, prompt-invariant framework for IEC 61131-3 PLC code generation addressing these needs. We benchmark 21 fixed prompting techniques on 25 real-world use cases (simple, medium, complex), using a standardized dataset and workflow spanning Siemens TIA Portal and Beckhoff TwinCAT. The quality of the generated code is evaluated through a layered validation pipeline: Bilingual Evaluation Understudy (BLEU) for lexical similarity, LLM-in-the-Loop (LITL) for scalable semantic checks across four dimensions (functional correctness, readability, safety compliance, and modularity), and Human-in-the-Loop (HITL) for expert safety-critical review. DeepSeek and Gemini 2.5 Pro generate ST/IL; syntax is cross-checked by ChatGPT-4o and Copilot Pro. The framework achieved a very high degree of accuracy, with Structured Text (ST) programs reaching near-perfect scores and Instruction List (IL) programs also performing exceptionally well on our scoring rubric. This resulted in a substantial reduction in manual correction effort, decreasing it by nearly half compared to ad-hoc methods. Across tasks, our approach led to a more than twofold increase in Safety Compliance and a significant improvement in Functional Correctness against unstructured baselines. A key finding is that the structure of the prompt itself was found to have a greater influence on determinism and correctness than the choice of LLM. The fixed-prompt reasoning combined with the BLEU/LITL/HITL validation stack provides a scalable, reproducible, and safety-aware method for PLC code generation. BLEU is utilized for rapid lexical triage and regression tracking, LITL provides structured semantic verification, and HITL ensures final compliance. The framework establishes a standardized basis for AI-assisted PLC programming and transparent benchmarking. Future work will extend the pipeline to include graphical languages, such as Ladder Diagram (LAD) and Function Block Diagram (FBD), using multimodal/graph-aware models, and will incorporate runtime validation to further close the gap to real-world deployment. Safety verification in this study is limited to logical and semantic validation. Real-time behavior, communication latency, and physical safety-fault recovery require Hardware-in-the-Loop (HIL) simulation or deployment on industrial test benches, which is identified as future work.
工业5.0中的工业自动化需要跨异构供应商生态系统的确定性、安全兼容的PLC代码。快速工程的大型语言模型(llm)提供了一条前进的道路,但需要可重复的方法和严格的验证。本研究介绍了LLM-PLC-AS,一种用于IEC 61131-3 PLC代码生成的混合、快速不变框架,以满足这些需求。我们在25个实际用例(简单、中等、复杂)上对21种固定提示技术进行基准测试,使用标准化数据集和工作流,涵盖西门子TIA Portal和倍福TwinCAT。生成代码的质量通过分层验证管道进行评估:用于词汇相似性的双语评估代理(BLEU),用于跨四个维度(功能正确性、可读性、安全性遵从性和模块化)的可扩展语义检查的llm -in- loop (LITL),以及用于专家安全关键审查的human -in- loop (HITL)。DeepSeek和Gemini 2.5 Pro生成ST/IL;语法由chatgpt - 40和Copilot Pro交叉检查。该框架实现了非常高的准确性,结构化文本(ST)程序达到了近乎完美的分数,指令列表(IL)程序在我们的评分标准上也表现得非常好。这大大减少了人工校正的工作量,与临时方法相比减少了近一半。在不同的任务中,我们的方法使安全遵从性增加了两倍以上,并且在非结构化基线的功能正确性方面有了显著的改进。一个关键的发现是,提示本身的结构比选择LLM对决定论和正确性的影响更大。固定提示推理与BLEU/LITL/HITL验证堆栈相结合,为PLC代码生成提供了可扩展,可重复和安全意识的方法。BLEU用于快速词法分类和回归跟踪,LITL提供结构化语义验证,HITL确保最终的遵从性。该框架为人工智能辅助PLC编程和透明基准建立了标准化基础。未来的工作将扩展管道,包括图形语言,如梯形图(LAD)和功能块图(FBD),使用多模态/图形感知模型,并将纳入运行时验证,以进一步缩小与实际部署的差距。本研究的安全性验证仅限于逻辑和语义验证。实时行为、通信延迟和物理安全故障恢复需要硬件在环(HIL)仿真或在工业测试台上部署,这被确定为未来的工作。
{"title":"Benchmarking and validation of prompting techniques for AI-assisted industrial PLC programming","authors":"Ketut Adnyana ,&nbsp;Andreas Schwung","doi":"10.1016/j.mlwa.2025.100804","DOIUrl":"10.1016/j.mlwa.2025.100804","url":null,"abstract":"<div><div>Industrial automation in Industry 5.0 demands deterministic, safety-compliant PLC code across heterogeneous vendor ecosystems. Prompt-engineered large language models (LLMs) offer a path forward but require reproducible methods and rigorous validation. This study introduces LLM-PLC-AS, a hybrid, prompt-invariant framework for IEC 61131-3 PLC code generation addressing these needs. We benchmark 21 fixed prompting techniques on 25 real-world use cases (simple, medium, complex), using a standardized dataset and workflow spanning Siemens TIA Portal and Beckhoff TwinCAT. The quality of the generated code is evaluated through a layered validation pipeline: Bilingual Evaluation Understudy (BLEU) for lexical similarity, LLM-in-the-Loop (LITL) for scalable semantic checks across four dimensions (functional correctness, readability, safety compliance, and modularity), and Human-in-the-Loop (HITL) for expert safety-critical review. DeepSeek and Gemini 2.5 Pro generate ST/IL; syntax is cross-checked by ChatGPT-4o and Copilot Pro. The framework achieved a very high degree of accuracy, with Structured Text (ST) programs reaching near-perfect scores and Instruction List (IL) programs also performing exceptionally well on our scoring rubric. This resulted in a substantial reduction in manual correction effort, decreasing it by nearly half compared to ad-hoc methods. Across tasks, our approach led to a more than twofold increase in Safety Compliance and a significant improvement in Functional Correctness against unstructured baselines. A key finding is that the structure of the prompt itself was found to have a greater influence on determinism and correctness than the choice of LLM. The fixed-prompt reasoning combined with the BLEU/LITL/HITL validation stack provides a scalable, reproducible, and safety-aware method for PLC code generation. BLEU is utilized for rapid lexical triage and regression tracking, LITL provides structured semantic verification, and HITL ensures final compliance. The framework establishes a standardized basis for AI-assisted PLC programming and transparent benchmarking. Future work will extend the pipeline to include graphical languages, such as Ladder Diagram (LAD) and Function Block Diagram (FBD), using multimodal/graph-aware models, and will incorporate runtime validation to further close the gap to real-world deployment. Safety verification in this study is limited to logical and semantic validation. Real-time behavior, communication latency, and physical safety-fault recovery require Hardware-in-the-Loop (HIL) simulation or deployment on industrial test benches, which is identified as future work.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100804"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Helicopter turboshaft modeling via mixtures of experts 直升机涡轮轴建模通过混合专家
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-01-04 DOI: 10.1016/j.mlwa.2025.100835
Aurelio Raffa Ugolini , Francesco Aldo Tucci , Damiano Paniccia , Luigi Capone , Mara Tanelli
A reliable and robust engine model is critical for helicopter design, operation, and maintenance, given the centrality of this sub-system. Several sources of uncertainty can limit the reliability and fidelity of first-principles models, necessitating data-driven solutions. Due to the relevant safety and security issues inherent to aircraft operation, however, fully black-box models may be unsuited to the challenge, due to their lack of explainability. In this work, we propose a multi-model approach to combine multiple physics-based descriptions, achieving a learning architecture that incorporates, in a data-driven setting, the existing knowledge of the engine’s dynamics, maximizing interpretability and facilitating model validation and diagnostics. Enabled by recent advances in onboard data collection, we learn the model directly on realistic operating conditions by leveraging recorded flight information. The benefits include a high degree of local interpretability as well as minimal requirements in terms of input signals, as empirically demonstrated in a real-world use case. We compare our technique on a real helicopter dataset against the SINDy technique, showcasing the advantages of our approach against the well-known interpretable approach to Nonlinear System Identification.
考虑到该子系统的中心地位,可靠且稳健的发动机模型对于直升机的设计、操作和维护至关重要。不确定性的几个来源可能会限制第一原理模型的可靠性和保真度,因此需要数据驱动的解决方案。然而,由于飞机运行固有的相关安全和安保问题,由于缺乏可解释性,完全的黑匣子模型可能不适合这项挑战。在这项工作中,我们提出了一种多模型方法来结合多种基于物理的描述,实现一个学习架构,该架构在数据驱动的设置中结合了引擎动力学的现有知识,最大限度地提高了可解释性,并促进了模型验证和诊断。通过机载数据收集的最新进展,我们通过利用记录的飞行信息直接在实际操作条件下学习模型。其好处包括高度的本地可解释性以及输入信号方面的最低要求,正如在实际用例中经验证明的那样。我们将我们的技术与SINDy技术在真实直升机数据集上进行了比较,展示了我们的方法与众所周知的非线性系统识别可解释方法相比的优势。
{"title":"Helicopter turboshaft modeling via mixtures of experts","authors":"Aurelio Raffa Ugolini ,&nbsp;Francesco Aldo Tucci ,&nbsp;Damiano Paniccia ,&nbsp;Luigi Capone ,&nbsp;Mara Tanelli","doi":"10.1016/j.mlwa.2025.100835","DOIUrl":"10.1016/j.mlwa.2025.100835","url":null,"abstract":"<div><div>A reliable and robust engine model is critical for helicopter design, operation, and maintenance, given the centrality of this sub-system. Several sources of uncertainty can limit the reliability and fidelity of first-principles models, necessitating data-driven solutions. Due to the relevant safety and security issues inherent to aircraft operation, however, fully black-box models may be unsuited to the challenge, due to their lack of explainability. In this work, we propose a multi-model approach to combine multiple physics-based descriptions, achieving a learning architecture that incorporates, in a data-driven setting, the existing knowledge of the engine’s dynamics, maximizing interpretability and facilitating model validation and diagnostics. Enabled by recent advances in onboard data collection, we learn the model directly on realistic operating conditions by leveraging recorded flight information. The benefits include a high degree of local interpretability as well as minimal requirements in terms of input signals, as empirically demonstrated in a real-world use case. We compare our technique on a real helicopter dataset against the SINDy technique, showcasing the advantages of our approach against the well-known interpretable approach to Nonlinear System Identification.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100835"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-driven modern slavery detection for supply chain: A cross-jurisdictional legal text analysis 人工智能驱动的供应链现代奴役检测:跨司法管辖区的法律文本分析
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-23 DOI: 10.1016/j.mlwa.2025.100827
Jaqueline Damacena Duarte , Elena Javidi da Costa , Joao Paulo Javidi da Costa , Ana Sofia Schweizer Silvestre , Edna Dias Canedo , Hernany Silveira Rocha
This study addresses a significant gap in Supply Chain Management (SCM) research by investigating the applicability of machine learning (ML) techniques, including state-of-the-art Large Language Models (LLMs), providing a foundational exploration into analysing legal documents to identify supply chain-related narratives of modern slavery. We developed a dataset with 1714 court opinions from the USA and 436 legal dockets from Indian jurisdictions, meticulously annotated and curated in three global labels and thirteen factual labels. We benchmarked context-aware classifiers using traditional ML, deep learning (DL), and transfer learning, and also tested a zero-shot prompt-based model (Gemini 1.5-Flash). Various vectorization strategies and classifiers were compared for performance. Our findings reveal that the fine-tuned domain-specific BERT (LegalBERT/CASEHOLD) model achieved superior results, with an 89.55% F1-Score and 90.93% accuracy in identifying relevant cases. Also relevant, Gemini 1.5-Flash achieved comparable results (86.97% F1-Score, 86.1% accuracy), outperforming traditional ML/DL baselines. This work provides empirical evidence of how advanced analytical techniques can be leveraged for knowledge discovery to risk assessment by efficiently scanning large volumes of legal texts for relevant warnings. As one of the first studies to apply a context-aware approach to identifying modern slavery in supply chains through legal records — an under-explored area — this research make a significant contribution to discussions on improving supply chain modern slavery risk assessment and audit practices.
本研究通过调查机器学习(ML)技术(包括最先进的大型语言模型(llm))的适用性,解决了供应链管理(SCM)研究中的一个重大空白,为分析法律文件提供了基础探索,以识别与现代奴隶制相关的供应链叙述。我们开发了一个数据集,其中包括来自美国的1714份法院意见和来自印度司法管辖区的436份法律摘要,并在三个全球标签和13个事实标签中进行了精心注释和整理。我们使用传统的机器学习、深度学习(DL)和迁移学习对上下文感知分类器进行基准测试,并测试了基于零射击提示的模型(Gemini 1.5-Flash)。比较了各种矢量化策略和分类器的性能。研究结果表明,经过微调的领域特异性BERT (LegalBERT/CASEHOLD)模型在识别相关病例方面取得了较好的效果,F1-Score为89.55%,准确率为90.93%。同样相关的是,Gemini 1.5-Flash取得了类似的结果(86.97% F1-Score, 86.1%准确率),优于传统的ML/DL基线。这项工作提供了经验证据,证明如何利用先进的分析技术,通过有效地扫描大量法律文本以寻找相关警告,来进行知识发现和风险评估。作为第一个应用上下文感知方法通过法律记录识别供应链中的现代奴隶制的研究之一-这是一个尚未开发的领域-本研究对改善供应链现代奴隶制风险评估和审计实践的讨论做出了重大贡献。
{"title":"AI-driven modern slavery detection for supply chain: A cross-jurisdictional legal text analysis","authors":"Jaqueline Damacena Duarte ,&nbsp;Elena Javidi da Costa ,&nbsp;Joao Paulo Javidi da Costa ,&nbsp;Ana Sofia Schweizer Silvestre ,&nbsp;Edna Dias Canedo ,&nbsp;Hernany Silveira Rocha","doi":"10.1016/j.mlwa.2025.100827","DOIUrl":"10.1016/j.mlwa.2025.100827","url":null,"abstract":"<div><div>This study addresses a significant gap in Supply Chain Management (SCM) research by investigating the applicability of machine learning (ML) techniques, including state-of-the-art Large Language Models (LLMs), providing a foundational exploration into analysing legal documents to identify supply chain-related narratives of modern slavery. We developed a dataset with 1714 court opinions from the USA and 436 legal dockets from Indian jurisdictions, meticulously annotated and curated in three global labels and thirteen factual labels. We benchmarked context-aware classifiers using traditional ML, deep learning (DL), and transfer learning, and also tested a zero-shot prompt-based model (Gemini 1.5-Flash). Various vectorization strategies and classifiers were compared for performance. Our findings reveal that the fine-tuned domain-specific BERT (LegalBERT/CASEHOLD) model achieved superior results, with an 89.55% F1-Score and 90.93% accuracy in identifying relevant cases. Also relevant, Gemini 1.5-Flash achieved comparable results (86.97% F1-Score, 86.1% accuracy), outperforming traditional ML/DL baselines. This work provides empirical evidence of how advanced analytical techniques can be leveraged for knowledge discovery to risk assessment by efficiently scanning large volumes of legal texts for relevant warnings. As one of the first studies to apply a context-aware approach to identifying modern slavery in supply chains through legal records — an under-explored area — this research make a significant contribution to discussions on improving supply chain modern slavery risk assessment and audit practices.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100827"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spectrogram-Based Deep Learning Models for Acoustic Identification of Honey Bees in Complex Environmental Noises 基于谱图的复杂环境噪声下蜜蜂声学识别深度学习模型
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-12 DOI: 10.1016/j.mlwa.2025.100807
Muhammad Anus Khan , Bilal Hassan Khan , Shafiq ur Rehman Khan , Ali Raza , Asif Raza , Shehzad Ashraf Chaudhry
The rapid decline of honey bee populations presents an urgent ecological and agricultural concern, demanding innovative and scalable monitoring solutions. This study proposes a deep learning-based system for non-invasive classification of honey bee buzzing sounds to distinguish bee activity from complex environmental noise—a fundamental challenge for real-world acoustic monitoring. Traditional machine learning models using features like Mel Frequency Cepstral Coefficients (MFCCs) and spectral statistics performed well on curated datasets but failed under natural conditions due to overlapping acoustic signatures and inconsistent recordings.
To address this gap, we built a diverse dataset combining public bee audio with recordings from the Honeybee Research Center at the National Agricultural Research Centre (NARC), Pakistan, capturing various devices and natural environments. Audio signals were converted into mel spectrograms and chromograms, enabling pattern learning via pre-trained convolutional neural networks. Among tested architectures—EfficientNetB0, ResNet50, and MobileNetV2—MobileNetV2 achieved the highest generalization, with 95.29% accuracy on spectrograms and over 90% on chromograms under an 80% confidence threshold.
Data augmentation improved robustness to noise, while transfer learning enhanced adaptability. This work forms part of a broader project to develop a mobile application for real-time hive health monitoring in natural environments, where distinguishing bee buzzing from other sounds is the crucial first step. Beyond binary classification, the proposed approach offers potential for detecting hive health issues through acoustic patterns, supporting early interventions and contributing to global bee conservation efforts.
蜜蜂种群的迅速减少引起了迫切的生态和农业关注,需要创新和可扩展的监测解决方案。本研究提出了一种基于深度学习的系统,用于蜜蜂嗡嗡声的非侵入性分类,以区分蜜蜂活动和复杂的环境噪声——这是现实世界声学监测的一个基本挑战。使用Mel频倒谱系数(MFCCs)和频谱统计等特征的传统机器学习模型在整理的数据集上表现良好,但在自然条件下由于声学特征重叠和记录不一致而失败。为了解决这一差距,我们建立了一个多样化的数据集,将公共蜜蜂音频与巴基斯坦国家农业研究中心(NARC)蜜蜂研究中心的录音结合起来,捕捉各种设备和自然环境。音频信号被转换成mel谱图和色图,通过预训练的卷积神经网络进行模式学习。在测试的体系结构中,效率netb0、ResNet50和MobileNetV2-MobileNetV2实现了最高的泛化,在80%的置信度阈值下,光谱图的准确度为95.29%,色谱图的准确度超过90%。数据增强增强了对噪声的鲁棒性,而迁移学习增强了自适应性。这项工作是一个更广泛的项目的一部分,该项目旨在开发一个移动应用程序,用于在自然环境中实时监测蜂箱健康,在这个项目中,区分蜜蜂嗡嗡声和其他声音是至关重要的第一步。除了二元分类之外,所提出的方法还提供了通过声学模式检测蜂巢健康问题的潜力,支持早期干预,并为全球蜜蜂保护工作做出贡献。
{"title":"Spectrogram-Based Deep Learning Models for Acoustic Identification of Honey Bees in Complex Environmental Noises","authors":"Muhammad Anus Khan ,&nbsp;Bilal Hassan Khan ,&nbsp;Shafiq ur Rehman Khan ,&nbsp;Ali Raza ,&nbsp;Asif Raza ,&nbsp;Shehzad Ashraf Chaudhry","doi":"10.1016/j.mlwa.2025.100807","DOIUrl":"10.1016/j.mlwa.2025.100807","url":null,"abstract":"<div><div>The rapid decline of honey bee populations presents an urgent ecological and agricultural concern, demanding innovative and scalable monitoring solutions. This study proposes a deep learning-based system for non-invasive classification of honey bee buzzing sounds to distinguish bee activity from complex environmental noise—a fundamental challenge for real-world acoustic monitoring. Traditional machine learning models using features like Mel Frequency Cepstral Coefficients (MFCCs) and spectral statistics performed well on curated datasets but failed under natural conditions due to overlapping acoustic signatures and inconsistent recordings.</div><div>To address this gap, we built a diverse dataset combining public bee audio with recordings from the Honeybee Research Center at the National Agricultural Research Centre (NARC), Pakistan, capturing various devices and natural environments. Audio signals were converted into mel spectrograms and chromograms, enabling pattern learning via pre-trained convolutional neural networks. Among tested architectures—EfficientNetB0, ResNet50, and MobileNetV2—MobileNetV2 achieved the highest generalization, with 95.29% accuracy on spectrograms and over 90% on chromograms under an 80% confidence threshold.</div><div>Data augmentation improved robustness to noise, while transfer learning enhanced adaptability. This work forms part of a broader project to develop a mobile application for real-time hive health monitoring in natural environments, where distinguishing bee buzzing from other sounds is the crucial first step. Beyond binary classification, the proposed approach offers potential for detecting hive health issues through acoustic patterns, supporting early interventions and contributing to global bee conservation efforts.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100807"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SK-DGCNN: Human activity recognition from point cloud data with skeleton transformation SK-DGCNN:基于骨架变换的点云数据的人类活动识别
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-01-16 DOI: 10.1016/j.mlwa.2026.100847
Zihan Zhang, Aman Anand, Farhana Zulkernine
Human Activity Recognition (HAR) has become a prominent research topic in artificial intelligence, with applications in surveillance, healthcare, and human–computer interaction. Among various data modalities used for HAR, skeleton and point cloud data offer strong potential due to their privacy-preserving and environment-agnostic properties. However, point cloud-based HAR faces challenges like data sparsity, high computation cost, and a lack of large annotated datasets. In this paper, we propose a novel two-stage framework that first transforms radar-based point cloud data into skeleton data using a Skeletal Dynamic Graph Convolutional Neural Network (SK-DGCNN), and then classifies the estimated skeletons using an efficient Spatial Temporal Graph Convolutional Network++ (ST-GCN++). The SK-DGCNN leverages dynamic edge convolution, attention mechanisms, and a custom loss function that combines Mean Square Error and Kullback–Leibler divergence to preserve the structural integrity of the human pose. Our pipeline achieves state-of-the-art performance on the MMActivity and DGUHA datasets, with Top-1 accuracy of 99.73% and 99.25%, and F1-scores of 99.62% and 99.25%, respectively. The proposed method provides an effective, lightweight, and privacy-conscious solution for real-world HAR applications using radar point cloud data.
人类活动识别(HAR)已成为人工智能领域的一个重要研究课题,在监控、医疗保健和人机交互等领域都有广泛的应用。在用于HAR的各种数据模式中,骨架和点云数据由于其隐私保护和与环境无关的特性而具有强大的潜力。然而,基于点云的HAR面临着数据稀疏性、计算成本高、缺乏大型标注数据集等挑战。在本文中,我们提出了一个新的两阶段框架,首先使用骨骼动态图卷积神经网络(SK-DGCNN)将基于雷达的点云数据转换为骨架数据,然后使用高效的时空图卷积网络(ST-GCN++)对估计的骨架进行分类。SK-DGCNN利用动态边缘卷积、注意机制和结合均方误差和Kullback-Leibler散度的自定义损失函数来保持人体姿势的结构完整性。我们的管道在MMActivity和DGUHA数据集上实现了最先进的性能,Top-1精度分别为99.73%和99.25%,f1分数分别为99.62%和99.25%。所提出的方法为使用雷达点云数据的实际HAR应用提供了一种有效、轻量级和注重隐私的解决方案。
{"title":"SK-DGCNN: Human activity recognition from point cloud data with skeleton transformation","authors":"Zihan Zhang,&nbsp;Aman Anand,&nbsp;Farhana Zulkernine","doi":"10.1016/j.mlwa.2026.100847","DOIUrl":"10.1016/j.mlwa.2026.100847","url":null,"abstract":"<div><div>Human Activity Recognition (HAR) has become a prominent research topic in artificial intelligence, with applications in surveillance, healthcare, and human–computer interaction. Among various data modalities used for HAR, skeleton and point cloud data offer strong potential due to their privacy-preserving and environment-agnostic properties. However, point cloud-based HAR faces challenges like data sparsity, high computation cost, and a lack of large annotated datasets. In this paper, we propose a novel two-stage framework that first transforms radar-based point cloud data into skeleton data using a Skeletal Dynamic Graph Convolutional Neural Network (SK-DGCNN), and then classifies the estimated skeletons using an efficient Spatial Temporal Graph Convolutional Network++ (ST-GCN++). The SK-DGCNN leverages dynamic edge convolution, attention mechanisms, and a custom loss function that combines Mean Square Error and Kullback–Leibler divergence to preserve the structural integrity of the human pose. Our pipeline achieves state-of-the-art performance on the MMActivity and DGUHA datasets, with Top-1 accuracy of 99.73% and 99.25%, and F1-scores of 99.62% and 99.25%, respectively. The proposed method provides an effective, lightweight, and privacy-conscious solution for real-world HAR applications using radar point cloud data.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100847"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning for labor market matching 劳动力市场匹配的机器学习
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-02-04 DOI: 10.1016/j.mlwa.2026.100861
Sabrina Mühlbauer, Enzo Weber
This paper develops a large-scale machine learning framework to improve labor market matching using rich administrative data. Matching is defined as a job seeker entering employment in a specific occupational field. We exploit comprehensive employment biographies from Germany, covering individual characteristics and job-related information, to estimate employment probabilities across occupations and generate personalized job recommendations. The contribution lies in demonstrating why machine learning methods are particularly well suited for administrative labor market data and outperform traditional statistical approaches. We compare logit, ordinary least squares (OLS), k-nearest neighbors, and random forest (RF). RF consistently achieves the highest predictive performance. Its advantage is rooted in key methodological properties: RF builds an ensemble of decision trees trained on bootstrap samples, introduces random feature selection at each split, and aggregates predictions through majority voting. This enables RF to capture nonlinear relationships and complex interactions, remain robust in high-dimensional settings, and reduce overfitting — features that are particularly relevant for heterogeneous and imbalanced administrative data. Compared to conventional models, RF better exploits the full informational content of employment histories, especially when estimating on all employment spells rather than restricting the sample to unemployment-to-employment transitions. The sample comprises approximately 55 million spells, representing about 6 percent of the German workforce from 2012 to 2018. Our results suggest that ML-based matching, relative to standard statistical approaches, could hypothetically reduce the unemployment rate by up to 0.3 percentage points, highlighting the practical relevance of RF-based decision support for labor market policy.
本文开发了一个大规模的机器学习框架,利用丰富的行政数据来改善劳动力市场匹配。匹配的定义是求职者进入特定的职业领域就业。我们利用来自德国的综合就业履历,涵盖个人特征和工作相关信息,来估计不同职业的就业概率,并生成个性化的工作推荐。其贡献在于证明了为什么机器学习方法特别适合于管理劳动力市场数据,并且优于传统的统计方法。我们比较了logit、普通最小二乘(OLS)、k近邻和随机森林(RF)。RF始终实现最高的预测性能。它的优势植根于关键的方法属性:RF构建了一个在bootstrap样本上训练的决策树集合,在每个分裂处引入随机特征选择,并通过多数投票汇总预测。这使得射频能够捕捉非线性关系和复杂的相互作用,在高维环境中保持鲁棒性,并减少过度拟合——这是与异构和不平衡管理数据特别相关的特征。与传统模型相比,RF更好地利用了就业历史的全部信息内容,特别是在估计所有就业时期时,而不是将样本限制在失业到就业的过渡阶段。样本包括约5500万个咒语,约占2012年至2018年德国劳动力的6%。我们的研究结果表明,相对于标准统计方法,基于机器学习的匹配假设可以将失业率降低0.3个百分点,这突出了基于机器学习的决策支持对劳动力市场政策的实际意义。
{"title":"Machine learning for labor market matching","authors":"Sabrina Mühlbauer,&nbsp;Enzo Weber","doi":"10.1016/j.mlwa.2026.100861","DOIUrl":"10.1016/j.mlwa.2026.100861","url":null,"abstract":"<div><div>This paper develops a large-scale machine learning framework to improve labor market matching using rich administrative data. Matching is defined as a job seeker entering employment in a specific occupational field. We exploit comprehensive employment biographies from Germany, covering individual characteristics and job-related information, to estimate employment probabilities across occupations and generate personalized job recommendations. The contribution lies in demonstrating why machine learning methods are particularly well suited for administrative labor market data and outperform traditional statistical approaches. We compare logit, ordinary least squares (OLS), k-nearest neighbors, and random forest (RF). RF consistently achieves the highest predictive performance. Its advantage is rooted in key methodological properties: RF builds an ensemble of decision trees trained on bootstrap samples, introduces random feature selection at each split, and aggregates predictions through majority voting. This enables RF to capture nonlinear relationships and complex interactions, remain robust in high-dimensional settings, and reduce overfitting — features that are particularly relevant for heterogeneous and imbalanced administrative data. Compared to conventional models, RF better exploits the full informational content of employment histories, especially when estimating on all employment spells rather than restricting the sample to unemployment-to-employment transitions. The sample comprises approximately 55 million spells, representing about 6 percent of the German workforce from 2012 to 2018. Our results suggest that ML-based matching, relative to standard statistical approaches, could hypothetically reduce the unemployment rate by up to 0.3 percentage points, highlighting the practical relevance of RF-based decision support for labor market policy.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100861"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146188222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DefMoN: A reproducible framework for theory-grounded synthetic data generation in affective AI DefMoN:情感人工智能中基于理论的合成数据生成的可复制框架
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-13 DOI: 10.1016/j.mlwa.2025.100817
Ryan SangBaek Kim
<div><div>Modern NLP systems excel when labels are abundant but struggle with <em>high-inference</em> constructs that are costly to annotate and risky to synthesize without constraints. We present the <em>Defensive Motivational Node</em> framework, henceforth DefMoN (formerly DMN), which <em>operationalizes</em> Vaillant’s hierarchy of ego defenses and Plutchik’s psychoevolutionary emotions into a controllable generative process for text. We release <strong>DMN-Syn v1.0</strong> — a quadri-lingual (EN/KO/FR/KA) corpus of 300 theory-constrained utterances — together with a complete, versioned research compendium (data, code, seeds, QC manifests, and evaluation scripts) archived on Zenodo (Kim, 2025). The full package is permanently available at <span><span>https://doi.org/10.5281/zenodo.17101927</span><svg><path></path></svg></span>.</div><div>On the modeling side, we treat defense recognition as 10-way sentence classification and fine-tune a multilingual Transformer (XLM-R) <em>only</em> on DMN-Syn v1. In-domain performance is high (EN macro-<span><math><mrow><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>=</mo><mn>0</mn><mo>.</mo><mn>97</mn></mrow></math></span>, MCC <span><math><mrow><mo>=</mo><mn>0</mn><mo>.</mo><mn>96</mn></mrow></math></span>; KO macro-<span><math><mrow><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>=</mo><mn>0</mn><mo>.</mo><mn>96</mn></mrow></math></span>), and zero-shot transfer is strong (EN<span><math><mo>→</mo></math></span>KO macro-<span><math><mrow><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>=</mo><mn>0</mn><mo>.</mo><mn>81</mn></mrow></math></span>). When evaluated on a small, anonymized real-world benchmark, the model reaches Macro <span><math><mrow><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>=</mo><mn>0</mn><mo>.</mo><mn>62</mn></mrow></math></span> with <em>zero</em> real training data, then rises to 0.76 with only <span><math><mrow><mi>k</mi><mo>=</mo><mn>64</mn></mrow></math></span> supervised examples per class. Human annotators on that same benchmark agree with each other at <span><math><mrow><mi>κ</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>68</mn></mrow></math></span>, <span><math><mrow><mi>α</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>66</mn></mrow></math></span>. This shows that DefMoN is not a turnkey classifier, but a <em>theory-grounded primer</em> that enables data-efficient adaptation toward human-level ambiguity <em>theory-grounded primer</em> that enables data-efficient alignment to <em>schema-coded human benchmark</em> without large-scale annotation. We additionally quantify <em>reliability</em>—reporting ECE/MCE and coverage–performance curves for selective prediction—and show robustness under group-aware splits (template/scenario disjoint) and cue ablations, establishing structural coherence in the absence of large-scale human trials.</div><div>Beyond raw scores, we foreground <em>auditability</em>. Each instance in <strong>DMN-Syn v1</st
现代NLP系统在标签丰富时表现出色,但在注释成本高且不受约束的合成风险大的高推理结构中挣扎。我们提出了防御性动机节点框架,即DefMoN(以前的DMN),它将Vaillant的自我防御层次和Plutchik的心理进化情感运作为一个可控的文本生成过程。我们发布了DMN-Syn v1.0——一个四语(EN/KO/FR/KA)语料库,包含300个理论约束的话语——以及一个完整的、版本化的研究纲要(数据、代码、种子、QC清单和评估脚本),存档在Zenodo (Kim, 2025)上。完整的软件包可以在https://doi.org/10.5281/zenodo.17101927.On建模端永久获得,我们将防御识别视为10路句子分类,并仅在DMN-Syn v1上微调多语言转换器(XLM-R)。域内性能高(EN macro-F1=0.97, MCC =0.96; KO macro-F1=0.96),零射转移强(EN→KO macro-F1=0.81)。当在一个小的、匿名的真实世界基准上进行评估时,模型在零真实训练数据的情况下达到Macro F1=0.62,然后在每个类只有k=64个监督样本的情况下上升到0.76。在相同的基准上,人类注释者在κ=0.68, α=0.66时彼此一致。这表明DefMoN不是一个交钥匙分类器,而是一个基于理论的引物,它可以实现对人类级别歧义的数据高效适应;基于理论的引物可以实现对模式编码的人类基准的数据高效校准,而无需大规模注释。我们还量化了可靠性报告ECE/MCE和覆盖率-性能曲线,以进行选择性预测,并显示了在群体意识分裂(模板/场景脱节)和线索消融下的稳健性,在没有大规模人体试验的情况下建立了结构一致性。除了原始分数,我们还强调可审计性。DMN-Syn v1中的每个实例都具有固定的种子,分组分裂和护栏,以防止标签泄漏和构造漂移;发布验证器、清单和代码以进行字节精确复制。结果支持理论约束合成作为昂贵的专家标记和无约束的LLM生成之间的实用中间路径,特别是对于低资源和跨语言设置。通过使用心理学理论作为明确的生成约束,而不是事后解释,DefMoN将合成数据工作重新定义为机器学习结构的操作化。该框架(i)标准化护栏,以最大限度地减少偏差放大和漂移,(ii)提供小但理论密集的语料库,训练具有不确定性意识的可靠分类器,以及(iii)提供可审计的工件(种子、清单、验证器),使其能够重现并扩展到新的防御、语言和对话级别设置。术语和品牌。在之前的版本和存储库中,我们使用首字母缩略词DMN来表示防御性动机节点。为了避免与神经科学的“默认模式网络”混淆,本文采用首字母缩略词DefMoN来表示整个框架。保留了遗留数据集和存储库名称(例如,DMN-Syn v1),以保持先前工作的连续性和可重复性。因此,在整篇论文中,我们使用:DefMoN作为整体框架和方法,DMN- syn v1用于发布的数据集及其工件,“DMN节点”用于指该数据集中的单个(防御、情感、场景)元组。这种分裂是有意的。
{"title":"DefMoN: A reproducible framework for theory-grounded synthetic data generation in affective AI","authors":"Ryan SangBaek Kim","doi":"10.1016/j.mlwa.2025.100817","DOIUrl":"10.1016/j.mlwa.2025.100817","url":null,"abstract":"&lt;div&gt;&lt;div&gt;Modern NLP systems excel when labels are abundant but struggle with &lt;em&gt;high-inference&lt;/em&gt; constructs that are costly to annotate and risky to synthesize without constraints. We present the &lt;em&gt;Defensive Motivational Node&lt;/em&gt; framework, henceforth DefMoN (formerly DMN), which &lt;em&gt;operationalizes&lt;/em&gt; Vaillant’s hierarchy of ego defenses and Plutchik’s psychoevolutionary emotions into a controllable generative process for text. We release &lt;strong&gt;DMN-Syn v1.0&lt;/strong&gt; — a quadri-lingual (EN/KO/FR/KA) corpus of 300 theory-constrained utterances — together with a complete, versioned research compendium (data, code, seeds, QC manifests, and evaluation scripts) archived on Zenodo (Kim, 2025). The full package is permanently available at &lt;span&gt;&lt;span&gt;https://doi.org/10.5281/zenodo.17101927&lt;/span&gt;&lt;svg&gt;&lt;path&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/span&gt;.&lt;/div&gt;&lt;div&gt;On the modeling side, we treat defense recognition as 10-way sentence classification and fine-tune a multilingual Transformer (XLM-R) &lt;em&gt;only&lt;/em&gt; on DMN-Syn v1. In-domain performance is high (EN macro-&lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;F&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;0&lt;/mn&gt;&lt;mo&gt;.&lt;/mo&gt;&lt;mn&gt;97&lt;/mn&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;, MCC &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;0&lt;/mn&gt;&lt;mo&gt;.&lt;/mo&gt;&lt;mn&gt;96&lt;/mn&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;; KO macro-&lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;F&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;0&lt;/mn&gt;&lt;mo&gt;.&lt;/mo&gt;&lt;mn&gt;96&lt;/mn&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;), and zero-shot transfer is strong (EN&lt;span&gt;&lt;math&gt;&lt;mo&gt;→&lt;/mo&gt;&lt;/math&gt;&lt;/span&gt;KO macro-&lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;F&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;0&lt;/mn&gt;&lt;mo&gt;.&lt;/mo&gt;&lt;mn&gt;81&lt;/mn&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;). When evaluated on a small, anonymized real-world benchmark, the model reaches Macro &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mrow&gt;&lt;mi&gt;F&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;0&lt;/mn&gt;&lt;mo&gt;.&lt;/mo&gt;&lt;mn&gt;62&lt;/mn&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; with &lt;em&gt;zero&lt;/em&gt; real training data, then rises to 0.76 with only &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;k&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;64&lt;/mn&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; supervised examples per class. Human annotators on that same benchmark agree with each other at &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;κ&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;0&lt;/mn&gt;&lt;mo&gt;.&lt;/mo&gt;&lt;mn&gt;68&lt;/mn&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;, &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;α&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;0&lt;/mn&gt;&lt;mo&gt;.&lt;/mo&gt;&lt;mn&gt;66&lt;/mn&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt;. This shows that DefMoN is not a turnkey classifier, but a &lt;em&gt;theory-grounded primer&lt;/em&gt; that enables data-efficient adaptation toward human-level ambiguity &lt;em&gt;theory-grounded primer&lt;/em&gt; that enables data-efficient alignment to &lt;em&gt;schema-coded human benchmark&lt;/em&gt; without large-scale annotation. We additionally quantify &lt;em&gt;reliability&lt;/em&gt;—reporting ECE/MCE and coverage–performance curves for selective prediction—and show robustness under group-aware splits (template/scenario disjoint) and cue ablations, establishing structural coherence in the absence of large-scale human trials.&lt;/div&gt;&lt;div&gt;Beyond raw scores, we foreground &lt;em&gt;auditability&lt;/em&gt;. Each instance in &lt;strong&gt;DMN-Syn v1&lt;/st","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100817"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning and the geometry of compactness in stability and generalization 深度学习与几何的紧致稳定性与泛化
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-13 DOI: 10.1016/j.mlwa.2025.100820
Mohammad Meysami , Ali Lotfi , Sehar Saleem
Deep learning models often continue to generalize well even when they have far more parameters than available training examples. This observation naturally leads to two questions: why does training remain stable, and why do the resulting predictors generalize at all? To address these questions, we return to the classical Extreme Value Theorem and interpret modern training as optimization over compact sets in parameter space or function space. Our main results show that continuity together with coercive or Lipschitz based regularization gives existence of minimizers and uniform control of the excess risk, by bounding rare high loss events. We apply this framework to weight decay, gradient penalties, and spectral normalization, and we introduce simple diagnostics that monitor compactness in parameter space, representation space, and function space. Experiments on synthetic examples, standard image data sets (MNIST, CIFAR ten, Tiny ImageNet), and the UCI Adult tabular task are consistent with the theory: mild regularization leads to smoother optimization, reduced variation across random seeds, and better robustness and calibration while preserving accuracy. Taken together, these results highlight compactness as a practical geometric guideline for training stable and reliable deep networks.
深度学习模型通常可以很好地泛化,即使它们的参数远远超过可用的训练样本。这一观察结果自然引出了两个问题:为什么训练保持稳定?为什么最终的预测结果具有普遍性?为了解决这些问题,我们回到经典的极值定理,并将现代训练解释为参数空间或函数空间中的紧集合上的优化。我们的主要结果表明,连续性与强制正则化或基于Lipschitz的正则化一起,通过约束罕见的高损失事件,可以得到最小值的存在性和对超额风险的均匀控制。我们将此框架应用于权重衰减、梯度惩罚和谱归一化,并介绍了简单的诊断方法,用于监控参数空间、表示空间和函数空间中的紧凑性。在合成示例、标准图像数据集(MNIST、CIFAR 10、Tiny ImageNet)和UCI Adult表格任务上的实验与理论一致:轻度正则化导致更平滑的优化,减少随机种子之间的变化,在保持准确性的同时具有更好的鲁棒性和校准性。综上所述,这些结果强调了紧凑性作为训练稳定可靠的深度网络的实用几何准则。
{"title":"Deep learning and the geometry of compactness in stability and generalization","authors":"Mohammad Meysami ,&nbsp;Ali Lotfi ,&nbsp;Sehar Saleem","doi":"10.1016/j.mlwa.2025.100820","DOIUrl":"10.1016/j.mlwa.2025.100820","url":null,"abstract":"<div><div>Deep learning models often continue to generalize well even when they have far more parameters than available training examples. This observation naturally leads to two questions: why does training remain stable, and why do the resulting predictors generalize at all? To address these questions, we return to the classical Extreme Value Theorem and interpret modern training as optimization over compact sets in parameter space or function space. Our main results show that continuity together with coercive or Lipschitz based regularization gives existence of minimizers and uniform control of the excess risk, by bounding rare high loss events. We apply this framework to weight decay, gradient penalties, and spectral normalization, and we introduce simple diagnostics that monitor compactness in parameter space, representation space, and function space. Experiments on synthetic examples, standard image data sets (MNIST, CIFAR ten, Tiny ImageNet), and the UCI Adult tabular task are consistent with the theory: mild regularization leads to smoother optimization, reduced variation across random seeds, and better robustness and calibration while preserving accuracy. Taken together, these results highlight compactness as a practical geometric guideline for training stable and reliable deep networks.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100820"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vision-language zero-shot models for radiographic image classification: A systematic review 用于放射图像分类的视觉语言零射击模型:系统综述
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-23 DOI: 10.1016/j.mlwa.2025.100826
Ana Guerrero-Tamayo, Ibon Oleagordia-Ruiz, Begonya Garcia-Zapirain
Zero-shot Vision-Language Models (VLMs) link visual and textual features, enabling generalization to unseen domains, making them promising for radiographic diagnosis, though clinical adoption is limited.
This systematic review examines zero-shot VLMs applied to radiographic image classification, following the PRISMA methodology. Articles were identified from IEEE, PubMed, Scopus, and Web of Science, with 16 selected after exhaustive screening. The analysis addressed five research questions (RQ1–RQ5) covering dataset characteristics, model attributes, natural language integration, reported limitations, and hyperparameter tuning.
Geographically, China (37%) and the United States (38%) contributed 75% of the reviewed studies, with no EU-led research identified, highlighting the need for increased European engagement in this field.
Architecturally (RQ2), high heterogeneity exists, with dual-encoder (43.75%) and attention-based fusion models most common. Most models (81.25%) employ a Joint Embedding Space for multimodal alignment.
Regarding datasets and natural language use (RQ1, RQ3), VLMs rely on few large but semantically narrow datasets, limiting generalizability and amplifying bias. Real clinical reports (direct supervision) and implicit pretrained textual embeddings each represent 37.5% of strategies, yet unstructured clinical text is underutilized. Limited vision-language integration negatively affects performance and explainability (RQ4). Hyperparameter tuning (RQ5) is rarely reported, with 9 of 16 studies not specifying methods, compromising reproducibility.
There is an urgent need for open, multilingual, multimodal datasets reflecting clinical and geographic diversity. Clinically useful zero-shot VLMs require transparent evaluation, including explainability metrics. Future models should adopt a multidisciplinary approach, combining technical innovation with usability, data representativeness, and methodological transparency to ensure diagnostic robustness.
零射击视觉语言模型(VLMs)将视觉和文本特征联系起来,使其能够推广到未知领域,使其有望用于放射诊断,尽管临床应用有限。这个系统的回顾检查零射击VLMs应用于放射图像分类,遵循PRISMA方法。文章来自IEEE、PubMed、Scopus和Web of Science,经过详尽的筛选后选出16篇。分析解决了五个研究问题(RQ1-RQ5),包括数据集特征、模型属性、自然语言集成、报告的局限性和超参数调优。从地理上看,中国(37%)和美国(38%)贡献了75%的审查研究,没有发现欧盟主导的研究,这突显了欧洲在这一领域增加参与的必要性。架构(RQ2)存在高度异质性,双编码器(43.75%)和基于注意力的融合模型最为常见。大多数模型(81.25%)采用联合嵌入空间进行多模态对齐。关于数据集和自然语言使用(RQ1, RQ3), vlm依赖于少数大型但语义狭窄的数据集,限制了可泛化性并放大了偏差。真实临床报告(直接监督)和隐式预训练文本嵌入各占37.5%的策略,但非结构化临床文本未得到充分利用。有限的视觉语言整合会负面影响表现和可解释性(RQ4)。超参数调优(RQ5)很少被报道,16项研究中有9项没有指定方法,影响了可重复性。迫切需要开放、多语言、多模式的数据集,以反映临床和地理的多样性。临床上有用的零射击VLMs需要透明的评估,包括可解释性指标。未来的模型应采用多学科方法,将技术创新与可用性、数据代表性和方法透明度相结合,以确保诊断的稳健性。
{"title":"Vision-language zero-shot models for radiographic image classification: A systematic review","authors":"Ana Guerrero-Tamayo,&nbsp;Ibon Oleagordia-Ruiz,&nbsp;Begonya Garcia-Zapirain","doi":"10.1016/j.mlwa.2025.100826","DOIUrl":"10.1016/j.mlwa.2025.100826","url":null,"abstract":"<div><div>Zero-shot Vision-Language Models (VLMs) link visual and textual features, enabling generalization to unseen domains, making them promising for radiographic diagnosis, though clinical adoption is limited.</div><div>This systematic review examines zero-shot VLMs applied to radiographic image classification, following the PRISMA methodology. Articles were identified from IEEE, PubMed, Scopus, and Web of Science, with 16 selected after exhaustive screening. The analysis addressed five research questions (RQ1–RQ5) covering dataset characteristics, model attributes, natural language integration, reported limitations, and hyperparameter tuning.</div><div>Geographically, China (37%) and the United States (38%) contributed 75% of the reviewed studies, with no EU-led research identified, highlighting the need for increased European engagement in this field.</div><div>Architecturally (RQ2), high heterogeneity exists, with dual-encoder (43.75%) and attention-based fusion models most common. Most models (81.25%) employ a Joint Embedding Space for multimodal alignment.</div><div>Regarding datasets and natural language use (RQ1, RQ3), VLMs rely on few large but semantically narrow datasets, limiting generalizability and amplifying bias. Real clinical reports (direct supervision) and implicit pretrained textual embeddings each represent 37.5% of strategies, yet unstructured clinical text is underutilized. Limited vision-language integration negatively affects performance and explainability (RQ4). Hyperparameter tuning (RQ5) is rarely reported, with 9 of 16 studies not specifying methods, compromising reproducibility.</div><div>There is an urgent need for open, multilingual, multimodal datasets reflecting clinical and geographic diversity. Clinically useful zero-shot VLMs require transparent evaluation, including explainability metrics. Future models should adopt a multidisciplinary approach, combining technical innovation with usability, data representativeness, and methodological transparency to ensure diagnostic robustness.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100826"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An explainable ensemble machine learning approach for multi-domain, multiclass sentiment analysis in Amazon product reviews 一种可解释的集成机器学习方法,用于亚马逊产品评论中的多领域、多类情感分析
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-20 DOI: 10.1016/j.mlwa.2025.100825
Kamogelo Mokgwatjane, Thulane Paepae
Sentiment analysis (SA) of online reviews is pivotal for e-commerce platforms yet challenges such as massive user-generated content volumes and class imbalances hinder accurate multiclass predictions and model interpretability. This study introduces a novel explainable ensemble learning framework for multiclass SA (positive, neutral, negative) across three Amazon product domains: appliances, groceries, and clothing. The framework integrates diverse supervised classifiers in a stacking ensemble, with SHapley Additive exPlanations (SHAP) innovatively employed not only to elucidate feature contributions but also to rank and interpret the individual impacts of base classifiers on ensemble predictions, a pioneering application in domain-specific SA, as it enables global insights into model dynamics and base model selection, addressing gaps in prior studies that relied on local explanations like LIME (Local Interpretable Model-agnostic Explanations). Evaluated using imbalance-sensitive metrics (weighted/macro F1-score, Matthews Correlation Coefficient, Cohen’s Kappa, Geometric Mean), the ensemble surpasses individual classifiers and demonstrates higher macro F1 and G-Mean than the transformer-based ALBERT model, while ALBERT excels in weighted F1, MCC, and Cohen's Kappa. Extra Trees notably excelled in the G-Mean for minority classes. SHAP analysis uncovers domain-specific drivers and base model roles, enhancing transparency. The results underscore the framework’s efficacy in delivering robust performance and actionable insights for trust modelling, automated analytics, and personalized recommendations. This work lays the groundwork for extensions to low-resource domains, multimodal data, and finer rating scales, advancing interpretable SA in e-commerce.
在线评论的情感分析(SA)对电子商务平台至关重要,但诸如大量用户生成内容量和类别不平衡等挑战阻碍了准确的多类别预测和模型可解释性。本研究引入了一种新的可解释的集成学习框架,用于跨三个亚马逊产品领域(家电、杂货和服装)的多类SA(积极、中性、消极)。该框架在堆叠集成中集成了多种监督分类器,SHapley加性解释(SHAP)创新地不仅用于阐明特征贡献,还用于对基本分类器对集成预测的个体影响进行排序和解释,这是特定领域SA的开创性应用,因为它能够全面了解模型动力学和基本模型选择。解决了先前研究中依赖于局部解释(如LIME)的空白。使用不平衡敏感指标(加权/宏观F1得分、马修斯相关系数、科恩Kappa、几何均值)进行评估,该集合优于单个分类器,并显示出比基于变压器的ALBERT模型更高的宏观F1和g均值,而ALBERT模型在加权F1、MCC和科恩Kappa方面表现出色。Extra Trees在少数族裔的G-Mean中表现突出。SHAP分析揭示了领域特定的驱动因素和基本模型角色,增强了透明度。结果强调了该框架在为信任建模、自动化分析和个性化建议提供稳健性能和可操作见解方面的有效性。这项工作为扩展到低资源领域、多模式数据和更精细的评级尺度奠定了基础,从而推进了电子商务中的可解释情景分析。
{"title":"An explainable ensemble machine learning approach for multi-domain, multiclass sentiment analysis in Amazon product reviews","authors":"Kamogelo Mokgwatjane,&nbsp;Thulane Paepae","doi":"10.1016/j.mlwa.2025.100825","DOIUrl":"10.1016/j.mlwa.2025.100825","url":null,"abstract":"<div><div>Sentiment analysis (SA) of online reviews is pivotal for e-commerce platforms yet challenges such as massive user-generated content volumes and class imbalances hinder accurate multiclass predictions and model interpretability. This study introduces a novel explainable ensemble learning framework for multiclass SA (positive, neutral, negative) across three Amazon product domains: appliances, groceries, and clothing. The framework integrates diverse supervised classifiers in a stacking ensemble, with SHapley Additive exPlanations (SHAP) innovatively employed not only to elucidate feature contributions but also to rank and interpret the individual impacts of base classifiers on ensemble predictions, a pioneering application in domain-specific SA, as it enables global insights into model dynamics and base model selection, addressing gaps in prior studies that relied on local explanations like LIME (Local Interpretable Model-agnostic Explanations). Evaluated using imbalance-sensitive metrics (weighted/macro F1-score, Matthews Correlation Coefficient, Cohen’s Kappa, Geometric Mean), the ensemble surpasses individual classifiers and demonstrates higher macro F1 and G-Mean than the transformer-based ALBERT model, while ALBERT excels in weighted F1, MCC, and Cohen's Kappa. Extra Trees notably excelled in the G-Mean for minority classes. SHAP analysis uncovers domain-specific drivers and base model roles, enhancing transparency. The results underscore the framework’s efficacy in delivering robust performance and actionable insights for trust modelling, automated analytics, and personalized recommendations. This work lays the groundwork for extensions to low-resource domains, multimodal data, and finer rating scales, advancing interpretable SA in e-commerce.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100825"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine learning with applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1