首页 > 最新文献

Machine learning with applications最新文献

英文 中文
PRCSL: A privacy-preserving continual split learning framework for decentralized medical diagnosis PRCSL:用于分散医疗诊断的隐私保护持续分裂学习框架
IF 4.9 Pub Date : 2025-12-29 DOI: 10.1016/j.mlwa.2025.100828
Jungmin Eom , Minjun Kang , Myungkeun Yoon , Nikil Dutt , Jinkyu Kim , Jaekoo Lee
Deep learning-based medical AI systems are increasingly deployed for disease diagnosis in decentralized healthcare environments where data are siloed across hospitals and IoT devices and cannot be freely shared due to strict privacy and security regulations. However, most existing continual learning and distributed learning approaches either assume centrally aggregated data or overlook incremental clinical changes, leading to catastrophic forgetting when applied to real-world medical data streams.
This paper introduces a novel healthcare-specific framework that integrates continual learning and distributed learning methods to utilize medical AI models effectively by addressing the practical constraints of the healthcare and medical ecosystem, such as data privacy, security, and changing clinical environments. Through the proposed framework, medical clients, such as hospital devices and IoT-based smart devices, can collaboratively train deep learning-based models on distributed computing resources without sharing sensitive data. Additionally, by considering incremental characteristics in medical environments such as mutations, new diseases, and abnormalities, the proposed framework can improve the disease diagnosis of medical AI models in actual clinical scenarios.
We propose Privacy-preserving Rehearsal-based Continual Split Learning (PRCSL), a healthcare-specific continual split learning framework that combines differential-privacy-based exemplar sharing, a mutual information alignment (MIA) module to correct representation shifts induced by noisy exemplars, and a parameter-free nearest-mean-of-exemplars (NME) classifier to mitigate task-recency bias under non-IID data distributions. o=Across eight benchmark datasets, including four MedMNIST subsets, HAM10000, CCH5000, c=CIFAR,cp=, p=100, and SVHN, PRCSL achieves competitive performance compared with representative continual learning baselines in terms of average accuracy and average forgetting. In particular, PRCSL achieves up to 3.62%p higher average accuracy than the best baseline. These results indicate that PRCSL enables privacy-preserving, communication-efficient, and continually adaptable medical AI in realistic decentralized clinical and IoT-enabled ecosystems. Our code is publicly available at our repository.
基于深度学习的医疗人工智能系统越来越多地部署在分散的医疗环境中进行疾病诊断,这些环境中的数据分散在医院和物联网设备之间,由于严格的隐私和安全法规,无法自由共享。然而,大多数现有的持续学习和分布式学习方法要么假设集中汇总的数据,要么忽略增量临床变化,在应用于现实世界的医疗数据流时导致灾难性的遗忘。本文介绍了一种新的医疗保健特定框架,该框架集成了持续学习和分布式学习方法,通过解决医疗保健和医疗生态系统的实际限制,如数据隐私、安全性和不断变化的临床环境,有效地利用医疗人工智能模型。通过提出的框架,医疗客户端(如医院设备和基于物联网的智能设备)可以在不共享敏感数据的情况下,在分布式计算资源上协同训练基于深度学习的模型。此外,通过考虑突变、新疾病、异常等医疗环境中的增量特征,该框架可以提高医疗AI模型在实际临床场景中的疾病诊断能力。我们提出了一种基于隐私保护预演的持续分裂学习(PRCSL),这是一种医疗保健特定的持续分裂学习框架,它结合了基于差分隐私的范例共享,一个相互信息校准(MIA)模块来纠正由噪声范例引起的表示移位,以及一个无参数的最接近范例均值(NME)分类器来减轻非iid数据分布下的任务近因偏差。在八个基准数据集上,包括四个MedMNIST子集,HAM10000, CCH5000, c=CIFAR,cp=, p=100和SVHN, PRCSL在平均准确率和平均遗忘方面与代表性的持续学习基线相比具有竞争力。特别是,PRCSL的平均准确度比最佳基线高出3.62%p。这些结果表明,PRCSL能够在现实的分散临床和物联网生态系统中实现隐私保护、通信高效和持续适应性强的医疗人工智能。我们的代码在我们的存储库中是公开的。
{"title":"PRCSL: A privacy-preserving continual split learning framework for decentralized medical diagnosis","authors":"Jungmin Eom ,&nbsp;Minjun Kang ,&nbsp;Myungkeun Yoon ,&nbsp;Nikil Dutt ,&nbsp;Jinkyu Kim ,&nbsp;Jaekoo Lee","doi":"10.1016/j.mlwa.2025.100828","DOIUrl":"10.1016/j.mlwa.2025.100828","url":null,"abstract":"<div><div>Deep learning-based medical AI systems are increasingly deployed for disease diagnosis in decentralized healthcare environments where data are siloed across hospitals and IoT devices and cannot be freely shared due to strict privacy and security regulations. However, most existing continual learning and distributed learning approaches either assume centrally aggregated data or overlook incremental clinical changes, leading to catastrophic forgetting when applied to real-world medical data streams.</div><div>This paper introduces a novel healthcare-specific framework that integrates continual learning and distributed learning methods to utilize medical AI models effectively by addressing the practical constraints of the healthcare and medical ecosystem, such as data privacy, security, and changing clinical environments. Through the proposed framework, medical clients, such as hospital devices and IoT-based smart devices, can collaboratively train deep learning-based models on distributed computing resources without sharing sensitive data. Additionally, by considering incremental characteristics in medical environments such as mutations, new diseases, and abnormalities, the proposed framework can improve the disease diagnosis of medical AI models in actual clinical scenarios.</div><div>We propose Privacy-preserving Rehearsal-based Continual Split Learning (PRCSL), a healthcare-specific continual split learning framework that combines differential-privacy-based exemplar sharing, a mutual information alignment (MIA) module to correct representation shifts induced by noisy exemplars, and a parameter-free nearest-mean-of-exemplars (NME) classifier to mitigate task-recency bias under non-IID data distributions. o=Across eight benchmark datasets, including four MedMNIST subsets, HAM10000, CCH5000, c=CIFAR,cp=, p=100, and SVHN, PRCSL achieves competitive performance compared with representative continual learning baselines in terms of average accuracy and average forgetting. In particular, PRCSL achieves up to 3.62%p higher average accuracy than the best baseline. These results indicate that PRCSL enables privacy-preserving, communication-efficient, and continually adaptable medical AI in realistic decentralized clinical and IoT-enabled ecosystems. Our code is publicly available at our repository.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100828"},"PeriodicalIF":4.9,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A traffic-aware federated learning prediction framework with custom aggregation 具有自定义聚合的流量感知联邦学习预测框架
IF 4.9 Pub Date : 2025-12-29 DOI: 10.1016/j.mlwa.2025.100829
Seerat Kaur, Sukhjit Singh Sehra, Darisuh Ebrahimi, Emad A. Mohammed
Reliable traffic predictions are essential for managing congestion, optimizing routes, improving commuter safety, and advancing the performance of intelligent transportation systems (ITS). However, existing centralized systems often lack adaptability to real-world traffic patterns and fail to capture spatio-temporal variability and client-level heterogeneity. These systems require large amounts of sensitive data to be collected on central servers, intensifying privacy risks. This study proposes a privacy-preserving Federated Learning (FL) framework for traffic flow and speed prediction (5 to 60 mins ahead) using non-independent and identically distributed (non-IID) traffic data. The objectives of this study are threefold: (1) design a client-aware custom FL aggregation strategy that accounts for traffic heterogeneity and client-specific dynamics, ignored in standard FL methods, (2) improve personalization by grouping clients based on real-world traffic pattern similarity via clustering-based approach and, (3) enhance convergence and predictive performance of global aggregation using dynamic, traffic-aware aggregation scores. The proposed framework designs a hybrid FL long-short-term memory (FedLSTM) model augmented with an attention mechanism to effectively model both temporal and spatial traffic variations across junctions, while ensuring that all raw data remains local. To improve learning under traffic diversity and imbalanced traffic distribution patterns, we propose a custom traffic-aware aggregation strategy that dynamically weighs client contributions based on six traffic-based metrics. Evaluations on clustered client partitions demonstrate that our custom aggregation consistently outperformed the baseline strategies across multiple evaluation metrics. These results highlight the effectiveness of integrating traffic-aware aggregation in enhancing the performance and generalization capability of FL-based traffic prediction frameworks.
可靠的交通预测对于管理拥堵、优化路线、提高通勤安全性和提高智能交通系统(ITS)的性能至关重要。然而,现有的集中式系统往往缺乏对现实世界交通模式的适应性,无法捕捉时空变化和客户级异质性。这些系统需要在中央服务器上收集大量敏感数据,从而加剧了隐私风险。本研究提出了一个隐私保护的联邦学习(FL)框架,用于使用非独立和同分布(非iid)交通数据进行交通流量和速度预测(提前5至60分钟)。本研究的目标有三个:(1)设计一个客户感知的自定义FL聚合策略,该策略考虑了标准FL方法中忽略的流量异质性和客户特定动态;(2)通过基于聚类的方法,根据真实交通模式的相似性对客户进行分组,从而提高个性化;(3)使用动态的、流量感知的聚合分数,增强全局聚合的收敛性和预测性能。该框架设计了一个混合FL长短期记忆(FedLSTM)模型,增强了注意机制,以有效地模拟跨路口的时空交通变化,同时确保所有原始数据保持本地。为了提高在流量多样性和不平衡流量分布模式下的学习能力,我们提出了一种自定义流量感知聚合策略,该策略基于六个基于流量的指标动态加权客户端贡献。对集群客户机分区的评估表明,我们的自定义聚合在多个评估指标上的性能始终优于基线策略。这些结果突出了集成流量感知聚合在提高基于fl的流量预测框架的性能和泛化能力方面的有效性。
{"title":"A traffic-aware federated learning prediction framework with custom aggregation","authors":"Seerat Kaur,&nbsp;Sukhjit Singh Sehra,&nbsp;Darisuh Ebrahimi,&nbsp;Emad A. Mohammed","doi":"10.1016/j.mlwa.2025.100829","DOIUrl":"10.1016/j.mlwa.2025.100829","url":null,"abstract":"<div><div>Reliable traffic predictions are essential for managing congestion, optimizing routes, improving commuter safety, and advancing the performance of intelligent transportation systems (ITS). However, existing centralized systems often lack adaptability to real-world traffic patterns and fail to capture spatio-temporal variability and client-level heterogeneity. These systems require large amounts of sensitive data to be collected on central servers, intensifying privacy risks. This study proposes a privacy-preserving Federated Learning (FL) framework for traffic flow and speed prediction (5 to 60 mins ahead) using non-independent and identically distributed (non-IID) traffic data. The objectives of this study are threefold: (1) design a client-aware custom FL aggregation strategy that accounts for traffic heterogeneity and client-specific dynamics, ignored in standard FL methods, (2) improve personalization by grouping clients based on real-world traffic pattern similarity via clustering-based approach and, (3) enhance convergence and predictive performance of global aggregation using dynamic, traffic-aware aggregation scores. The proposed framework designs a hybrid FL long-short-term memory (FedLSTM) model augmented with an attention mechanism to effectively model both temporal and spatial traffic variations across junctions, while ensuring that all raw data remains local. To improve learning under traffic diversity and imbalanced traffic distribution patterns, we propose a custom traffic-aware aggregation strategy that dynamically weighs client contributions based on six traffic-based metrics. Evaluations on clustered client partitions demonstrate that our custom aggregation consistently outperformed the baseline strategies across multiple evaluation metrics. These results highlight the effectiveness of integrating traffic-aware aggregation in enhancing the performance and generalization capability of FL-based traffic prediction frameworks.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100829"},"PeriodicalIF":4.9,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time wheat growth stage detection via improved Swin transformer for edge devices 基于改进Swin变压器的边缘设备小麦生长阶段实时检测
IF 4.9 Pub Date : 2025-12-29 DOI: 10.1016/j.mlwa.2025.100831
Xianyuan Zhu
Accurate identification of crop growth stages is crucial for precision agriculture and automated field management. This study designed and developed an improved Swin Transformer-based detection system for wheat growth stages, with an emphasis on real time deployment on embedded edge devices. Specifically, we incorporate a Progressive Transfer Learning strategy to ensure robust generalization on agricultural data and introduce an Ordinal Regression Loss to effectively mitigate misclassifications in transitional growth stages. The proposed approach integrates a hierarchical Transformer backbone with an optimized deployment pipeline for NVIDIA Jetson Orin NX, supporting gallery images, video streams, and live camera inputs. Experimental evaluation demonstrated that the system achieves consistently high recognition accuracy (above 93%) while maintaining real-time performance (above 12FPS) under different modes, with moderate power consumption (6–8 W). Compared with baseline CNNs (ResNet-50, MobileNetV3) and Transformer models (ViT), the proposed design achieves a favorable balance among accuracy, efficiency, and robustness. These results suggest that the system can contribute to the development of practical agricultural monitoring and provide a step toward intelligent control strategies in precision farming.
作物生长阶段的准确识别对于精准农业和自动化田间管理至关重要。本研究设计并开发了一种改进的基于Swin变压器的小麦生长阶段检测系统,重点是在嵌入式边缘设备上的实时部署。具体而言,我们采用渐进迁移学习策略来确保农业数据的鲁棒泛化,并引入序数回归损失来有效减轻过渡生长阶段的错误分类。所提出的方法集成了一个分层Transformer主干和一个针对NVIDIA Jetson Orin NX的优化部署管道,支持图库图像、视频流和实时摄像机输入。实验评估表明,该系统在不同模式下均能保持较高的识别准确率(93%以上),同时保持实时性(12FPS以上),且功耗适中(6-8 W)。与基线cnn (ResNet-50、MobileNetV3)和Transformer模型(ViT)相比,本文提出的设计在准确率、效率和鲁棒性之间取得了良好的平衡。这些结果表明,该系统可以促进实际农业监测的发展,并为精准农业的智能控制策略提供一步。
{"title":"Real-time wheat growth stage detection via improved Swin transformer for edge devices","authors":"Xianyuan Zhu","doi":"10.1016/j.mlwa.2025.100831","DOIUrl":"10.1016/j.mlwa.2025.100831","url":null,"abstract":"<div><div>Accurate identification of crop growth stages is crucial for precision agriculture and automated field management. This study designed and developed an improved Swin Transformer-based detection system for wheat growth stages, with an emphasis on real time deployment on embedded edge devices. Specifically, we incorporate a Progressive Transfer Learning strategy to ensure robust generalization on agricultural data and introduce an Ordinal Regression Loss to effectively mitigate misclassifications in transitional growth stages. The proposed approach integrates a hierarchical Transformer backbone with an optimized deployment pipeline for NVIDIA Jetson Orin NX, supporting gallery images, video streams, and live camera inputs. Experimental evaluation demonstrated that the system achieves consistently high recognition accuracy (above 93%) while maintaining real-time performance (above 12FPS) under different modes, with moderate power consumption (6–8 W). Compared with baseline CNNs (ResNet-50, MobileNetV3) and Transformer models (ViT), the proposed design achieves a favorable balance among accuracy, efficiency, and robustness. These results suggest that the system can contribute to the development of practical agricultural monitoring and provide a step toward intelligent control strategies in precision farming.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100831"},"PeriodicalIF":4.9,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithmic red teaming approaches to secure LLMs 算法红队方法确保法学硕士
IF 4.9 Pub Date : 2025-12-27 DOI: 10.1016/j.mlwa.2025.100815
Shaurya Jauhari
Algorithmic red teaming for Large Language Models (LLMs) is a crucial practice for proactively ensuring their safety and robustness. This process involves using an LLM as an adversary to test the vulnerabilities of a target LLM, which is essential for identifying and mitigating potential security risks before the model is deployed. Automated methodologies, which surpass the constraints of human creativity, utilize a triad of models: an attacker, a target, and a judge. This primer provides a concise summary and comparison of several state-of-the-art algorithmic red-teaming approaches, including TAP, PAIR, Crescendo, and AutoDAN-Turbo. The goal of these techniques, such as prompt injection and jailbreaking, is to push LLMs beyond their intended safe behavior. Critically, the non-deterministic nature of LLMs presents a key challenge when they are utilized as assessors or judges, potentially rendering evaluations unreliable. The paper stresses that red teaming is not a one-time exercise and is particularly vital for AI agents that use LLMs as components, as a single failure can lead to significant public scrutiny.
大型语言模型(llm)的算法红队是主动确保其安全性和鲁棒性的关键实践。此过程涉及使用LLM作为对手来测试目标LLM的漏洞,这对于在部署模型之前识别和减轻潜在的安全风险至关重要。自动化方法超越了人类创造力的限制,它利用了三种模型:攻击者、目标和判断者。本入门提供了几个最先进的算法红队方法的简要总结和比较,包括TAP, PAIR, Crescendo和AutoDAN-Turbo。这些技术(如提示注入和越狱)的目标是推动llm超出其预期的安全行为。至关重要的是,法学硕士的不确定性提出了一个关键挑战,当他们被用作评估者或法官时,可能会使评估不可靠。这篇论文强调,红队不是一次性的练习,对于使用法学硕士作为组件的人工智能代理来说尤其重要,因为一次失败就可能导致重大的公众监督。
{"title":"Algorithmic red teaming approaches to secure LLMs","authors":"Shaurya Jauhari","doi":"10.1016/j.mlwa.2025.100815","DOIUrl":"10.1016/j.mlwa.2025.100815","url":null,"abstract":"<div><div>Algorithmic red teaming for Large Language Models (LLMs) is a crucial practice for proactively ensuring their safety and robustness. This process involves using an LLM as an adversary to test the vulnerabilities of a target LLM, which is essential for identifying and mitigating potential security risks before the model is deployed. Automated methodologies, which surpass the constraints of human creativity, utilize a triad of models: an attacker, a target, and a judge. This primer provides a concise summary and comparison of several state-of-the-art algorithmic red-teaming approaches, including TAP, PAIR, Crescendo, and AutoDAN-Turbo. The goal of these techniques, such as prompt injection and jailbreaking, is to push LLMs beyond their intended safe behavior. Critically, the non-deterministic nature of LLMs presents a key challenge when they are utilized as assessors or judges, potentially rendering evaluations unreliable. The paper stresses that red teaming is not a one-time exercise and is particularly vital for AI agents that use LLMs as components, as a single failure can lead to significant public scrutiny.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100815"},"PeriodicalIF":4.9,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable deepfake detection: A multi-model framework with human-interpretable rationales for legal investigation purposes 可解释的深度假检测:一个多模型框架,具有人类可解释的法律调查目的的基本原理
IF 4.9 Pub Date : 2025-12-24 DOI: 10.1016/j.mlwa.2025.100819
Nitu Bharati, Patrick Wong, Soraya Kouadri Mostéfaoui, Dhouha Kbaier, Jan Collie
The growing spread of deepfake images, combined with the sophistication of machine learning tools and techniques used to produce them, pose serious threats to the integrity of information, individual privacy, and the preservation of public trust. To detect these deepfake images for legal investigation purposes, it requires advanced detection mechanisms that not only achieve high accuracy but also provide transparent and understandable explanations of the decisions made.
This paper presents a new framework for deepfake detection, which not only pursues accuracy but, more crucially prioritises the explainability of detection, which is a critical need in legal investigations contexts such as policing and digital forensics. The framework is composed of advanced machine learning models, an explainable AI (XAI) component and three commonly used image processing methods for detecting manipulations, to detect and explain manipulations in deepfake images of human faces. Four independently trained CNN models were developed for the original and processed images, and through decision fusion achieved an overall detection accuracy of 97 %. Moreover, the framework achieved an F1 score of 92 % from a hidden test dataset used in the UK Home Office’s Deepfake Detection Challenge 2024, placing it third out of the competing teams in the image deepfake category. Shapley values were also used to identify the facial features that influenced the models’ detection decisions. This information enabled us to home in on various areas on the face to find features more likely to occur in deepfake images. Through Bayes’ theorem, we presented a human-understandable detection method, achieving 85 % detection accuracy on the test images while maintaining explainability of the detection rationales. Our work demonstrates that combining machine learning, image processing, XAI with human understandable rationales results in a demonstrably effective and practical deepfake detection system that could significantly streamline criminal investigations as performed in policing and digital forensics. Future research will explore the interplay between psychological factors and the acceptance and trust of such frameworks and extend the framework by incorporating additional image processing techniques to enhance detection accuracy.
深度假图像的日益传播,加上用于制作它们的机器学习工具和技术的复杂性,对信息的完整性、个人隐私和公众信任的维护构成了严重威胁。为了检测这些深度伪造图像用于法律调查,需要先进的检测机制,不仅要达到高精度,还要为所做的决定提供透明和可理解的解释。本文提出了一种新的深度伪造检测框架,它不仅追求准确性,而且更重要的是优先考虑检测的可解释性,这是警务和数字取证等法律调查环境中的关键需求。该框架由先进的机器学习模型,可解释的AI (XAI)组件和三种常用的图像处理方法组成,用于检测操纵,以检测和解释人脸深度假图像中的操纵。针对原始图像和处理后的图像开发了4个独立训练的CNN模型,通过决策融合实现了97%的总体检测准确率。此外,该框架在英国内政部2024年深度伪造检测挑战赛中使用的隐藏测试数据集中获得了92%的F1分数,在图像深度伪造类别的竞争团队中排名第三。Shapley值还用于识别影响模型检测决策的面部特征。这些信息使我们能够专注于面部的各个区域,以找到更有可能在深度伪造图像中出现的特征。通过贝叶斯定理,我们提出了一种人类可理解的检测方法,在保持检测原理可解释性的同时,在测试图像上实现了85%的检测准确率。我们的工作表明,将机器学习、图像处理、人工智能与人类可理解的原理相结合,可以形成一个明显有效和实用的深度伪造检测系统,可以大大简化警务和数字取证中的刑事调查。未来的研究将探索心理因素与这些框架的接受和信任之间的相互作用,并通过结合额外的图像处理技术来扩展框架,以提高检测精度。
{"title":"Explainable deepfake detection: A multi-model framework with human-interpretable rationales for legal investigation purposes","authors":"Nitu Bharati,&nbsp;Patrick Wong,&nbsp;Soraya Kouadri Mostéfaoui,&nbsp;Dhouha Kbaier,&nbsp;Jan Collie","doi":"10.1016/j.mlwa.2025.100819","DOIUrl":"10.1016/j.mlwa.2025.100819","url":null,"abstract":"<div><div>The growing spread of deepfake images, combined with the sophistication of machine learning tools and techniques used to produce them, pose serious threats to the integrity of information, individual privacy, and the preservation of public trust. To detect these deepfake images for legal investigation purposes, it requires advanced detection mechanisms that not only achieve high accuracy but also provide transparent and understandable explanations of the decisions made.</div><div>This paper presents a new framework for deepfake detection, which not only pursues accuracy but, more crucially prioritises the explainability of detection, which is a critical need in legal investigations contexts such as policing and digital forensics. The framework is composed of advanced machine learning models, an explainable AI (XAI) component and three commonly used image processing methods for detecting manipulations, to detect and explain manipulations in deepfake images of human faces. Four independently trained CNN models were developed for the original and processed images, and through decision fusion achieved an overall detection accuracy of 97 %. Moreover, the framework achieved an F1 score of 92 % from a hidden test dataset used in the UK Home Office’s Deepfake Detection Challenge 2024, placing it third out of the competing teams in the image deepfake category. Shapley values were also used to identify the facial features that influenced the models’ detection decisions. This information enabled us to home in on various areas on the face to find features more likely to occur in deepfake images. Through Bayes’ theorem, we presented a human-understandable detection method, achieving 85 % detection accuracy on the test images while maintaining explainability of the detection rationales. Our work demonstrates that combining machine learning, image processing, XAI with human understandable rationales results in a demonstrably effective and practical deepfake detection system that could significantly streamline criminal investigations as performed in policing and digital forensics. Future research will explore the interplay between psychological factors and the acceptance and trust of such frameworks and extend the framework by incorporating additional image processing techniques to enhance detection accuracy.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100819"},"PeriodicalIF":4.9,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-driven modern slavery detection for supply chain: A cross-jurisdictional legal text analysis 人工智能驱动的供应链现代奴役检测:跨司法管辖区的法律文本分析
IF 4.9 Pub Date : 2025-12-23 DOI: 10.1016/j.mlwa.2025.100827
Jaqueline Damacena Duarte , Elena Javidi da Costa , Joao Paulo Javidi da Costa , Ana Sofia Schweizer Silvestre , Edna Dias Canedo , Hernany Silveira Rocha
This study addresses a significant gap in Supply Chain Management (SCM) research by investigating the applicability of machine learning (ML) techniques, including state-of-the-art Large Language Models (LLMs), providing a foundational exploration into analysing legal documents to identify supply chain-related narratives of modern slavery. We developed a dataset with 1714 court opinions from the USA and 436 legal dockets from Indian jurisdictions, meticulously annotated and curated in three global labels and thirteen factual labels. We benchmarked context-aware classifiers using traditional ML, deep learning (DL), and transfer learning, and also tested a zero-shot prompt-based model (Gemini 1.5-Flash). Various vectorization strategies and classifiers were compared for performance. Our findings reveal that the fine-tuned domain-specific BERT (LegalBERT/CASEHOLD) model achieved superior results, with an 89.55% F1-Score and 90.93% accuracy in identifying relevant cases. Also relevant, Gemini 1.5-Flash achieved comparable results (86.97% F1-Score, 86.1% accuracy), outperforming traditional ML/DL baselines. This work provides empirical evidence of how advanced analytical techniques can be leveraged for knowledge discovery to risk assessment by efficiently scanning large volumes of legal texts for relevant warnings. As one of the first studies to apply a context-aware approach to identifying modern slavery in supply chains through legal records — an under-explored area — this research make a significant contribution to discussions on improving supply chain modern slavery risk assessment and audit practices.
本研究通过调查机器学习(ML)技术(包括最先进的大型语言模型(llm))的适用性,解决了供应链管理(SCM)研究中的一个重大空白,为分析法律文件提供了基础探索,以识别与现代奴隶制相关的供应链叙述。我们开发了一个数据集,其中包括来自美国的1714份法院意见和来自印度司法管辖区的436份法律摘要,并在三个全球标签和13个事实标签中进行了精心注释和整理。我们使用传统的机器学习、深度学习(DL)和迁移学习对上下文感知分类器进行基准测试,并测试了基于零射击提示的模型(Gemini 1.5-Flash)。比较了各种矢量化策略和分类器的性能。研究结果表明,经过微调的领域特异性BERT (LegalBERT/CASEHOLD)模型在识别相关病例方面取得了较好的效果,F1-Score为89.55%,准确率为90.93%。同样相关的是,Gemini 1.5-Flash取得了类似的结果(86.97% F1-Score, 86.1%准确率),优于传统的ML/DL基线。这项工作提供了经验证据,证明如何利用先进的分析技术,通过有效地扫描大量法律文本以寻找相关警告,来进行知识发现和风险评估。作为第一个应用上下文感知方法通过法律记录识别供应链中的现代奴隶制的研究之一-这是一个尚未开发的领域-本研究对改善供应链现代奴隶制风险评估和审计实践的讨论做出了重大贡献。
{"title":"AI-driven modern slavery detection for supply chain: A cross-jurisdictional legal text analysis","authors":"Jaqueline Damacena Duarte ,&nbsp;Elena Javidi da Costa ,&nbsp;Joao Paulo Javidi da Costa ,&nbsp;Ana Sofia Schweizer Silvestre ,&nbsp;Edna Dias Canedo ,&nbsp;Hernany Silveira Rocha","doi":"10.1016/j.mlwa.2025.100827","DOIUrl":"10.1016/j.mlwa.2025.100827","url":null,"abstract":"<div><div>This study addresses a significant gap in Supply Chain Management (SCM) research by investigating the applicability of machine learning (ML) techniques, including state-of-the-art Large Language Models (LLMs), providing a foundational exploration into analysing legal documents to identify supply chain-related narratives of modern slavery. We developed a dataset with 1714 court opinions from the USA and 436 legal dockets from Indian jurisdictions, meticulously annotated and curated in three global labels and thirteen factual labels. We benchmarked context-aware classifiers using traditional ML, deep learning (DL), and transfer learning, and also tested a zero-shot prompt-based model (Gemini 1.5-Flash). Various vectorization strategies and classifiers were compared for performance. Our findings reveal that the fine-tuned domain-specific BERT (LegalBERT/CASEHOLD) model achieved superior results, with an 89.55% F1-Score and 90.93% accuracy in identifying relevant cases. Also relevant, Gemini 1.5-Flash achieved comparable results (86.97% F1-Score, 86.1% accuracy), outperforming traditional ML/DL baselines. This work provides empirical evidence of how advanced analytical techniques can be leveraged for knowledge discovery to risk assessment by efficiently scanning large volumes of legal texts for relevant warnings. As one of the first studies to apply a context-aware approach to identifying modern slavery in supply chains through legal records — an under-explored area — this research make a significant contribution to discussions on improving supply chain modern slavery risk assessment and audit practices.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100827"},"PeriodicalIF":4.9,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vision-language zero-shot models for radiographic image classification: A systematic review 用于放射图像分类的视觉语言零射击模型:系统综述
IF 4.9 Pub Date : 2025-12-23 DOI: 10.1016/j.mlwa.2025.100826
Ana Guerrero-Tamayo, Ibon Oleagordia-Ruiz, Begonya Garcia-Zapirain
Zero-shot Vision-Language Models (VLMs) link visual and textual features, enabling generalization to unseen domains, making them promising for radiographic diagnosis, though clinical adoption is limited.
This systematic review examines zero-shot VLMs applied to radiographic image classification, following the PRISMA methodology. Articles were identified from IEEE, PubMed, Scopus, and Web of Science, with 16 selected after exhaustive screening. The analysis addressed five research questions (RQ1–RQ5) covering dataset characteristics, model attributes, natural language integration, reported limitations, and hyperparameter tuning.
Geographically, China (37%) and the United States (38%) contributed 75% of the reviewed studies, with no EU-led research identified, highlighting the need for increased European engagement in this field.
Architecturally (RQ2), high heterogeneity exists, with dual-encoder (43.75%) and attention-based fusion models most common. Most models (81.25%) employ a Joint Embedding Space for multimodal alignment.
Regarding datasets and natural language use (RQ1, RQ3), VLMs rely on few large but semantically narrow datasets, limiting generalizability and amplifying bias. Real clinical reports (direct supervision) and implicit pretrained textual embeddings each represent 37.5% of strategies, yet unstructured clinical text is underutilized. Limited vision-language integration negatively affects performance and explainability (RQ4). Hyperparameter tuning (RQ5) is rarely reported, with 9 of 16 studies not specifying methods, compromising reproducibility.
There is an urgent need for open, multilingual, multimodal datasets reflecting clinical and geographic diversity. Clinically useful zero-shot VLMs require transparent evaluation, including explainability metrics. Future models should adopt a multidisciplinary approach, combining technical innovation with usability, data representativeness, and methodological transparency to ensure diagnostic robustness.
零射击视觉语言模型(VLMs)将视觉和文本特征联系起来,使其能够推广到未知领域,使其有望用于放射诊断,尽管临床应用有限。这个系统的回顾检查零射击VLMs应用于放射图像分类,遵循PRISMA方法。文章来自IEEE、PubMed、Scopus和Web of Science,经过详尽的筛选后选出16篇。分析解决了五个研究问题(RQ1-RQ5),包括数据集特征、模型属性、自然语言集成、报告的局限性和超参数调优。从地理上看,中国(37%)和美国(38%)贡献了75%的审查研究,没有发现欧盟主导的研究,这突显了欧洲在这一领域增加参与的必要性。架构(RQ2)存在高度异质性,双编码器(43.75%)和基于注意力的融合模型最为常见。大多数模型(81.25%)采用联合嵌入空间进行多模态对齐。关于数据集和自然语言使用(RQ1, RQ3), vlm依赖于少数大型但语义狭窄的数据集,限制了可泛化性并放大了偏差。真实临床报告(直接监督)和隐式预训练文本嵌入各占37.5%的策略,但非结构化临床文本未得到充分利用。有限的视觉语言整合会负面影响表现和可解释性(RQ4)。超参数调优(RQ5)很少被报道,16项研究中有9项没有指定方法,影响了可重复性。迫切需要开放、多语言、多模式的数据集,以反映临床和地理的多样性。临床上有用的零射击VLMs需要透明的评估,包括可解释性指标。未来的模型应采用多学科方法,将技术创新与可用性、数据代表性和方法透明度相结合,以确保诊断的稳健性。
{"title":"Vision-language zero-shot models for radiographic image classification: A systematic review","authors":"Ana Guerrero-Tamayo,&nbsp;Ibon Oleagordia-Ruiz,&nbsp;Begonya Garcia-Zapirain","doi":"10.1016/j.mlwa.2025.100826","DOIUrl":"10.1016/j.mlwa.2025.100826","url":null,"abstract":"<div><div>Zero-shot Vision-Language Models (VLMs) link visual and textual features, enabling generalization to unseen domains, making them promising for radiographic diagnosis, though clinical adoption is limited.</div><div>This systematic review examines zero-shot VLMs applied to radiographic image classification, following the PRISMA methodology. Articles were identified from IEEE, PubMed, Scopus, and Web of Science, with 16 selected after exhaustive screening. The analysis addressed five research questions (RQ1–RQ5) covering dataset characteristics, model attributes, natural language integration, reported limitations, and hyperparameter tuning.</div><div>Geographically, China (37%) and the United States (38%) contributed 75% of the reviewed studies, with no EU-led research identified, highlighting the need for increased European engagement in this field.</div><div>Architecturally (RQ2), high heterogeneity exists, with dual-encoder (43.75%) and attention-based fusion models most common. Most models (81.25%) employ a Joint Embedding Space for multimodal alignment.</div><div>Regarding datasets and natural language use (RQ1, RQ3), VLMs rely on few large but semantically narrow datasets, limiting generalizability and amplifying bias. Real clinical reports (direct supervision) and implicit pretrained textual embeddings each represent 37.5% of strategies, yet unstructured clinical text is underutilized. Limited vision-language integration negatively affects performance and explainability (RQ4). Hyperparameter tuning (RQ5) is rarely reported, with 9 of 16 studies not specifying methods, compromising reproducibility.</div><div>There is an urgent need for open, multilingual, multimodal datasets reflecting clinical and geographic diversity. Clinically useful zero-shot VLMs require transparent evaluation, including explainability metrics. Future models should adopt a multidisciplinary approach, combining technical innovation with usability, data representativeness, and methodological transparency to ensure diagnostic robustness.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100826"},"PeriodicalIF":4.9,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An explainable ensemble machine learning approach for multi-domain, multiclass sentiment analysis in Amazon product reviews 一种可解释的集成机器学习方法,用于亚马逊产品评论中的多领域、多类情感分析
IF 4.9 Pub Date : 2025-12-20 DOI: 10.1016/j.mlwa.2025.100825
Kamogelo Mokgwatjane, Thulane Paepae
Sentiment analysis (SA) of online reviews is pivotal for e-commerce platforms yet challenges such as massive user-generated content volumes and class imbalances hinder accurate multiclass predictions and model interpretability. This study introduces a novel explainable ensemble learning framework for multiclass SA (positive, neutral, negative) across three Amazon product domains: appliances, groceries, and clothing. The framework integrates diverse supervised classifiers in a stacking ensemble, with SHapley Additive exPlanations (SHAP) innovatively employed not only to elucidate feature contributions but also to rank and interpret the individual impacts of base classifiers on ensemble predictions, a pioneering application in domain-specific SA, as it enables global insights into model dynamics and base model selection, addressing gaps in prior studies that relied on local explanations like LIME (Local Interpretable Model-agnostic Explanations). Evaluated using imbalance-sensitive metrics (weighted/macro F1-score, Matthews Correlation Coefficient, Cohen’s Kappa, Geometric Mean), the ensemble surpasses individual classifiers and demonstrates higher macro F1 and G-Mean than the transformer-based ALBERT model, while ALBERT excels in weighted F1, MCC, and Cohen's Kappa. Extra Trees notably excelled in the G-Mean for minority classes. SHAP analysis uncovers domain-specific drivers and base model roles, enhancing transparency. The results underscore the framework’s efficacy in delivering robust performance and actionable insights for trust modelling, automated analytics, and personalized recommendations. This work lays the groundwork for extensions to low-resource domains, multimodal data, and finer rating scales, advancing interpretable SA in e-commerce.
在线评论的情感分析(SA)对电子商务平台至关重要,但诸如大量用户生成内容量和类别不平衡等挑战阻碍了准确的多类别预测和模型可解释性。本研究引入了一种新的可解释的集成学习框架,用于跨三个亚马逊产品领域(家电、杂货和服装)的多类SA(积极、中性、消极)。该框架在堆叠集成中集成了多种监督分类器,SHapley加性解释(SHAP)创新地不仅用于阐明特征贡献,还用于对基本分类器对集成预测的个体影响进行排序和解释,这是特定领域SA的开创性应用,因为它能够全面了解模型动力学和基本模型选择。解决了先前研究中依赖于局部解释(如LIME)的空白。使用不平衡敏感指标(加权/宏观F1得分、马修斯相关系数、科恩Kappa、几何均值)进行评估,该集合优于单个分类器,并显示出比基于变压器的ALBERT模型更高的宏观F1和g均值,而ALBERT模型在加权F1、MCC和科恩Kappa方面表现出色。Extra Trees在少数族裔的G-Mean中表现突出。SHAP分析揭示了领域特定的驱动因素和基本模型角色,增强了透明度。结果强调了该框架在为信任建模、自动化分析和个性化建议提供稳健性能和可操作见解方面的有效性。这项工作为扩展到低资源领域、多模式数据和更精细的评级尺度奠定了基础,从而推进了电子商务中的可解释情景分析。
{"title":"An explainable ensemble machine learning approach for multi-domain, multiclass sentiment analysis in Amazon product reviews","authors":"Kamogelo Mokgwatjane,&nbsp;Thulane Paepae","doi":"10.1016/j.mlwa.2025.100825","DOIUrl":"10.1016/j.mlwa.2025.100825","url":null,"abstract":"<div><div>Sentiment analysis (SA) of online reviews is pivotal for e-commerce platforms yet challenges such as massive user-generated content volumes and class imbalances hinder accurate multiclass predictions and model interpretability. This study introduces a novel explainable ensemble learning framework for multiclass SA (positive, neutral, negative) across three Amazon product domains: appliances, groceries, and clothing. The framework integrates diverse supervised classifiers in a stacking ensemble, with SHapley Additive exPlanations (SHAP) innovatively employed not only to elucidate feature contributions but also to rank and interpret the individual impacts of base classifiers on ensemble predictions, a pioneering application in domain-specific SA, as it enables global insights into model dynamics and base model selection, addressing gaps in prior studies that relied on local explanations like LIME (Local Interpretable Model-agnostic Explanations). Evaluated using imbalance-sensitive metrics (weighted/macro F1-score, Matthews Correlation Coefficient, Cohen’s Kappa, Geometric Mean), the ensemble surpasses individual classifiers and demonstrates higher macro F1 and G-Mean than the transformer-based ALBERT model, while ALBERT excels in weighted F1, MCC, and Cohen's Kappa. Extra Trees notably excelled in the G-Mean for minority classes. SHAP analysis uncovers domain-specific drivers and base model roles, enhancing transparency. The results underscore the framework’s efficacy in delivering robust performance and actionable insights for trust modelling, automated analytics, and personalized recommendations. This work lays the groundwork for extensions to low-resource domains, multimodal data, and finer rating scales, advancing interpretable SA in e-commerce.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100825"},"PeriodicalIF":4.9,"publicationDate":"2025-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal Digital Twins for cyber–physical security in water systems: A framework for robust anomaly detection 水系统网络物理安全的因果数字孪生:一个鲁棒异常检测框架
IF 4.9 Pub Date : 2025-12-17 DOI: 10.1016/j.mlwa.2025.100824
Mohammadhossein Homaei , Mehran Tarif , Pablo García Rodríguez , Andrés Caro , Mar Ávila
Industrial Control Systems (ICS) in water distribution and treatment face cyber–physical attacks exploiting network and physical vulnerabilities. Current water system anomaly detection methods rely on correlations, yielding high false alarms and poor root cause analysis. We propose a Causal Digital Twin (CDT) framework for water infrastructures, combining causal inference with digital twin modeling. CDT supports association for pattern detection, intervention for system response, and counterfactual analysis for water attack prevention. Evaluated on water-related datasets SWaT, WADI, and HAI, CDT shows high compliance with physical constraints (90.8% for SWaT, 87.4%–90.8% across datasets) and structural Hamming distance 0.133 ± 0.02. F1-scores are 0.944±0.014 (SWaT), 0.902±0.021 (WADI), 0.923±0.018 (HAI, p<0.0024). Multi-scale temporal detection strategies (τ{5,10,20}) enable 91.7% detection of stealthy attacks through cumulative causal discrepancy analysis. CDT reduces false positives by 48% compared to state-of-the-art methods (70% vs. statistical baselines), achieves 78.4% root cause accuracy, and enables counterfactual defenses reducing attack success by up to 89.1%. Real-time performance at 3.2 ms latency ensures safe and interpretable operation for medium-scale water systems.
供水和处理中的工业控制系统(ICS)面临着利用网络和物理漏洞的网络物理攻击。目前的水系统异常检测方法依赖于相关性,产生高误报和差的根本原因分析。我们提出了一个因果数字孪生(CDT)框架,将因果推理与数字孪生建模相结合。CDT支持模式检测的关联、系统响应的干预以及水攻击预防的反事实分析。在水相关数据集SWaT、WADI和HAI上评估,CDT对物理约束的符合性较高(SWaT为90.8%,跨数据集为87.4%-90.8%),结构汉明距离为0.133±0.02。f1评分分别为0.944±0.014 (SWaT), 0.902±0.021 (WADI), 0.923±0.018 (HAI, p<0.0024)。多尺度时间检测策略(τ∈{5,10,20})通过累积因果差异分析,能够检测出91.7%的隐身攻击。与最先进的方法相比,CDT减少了48%的误报(与统计基线相比为70%),实现了78.4%的根本原因准确性,并使反事实防御将攻击成功率降低了89.1%。3.2 ms延迟的实时性能确保了中等规模水系统的安全和可解释的操作。
{"title":"Causal Digital Twins for cyber–physical security in water systems: A framework for robust anomaly detection","authors":"Mohammadhossein Homaei ,&nbsp;Mehran Tarif ,&nbsp;Pablo García Rodríguez ,&nbsp;Andrés Caro ,&nbsp;Mar Ávila","doi":"10.1016/j.mlwa.2025.100824","DOIUrl":"10.1016/j.mlwa.2025.100824","url":null,"abstract":"<div><div>Industrial Control Systems (ICS) in water distribution and treatment face cyber–physical attacks exploiting network and physical vulnerabilities. Current water system anomaly detection methods rely on correlations, yielding high false alarms and poor root cause analysis. We propose a Causal Digital Twin (CDT) framework for water infrastructures, combining causal inference with digital twin modeling. CDT supports association for pattern detection, intervention for system response, and counterfactual analysis for water attack prevention. Evaluated on water-related datasets SWaT, WADI, and HAI, CDT shows high compliance with physical constraints (90.8% for SWaT, 87.4%–90.8% across datasets) and structural Hamming distance 0.133 ± 0.02. F1-scores are <span><math><mrow><mn>0</mn><mo>.</mo><mn>944</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>014</mn></mrow></math></span> (SWaT), <span><math><mrow><mn>0</mn><mo>.</mo><mn>902</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>021</mn></mrow></math></span> (WADI), <span><math><mrow><mn>0</mn><mo>.</mo><mn>923</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>018</mn></mrow></math></span> (HAI, <span><math><mrow><mi>p</mi><mo>&lt;</mo><mn>0</mn><mo>.</mo><mn>0024</mn></mrow></math></span>). Multi-scale temporal detection strategies (<span><math><mrow><mi>τ</mi><mo>∈</mo><mrow><mo>{</mo><mn>5</mn><mo>,</mo><mn>10</mn><mo>,</mo><mn>20</mn><mo>}</mo></mrow></mrow></math></span>) enable 91.7% detection of stealthy attacks through cumulative causal discrepancy analysis. CDT reduces false positives by 48% compared to state-of-the-art methods (70% vs. statistical baselines), achieves 78.4% root cause accuracy, and enables counterfactual defenses reducing attack success by up to 89.1%. Real-time performance at 3.2 ms latency ensures safe and interpretable operation for medium-scale water systems.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100824"},"PeriodicalIF":4.9,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond text: Multimodal stance detection in Arabic tweets 超越文本:阿拉伯语推文的多模态姿态检测
IF 4.9 Pub Date : 2025-12-16 DOI: 10.1016/j.mlwa.2025.100823
Nouf AlShenaifi, Nourah Alangari
Despite the growing importance of multimodal signals on social media, Arabic stance detection has remained largely text-only, overlooking the visual context that often accompanies user posts. To bridge this gap, we present MAWQIF-MM, the first publicly available Arabic multimodal stance detection corpus of tweet–image pairs annotated with three stance labels: Favor, Against, and Neutral. Building on this resource, we propose a novel attention-based cross-modal fusion model that jointly encodes text and images. Textual content is processed using AraBERT v2, a transformer-based language model optimized for Arabic, while visual features are extracted using BLIP with a ViT-B backbone, a state-of-the-art vision-language model. These two modalities are integrated via multi-head cross-attention to capture cross-modal interactions. Experimental results demonstrate the effectiveness of our approach: on a held-out test set, the model achieves 88% accuracy, outperforming a text-only AraBERT baseline by 12 percentage points and an image-only BLIP baseline by 4 points. To further probe large vision–language models (VLMs) in low-resource settings, we benchmark Gemini 2.5 Flash and GPT-4o under zero-shot and few-shot prompting. While these models show promising generalization, they struggle with nuanced stances without fine-tuning, underscoring the value of domain-specific supervised training.
尽管多模式信号在社交媒体上的重要性日益增加,但阿拉伯语的立场检测仍然主要是文本检测,忽略了用户帖子中经常出现的视觉背景。为了弥补这一差距,我们提出了MAWQIF-MM,这是第一个公开可用的阿拉伯语多模态姿态检测语料库,该语料库使用三个姿态标签进行注释:赞成、反对和中立。在此基础上,我们提出了一种新的基于注意力的跨模态融合模型,该模型联合编码文本和图像。文本内容使用AraBERT v2处理,AraBERT v2是一种针对阿拉伯语进行优化的基于转换器的语言模型,而视觉特征则使用BLIP与最先进的视觉语言模型ViT-B主干进行提取。这两种模式通过多头交叉注意来整合,以捕捉跨模式的相互作用。实验结果证明了我们方法的有效性:在一个固定测试集上,该模型达到88%的准确率,比纯文本的AraBERT基线高出12个百分点,比纯图像的BLIP基线高出4个百分点。为了进一步探索低资源环境下的大型视觉语言模型(VLMs),我们在零射击和少射击提示下对Gemini 2.5 Flash和gpt - 40进行了基准测试。虽然这些模型显示出有希望的泛化,但它们在没有微调的情况下与细微的立场作斗争,强调了特定领域监督训练的价值。
{"title":"Beyond text: Multimodal stance detection in Arabic tweets","authors":"Nouf AlShenaifi,&nbsp;Nourah Alangari","doi":"10.1016/j.mlwa.2025.100823","DOIUrl":"10.1016/j.mlwa.2025.100823","url":null,"abstract":"<div><div>Despite the growing importance of multimodal signals on social media, Arabic stance detection has remained largely text-only, overlooking the visual context that often accompanies user posts. To bridge this gap, we present MAWQIF-MM, the first publicly available Arabic multimodal stance detection corpus of tweet–image pairs annotated with three stance labels: Favor, Against, and Neutral. Building on this resource, we propose a novel attention-based cross-modal fusion model that jointly encodes text and images. Textual content is processed using AraBERT v2, a transformer-based language model optimized for Arabic, while visual features are extracted using BLIP with a ViT-B backbone, a state-of-the-art vision-language model. These two modalities are integrated via multi-head cross-attention to capture cross-modal interactions. Experimental results demonstrate the effectiveness of our approach: on a held-out test set, the model achieves 88% accuracy, outperforming a text-only AraBERT baseline by 12 percentage points and an image-only BLIP baseline by 4 points. To further probe large vision–language models (VLMs) in low-resource settings, we benchmark Gemini 2.5 Flash and GPT-4o under zero-shot and few-shot prompting. While these models show promising generalization, they struggle with nuanced stances without fine-tuning, underscoring the value of domain-specific supervised training.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100823"},"PeriodicalIF":4.9,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine learning with applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1