首页 > 最新文献

IEEE Access最新文献

英文 中文
Channeling Fairness: Class Imbalance-Aware Skin Disease Recognition via Fair Channel Enhancement Module 通道公平:通过公平通道增强模块识别类不平衡皮肤疾病
IF 3.6 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-23 DOI: 10.1109/ACCESS.2026.3665831
Joseph Thomas Año;Arren Matthew C. Antioquia
Skin disease classification presents significant challenges due to class imbalance, low inter-class variability, and high intra-class variation present in most clinical image datasets. To address these issues, we propose Fair Channel Enhancement (FCE), a novel module that improves fine-grained feature representation without requiring additional annotations nor complex architectures. FCE allocates feature channels proportionally based on class frequency, which ensures fairer representation of underrepresented classes. FCE is coupled with CutMix augmentation and label smoothing to enhance model robustness and generalization. Extensive experiments on three dermatology benchmark datasets, including SD-128, SD-198, and SD-260, demonstrate that our approach achieves up to a 7.13% accuracy improvement over baseline models and outperforms state-of-the-art methods by a significant margin. FCE also boosts the average accuracy of both low- and high-frequency classes by up to 8.60% and 10.48%, respectively. Furthermore, our method generalizes effectively to other medical image datasets, including ISIC 2018 and Hyper-Kvasir, and performs well on smaller dataset subsets. These results highlight FCE as a simple and effective solution for imbalanced classification problems.
由于大多数临床图像数据集中存在类别不平衡、低类别间变异性和高类别内变异性,皮肤病分类面临重大挑战。为了解决这些问题,我们提出了公平通道增强(Fair Channel Enhancement, FCE),这是一个新颖的模块,可以改善细粒度的特征表示,而不需要额外的注释或复杂的架构。FCE根据类频率按比例分配特征通道,确保了代表性不足的类的公平代表。FCE与CutMix增强和标签平滑相结合,增强了模型的鲁棒性和泛化性。在三个皮肤科基准数据集(包括SD-128、SD-198和SD-260)上进行的大量实验表明,我们的方法比基线模型的准确率提高了7.13%,并且显著优于最先进的方法。FCE还将低频和高频类别的平均准确率分别提高了8.60%和10.48%。此外,我们的方法可以有效地推广到其他医学图像数据集,包括ISIC 2018和Hyper-Kvasir,并且在较小的数据集子集上表现良好。这些结果表明FCE是一种简单有效的解决不平衡分类问题的方法。
{"title":"Channeling Fairness: Class Imbalance-Aware Skin Disease Recognition via Fair Channel Enhancement Module","authors":"Joseph Thomas Año;Arren Matthew C. Antioquia","doi":"10.1109/ACCESS.2026.3665831","DOIUrl":"https://doi.org/10.1109/ACCESS.2026.3665831","url":null,"abstract":"Skin disease classification presents significant challenges due to class imbalance, low inter-class variability, and high intra-class variation present in most clinical image datasets. To address these issues, we propose Fair Channel Enhancement (FCE), a novel module that improves fine-grained feature representation without requiring additional annotations nor complex architectures. FCE allocates feature channels proportionally based on class frequency, which ensures fairer representation of underrepresented classes. FCE is coupled with CutMix augmentation and label smoothing to enhance model robustness and generalization. Extensive experiments on three dermatology benchmark datasets, including SD-128, SD-198, and SD-260, demonstrate that our approach achieves up to a 7.13% accuracy improvement over baseline models and outperforms state-of-the-art methods by a significant margin. FCE also boosts the average accuracy of both low- and high-frequency classes by up to 8.60% and 10.48%, respectively. Furthermore, our method generalizes effectively to other medical image datasets, including ISIC 2018 and Hyper-Kvasir, and performs well on smaller dataset subsets. These results highlight FCE as a simple and effective solution for imbalanced classification problems.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"14 ","pages":"29674-29691"},"PeriodicalIF":3.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11407488","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147292792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Policy-Bound, Verifier-Pluggable Smart Contract Framework for Auditable Healthcare Analytics 用于可审计医疗保健分析的策略绑定、验证器可插入的智能合约框架
IF 3.6 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-23 DOI: 10.1109/ACCESS.2026.3665093
Muhammad Saeed Javed;Ali Hennache;Muhammad Imran;Saddam Hussain Abbasi
Healthcare analytics involving sensitive patient data demand robust statistical safeguards alongside verifiable compliance mechanisms that allow regulators to independently reconstruct query operations, applied policies, and reported outcomes without accessing raw medical information. We present a governance-first, upgradeable on-chain framework that anchors complete request lifecycles to public blockchain using keccak256 cryptographic commitments. This approach delineates a process wherein researchers submit structured queries, a controller captures active policy states, and hospitals provide result digests. A comprehensive and immutable event trail for this entire process is maintained on the Ethereum Sepolia blockchain. External auditors can then reconstruct complete timelines and verify consistent binding between requests, policies, and outcomes. The proposed modular verifier interface, currently implemented via configurable MockVerifier, maintains stable Application Binary Interface compatibility for future production verifier integration while validating end-to-end governance. We demonstrate the framework’s practicality through a detailed diabetes prevalence analytics case study. We provide detailed gas consumption profiles, request latency measurements, and observable failure modes that demonstrate how governance rules translate into enforceable reversions. The architecture maintains minimal on-chain state with Layer-2 readiness, offering immediate regulatory accountability while preserving straightforward upgrade paths to advanced cryptographic components like zk verifiers (Groth16), verifiable differential privacy through VRF-seeded randomness, content-addressed artifact storage, and versioned policy management. This approach effectively separates initial deployment feasibility from computationally intensive cryptography while delivering immediately actionable, externally verifiable evidence of policy compliance.
涉及敏感患者数据的医疗保健分析需要强大的统计保护措施以及可验证的遵从性机制,这些机制允许监管机构在不访问原始医疗信息的情况下独立地重建查询操作、应用策略和报告结果。我们提出了一个治理优先的、可升级的链上框架,该框架使用keccak256加密承诺将完整的请求生命周期锚定到公共区块链。这种方法描述了一个过程,其中研究人员提交结构化查询,控制器捕获活动策略状态,医院提供结果摘要。在以太坊Sepolia区块链上维护了整个过程的全面且不可变的事件跟踪。然后,外部审计员可以重建完整的时间表,并验证请求、策略和结果之间的一致绑定。提议的模块化验证器接口,目前通过可配置的MockVerifier实现,在验证端到端治理的同时,为未来的产品验证器集成维护稳定的应用程序二进制接口兼容性。我们通过一个详细的糖尿病流行分析案例研究来证明该框架的实用性。我们提供了详细的气体消耗概况、请求延迟测量和可观察的故障模式,以演示治理规则如何转化为可执行的逆转。该架构通过第2层准备保持最小的链上状态,提供即时的监管责任,同时保留直接升级到高级加密组件的路径,如zk验证器(Groth16),通过vrf种子随机性验证的差异隐私,内容寻址的工件存储和版本化策略管理。这种方法有效地将初始部署可行性与计算密集型密码学分离开来,同时提供立即可操作的、外部可验证的策略遵从性证据。
{"title":"Policy-Bound, Verifier-Pluggable Smart Contract Framework for Auditable Healthcare Analytics","authors":"Muhammad Saeed Javed;Ali Hennache;Muhammad Imran;Saddam Hussain Abbasi","doi":"10.1109/ACCESS.2026.3665093","DOIUrl":"https://doi.org/10.1109/ACCESS.2026.3665093","url":null,"abstract":"Healthcare analytics involving sensitive patient data demand robust statistical safeguards alongside verifiable compliance mechanisms that allow regulators to independently reconstruct query operations, applied policies, and reported outcomes without accessing raw medical information. We present a governance-first, upgradeable on-chain framework that anchors complete request lifecycles to public blockchain using <monospace>keccak256</monospace> cryptographic commitments. This approach delineates a process wherein researchers submit structured queries, a controller captures active policy states, and hospitals provide result digests. A comprehensive and immutable event trail for this entire process is maintained on the Ethereum Sepolia blockchain. External auditors can then reconstruct complete timelines and verify consistent binding between requests, policies, and outcomes. The proposed modular verifier interface, currently implemented via configurable MockVerifier, maintains stable Application Binary Interface compatibility for future production verifier integration while validating end-to-end governance. We demonstrate the framework’s practicality through a detailed diabetes prevalence analytics case study. We provide detailed gas consumption profiles, request latency measurements, and observable failure modes that demonstrate how governance rules translate into enforceable reversions. The architecture maintains minimal on-chain state with Layer-2 readiness, offering immediate regulatory accountability while preserving straightforward upgrade paths to advanced cryptographic components like zk verifiers (Groth16), verifiable differential privacy through VRF-seeded randomness, content-addressed artifact storage, and versioned policy management. This approach effectively separates initial deployment feasibility from computationally intensive cryptography while delivering immediately actionable, externally verifiable evidence of policy compliance.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"14 ","pages":"29590-29609"},"PeriodicalIF":3.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11407969","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novelty Detection in Event Surveillance Documents 事件监控文件中的新颖性检测
IF 3.6 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-23 DOI: 10.1109/ACCESS.2026.3666022
Edmond Menya;Roberto Interdonato;Dickson Owuor;Mathieu Roche
Event Based Surveillance (EBS) monitors online sources such as broadcast, print, web news and generates early warning and response (EWAR) signals for use in disaster mitigation. These online sources provide a dynamic data source allowing for potential real-time EBS updates. However, in dealing with news articles, fragmented information exists in varied sources and redundant information are known to overburden EBS. In this study we propose a Large Language Model based approach that filters out redundancies while learning novel information from event centric online news corpora. We study this novelty task for events covering animal health, food security and climate change surveillance domains. Our approach focuses on features integrating spatio-temporal information (such as location and date of event) and thematic information (such as the name of disease, food insecurity triggers, climate change magnitude). We characterize novelty as presence of new and additional information (e.g., a newly mentioned disease name or additional location information) as distinguished from duplicate (e.g., an already seen disease name) and missing (expected but absent) information. To this regard, our approach proposes fine-grained classification of novelty in event surveillance and language modeling adoption with a multi-class classification objective to learn classifying of event information. Our LLM adoption strategy proposes question-based prompts whose extracted answers map to predefined feature types (e.g., location, date, name of disease) in order to enrich our classifier. In our empirical studies, we present comparative analysis with respect to language models and large language models for State-Of-The-Art performance in the event novelty classification task. Our findings demonstrates the ability of cross-domain novelty classification with our model EpidGPT (few-shot) achieving $F_{1}%$ scores of 82.3, 85.49 and 88.97 in animal health, food security and climate change domains while finetuned EpidGPT achieves $F_{1}%$ scores of 96.02, 86.0 and 88.45 on each respectively domains.
基于事件的监视(EBS)监视在线资源,如广播、印刷、网络新闻,并生成用于减灾的早期预警和响应(EWAR)信号。这些在线数据源提供了一个动态数据源,允许潜在的实时EBS更新。但是,在处理新闻文章时,分散的信息存在于各种来源中,并且已知冗余信息会使EBS负担过重。在这项研究中,我们提出了一种基于大型语言模型的方法,该方法在从以事件为中心的在线新闻语料库中学习新信息的同时过滤掉冗余。我们研究这一新奇任务的事件涵盖动物健康,食品安全和气候变化监测领域。我们的方法侧重于整合时空信息(如事件的位置和日期)和主题信息(如疾病名称、粮食不安全触发因素、气候变化幅度)的特征。我们将新颖性描述为存在新的和额外的信息(例如,新提到的疾病名称或额外的位置信息),以区别于重复(例如,已经看到的疾病名称)和缺失(预期但不存在)的信息。在这方面,我们的方法提出了细粒度的事件监视新颖性分类和语言建模,采用多类分类目标来学习事件信息的分类。我们的LLM采用策略提出了基于问题的提示,其提取的答案映射到预定义的特征类型(例如,位置,日期,疾病名称),以丰富我们的分类器。在我们的实证研究中,我们对语言模型和大型语言模型在事件新颖性分类任务中的最新性能进行了比较分析。研究结果表明,EpidGPT (few-shot)模型在动物健康、食品安全和气候变化领域的新颖性分类得分分别为82.3、85.49和88.97,而经过微调的EpidGPT模型在每个领域的新颖性分类得分分别为96.02、86.0和88.45。
{"title":"Novelty Detection in Event Surveillance Documents","authors":"Edmond Menya;Roberto Interdonato;Dickson Owuor;Mathieu Roche","doi":"10.1109/ACCESS.2026.3666022","DOIUrl":"https://doi.org/10.1109/ACCESS.2026.3666022","url":null,"abstract":"Event Based Surveillance (EBS) monitors online sources such as broadcast, print, web news and generates early warning and response (EWAR) signals for use in disaster mitigation. These online sources provide a dynamic data source allowing for potential real-time EBS updates. However, in dealing with news articles, fragmented information exists in varied sources and redundant information are known to overburden EBS. In this study we propose a Large Language Model based approach that filters out redundancies while learning novel information from event centric online news corpora. We study this novelty task for events covering animal health, food security and climate change surveillance domains. Our approach focuses on features integrating spatio-temporal information (such as location and date of event) and thematic information (such as the name of disease, food insecurity triggers, climate change magnitude). We characterize novelty as presence of new and additional information (e.g., a newly mentioned disease name or additional location information) as distinguished from duplicate (e.g., an already seen disease name) and missing (expected but absent) information. To this regard, our approach proposes fine-grained classification of novelty in event surveillance and language modeling adoption with a multi-class classification objective to learn classifying of event information. Our LLM adoption strategy proposes question-based prompts whose extracted answers map to predefined feature types (e.g., location, date, name of disease) in order to enrich our classifier. In our empirical studies, we present comparative analysis with respect to language models and large language models for State-Of-The-Art performance in the event novelty classification task. Our findings demonstrates the ability of cross-domain novelty classification with our model EpidGPT (few-shot) achieving <inline-formula> <tex-math>$F_{1}%$ </tex-math></inline-formula> scores of 82.3, 85.49 and 88.97 in animal health, food security and climate change domains while finetuned EpidGPT achieves <inline-formula> <tex-math>$F_{1}%$ </tex-math></inline-formula> scores of 96.02, 86.0 and 88.45 on each respectively domains.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"14 ","pages":"29566-29589"},"PeriodicalIF":3.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11407933","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SHIELD: System for Harmful Explicit-Content Identification and Evaluation Through LLM-Driven Approach SHIELD:通过法学硕士驱动的方法进行有害明确内容识别和评估的系统
IF 3.6 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-23 DOI: 10.1109/ACCESS.2026.3667099
Dishant Kapoor;Karan Ahuja;Deepika Kumar;Paanav Puri;Srinivas Jangirala;Vedika Gupta;Anandadeep Mandal
The surge in access to explicit content across various platforms has sparked major concerns, yet existing content filtering systems find it difficult to analyze different media formats leading to the spread of unchecked dissemination of harmful content. To tackle these shortcomings, the authors proposed SHIELD, which is an optimized end-to-end pipeline to detect & analyze explicit content, using a large-language-model (LLM) driven approach. SHIELD processes multi media inputs by segregating and preprocessing them, followed by converting all formats into text through advanced models, extracting meaningful textual context and subjecting the resulting data to two parallel evaluation mechanisms: an LLM-based classifier for contextual analysis, and a semantic vector-based scoring system for quantitative measurement. Explicitness classifications are output in a JSON format, which allows easy integration into real-world systems. When benchmarked against a manually curated ground truth dataset, the LLM-based system surpasses vector-based approach, with an accuracy of 93.32%, as against 67.81%. The pipeline shows robustness across all media types and file sizes, confirming its viability as a scalable, context-aware solution.
各种平台上对露露内容的访问激增引发了重大关注,然而现有的内容过滤系统发现很难分析不同的媒体格式,导致有害内容的不受控制的传播。为了解决这些缺点,作者提出了SHIELD,这是一个优化的端到端管道,用于检测和分析显式内容,使用大语言模型(LLM)驱动的方法。SHIELD对多媒体输入进行分离和预处理,然后通过高级模型将所有格式转换为文本,提取有意义的文本上下文,并将结果数据置于两种并行评估机制中:基于llm的上下文分析分类器和基于语义向量的定量测量评分系统。显式分类以JSON格式输出,这允许轻松集成到实际系统中。当与人工编制的地面真实数据集进行基准测试时,基于llm的系统优于基于向量的方法,准确率为93.32%,而基于向量的方法的准确率为67.81%。该管道在所有媒体类型和文件大小上都显示出健壮性,证实了它作为可扩展的、上下文感知的解决方案的可行性。
{"title":"SHIELD: System for Harmful Explicit-Content Identification and Evaluation Through LLM-Driven Approach","authors":"Dishant Kapoor;Karan Ahuja;Deepika Kumar;Paanav Puri;Srinivas Jangirala;Vedika Gupta;Anandadeep Mandal","doi":"10.1109/ACCESS.2026.3667099","DOIUrl":"https://doi.org/10.1109/ACCESS.2026.3667099","url":null,"abstract":"The surge in access to explicit content across various platforms has sparked major concerns, yet existing content filtering systems find it difficult to analyze different media formats leading to the spread of unchecked dissemination of harmful content. To tackle these shortcomings, the authors proposed SHIELD, which is an optimized end-to-end pipeline to detect & analyze explicit content, using a large-language-model (LLM) driven approach. SHIELD processes multi media inputs by segregating and preprocessing them, followed by converting all formats into text through advanced models, extracting meaningful textual context and subjecting the resulting data to two parallel evaluation mechanisms: an LLM-based classifier for contextual analysis, and a semantic vector-based scoring system for quantitative measurement. Explicitness classifications are output in a JSON format, which allows easy integration into real-world systems. When benchmarked against a manually curated ground truth dataset, the LLM-based system surpasses vector-based approach, with an accuracy of 93.32%, as against 67.81%. The pipeline shows robustness across all media types and file sizes, confirming its viability as a scalable, context-aware solution.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"14 ","pages":"29493-29522"},"PeriodicalIF":3.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11406083","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Computationally Efficient Multi-Objective Design Optimization of SRM Using K-Means Clustering and Artificial Neural Networks 基于k -均值聚类和人工神经网络的SRM多目标优化设计
IF 3.6 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-23 DOI: 10.1109/ACCESS.2026.3664958
Sandesh Bhaktha B;P. N. Sarath Kannan;Jeyaraj Pitchaimani;K. V. Gangadharan
This paper presents a novel methodology for realizing a computationally efficient multi-objective design optimization (CE-MODO) of switched reluctance motors (SRMs). Existing MODO approaches rely heavily on static and dynamic finite element analysis (FEA) to evaluate the electromagnetic performance parameters, which, despite their accuracy, incur significant computational costs. While analytical models reduce the dependence on static FEA, their limited accuracy constrains their applicability. Similarly, existing machine learning methods are inadequate for mapping multiple geometric parameters (GPs) to static characteristics across varied SRM designs. To overcome these limitations, this study proposes a novel integrated approach that combines K-means clustering with an artificial neural network (ANN) to enable accurate and efficient prediction of static characteristics. This technique led to a 52.16% decrease in the total computation time for determining the static characteristics of the SRM designs. Further, dynamic performance is evaluated using a MATLAB/Simulink-based SRM drive model, offering a computationally lightweight alternative to dynamic FEA. The proposed CE-MODO framework is applied to a four-phase 8/6 SRM topology designed for an electric three-wheeler, with average torque and electromagnetic losses as optimization objectives. Optimization is carried out by coupling the nondominated sorting genetic algorithm II (NSGA-II) with Kriging surrogate models, significantly reducing the computational load. The proposed methodology achieved an 11.01% improvement in average torque and a 4.56% reduction in electromagnetic losses compared to the initial design. The FEA models corresponding to both static and dynamic analyses employed in this study are further validated through experimental testing on a fabricated SRM prototype.
本文提出了一种实现开关磁阻电机多目标优化设计的新方法。现有的MODO方法在很大程度上依赖于静态和动态有限元分析(FEA)来评估电磁性能参数,尽管这些方法精度高,但会产生巨大的计算成本。解析模型虽然减少了对静力有限元的依赖,但其有限的精度限制了其适用性。同样,现有的机器学习方法也不足以将多个几何参数(GPs)映射到不同SRM设计中的静态特征。为了克服这些限制,本研究提出了一种新的集成方法,将k均值聚类与人工神经网络(ANN)相结合,以实现准确有效的静态特征预测。该技术使确定SRM设计静态特性的总计算时间减少了52.16%。此外,使用基于MATLAB/ simulink的SRM驱动模型评估动态性能,为动态有限元分析提供了计算轻量级的替代方案。提出的CE-MODO框架应用于电动三轮车的四相8/6 SRM拓扑,以平均扭矩和电磁损耗为优化目标。通过将非支配排序遗传算法II (NSGA-II)与Kriging代理模型耦合进行优化,显著降低了计算量。与初始设计相比,该方法的平均扭矩提高了11.01%,电磁损耗降低了4.56%。通过制造的SRM样机的实验测试,进一步验证了本研究中所采用的静态和动态分析的有限元模型。
{"title":"A Computationally Efficient Multi-Objective Design Optimization of SRM Using K-Means Clustering and Artificial Neural Networks","authors":"Sandesh Bhaktha B;P. N. Sarath Kannan;Jeyaraj Pitchaimani;K. V. Gangadharan","doi":"10.1109/ACCESS.2026.3664958","DOIUrl":"https://doi.org/10.1109/ACCESS.2026.3664958","url":null,"abstract":"This paper presents a novel methodology for realizing a computationally efficient multi-objective design optimization (CE-MODO) of switched reluctance motors (SRMs). Existing MODO approaches rely heavily on static and dynamic finite element analysis (FEA) to evaluate the electromagnetic performance parameters, which, despite their accuracy, incur significant computational costs. While analytical models reduce the dependence on static FEA, their limited accuracy constrains their applicability. Similarly, existing machine learning methods are inadequate for mapping multiple geometric parameters (GPs) to static characteristics across varied SRM designs. To overcome these limitations, this study proposes a novel integrated approach that combines K-means clustering with an artificial neural network (ANN) to enable accurate and efficient prediction of static characteristics. This technique led to a 52.16% decrease in the total computation time for determining the static characteristics of the SRM designs. Further, dynamic performance is evaluated using a MATLAB/Simulink-based SRM drive model, offering a computationally lightweight alternative to dynamic FEA. The proposed CE-MODO framework is applied to a four-phase 8/6 SRM topology designed for an electric three-wheeler, with average torque and electromagnetic losses as optimization objectives. Optimization is carried out by coupling the nondominated sorting genetic algorithm II (NSGA-II) with Kriging surrogate models, significantly reducing the computational load. The proposed methodology achieved an 11.01% improvement in average torque and a 4.56% reduction in electromagnetic losses compared to the initial design. The FEA models corresponding to both static and dynamic analyses employed in this study are further validated through experimental testing on a fabricated SRM prototype.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"14 ","pages":"29729-29747"},"PeriodicalIF":3.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11407487","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Satellite-Based Rainfall Datasets: A Global Systematic Review of Applications, Accuracy, and Research Gaps 基于卫星的降雨数据集:应用、准确性和研究差距的全球系统综述
IF 3.6 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-23 DOI: 10.1109/ACCESS.2026.3667060
Luiza Chiarelli Conte;Rutineia Tassi;Débora Missio Bayer
In the context of increasing climate variability and the gradual decline of ground-based observation networks, satellite-based rainfall estimates (SREs) have become indispensable tools for hydrological monitoring disaster preparedness, and climate modeling. Satellite technology has evolved rapidly in recent years, with new missions, sensors, and techniques implemented by agencies and researchers to improve SRE products. This study presents a global systematic review explicitly applying the PRISMA methodology, offering a structured and reproducible framework for evidence synthesis. It evaluates the performance of the most widely used SREs across all continents from January 2018 to November 2025, with particular emphasis on their application in data-scarce and hydrologically complex regions. Drawing from 636 peer-reviewed studies, the review identifies key factors affecting the accuracy of SREs, including topography, rainfall type, and seasonality. Notably, products that integrate satellite data with ground-based observations consistently demonstrate superior performance compared to satellite-only estimates. Among them, IMERG-Final and CHIRPS stand out as the most widely used datasets worldwide, with IMERG-Final showing particularly promising performance across most continents. The findings highlight the need for future research to prioritize the development of advanced bias correction algorithms, region-specific calibration methods, and hybrid models that incorporate additional meteorological variables. Although previous reviews have addressed this approach, the present synthesis offers an updated and concise reference for selecting suitable SREs across diverse environmental and operational contexts.
在气候变率增加和地面观测网络逐渐减少的背景下,基于卫星的降雨估计(SREs)已成为水文监测、备灾和气候建模不可或缺的工具。近年来,卫星技术发展迅速,各机构和研究人员实施了新的任务、传感器和技术,以改进SRE产品。本研究提出了一个明确应用PRISMA方法的全球系统综述,为证据合成提供了一个结构化和可重复的框架。该报告评估了2018年1月至2025年11月各大洲使用最广泛的SREs的表现,特别强调了它们在数据稀缺和水文复杂地区的应用。根据636项同行评议的研究,该综述确定了影响SREs准确性的关键因素,包括地形、降雨类型和季节性。值得注意的是,将卫星数据与地面观测相结合的产品始终表现出优于仅使用卫星的估计的性能。其中,imergr - final和CHIRPS是全球使用最广泛的数据集,imergr - final在大多数大洲都表现出特别有前景的性能。这些发现强调了未来的研究需要优先发展先进的偏差校正算法、特定区域的校准方法和包含额外气象变量的混合模型。虽然以前的评论已经讨论了这种方法,但本综述为在不同的环境和业务背景下选择合适的SREs提供了更新和简明的参考。
{"title":"Satellite-Based Rainfall Datasets: A Global Systematic Review of Applications, Accuracy, and Research Gaps","authors":"Luiza Chiarelli Conte;Rutineia Tassi;Débora Missio Bayer","doi":"10.1109/ACCESS.2026.3667060","DOIUrl":"https://doi.org/10.1109/ACCESS.2026.3667060","url":null,"abstract":"In the context of increasing climate variability and the gradual decline of ground-based observation networks, satellite-based rainfall estimates (SREs) have become indispensable tools for hydrological monitoring disaster preparedness, and climate modeling. Satellite technology has evolved rapidly in recent years, with new missions, sensors, and techniques implemented by agencies and researchers to improve SRE products. This study presents a global systematic review explicitly applying the PRISMA methodology, offering a structured and reproducible framework for evidence synthesis. It evaluates the performance of the most widely used SREs across all continents from January 2018 to November 2025, with particular emphasis on their application in data-scarce and hydrologically complex regions. Drawing from 636 peer-reviewed studies, the review identifies key factors affecting the accuracy of SREs, including topography, rainfall type, and seasonality. Notably, products that integrate satellite data with ground-based observations consistently demonstrate superior performance compared to satellite-only estimates. Among them, IMERG-Final and CHIRPS stand out as the most widely used datasets worldwide, with IMERG-Final showing particularly promising performance across most continents. The findings highlight the need for future research to prioritize the development of advanced bias correction algorithms, region-specific calibration methods, and hybrid models that incorporate additional meteorological variables. Although previous reviews have addressed this approach, the present synthesis offers an updated and concise reference for selecting suitable SREs across diverse environmental and operational contexts.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"14 ","pages":"29539-29565"},"PeriodicalIF":3.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11405855","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anticipating Financial Risk: Machine Learning for Debt Management in Telecommunications 预测金融风险:电信债务管理的机器学习
IF 3.6 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-23 DOI: 10.1109/ACCESS.2026.3665976
Filipe O. F. Arsénio;António Raimundo;João Pedro C. B. B. Pavia
The telecommunications industry is characterized by intense competition and rapid technological evolution, making financial stability a critical factor for sustained growth. This work focuses on leveraging machine learning techniques to analyze and predict customer payment behavior within a Portuguese telecommunications company, aiming to reduce financial losses associated with unpaid debts. Using the CRISP-DM methodology, the project first develops supervised learning models to predict whether customers will remain good payers, based solely on internal data. Among the algorithms tested, Random Forest achieved the highest accuracy of 99%, enabling early identification of potential defaulters. Complementing this, unsupervised learning methods, specifically Principal Component Analysis for dimensionality reduction and K-Means clustering, uncover hidden behavioral segments within the customer base. The optimal clustering identified five distinct groups, some of which show near-homogeneous target values (close to 0 or 1), allowing for strong characterization of compliant and non-compliant profiles. The findings demonstrate the effectiveness of combining supervised and unsupervised learning for risk analysis. Supervised models allow scenario testing by altering feature values to simulate changes in payment behavior. In unsupervised learning, analyzing ambiguous clusters through comparison with more definitive ones helps estimate likely client outcomes and supports proactive management. Future work may explore focused clustering of non-compliant clients, alternative data preprocessing, and time series forecasting to further improve predictive accuracy and operational utility.
电信业的特点是竞争激烈,技术发展迅速,使金融稳定成为持续增长的关键因素。这项工作的重点是利用机器学习技术来分析和预测葡萄牙电信公司的客户支付行为,旨在减少与未偿债务相关的财务损失。使用CRISP-DM方法,该项目首先开发监督学习模型,仅基于内部数据来预测客户是否仍将是良好的付款人。在测试的算法中,Random Forest达到了99%的最高准确率,能够早期识别潜在的违约者。作为补充,无监督学习方法,特别是用于降维和k均值聚类的主成分分析,揭示了客户群中隐藏的行为细分。最优聚类确定了五个不同的组,其中一些显示出接近均匀的目标值(接近0或1),从而允许对符合和不符合的配置文件进行强烈的表征。研究结果表明,将监督学习和非监督学习结合起来进行风险分析是有效的。监督模型允许通过改变特征值来模拟支付行为的变化来进行场景测试。在无监督学习中,通过与更明确的集群进行比较来分析模糊集群有助于估计可能的客户结果,并支持主动管理。未来的工作可能会探索不合规客户的集中聚类,替代数据预处理和时间序列预测,以进一步提高预测准确性和操作效用。
{"title":"Anticipating Financial Risk: Machine Learning for Debt Management in Telecommunications","authors":"Filipe O. F. Arsénio;António Raimundo;João Pedro C. B. B. Pavia","doi":"10.1109/ACCESS.2026.3665976","DOIUrl":"https://doi.org/10.1109/ACCESS.2026.3665976","url":null,"abstract":"The telecommunications industry is characterized by intense competition and rapid technological evolution, making financial stability a critical factor for sustained growth. This work focuses on leveraging machine learning techniques to analyze and predict customer payment behavior within a Portuguese telecommunications company, aiming to reduce financial losses associated with unpaid debts. Using the CRISP-DM methodology, the project first develops supervised learning models to predict whether customers will remain good payers, based solely on internal data. Among the algorithms tested, Random Forest achieved the highest accuracy of 99%, enabling early identification of potential defaulters. Complementing this, unsupervised learning methods, specifically Principal Component Analysis for dimensionality reduction and K-Means clustering, uncover hidden behavioral segments within the customer base. The optimal clustering identified five distinct groups, some of which show near-homogeneous target values (close to 0 or 1), allowing for strong characterization of compliant and non-compliant profiles. The findings demonstrate the effectiveness of combining supervised and unsupervised learning for risk analysis. Supervised models allow scenario testing by altering feature values to simulate changes in payment behavior. In unsupervised learning, analyzing ambiguous clusters through comparison with more definitive ones helps estimate likely client outcomes and supports proactive management. Future work may explore focused clustering of non-compliant clients, alternative data preprocessing, and time series forecasting to further improve predictive accuracy and operational utility.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"14 ","pages":"29523-29538"},"PeriodicalIF":3.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11407965","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Contextual Biasing in Chinese ASR via Multimodal Large Language Models 基于多模态大语言模型的汉语ASR语境偏向研究
IF 3.6 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-23 DOI: 10.1109/ACCESS.2026.3667399
Kai Zhang;Qiuxia Zhang;Chung-Che Wang;Jyh-Shing Roger Jang
This study addresses the challenge in Chinese automatic speech recognition (ASR) systems of accurately recognizing proper nouns such as place names, personal names, song titles, and movie or TV show titles, which is often hindered by tonal features and abundant homophones. We propose an integrated contextual biasing framework centered on multimodal large language models (MLLMs) to enhance the system’s context awareness and task adaptability. The core of this framework is an intent-driven dynamic contextual biasing mechanism: first, a fine-tuned MLLM performs end-to-end intent recognition, achieving an 81.82% relative error rate reduction compared to the unfine-tuned model and a 66.71% reduction relative to a cascaded model; subsequently, based on the highly accurate intent predictions, context-relevant keyword prompts are dynamically generated to guide speech recognition. Models fine-tuned using this strategy demonstrate significant improvements in both character error rate (CER) and keyword error rate (KER), with a 41.48% relative error reduction in KER. To address the cold-start problem, we also develop an automated data generation pipeline that requires only a domain-specific list of proper nouns to generate natural sentences using a small language model, followed by speech synthesis to produce training audio. Experiments show that models fine-tuned with synthetic data achieve a 41.91% relative error reduction in keyword recognition, nearly matching the performance of models trained on real annotated data. Overall, this work provides an innovative framework for contextual biasing in Chinese ASR and demonstrates, through open-source code and evaluation standards, the potential of multimodal large language models in integrating speech understanding and recognition tasks.
本文针对中文语音自动识别系统在准确识别地名、人名、歌名、影视片名等专有名词时,经常受到声调特征和大量同音异义词的阻碍。我们提出了一个以多模态大语言模型(mllm)为中心的集成上下文偏置框架,以增强系统的上下文感知和任务适应性。该框架的核心是一个意图驱动的动态上下文偏置机制:首先,经过微调的MLLM执行端到端意图识别,与未微调的模型相比,相对错误率降低了81.82%,与级联模型相比,相对错误率降低了66.71%;随后,基于高度准确的意图预测,动态生成与上下文相关的关键字提示,指导语音识别。使用该策略进行微调的模型在字符错误率(CER)和关键字错误率(KER)方面都有显著改善,其中KER的相对错误率降低了41.48%。为了解决冷启动问题,我们还开发了一个自动数据生成管道,该管道只需要特定领域的专有名词列表来使用小型语言模型生成自然句子,然后进行语音合成以生成训练音频。实验表明,经过合成数据微调后的模型在关键字识别方面的相对误差降低了41.91%,接近于在真实标注数据上训练的模型的性能。总的来说,这项工作为中文ASR中的语境偏见提供了一个创新的框架,并通过开源代码和评估标准,展示了多模态大型语言模型在集成语音理解和识别任务方面的潜力。
{"title":"Improving Contextual Biasing in Chinese ASR via Multimodal Large Language Models","authors":"Kai Zhang;Qiuxia Zhang;Chung-Che Wang;Jyh-Shing Roger Jang","doi":"10.1109/ACCESS.2026.3667399","DOIUrl":"https://doi.org/10.1109/ACCESS.2026.3667399","url":null,"abstract":"This study addresses the challenge in Chinese automatic speech recognition (ASR) systems of accurately recognizing proper nouns such as place names, personal names, song titles, and movie or TV show titles, which is often hindered by tonal features and abundant homophones. We propose an integrated contextual biasing framework centered on multimodal large language models (MLLMs) to enhance the system’s context awareness and task adaptability. The core of this framework is an intent-driven dynamic contextual biasing mechanism: first, a fine-tuned MLLM performs end-to-end intent recognition, achieving an 81.82% relative error rate reduction compared to the unfine-tuned model and a 66.71% reduction relative to a cascaded model; subsequently, based on the highly accurate intent predictions, context-relevant keyword prompts are dynamically generated to guide speech recognition. Models fine-tuned using this strategy demonstrate significant improvements in both character error rate (CER) and keyword error rate (KER), with a 41.48% relative error reduction in KER. To address the cold-start problem, we also develop an automated data generation pipeline that requires only a domain-specific list of proper nouns to generate natural sentences using a small language model, followed by speech synthesis to produce training audio. Experiments show that models fine-tuned with synthetic data achieve a 41.91% relative error reduction in keyword recognition, nearly matching the performance of models trained on real annotated data. Overall, this work provides an innovative framework for contextual biasing in Chinese ASR and demonstrates, through open-source code and evaluation standards, the potential of multimodal large language models in integrating speech understanding and recognition tasks.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"14 ","pages":"29706-29728"},"PeriodicalIF":3.6,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11408189","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GDNConv: A Novel Graph Deformation Network for Robust Representation Learning on Noisy Graph Structures GDNConv:一种用于噪声图结构鲁棒表示学习的新型图变形网络
IF 3.6 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-20 DOI: 10.1109/ACCESS.2026.3666584
Vinay Santhosh Chitla;Hemantha Kumar Kalluri;Satya Krishna Nunna;Mahesh Kumar Morampudi
Graph Neural Networks have emerged as powerful tools for analyzing graph-structured data. However, their performance often varies across datasets due to challenges such as noisy edges, sparse connectivity, and over-smoothing in deep layers. To address these limitations, Graph Deformation Network Convolution (GDNConv) is proposed as a novel graph convolution model that incorporates four key innovations: dynamic edge weight learning to filter noisy connections, graph attention deformation to prioritize relevant neighbors, multi-level aggregation to capture multi-scale patterns, and self-regularization to stabilize training. This proposed model demonstrates robustness and scalability, particularly for real-world applications involving complex and noisy graph structures, such as social networks and recommendation systems. It has the ability to dynamically adapt graph topology during training and superior performance on both dense and sparse datasets highlight its potential as a versatile solution for graph-based learning tasks. Additionally, GDNConv’s computational efficiency and self-regularization mechanisms make it suitable for large-scale applications where resource constraints are a concern. The proposed model is evaluated on four benchmark datasets—Cora, CiteSeer, PubMed, and ogbn-arxiv—and compared with several state-of-the-art models, including Graph Convolutional Network, Graph Attention Network, and Graph Sample and Aggregate. The experimental results demonstrate that the proposed model consistently outperforms these baseline approaches, achieving improvements of 4.7% and 4.2% in both accuracy and F1 Score.
图神经网络已经成为分析图结构数据的强大工具。然而,由于诸如噪声边缘、稀疏连接和深层过度平滑等挑战,它们的性能经常在数据集之间变化。为了解决这些限制,图变形网络卷积(GDNConv)被提出作为一种新的图卷积模型,它包含了四个关键的创新:动态边缘权重学习以过滤噪声连接,图注意力变形以优先考虑相关邻居,多层次聚集以捕获多尺度模式,以及自正则化以稳定训练。该模型展示了鲁棒性和可扩展性,特别是对于涉及复杂和噪声图结构的现实应用,如社交网络和推荐系统。它具有在训练过程中动态适应图拓扑的能力,并且在密集和稀疏数据集上的优越性能突出了它作为基于图的学习任务的通用解决方案的潜力。此外,GDNConv的计算效率和自正则化机制使其适合关注资源约束的大规模应用程序。该模型在四个基准数据集(cora、CiteSeer、PubMed和ogbn-arxiv)上进行了评估,并与几个最先进的模型(包括图卷积网络、图注意力网络和图样本和聚合)进行了比较。实验结果表明,所提出的模型始终优于这些基线方法,在准确率和F1分数方面分别提高了4.7%和4.2%。
{"title":"GDNConv: A Novel Graph Deformation Network for Robust Representation Learning on Noisy Graph Structures","authors":"Vinay Santhosh Chitla;Hemantha Kumar Kalluri;Satya Krishna Nunna;Mahesh Kumar Morampudi","doi":"10.1109/ACCESS.2026.3666584","DOIUrl":"https://doi.org/10.1109/ACCESS.2026.3666584","url":null,"abstract":"Graph Neural Networks have emerged as powerful tools for analyzing graph-structured data. However, their performance often varies across datasets due to challenges such as noisy edges, sparse connectivity, and over-smoothing in deep layers. To address these limitations, Graph Deformation Network Convolution (GDNConv) is proposed as a novel graph convolution model that incorporates four key innovations: dynamic edge weight learning to filter noisy connections, graph attention deformation to prioritize relevant neighbors, multi-level aggregation to capture multi-scale patterns, and self-regularization to stabilize training. This proposed model demonstrates robustness and scalability, particularly for real-world applications involving complex and noisy graph structures, such as social networks and recommendation systems. It has the ability to dynamically adapt graph topology during training and superior performance on both dense and sparse datasets highlight its potential as a versatile solution for graph-based learning tasks. Additionally, GDNConv’s computational efficiency and self-regularization mechanisms make it suitable for large-scale applications where resource constraints are a concern. The proposed model is evaluated on four benchmark datasets—Cora, CiteSeer, PubMed, and ogbn-arxiv—and compared with several state-of-the-art models, including Graph Convolutional Network, Graph Attention Network, and Graph Sample and Aggregate. The experimental results demonstrate that the proposed model consistently outperforms these baseline approaches, achieving improvements of 4.7% and 4.2% in both accuracy and F1 Score.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"14 ","pages":"29610-29627"},"PeriodicalIF":3.6,"publicationDate":"2026-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11404168","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147292791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond Frames: 3D-CoAtNet for Generalizable Deepfake Video Detection 超越帧:3D-CoAtNet用于通用深度假视频检测
IF 3.6 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-20 DOI: 10.1109/ACCESS.2026.3666623
Eman Alattas;John Clark;Bassma Alsulami;Salma Kammoun Jarraya
Deepfakes pose a growing risk to digital integrity and public trust, driving the need for robust video-level forgery-detection methods. Many existing approaches analyse individual frames independently and overlook temporal dependencies, thereby weakening the generalisation to unseen manipulation techniques. This paper introduces 3D-CoAtNet, a spatiotemporal architecture for deepfake video detection that processes multiple frames simultaneously, thereby reducing reliance on single-frame artefacts. The model inflates CoAtNet’s 2D convolutional, residual, pooling, and self-attention layers into their 3D counterparts to learn spatial and temporal representations from multiple frames. We evaluated two input modalities: RGB 15-frame clips sampled from each video, and 15-frame optical-flow sequences that capture motion cues. Extensive experiments on FaceForensics++ (FF++), DFDC, and Celeb-DF under intra- and cross-dataset settings show that 3D-CoAtNet is competitive in intra-dataset evaluations (best in the DeepFakes dataset) and transfers well to Celeb-DF. Moreover, although frame-based CoAtNet16A achieves strong within-dataset accuracy, 3D-CoAtNet improves cross-dataset generalisation. These findings highlight the importance of the proposed 3D-CoAtNet model for deepfake forensics.
深度造假对数字完整性和公众信任构成越来越大的风险,推动了对强大的视频级伪造检测方法的需求。许多现有的方法独立分析单个帧,忽略了时间依赖性,从而削弱了对看不见的操作技术的泛化。本文介绍了3D-CoAtNet,这是一种用于深度假视频检测的时空架构,可以同时处理多个帧,从而减少对单帧伪影的依赖。该模型将CoAtNet的2D卷积层、残差层、池化层和自关注层扩展到3D层,以从多个帧中学习空间和时间表示。我们评估了两种输入模式:从每个视频中采样的RGB 15帧剪辑,以及捕获运动线索的15帧光流序列。在内部和跨数据集设置下,对face取证++ (FF++)、DFDC和Celeb-DF进行的大量实验表明,3D-CoAtNet在数据集内部评估中具有竞争力(在DeepFakes数据集中表现最佳),并且可以很好地转移到Celeb-DF。此外,尽管基于帧的CoAtNet16A在数据集内实现了很强的准确性,但3D-CoAtNet提高了跨数据集的泛化。这些发现突出了提出的3D-CoAtNet模型对深度伪造取证的重要性。
{"title":"Beyond Frames: 3D-CoAtNet for Generalizable Deepfake Video Detection","authors":"Eman Alattas;John Clark;Bassma Alsulami;Salma Kammoun Jarraya","doi":"10.1109/ACCESS.2026.3666623","DOIUrl":"https://doi.org/10.1109/ACCESS.2026.3666623","url":null,"abstract":"Deepfakes pose a growing risk to digital integrity and public trust, driving the need for robust video-level forgery-detection methods. Many existing approaches analyse individual frames independently and overlook temporal dependencies, thereby weakening the generalisation to unseen manipulation techniques. This paper introduces 3D-CoAtNet, a spatiotemporal architecture for deepfake video detection that processes multiple frames simultaneously, thereby reducing reliance on single-frame artefacts. The model inflates CoAtNet’s 2D convolutional, residual, pooling, and self-attention layers into their 3D counterparts to learn spatial and temporal representations from multiple frames. We evaluated two input modalities: RGB 15-frame clips sampled from each video, and 15-frame optical-flow sequences that capture motion cues. Extensive experiments on FaceForensics++ (FF++), DFDC, and Celeb-DF under intra- and cross-dataset settings show that 3D-CoAtNet is competitive in intra-dataset evaluations (best in the DeepFakes dataset) and transfers well to Celeb-DF. Moreover, although frame-based CoAtNet16A achieves strong within-dataset accuracy, 3D-CoAtNet improves cross-dataset generalisation. These findings highlight the importance of the proposed 3D-CoAtNet model for deepfake forensics.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"14 ","pages":"29692-29705"},"PeriodicalIF":3.6,"publicationDate":"2026-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11404125","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Access
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1