首页 > 最新文献

Frontiers in Big Data最新文献

英文 中文
Adaptive model for rate of penetration prediction based on the dynamic correlation of influencing factors. 基于影响因素动态相关性的侵彻率预测自适应模型。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-05 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1676054
Yonggang Deng, Xiaojing Zhou, Zixuan Feng, Xin Li, Hui Li

Introduction: Accurately predicting the rate of penetration (ROP) is a critical benchmark for evaluating operational efficiency in drilling operations, and it is necessary to optimize the drilling parameters and construct an accurate ROP prediction model. At present, the correlations between drilling operation parameters and the ROP are commonly evaluated using a static assessment, which overlooks dynamic changes in parameter correlations during drilling processes.

Method: An adaptive ROP prediction model that incorporates depth-varying correlations of influential parameters is constructed. This model can automatically identify the dynamic correlations of the modeling parameters at different depths of well sections, and the optimal modeling parameters for adaptive training are selected based on the ranking of the correlation coefficients.

Results: An analysis of 33 drilling parameters across 4,837 datasets collected from 4 wellbores in Sichuan. The comparison analysis revealed that at different well sections, the dynamic correlation coefficient of each parameter deviates significantly from the overall correlation coefficient. According to the proposed model, it can dynamically select key parameters and achieve self-update based on real-time data streams, avoiding the defect of traditional fixed-parameter models that ignore the dynamic changes of well sections.

Discussion: Modeling comparison analysis revealed that in multiple rounds of prediction based on dynamic correlations, the prediction accuracy in 93% of the prediction rounds exceeded that of the overall correlation, indicating that the adaptive ROP prediction model with dynamic correlations has high application value.

在钻井作业中,准确预测钻速(ROP)是评价作业效率的重要基准,优化钻井参数,构建准确的ROP预测模型是必要的。目前,钻井作业参数与ROP之间的相关性通常采用静态评估,忽略了钻井过程中参数相关性的动态变化。方法:建立了考虑影响参数深度变化相关性的自适应ROP预测模型。该模型能够自动识别不同井段深度建模参数的动态相关性,并根据相关系数排序选择最优建模参数进行自适应训练。结果:对四川4口井4837个数据集的33个钻井参数进行了分析。对比分析表明,在不同井段,各参数的动态相关系数与整体相关系数存在较大偏差。该模型能够根据实时数据流动态选择关键参数并实现自更新,避免了传统固定参数模型忽略井段动态变化的缺陷。讨论:模型对比分析显示,在基于动态相关性的多轮预测中,93%的预测轮的预测精度超过了整体相关性的预测精度,表明具有动态相关性的自适应ROP预测模型具有较高的应用价值。
{"title":"Adaptive model for rate of penetration prediction based on the dynamic correlation of influencing factors.","authors":"Yonggang Deng, Xiaojing Zhou, Zixuan Feng, Xin Li, Hui Li","doi":"10.3389/fdata.2025.1676054","DOIUrl":"10.3389/fdata.2025.1676054","url":null,"abstract":"<p><strong>Introduction: </strong>Accurately predicting the rate of penetration (ROP) is a critical benchmark for evaluating operational efficiency in drilling operations, and it is necessary to optimize the drilling parameters and construct an accurate ROP prediction model. At present, the correlations between drilling operation parameters and the ROP are commonly evaluated using a static assessment, which overlooks dynamic changes in parameter correlations during drilling processes.</p><p><strong>Method: </strong>An adaptive ROP prediction model that incorporates depth-varying correlations of influential parameters is constructed. This model can automatically identify the dynamic correlations of the modeling parameters at different depths of well sections, and the optimal modeling parameters for adaptive training are selected based on the ranking of the correlation coefficients.</p><p><strong>Results: </strong>An analysis of 33 drilling parameters across 4,837 datasets collected from 4 wellbores in Sichuan. The comparison analysis revealed that at different well sections, the dynamic correlation coefficient of each parameter deviates significantly from the overall correlation coefficient. According to the proposed model, it can dynamically select key parameters and achieve self-update based on real-time data streams, avoiding the defect of traditional fixed-parameter models that ignore the dynamic changes of well sections.</p><p><strong>Discussion: </strong>Modeling comparison analysis revealed that in multiple rounds of prediction based on dynamic correlations, the prediction accuracy in 93% of the prediction rounds exceeded that of the overall correlation, indicating that the adaptive ROP prediction model with dynamic correlations has high application value.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1676054"},"PeriodicalIF":2.4,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12812677/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146012440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of artificial intelligence in cervical cytology: a systematic review of deep learning models, datasets, and reported metrics. 人工智能在宫颈细胞学中的应用:深度学习模型、数据集和报告指标的系统回顾。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-02 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1678863
Miguel Angel Valles-Coral, Lloy Pinedo, Ciro Rodríguez, Diego Rodríguez, Keller Sánchez-Dávila, Lolita Arévalo-Fasanando, Nelly Reátegui-Lozano

Introduction: The use of artificial intelligence (AI) in cervical cytology has increased substantially due to the need for automated tools that support the early detection of precancerous lesions.

Methods: This systematic review examined deep learning models applied to cervical cytology images, focusing on the architectures used, the datasets employed, and the performance metrics reported. Articles published between 2022 and 2025 were retrieved from Scopus using PRISMA methodology. After applying inclusion criteria and full-text screening, 77 studies were included for RQ1 (models), 75 for RQ2 (datasets), and 71 for RQ3 (metrics).

Results: Hybrid models were the most prevalent (56%), followed by convolutional neural networks (CNNs) and a growing number of Vision Transformer (ViT)-based approaches. SIPaKMeD and Herlev were the most frequently used datasets, although the use of private datasets is increasing. Accuracy was the most commonly reported metric (mean 87.76%), followed by precision, recall, and F1-score. Several hybrid and ViT-based models exceeded 92% accuracy. Identified limitations included limited cross-validation, reduced clinical representativeness of datasets, and inconsistent diagnostic criteria.

Discussion: This review synthesizes current trends in AI-based cervical cytology, highlights common methodological limitations, and proposes directions for future research to enhance clinical applicability and standardization.

导语:人工智能(AI)在宫颈细胞学中的应用已经大大增加,因为需要支持早期检测癌前病变的自动化工具。方法:本系统综述研究了应用于宫颈细胞学图像的深度学习模型,重点关注所使用的架构、所使用的数据集和所报告的性能指标。使用PRISMA方法从Scopus检索2022年至2025年间发表的文章。在应用纳入标准和全文筛选后,RQ1纳入了77项研究(模型),RQ2纳入了75项研究(数据集),RQ3纳入了71项研究(指标)。结果:混合模型最为普遍(56%),其次是卷积神经网络(cnn)和越来越多的基于视觉变换(ViT)的方法。SIPaKMeD和Herlev是最常用的数据集,尽管私人数据集的使用正在增加。准确率是最常见的指标(平均87.76%),其次是准确率、召回率和f1评分。一些混合模型和基于vit的模型准确率超过92%。确定的局限性包括交叉验证有限,数据集的临床代表性降低,诊断标准不一致。讨论:本文综述了目前人工智能宫颈细胞学的发展趋势,强调了常见的方法局限性,并提出了未来的研究方向,以提高临床适用性和标准化。
{"title":"Application of artificial intelligence in cervical cytology: a systematic review of deep learning models, datasets, and reported metrics.","authors":"Miguel Angel Valles-Coral, Lloy Pinedo, Ciro Rodríguez, Diego Rodríguez, Keller Sánchez-Dávila, Lolita Arévalo-Fasanando, Nelly Reátegui-Lozano","doi":"10.3389/fdata.2025.1678863","DOIUrl":"10.3389/fdata.2025.1678863","url":null,"abstract":"<p><strong>Introduction: </strong>The use of artificial intelligence (AI) in cervical cytology has increased substantially due to the need for automated tools that support the early detection of precancerous lesions.</p><p><strong>Methods: </strong>This systematic review examined deep learning models applied to cervical cytology images, focusing on the architectures used, the datasets employed, and the performance metrics reported. Articles published between 2022 and 2025 were retrieved from Scopus using PRISMA methodology. After applying inclusion criteria and full-text screening, 77 studies were included for RQ1 (models), 75 for RQ2 (datasets), and 71 for RQ3 (metrics).</p><p><strong>Results: </strong>Hybrid models were the most prevalent (56%), followed by convolutional neural networks (CNNs) and a growing number of Vision Transformer (ViT)-based approaches. SIPaKMeD and Herlev were the most frequently used datasets, although the use of private datasets is increasing. Accuracy was the most commonly reported metric (mean 87.76%), followed by precision, recall, and F1-score. Several hybrid and ViT-based models exceeded 92% accuracy. Identified limitations included limited cross-validation, reduced clinical representativeness of datasets, and inconsistent diagnostic criteria.</p><p><strong>Discussion: </strong>This review synthesizes current trends in AI-based cervical cytology, highlights common methodological limitations, and proposes directions for future research to enhance clinical applicability and standardization.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1678863"},"PeriodicalIF":2.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12807953/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145999624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Time series forecasting for bug resolution using machine learning and deep learning models. 使用机器学习和深度学习模型进行bug解决的时间序列预测。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-19 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1745751
Lerina Aversano, Martina Iammarino, Antonella Madau, Fabiano Pecorelli

Predicting bug fix times is a key objective for improving software maintenance and supporting planning in open source projects. In this study, we evaluate the effectiveness of different time series forecasting models applied to real-world data from multiple repositories, comparing local (one model per project) and global (a single model trained across multiple projects) approaches. We considered classical models (Naive, Linear Regression, Random Forest) and neural networks (MLP, LSTM, GRU), with global extensions including Random Forest and LSTM with project embeddings. The results highlight that, at the local level, Random Forest achieves lower errors and better classification metrics than deep learning models in several cases. However, global models show greater robustness and generalizability: in particular, the global Random Forest significantly reduces the mean error and maintains high performance in terms of accuracy and F1 score, while the global LSTM captures temporal dependencies and provides additional insights into cross-project dynamics. The explainable AI techniques adopted (permutation importance, saliency maps, and embedding analysis) allow us to interpret the main drivers of forecasts, confirming the role of process variables and temporal characteristics. Overall, the study demonstrates that an integrated approach, combining classical models and deep learning in a global perspective, offers more reliable and interpretable forecasts to support software maintenance.

在开源项目中,预测bug修复时间是改进软件维护和支持计划的关键目标。在本研究中,我们评估了应用于来自多个存储库的真实世界数据的不同时间序列预测模型的有效性,比较了本地(每个项目一个模型)和全局(跨多个项目训练的单个模型)方法。我们考虑了经典模型(朴素、线性回归、随机森林)和神经网络(MLP、LSTM、GRU),它们具有全局扩展,包括随机森林和具有项目嵌入的LSTM。结果强调,在局部层面上,在一些情况下,随机森林比深度学习模型实现了更低的误差和更好的分类指标。然而,全局模型显示出更强的鲁棒性和泛化性:特别是,全局随机森林显著降低了平均误差,并在准确性和F1分数方面保持了高性能,而全局LSTM捕获了时间依赖性,并提供了对跨项目动态的额外见解。所采用的可解释的人工智能技术(排列重要性、显著性图和嵌入分析)使我们能够解释预测的主要驱动因素,确认过程变量和时间特征的作用。总体而言,该研究表明,在全球视角下,将经典模型和深度学习相结合的集成方法可以提供更可靠和可解释的预测,以支持软件维护。
{"title":"Time series forecasting for bug resolution using machine learning and deep learning models.","authors":"Lerina Aversano, Martina Iammarino, Antonella Madau, Fabiano Pecorelli","doi":"10.3389/fdata.2025.1745751","DOIUrl":"10.3389/fdata.2025.1745751","url":null,"abstract":"<p><p>Predicting bug fix times is a key objective for improving software maintenance and supporting planning in open source projects. In this study, we evaluate the effectiveness of different time series forecasting models applied to real-world data from multiple repositories, comparing local (one model per project) and global (a single model trained across multiple projects) approaches. We considered classical models (Naive, Linear Regression, Random Forest) and neural networks (MLP, LSTM, GRU), with global extensions including Random Forest and LSTM with project embeddings. The results highlight that, at the local level, Random Forest achieves lower errors and better classification metrics than deep learning models in several cases. However, global models show greater robustness and generalizability: in particular, the global Random Forest significantly reduces the mean error and maintains high performance in terms of accuracy and F1 score, while the global LSTM captures temporal dependencies and provides additional insights into cross-project dynamics. The explainable AI techniques adopted (permutation importance, saliency maps, and embedding analysis) allow us to interpret the main drivers of forecasts, confirming the role of process variables and temporal characteristics. Overall, the study demonstrates that an integrated approach, combining classical models and deep learning in a global perspective, offers more reliable and interpretable forecasts to support software maintenance.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1745751"},"PeriodicalIF":2.4,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12757211/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145901630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy protection method for ADS-B air traffic control data based on convolutional neural network and symmetric encryption. 基于卷积神经网络和对称加密的ADS-B空管数据隐私保护方法。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-18 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1683027
Changsheng Ma, Ruchun Jia, Jing Lou, Mingqian Wang

Introduction: ADS-B (Automatic Dependent Surveillance-Broadcast) is a key surveillance technology in modern air traffic management, which broadcasts real-time aircraft information such as position, speed, and altitude for enhanced flight tracking and safety. However, the open broadcast nature of ADS-B communication raises significant privacy concerns, as sensitive data can be easily intercepted and misused. Research on privacy protection for ADS-B air traffic control data faces significant challenges, making the effective mining and safeguarding of privacy information a critical research focus.

Methods: This study proposes a novel privacy protection method that integrates deep learning with symmetric encryption. Specifically, by analyzing the ADS-B air traffic monitoring architecture, we mine and normalize privacy-related data to develop a Convolutional Neural Network (CNN)-based classification model for accurate identification of sensitive information.

Results: Experimental results demonstrate that the proposed method effectively scrambles the original privacy information, with no instances of data theft or malicious damage. For data volumes of 10GB, 20GB, 30GB, and 40GB, the encryption times are 20.36ms, 30.56ms, 40.35ms, and 50.36ms, respectively, showcasing its efficiency.

Discussion: Compared to existing methods, our approach achieves shorter encryption times while maintaining robust privacy protection. Future work could explore integrating advanced encryption technologies with state-of-the-art deep learning algorithms to further enhance the security of privacy protection in ADS-B systems.

ADS-B广播自动相关监视(Automatic Dependent surveillance - broadcast, ADS-B)是现代空中交通管理中的一项关键监视技术,它可以实时广播飞机的位置、速度、高度等信息,增强飞行跟踪和安全。然而,ADS-B通信的公开广播性质引起了重大的隐私问题,因为敏感数据很容易被截获和滥用。ADS-B空管数据隐私保护研究面临重大挑战,有效挖掘和保护隐私信息成为关键研究热点。方法:本研究提出一种将深度学习与对称加密相结合的新型隐私保护方法。具体而言,通过分析ADS-B空中交通监控体系结构,对隐私相关数据进行挖掘和归一化,建立基于卷积神经网络(CNN)的分类模型,实现敏感信息的准确识别。结果:实验结果表明,该方法对原始隐私信息进行了有效的置乱,没有出现数据被盗或恶意破坏的情况。对于10GB、20GB、30GB和40GB的数据量,加密时间分别为20.36ms、30.56ms、40.35ms和50.36ms,显示了其效率。讨论:与现有方法相比,我们的方法在保持健壮的隐私保护的同时实现了更短的加密时间。未来的工作可以探索将先进的加密技术与最先进的深度学习算法相结合,以进一步提高ADS-B系统的隐私保护安全性。
{"title":"Privacy protection method for ADS-B air traffic control data based on convolutional neural network and symmetric encryption.","authors":"Changsheng Ma, Ruchun Jia, Jing Lou, Mingqian Wang","doi":"10.3389/fdata.2025.1683027","DOIUrl":"10.3389/fdata.2025.1683027","url":null,"abstract":"<p><strong>Introduction: </strong>ADS-B (Automatic Dependent Surveillance-Broadcast) is a key surveillance technology in modern air traffic management, which broadcasts real-time aircraft information such as position, speed, and altitude for enhanced flight tracking and safety. However, the open broadcast nature of ADS-B communication raises significant privacy concerns, as sensitive data can be easily intercepted and misused. Research on privacy protection for ADS-B air traffic control data faces significant challenges, making the effective mining and safeguarding of privacy information a critical research focus.</p><p><strong>Methods: </strong>This study proposes a novel privacy protection method that integrates deep learning with symmetric encryption. Specifically, by analyzing the ADS-B air traffic monitoring architecture, we mine and normalize privacy-related data to develop a Convolutional Neural Network (CNN)-based classification model for accurate identification of sensitive information.</p><p><strong>Results: </strong>Experimental results demonstrate that the proposed method effectively scrambles the original privacy information, with no instances of data theft or malicious damage. For data volumes of 10GB, 20GB, 30GB, and 40GB, the encryption times are 20.36ms, 30.56ms, 40.35ms, and 50.36ms, respectively, showcasing its efficiency.</p><p><strong>Discussion: </strong>Compared to existing methods, our approach achieves shorter encryption times while maintaining robust privacy protection. Future work could explore integrating advanced encryption technologies with state-of-the-art deep learning algorithms to further enhance the security of privacy protection in ADS-B systems.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1683027"},"PeriodicalIF":2.4,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756101/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145901590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transparent and trustworthy CyberSecurity: an XAI-integrated big data framework for phishing attack detection. 透明可信的网络安全:集成ai的网络钓鱼攻击检测大数据框架。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-18 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1688091
Muhammad Nauman, Hafiz Muhammad Usman Akhtar, Huseyn Gorbani, Muhammad Hadi Ul Hassan, Muhammad A B Fayyaz

Introduction: The exponential growth of heterogeneous, high-velocity CyberSecurity data generated by modern digital infrastructures presents both opportunities and challenges for threat detection, especially against increasingly sophisticated cyber-attacks. Traditional security tools struggle to process such data effectively, highlighting the need for scalable Big Data Analytics and advanced Machine Learning (ML) techniques. However, the black-box nature of many ML models limits interpretability, trust, and regulatory compliance in high-stakes environments.

Methods: This study proposes an integrated framework that combines Big Data technologies, ML models, and Explainable Artificial Intelligence (XAI) to enable accurate, transparent, and real-time phishing attack detection. The framework leverages distributed computing and stream processing for efficient handling of large and diverse datasets while incorporating XAI methods to generate human-understandable model explanations.

Results: Experimental evaluation conducted on four publicly available CyberSecurity datasets demonstrates improved phishing detection performance, enhanced interpretability of model decisions, and actionable insights into malicious URL behavior and patterns.

Discussion: The proposed approach advances interpretable and scalable CyberSecurity analytics by addressing the gap between predictive accuracy and decision transparency. By integrating Big Data processing with XAI-driven ML, the framework offers a trustworthy solution for real-time threat detection, supporting informed decision-making and regulatory compliance.

现代数字基础设施产生的异构高速网络安全数据呈指数级增长,为威胁检测带来了机遇和挑战,特别是针对日益复杂的网络攻击。传统的安全工具难以有效地处理此类数据,这凸显了对可扩展的大数据分析和先进机器学习(ML)技术的需求。然而,许多ML模型的黑箱特性限制了高风险环境中的可解释性、信任和法规遵从性。方法:本研究提出了一个结合大数据技术、ML模型和可解释人工智能(XAI)的集成框架,以实现准确、透明和实时的网络钓鱼攻击检测。该框架利用分布式计算和流处理来有效地处理大型和不同的数据集,同时结合XAI方法来生成人类可理解的模型解释。结果:在四个公开可用的网络安全数据集上进行的实验评估表明,网络钓鱼检测性能得到改善,模型决策的可解释性得到增强,对恶意URL行为和模式的可操作见解得到增强。讨论:提出的方法通过解决预测准确性和决策透明度之间的差距,推进可解释和可扩展的网络安全分析。通过将大数据处理与xai驱动的机器学习相结合,该框架为实时威胁检测提供了可靠的解决方案,支持明智的决策和法规遵从。
{"title":"Transparent and trustworthy CyberSecurity: an XAI-integrated big data framework for phishing attack detection.","authors":"Muhammad Nauman, Hafiz Muhammad Usman Akhtar, Huseyn Gorbani, Muhammad Hadi Ul Hassan, Muhammad A B Fayyaz","doi":"10.3389/fdata.2025.1688091","DOIUrl":"10.3389/fdata.2025.1688091","url":null,"abstract":"<p><strong>Introduction: </strong>The exponential growth of heterogeneous, high-velocity CyberSecurity data generated by modern digital infrastructures presents both opportunities and challenges for threat detection, especially against increasingly sophisticated cyber-attacks. Traditional security tools struggle to process such data effectively, highlighting the need for scalable Big Data Analytics and advanced Machine Learning (ML) techniques. However, the black-box nature of many ML models limits interpretability, trust, and regulatory compliance in high-stakes environments.</p><p><strong>Methods: </strong>This study proposes an integrated framework that combines Big Data technologies, ML models, and Explainable Artificial Intelligence (XAI) to enable accurate, transparent, and real-time phishing attack detection. The framework leverages distributed computing and stream processing for efficient handling of large and diverse datasets while incorporating XAI methods to generate human-understandable model explanations.</p><p><strong>Results: </strong>Experimental evaluation conducted on four publicly available CyberSecurity datasets demonstrates improved phishing detection performance, enhanced interpretability of model decisions, and actionable insights into malicious URL behavior and patterns.</p><p><strong>Discussion: </strong>The proposed approach advances interpretable and scalable CyberSecurity analytics by addressing the gap between predictive accuracy and decision transparency. By integrating Big Data processing with XAI-driven ML, the framework offers a trustworthy solution for real-time threat detection, supporting informed decision-making and regulatory compliance.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1688091"},"PeriodicalIF":2.4,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12756072/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145901563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferring causal interplay between air pollution and meteorology. 推断空气污染与气象学之间的因果关系。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-17 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1710462
Yves Philippe Rybarczyk, Niralkumar Hemantbhai Dave, Tobias Isaac Tapia-Flores, Rasa Zalakeviciute

Introduction: This study investigates the bidirectional causal interplay between PM2.5 and relative humidity (RH) in Quito, Ecuador. Focusing on a high-altitude city with complex terrain, the objective is to understand pollution-climate feedbacks over a two-decade span.

Methods: The study employs Convergent Cross Mapping (CCM), a nonlinear empirical dynamic modeling approach. Hourly data were analyzed across four districts in Quito across two distinct time periods: 2004-2005 versus 2022-2024. Robustness of causality was confirmed using surrogate testing techniques.

Results: The analysis reveals statistically significant, nonlinear, and time-variant couplings. While RH influenced PM2.5 in the early 2000s, the relationship inverted, with PM2.5 increasingly driving RH by the early 2020s. Partial-derivative analyses indicate shifting interaction signs and strengths. Notably, pollution was found to increasingly suppress RH, particularly in northern districts.

Discussion: The observed suppression of RH by pollution is consistent with urban heat island amplification and radiative effects. These findings underscore the necessity of nonlinear causality frameworks for understanding environmental feedbacks in complex terrains. The study highlights the need for integrated air quality and climate strategies. Future research should expand variables and monitoring sites to further generalize these findings.

本研究调查了厄瓜多尔基多PM2.5与相对湿度(RH)之间的双向因果关系。研究的重点是一个地形复杂的高海拔城市,其目标是了解20多年来污染-气候的反馈。方法:采用非线性经验动态建模方法——收敛交叉映射(CCM)。每小时的数据分析了基多四个地区在两个不同时间段的数据:2004-2005年与2022-2024年。因果关系的稳健性通过替代检验技术得到证实。结果:分析揭示了统计显著、非线性和时变耦合。虽然RH在21世纪初影响PM2.5,但这种关系是反向的,到21世纪20年代初,PM2.5对RH的影响越来越大。偏导数分析表明相互作用的标志和强度在变化。值得注意的是,污染日益抑制RH,特别是在北部地区。讨论:观测到的污染对RH的抑制与城市热岛放大和辐射效应是一致的。这些发现强调了非线性因果关系框架对于理解复杂地形环境反馈的必要性。该研究强调了综合空气质量和气候战略的必要性。未来的研究应扩大变量和监测地点,以进一步推广这些发现。
{"title":"Inferring causal interplay between air pollution and meteorology.","authors":"Yves Philippe Rybarczyk, Niralkumar Hemantbhai Dave, Tobias Isaac Tapia-Flores, Rasa Zalakeviciute","doi":"10.3389/fdata.2025.1710462","DOIUrl":"10.3389/fdata.2025.1710462","url":null,"abstract":"<p><strong>Introduction: </strong>This study investigates the bidirectional causal interplay between PM<sub>2.5</sub> and relative humidity (RH) in Quito, Ecuador. Focusing on a high-altitude city with complex terrain, the objective is to understand pollution-climate feedbacks over a two-decade span.</p><p><strong>Methods: </strong>The study employs Convergent Cross Mapping (CCM), a nonlinear empirical dynamic modeling approach. Hourly data were analyzed across four districts in Quito across two distinct time periods: 2004-2005 versus 2022-2024. Robustness of causality was confirmed using surrogate testing techniques.</p><p><strong>Results: </strong>The analysis reveals statistically significant, nonlinear, and time-variant couplings. While RH influenced PM<sub>2.5</sub> in the early 2000s, the relationship inverted, with PM<sub>2.5</sub> increasingly driving RH by the early 2020s. Partial-derivative analyses indicate shifting interaction signs and strengths. Notably, pollution was found to increasingly suppress RH, particularly in northern districts.</p><p><strong>Discussion: </strong>The observed suppression of RH by pollution is consistent with urban heat island amplification and radiative effects. These findings underscore the necessity of nonlinear causality frameworks for understanding environmental feedbacks in complex terrains. The study highlights the need for integrated air quality and climate strategies. Future research should expand variables and monitoring sites to further generalize these findings.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1710462"},"PeriodicalIF":2.4,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12753360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145890296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Posterior averaging with Gaussian naive Bayes and the R package RandomGaussianNB for big-data classification. 基于高斯朴素贝叶斯的后验平均和随机高斯包的大数据分类方法。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-11 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1706417
Patchanok Srisuradetchai

RandomGaussianNB is an open-source R package implementing the posterior-averaging Gaussian naive Bayes (PAV-GNB) algorithm, a scalable ensemble extension of the classical GNB classifier. The method introduces posterior averaging to mitigate correlation bias and enhance stability in high-dimensional settings while maintaining interpretability and computational efficiency. Theoretical results establish the variance of the ensemble posterior, which decreases inversely with ensemble size, and a margin-based generalization bound that connects posterior variance with classification error. Together, these results provide a principled understanding of the bias-variance trade-off in PAV-GNB. The package delivers a fully parallel, reproducible framework for large-scale classification. Simulation studies under big-data conditions-large samples, many features, and multiple classes-show consistent accuracy, low variance, and agreement with theoretical predictions. Scalability experiments demonstrate near-linear runtime improvement with multi-core execution, and a real-world application on the Pima Indians Diabetes dataset validates PAV-GNB's reliability and computational efficiency as an interpretable, statistically grounded approach for ensemble naive Bayes classification.

RandomGaussianNB是一个开源R包,实现后验平均高斯朴素贝叶斯(PAV-GNB)算法,这是经典GNB分类器的可扩展集成扩展。该方法引入后验平均来减轻相关偏差,增强高维环境下的稳定性,同时保持可解释性和计算效率。理论结果建立了集合后验方差(随集合大小成反比减小)和基于边际的将后验方差与分类误差联系起来的泛化界。总之,这些结果提供了对PAV-GNB的偏差-方差权衡的原则性理解。该软件包为大规模分类提供了一个完全并行的、可重复的框架。在大数据条件下的模拟研究-大样本,许多特征和多个类别-显示出一致的准确性,低方差,并与理论预测一致。可扩展性实验证明了多核执行的近线性运行时间改进,并且在皮马印第安人糖尿病数据集上的实际应用验证了PAV-GNB作为可解释的、基于统计的集成朴素贝叶斯分类方法的可靠性和计算效率。
{"title":"Posterior averaging with Gaussian naive Bayes and the R package RandomGaussianNB for big-data classification.","authors":"Patchanok Srisuradetchai","doi":"10.3389/fdata.2025.1706417","DOIUrl":"10.3389/fdata.2025.1706417","url":null,"abstract":"<p><p>RandomGaussianNB is an open-source R package implementing the posterior-averaging Gaussian naive Bayes (PAV-GNB) algorithm, a scalable ensemble extension of the classical GNB classifier. The method introduces posterior averaging to mitigate correlation bias and enhance stability in high-dimensional settings while maintaining interpretability and computational efficiency. Theoretical results establish the variance of the ensemble posterior, which decreases inversely with ensemble size, and a margin-based generalization bound that connects posterior variance with classification error. Together, these results provide a principled understanding of the bias-variance trade-off in PAV-GNB. The package delivers a fully parallel, reproducible framework for large-scale classification. Simulation studies under big-data conditions-large samples, many features, and multiple classes-show consistent accuracy, low variance, and agreement with theoretical predictions. Scalability experiments demonstrate near-linear runtime improvement with multi-core execution, and a real-world application on the Pima Indians Diabetes dataset validates PAV-GNB's reliability and computational efficiency as an interpretable, statistically grounded approach for ensemble naive Bayes classification.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1706417"},"PeriodicalIF":2.4,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12738300/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145851434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting anti-forensic deepfakes with identity-aware multi-branch networks. 利用身份感知多分支网络检测反取证深度伪造。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-10 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1720525
Mingyu Zhu, Jun Long

Deepfake detection systems have achieved impressive accuracy on conventional forged images; however, they remain vulnerable to anti-forensic or adversarial samples deliberately crafted to evade detection. Such samples introduce imperceptible perturbations that conceal forgery artifacts, causing traditional binary classifiers-trained solely on real and forged data-to misclassify them as authentic. In this paper, we address this challenge by proposing a multi-channel feature extraction framework combined with a three-class classification strategy. Specifically, one channel focuses on extracting identity-preserving facial representations to capture inconsistencies in personal identity traits, while additional channels extract complementary spatial and frequency domain features to detect subtle forgery traces. These multi-channel features are fused and fed into a three-class detector capable of distinguishing real, forged, and anti-forensic samples. Experimental results on datasets incorporating adversarial deepfakes demonstrate that our method substantially improves robustness against anti-forensic attacks while maintaining high accuracy on conventional deepfake detection tasks.

深度伪造检测系统在传统伪造图像上取得了令人印象深刻的准确性;然而,它们仍然容易受到故意制作以逃避检测的反法医或对抗性样本的攻击。这样的样本引入了难以察觉的扰动,隐藏了伪造的工件,导致传统的二元分类器——只训练真实和伪造的数据——将它们错误地分类为真实的。在本文中,我们提出了一个多通道特征提取框架,结合三类分类策略来解决这一挑战。具体来说,一个通道专注于提取保持身份的面部表征,以捕捉个人身份特征的不一致性,而其他通道提取互补的空间和频域特征,以检测细微的伪造痕迹。这些多通道特征融合并馈送到一个三级检测器能够区分真实,伪造和反法医样本。在包含对抗性深度伪造的数据集上的实验结果表明,我们的方法大大提高了对反取证攻击的鲁棒性,同时在传统深度伪造检测任务中保持了较高的准确性。
{"title":"Detecting anti-forensic deepfakes with identity-aware multi-branch networks.","authors":"Mingyu Zhu, Jun Long","doi":"10.3389/fdata.2025.1720525","DOIUrl":"10.3389/fdata.2025.1720525","url":null,"abstract":"<p><p>Deepfake detection systems have achieved impressive accuracy on conventional forged images; however, they remain vulnerable to anti-forensic or adversarial samples deliberately crafted to evade detection. Such samples introduce imperceptible perturbations that conceal forgery artifacts, causing traditional binary classifiers-trained solely on real and forged data-to misclassify them as authentic. In this paper, we address this challenge by proposing a multi-channel feature extraction framework combined with a three-class classification strategy. Specifically, one channel focuses on extracting identity-preserving facial representations to capture inconsistencies in personal identity traits, while additional channels extract complementary spatial and frequency domain features to detect subtle forgery traces. These multi-channel features are fused and fed into a three-class detector capable of distinguishing real, forged, and anti-forensic samples. Experimental results on datasets incorporating adversarial deepfakes demonstrate that our method substantially improves robustness against anti-forensic attacks while maintaining high accuracy on conventional deepfake detection tasks.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1720525"},"PeriodicalIF":2.4,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12727607/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145835268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust contactless fingerprint authentication using dolphin optimization and SVM hybridization. 基于海豚优化和支持向量机杂交的鲁棒非接触指纹认证。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-05 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1641714
Jenisha Rachel, Ezhilmaran Devarasan

The field of contactless fingerprint (CLFP) recognition is rapidly evolving, driven by its potential to offer enhanced hygiene and user convenience over traditional touch-based systems without compromising security. This study introduces a contactless fingerprint recognition system using the Dolphin Optimization Algorithm (DOA), a nature-inspired technique suited for complex optimization tasks. The Histogram of Oriented Gradients (HOG) method is applied to reduce image features, with DOA optimizing the feature selection process. To boost prediction accuracy, we fused the DOA with a Support Vector Machine (SVM) classifier, creating a hybrid (DOA-SVM) that leverages the global search prowess of DOA alongside the reliable classification strength of SVM. Additionally, two more hybrid models are proposed: one combining Fuzzy C-Means (FCM) with DOA-SVM, and another combining Neutrosophic C-Means (NCM) with DOA-SVM. Experimental validation on 504 contactless fingerprint images from the Hong Kong Polytechnic University dataset demonstrates a clear performance progression: DOA (91.00%), DOA-SVM (94.07%), FCM-DOA-SVM (96.03%), and NCM-DOA-SVM (98.00%). The NCM-DOA-SVM approach achieves superior accuracy through effective uncertainty handling via neutrosophic logic while maintaining competitive processing efficiency. Comparative analysis with other bio-inspired methods shows our approach achieves higher accuracy with reduced computational requirements. These results highlight the effectiveness of combining bio-inspired optimization with traditional classifiers and advanced clustering for biometric recognition.

非接触式指纹识别(CLFP)领域正在迅速发展,其潜力是在不影响安全性的情况下,比传统的基于触摸的系统提供更好的卫生和用户便利性。本文介绍了一种采用海豚优化算法(DOA)的非接触式指纹识别系统,这是一种适合复杂优化任务的自然灵感技术。采用定向梯度直方图(Histogram of Oriented Gradients, HOG)方法对图像特征进行约简,DOA对特征选择过程进行优化。为了提高预测精度,我们将DOA与支持向量机(SVM)分类器融合在一起,创建了一个混合(DOA-SVM),它利用了DOA的全局搜索能力和支持向量机的可靠分类能力。此外,还提出了两种混合模型:模糊c均值(FCM)与DOA-SVM相结合,中性c均值(NCM)与DOA-SVM相结合。对来自香港理工大学数据集的504张非接触式指纹图像的实验验证表明,该算法的性能进步明显:DOA(91.00%)、DOA- svm(94.07%)、FCM-DOA-SVM(96.03%)和NCM-DOA-SVM(98.00%)。NCM-DOA-SVM方法在保持有竞争力的处理效率的同时,通过中性逻辑有效地处理不确定性,实现了优越的精度。与其他生物启发方法的比较分析表明,该方法在减少计算量的同时达到了更高的精度。这些结果突出了将生物特征优化与传统分类器和高级聚类相结合用于生物特征识别的有效性。
{"title":"Robust contactless fingerprint authentication using dolphin optimization and SVM hybridization.","authors":"Jenisha Rachel, Ezhilmaran Devarasan","doi":"10.3389/fdata.2025.1641714","DOIUrl":"10.3389/fdata.2025.1641714","url":null,"abstract":"<p><p>The field of contactless fingerprint (CLFP) recognition is rapidly evolving, driven by its potential to offer enhanced hygiene and user convenience over traditional touch-based systems without compromising security. This study introduces a contactless fingerprint recognition system using the Dolphin Optimization Algorithm (DOA), a nature-inspired technique suited for complex optimization tasks. The Histogram of Oriented Gradients (HOG) method is applied to reduce image features, with DOA optimizing the feature selection process. To boost prediction accuracy, we fused the DOA with a Support Vector Machine (SVM) classifier, creating a hybrid (DOA-SVM) that leverages the global search prowess of DOA alongside the reliable classification strength of SVM. Additionally, two more hybrid models are proposed: one combining Fuzzy C-Means (FCM) with DOA-SVM, and another combining Neutrosophic C-Means (NCM) with DOA-SVM. Experimental validation on 504 contactless fingerprint images from the Hong Kong Polytechnic University dataset demonstrates a clear performance progression: DOA (91.00%), DOA-SVM (94.07%), FCM-DOA-SVM (96.03%), and NCM-DOA-SVM (98.00%). The NCM-DOA-SVM approach achieves superior accuracy through effective uncertainty handling via neutrosophic logic while maintaining competitive processing efficiency. Comparative analysis with other bio-inspired methods shows our approach achieves higher accuracy with reduced computational requirements. These results highlight the effectiveness of combining bio-inspired optimization with traditional classifiers and advanced clustering for biometric recognition.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1641714"},"PeriodicalIF":2.4,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715523/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unequal access in a digital age: women's digital exclusion and socioeconomic inequalities in Vietnam. 数字时代的不平等访问:越南女性的数字排斥和社会经济不平等。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-04 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1718366
Chi Thi Lan Pham, Quyen Thi Tu Bui, Anh Ha Le, Long Quynh Khuong

Introduction: Access to information and communication technologies (ICTs) and the skills to use them are essential for inclusive development and digital participation. As Vietnam accelerates its digital transformation, ensuring that women are not left behind is critical to achieving the Sustainable Development Goals (SDGs), particularly SDG 5 (Gender Equality) and SDG 9 (Industry, Innovation, and Infrastructure). This study investigates the extent and socioeconomic patterning of digital exclusion among women in Vietnam.

Methods: We utilized nationally representative data from the 2021 Multiple Indicator Cluster Survey (MICS), which covered 10,770 women aged 15-49. Digital exclusion was defined in terms of (1) no ICT access (no use of computer, internet, or mobile phone in the past 3 months) and (2) no ICT skills (unable to perform any of nine standard digital tasks).

Results: Results show that 4.28% of women lacked digital access and 72.85% lacked digital skills. Inequalities were stark: access was lowest among ethnic minorities (19.55%) and the poorest quintile (17.10%), compared to 1.980.31% in the majority and richest groups. The digital skills gap was even wider, with 95.51% of the poorest women lacking ICT skills vs. 41.23% of the richest. Multivariable logistic regressions confirmed that ethnicity, wealth, rural residence, and older age were key predictors of exclusion.

Conclusion: These findings underscore the urgent need for inclusive digital policies that extend beyond infrastructure to address gendered and socioeconomic barriers to digital literacy. Without targeted efforts, digital rollouts may widen existing inequalities and undermine SDG progress.

导言:获取信息通信技术(ict)及其使用技能对于包容性发展和数字参与至关重要。随着越南加快数字化转型,确保女性不被落下对于实现可持续发展目标(SDG)至关重要,特别是可持续发展目标5(性别平等)和可持续发展目标9(工业、创新和基础设施)。本研究调查了越南妇女数字排斥的程度和社会经济模式。方法:我们使用了2021年多指标类集调查(MICS)的全国代表性数据,涵盖了10,770名15-49岁的女性。数字排斥的定义是(1)没有ICT接入(在过去3个月内没有使用计算机、互联网或移动电话)和(2)没有ICT技能(无法执行9项标准数字任务中的任何一项)。结果:4.28%的女性缺乏数字接入,72.85%的女性缺乏数字技能。不平等现象十分明显:少数民族(19.55%)和最贫穷的五分之一(17.10%)的入学率最低,而多数和最富裕群体的入学率为1.980.31%。数字技能差距更大,95.51%的最贫穷妇女缺乏信息通信技术技能,而最富裕妇女的这一比例为41.23%。多变量logistic回归证实,种族、财富、农村居住和年龄是排斥的关键预测因素。结论:这些研究结果强调,迫切需要制定包容性的数字政策,并将其延伸到基础设施之外,以解决影响数字素养的性别和社会经济障碍。如果没有针对性的努力,数字推广可能会扩大现有的不平等现象,破坏可持续发展目标的进展。
{"title":"Unequal access in a digital age: women's digital exclusion and socioeconomic inequalities in Vietnam.","authors":"Chi Thi Lan Pham, Quyen Thi Tu Bui, Anh Ha Le, Long Quynh Khuong","doi":"10.3389/fdata.2025.1718366","DOIUrl":"10.3389/fdata.2025.1718366","url":null,"abstract":"<p><strong>Introduction: </strong>Access to information and communication technologies (ICTs) and the skills to use them are essential for inclusive development and digital participation. As Vietnam accelerates its digital transformation, ensuring that women are not left behind is critical to achieving the Sustainable Development Goals (SDGs), particularly SDG 5 (Gender Equality) and SDG 9 (Industry, Innovation, and Infrastructure). This study investigates the extent and socioeconomic patterning of digital exclusion among women in Vietnam.</p><p><strong>Methods: </strong>We utilized nationally representative data from the 2021 Multiple Indicator Cluster Survey (MICS), which covered 10,770 women aged 15-49. Digital exclusion was defined in terms of (1) no ICT access (no use of computer, internet, or mobile phone in the past 3 months) and (2) no ICT skills (unable to perform any of nine standard digital tasks).</p><p><strong>Results: </strong>Results show that 4.28% of women lacked digital access and 72.85% lacked digital skills. Inequalities were stark: access was lowest among ethnic minorities (19.55%) and the poorest quintile (17.10%), compared to 1.980.31% in the majority and richest groups. The digital skills gap was even wider, with 95.51% of the poorest women lacking ICT skills vs. 41.23% of the richest. Multivariable logistic regressions confirmed that ethnicity, wealth, rural residence, and older age were key predictors of exclusion.</p><p><strong>Conclusion: </strong>These findings underscore the urgent need for inclusive digital policies that extend beyond infrastructure to address gendered and socioeconomic barriers to digital literacy. Without targeted efforts, digital rollouts may widen existing inequalities and undermine SDG progress.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1718366"},"PeriodicalIF":2.4,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12713119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1