首页 > 最新文献

Frontiers in Big Data最新文献

英文 中文
EnDuSecFed: an ensemble approach for privacy preserving Federated Learning with dual-security framework for sustainable healthcare. EnDuSecFed:一种用于保护隐私的集成方法,具有用于可持续医疗保健的双安全框架的联邦学习。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-22 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1659026
Bela Shrimali, Jenil Gajjar, Swapnoneel Roy, Sanjay Patel, Kanu Patel, Ramesh Ram Naik

Recent advances in Artificial Intelligence have highlighted the role of Machine Learning in healthcare decision-making, but centralized data collection raises significant privacy risks. Federated Learning addresses this by enabling collaborative training across multiple clients without sharing raw data. However, Federated Learning remains vulnerable to security threats that can compromise model reliability. This paper proposes a dual-security Federated Learning framework that integrates Fernet Symmetric Encryption for secure transmission of model updates using symmetric encryption and an Intrusion Detection System to detect anomalous client behavior. Experiments on a publicly available healthcare dataset show that the proposed system enhances privacy and robustness compared to traditional FL. Among tested models, including Logistic Regression, Random Forest, and SVC, the ensemble method achieved the best performance with 99% accuracy.

人工智能的最新进展突出了机器学习在医疗保健决策中的作用,但集中的数据收集会带来重大的隐私风险。联邦学习通过在不共享原始数据的情况下支持跨多个客户机的协作训练来解决这个问题。然而,联邦学习仍然容易受到可能损害模型可靠性的安全威胁。本文提出了一种双安全的联邦学习框架,该框架集成了Fernet对称加密(用于使用对称加密的模型更新的安全传输)和入侵检测系统(用于检测异常客户端行为)。在公开可用的医疗数据集上的实验表明,与传统的FL相比,所提出的系统增强了隐私性和鲁棒性。在测试的模型中,包括逻辑回归、随机森林和SVC,集成方法达到了99%的准确率。
{"title":"EnDuSecFed: an ensemble approach for privacy preserving Federated Learning with dual-security framework for sustainable healthcare.","authors":"Bela Shrimali, Jenil Gajjar, Swapnoneel Roy, Sanjay Patel, Kanu Patel, Ramesh Ram Naik","doi":"10.3389/fdata.2025.1659026","DOIUrl":"https://doi.org/10.3389/fdata.2025.1659026","url":null,"abstract":"<p><p>Recent advances in Artificial Intelligence have highlighted the role of Machine Learning in healthcare decision-making, but centralized data collection raises significant privacy risks. Federated Learning addresses this by enabling collaborative training across multiple clients without sharing raw data. However, Federated Learning remains vulnerable to security threats that can compromise model reliability. This paper proposes a dual-security Federated Learning framework that integrates Fernet Symmetric Encryption for secure transmission of model updates using symmetric encryption and an Intrusion Detection System to detect anomalous client behavior. Experiments on a publicly available healthcare dataset show that the proposed system enhances privacy and robustness compared to traditional FL. Among tested models, including Logistic Regression, Random Forest, and SVC, the ensemble method achieved the best performance with 99% accuracy.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1659026"},"PeriodicalIF":2.4,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12878652/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining the influence of deterrent and enhancement factors on QR-code mobile payment continuance intention: insights from PLS-SEM and IPMA analysis. 研究威慑和增强因素对二维码移动支付持续意愿的影响:来自PLS-SEM和IPMA分析的见解。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-22 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1679897
Ashikur Rahman, Fahmid Al Farid, Mohammad Abul Bashar, Jia Uddin, Arif Mahmud, Hezerul Abdul Karim

Introduction: The rise of contactless payment has made quick response (QR) code-mobile payment (QR-MP) platform increasingly popular among mobile financial service (MFS) users, especially in emerging economies. It has been demonstrated that the ongoing use of QR payments can significantly drive the growth of emerging economies. However, despite its importance, the continued use of this technology has not been satisfactory. Thus, this study seeks to explore the modified Unified Theory of Acceptance and Use of Technology 2 (UTAUT2) model, including four additional constructs: amotivation (AM), alternative attractiveness (AA), QR transaction anxiety (QTA), and transaction convenience (TC) to examine the MFS users' sustained usage of QR payment.

Methods: Data were collected from 247 MFS users in Bangladesh using an online survey and analyzed through SEM-PLS and non-linear analysis of IPMA.

Results: The research findings reveal that effort expectancy is the most influential factor, and that both moderator factors, QTA and TC, are significant. However, social influence and hedonic motivation were found to be insignificant. Furthermore, our extended research model explains 76.5% of the variance in CINT without the moderation effect.

Discussion: The IPMA findings help to find the best-performing variables and provide practical insights for this study. Theoretical and managerial implications are provided to enrich the existing literature on the study of information technology, indicating how MFS providers in developing countries can retain their existing users.

简介:非接触式支付的兴起,使得QR码移动支付平台在移动金融服务(MFS)用户中越来越受欢迎,尤其是在新兴经济体。事实证明,二维码支付的持续使用可以显著推动新兴经济体的增长。然而,尽管这项技术很重要,但它的继续使用并不令人满意。因此,本研究试图通过改进的UTAUT2 (Unified Theory of Acceptance and Use of Technology 2)模型,包括动机(motivation, AM)、替代吸引力(alternative attractiveness, AA)、QR交易焦虑(transaction anxiety, QTA)和交易便利(transaction convenience, TC)四个构式,来考察MFS用户对QR支付的持续使用。方法:采用在线调查的方式收集孟加拉国247名MFS用户的数据,并通过SEM-PLS和非线性IPMA分析进行分析。结果:研究结果表明,努力期望是最重要的影响因素,并且QTA和TC这两个调节因素都显著。然而,社会影响和享乐动机的影响不显著。此外,我们的扩展研究模型解释了76.5%的CINT方差,没有调节效应。讨论:IPMA的研究结果有助于找到表现最佳的变量,并为本研究提供实用的见解。提供了理论和管理方面的影响,以丰富现有的信息技术研究文献,指出发展中国家的MFS提供者如何能够保留其现有用户。
{"title":"Examining the influence of deterrent and enhancement factors on QR-code mobile payment continuance intention: insights from PLS-SEM and IPMA analysis.","authors":"Ashikur Rahman, Fahmid Al Farid, Mohammad Abul Bashar, Jia Uddin, Arif Mahmud, Hezerul Abdul Karim","doi":"10.3389/fdata.2025.1679897","DOIUrl":"https://doi.org/10.3389/fdata.2025.1679897","url":null,"abstract":"<p><strong>Introduction: </strong>The rise of contactless payment has made quick response (QR) code-mobile payment (QR-MP) platform increasingly popular among mobile financial service (MFS) users, especially in emerging economies. It has been demonstrated that the ongoing use of QR payments can significantly drive the growth of emerging economies. However, despite its importance, the continued use of this technology has not been satisfactory. Thus, this study seeks to explore the modified Unified Theory of Acceptance and Use of Technology 2 (UTAUT2) model, including four additional constructs: amotivation (AM), alternative attractiveness (AA), QR transaction anxiety (QTA), and transaction convenience (TC) to examine the MFS users' sustained usage of QR payment.</p><p><strong>Methods: </strong>Data were collected from 247 MFS users in Bangladesh using an online survey and analyzed through SEM-PLS and non-linear analysis of IPMA.</p><p><strong>Results: </strong>The research findings reveal that effort expectancy is the most influential factor, and that both moderator factors, QTA and TC, are significant. However, social influence and hedonic motivation were found to be insignificant. Furthermore, our extended research model explains 76.5% of the variance in CINT without the moderation effect.</p><p><strong>Discussion: </strong>The IPMA findings help to find the best-performing variables and provide practical insights for this study. Theoretical and managerial implications are provided to enrich the existing literature on the study of information technology, indicating how MFS providers in developing countries can retain their existing users.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1679897"},"PeriodicalIF":2.4,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12873545/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Big data approaches to bovine bioacoustics: a FAIR-compliant dataset and scalable ML framework for precision livestock welfare. 牛生物声学的大数据方法:一个符合fair标准的数据集和用于精确牲畜福利的可扩展ML框架。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-16 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1723155
Mayuri Kate, Suresh Neethirajan

The convergence of IoT sensing, edge computing, and machine learning is revolutionizing precision livestock farming. Yet bioacoustic data streams remain underexploited due to computational-complexity and ecological-validity challenges. We present one of the most comprehensive bovine vocalization datasets to date-569 expertly curated clips spanning 48 behavioral classes, recorded across three commercial dairy farms using multi-microphone arrays and expanded to 2,900 samples through domain-informed data augmentation. This FAIR-compliant resource addresses key Big Data challenges: volume (90 h of raw recordings, 65.6 GB), variety (multi-farm, multi-zone acoustic environments), velocity (real-time processing requirements), and veracity (noise-robust feature-extraction pipelines). A modular data-processing workflow combines denoising implemented both in iZotope RX 11 for quality control and an equivalent open-source Python pipeline using noisereduce, multi-modal synchronization (audio-video alignment), and standardized feature engineering (24 acoustic descriptors via Praat, librosa, and openSMILE) to enable scalable welfare monitoring. Preliminary machine-learning benchmarks reveal distinct class-wise acoustic signatures across estrus detection, distress classification, and maternal-communication recognition. The dataset's ecological realism-embracing authentic barn acoustics rather than controlled conditions-ensures deployment-ready model development. This work establishes the foundation for animal-centered AI, where bioacoustic streams enable continuous, non-invasive welfare assessment at industrial scale. By releasing a Zenodo-hosted, FAIR-compliant dataset (restricted access) and an open-source preprocessing pipeline on GitHub, together with comprehensive metadata schemas, we advance reproducible research at the intersection of Big Data analytics, sustainable agriculture, and precision livestock management. The framework directly supports UN SDG 9, demonstrating how data science can transform traditional farming into intelligent, welfare-optimized production systems capable of meeting global food demands while maintaining ethical animal-care standards.

物联网传感、边缘计算和机器学习的融合正在彻底改变精准畜牧业。然而,由于计算复杂性和生态有效性的挑战,生物声学数据流仍未得到充分利用。我们展示了迄今为止最全面的牛发声数据集之一——569个经过专业策划的片段,跨越48个行为类别,使用多麦克风阵列在三个商业奶牛场记录,并通过领域知情数据增强扩展到2900个样本。这种符合fair标准的资源解决了关键的大数据挑战:体积(90小时的原始录音,65.6 GB),种类(多农场,多区域声学环境),速度(实时处理要求)和准确性(噪声鲁棒特征提取管道)。模块化的数据处理工作流程结合了iZotope RX 11中实现的降噪,用于质量控制,以及等效的开源Python管道,使用降噪、多模态同步(音频-视频对齐)和标准化特征工程(通过Praat、librosa和openSMILE的24个声学描述符)来实现可扩展的福利监控。初步的机器学习基准测试揭示了不同类别的声音特征,包括发情检测、遇险分类和母亲通信识别。数据集的生态现实主义——采用真实的谷仓声学,而不是控制条件——确保了部署就绪的模型开发。这项工作为以动物为中心的人工智能奠定了基础,其中生物声学流可以在工业规模上进行连续、无创的福利评估。通过在GitHub上发布zenodo托管的、符合fair标准的数据集(限制访问)和开源预处理管道,以及全面的元数据模式,我们在大数据分析、可持续农业和精准牲畜管理的交叉领域推进可重复研究。该框架直接支持联合国可持续发展目标9,展示了数据科学如何将传统农业转变为智能、福利优化的生产系统,能够满足全球粮食需求,同时保持合乎道德的动物护理标准。
{"title":"Big data approaches to bovine bioacoustics: a FAIR-compliant dataset and scalable ML framework for precision livestock welfare.","authors":"Mayuri Kate, Suresh Neethirajan","doi":"10.3389/fdata.2025.1723155","DOIUrl":"10.3389/fdata.2025.1723155","url":null,"abstract":"<p><p>The convergence of IoT sensing, edge computing, and machine learning is revolutionizing precision livestock farming. Yet bioacoustic data streams remain underexploited due to computational-complexity and ecological-validity challenges. We present one of the most comprehensive bovine vocalization datasets to date-569 expertly curated clips spanning 48 behavioral classes, recorded across three commercial dairy farms using multi-microphone arrays and expanded to 2,900 samples through domain-informed data augmentation. This FAIR-compliant resource addresses key Big Data challenges: volume (90 h of raw recordings, 65.6 GB), variety (multi-farm, multi-zone acoustic environments), velocity (real-time processing requirements), and veracity (noise-robust feature-extraction pipelines). A modular data-processing workflow combines denoising implemented both in iZotope RX 11 for quality control and an equivalent open-source Python pipeline using noisereduce, multi-modal synchronization (audio-video alignment), and standardized feature engineering (24 acoustic descriptors via Praat, librosa, and openSMILE) to enable scalable welfare monitoring. Preliminary machine-learning benchmarks reveal distinct class-wise acoustic signatures across estrus detection, distress classification, and maternal-communication recognition. The dataset's ecological realism-embracing authentic barn acoustics rather than controlled conditions-ensures deployment-ready model development. This work establishes the foundation for animal-centered AI, where bioacoustic streams enable continuous, non-invasive welfare assessment at industrial scale. By releasing a Zenodo-hosted, FAIR-compliant dataset (restricted access) and an open-source preprocessing pipeline on GitHub, together with comprehensive metadata schemas, we advance reproducible research at the intersection of Big Data analytics, sustainable agriculture, and precision livestock management. The framework directly supports UN SDG 9, demonstrating how data science can transform traditional farming into intelligent, welfare-optimized production systems capable of meeting global food demands while maintaining ethical animal-care standards.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1723155"},"PeriodicalIF":2.4,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12855049/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146108455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning-enabled hybrid systems for accurate recognition of text in seal images. 支持深度学习的混合系统,用于准确识别印章图像中的文本。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-14 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1753871
Keke Zhang, Mingyu Guan, Chao Wu, Yutong Li, Qingguo Lü, Yi Liu, Yi Wang, Wei Wang, Wei Zhang

Chinese seals are widely used in various fields within Chinese society as a tool for certifying legal documents. However, recognizing text on these seals presents challenges due to background text, high noise levels, and minimalistic image features. This paper introduces a hybrid model to address these difficulties in Chinese seal text recognition. Our model integrates preprocessing techniques tailored for real seals, a deep learning-based position correction model, a circular text unwrapping model, and OCR text recognition. First, we apply a color-based method to effectively remove the black background text on seals, eliminating redundant information while retaining crucial features for further analysis. Next, we introduce an innovative image denoising algorithm to significantly improve the system's robustness in processing noisy seal images. Additionally, we develop a deep learning-based angle prediction network and create synthetic datasets that mimic real seal scenes, enabling optimal seal image positioning for enhanced text flattening and recognition, thus boosting overall system performance. Finally, polar coordinate transformation is employed to convert the circular seal into a rectangular image for more efficient text recognition. Experimental results indicate that our proposed methods effectively enhance the accuracy of seal text recognition.

中国印章作为一种证明法律文件的工具,广泛应用于中国社会的各个领域。然而,由于背景文本、高噪声水平和极简图像特征,识别这些封条上的文本存在挑战。本文引入了一种混合模型来解决中国印文本识别中的这些困难。我们的模型集成了针对真实印章的预处理技术、基于深度学习的位置校正模型、圆形文本展开模型和OCR文本识别。首先,我们采用基于颜色的方法有效地去除印章上的黑色背景文本,在保留关键特征以供进一步分析的同时消除冗余信息。接下来,我们引入了一种创新的图像去噪算法,以显着提高系统在处理噪声密封图像时的鲁棒性。此外,我们开发了一个基于深度学习的角度预测网络,并创建了模拟真实印章场景的合成数据集,实现了最佳的印章图像定位,以增强文本平坦化和识别,从而提高了整体系统性能。最后,利用极坐标变换将圆形印章转换为矩形图像,提高文本识别效率。实验结果表明,本文提出的方法有效地提高了印章文本识别的准确率。
{"title":"Deep learning-enabled hybrid systems for accurate recognition of text in seal images.","authors":"Keke Zhang, Mingyu Guan, Chao Wu, Yutong Li, Qingguo Lü, Yi Liu, Yi Wang, Wei Wang, Wei Zhang","doi":"10.3389/fdata.2025.1753871","DOIUrl":"10.3389/fdata.2025.1753871","url":null,"abstract":"<p><p>Chinese seals are widely used in various fields within Chinese society as a tool for certifying legal documents. However, recognizing text on these seals presents challenges due to background text, high noise levels, and minimalistic image features. This paper introduces a hybrid model to address these difficulties in Chinese seal text recognition. Our model integrates preprocessing techniques tailored for real seals, a deep learning-based position correction model, a circular text unwrapping model, and OCR text recognition. First, we apply a color-based method to effectively remove the black background text on seals, eliminating redundant information while retaining crucial features for further analysis. Next, we introduce an innovative image denoising algorithm to significantly improve the system's robustness in processing noisy seal images. Additionally, we develop a deep learning-based angle prediction network and create synthetic datasets that mimic real seal scenes, enabling optimal seal image positioning for enhanced text flattening and recognition, thus boosting overall system performance. Finally, polar coordinate transformation is employed to convert the circular seal into a rectangular image for more efficient text recognition. Experimental results indicate that our proposed methods effectively enhance the accuracy of seal text recognition.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1753871"},"PeriodicalIF":2.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12847014/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic patterns of healthy lifestyle awareness after COVID-19: a study using Google Trends and joinpoint regression. COVID-19后健康生活方式意识的动态模式:使用谷歌趋势和联点回归的研究
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-13 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1717592
Zahroh Shaluhiyah, Shabrina Arifia Qatrannada, Roshan Kumar Mahato, Farid Agushybana, Sri Handayani, Dzul Fahmi Afriyanto, Usha Rani, Dewie Sulistyorini

Introduction: The COVID-19 pandemic has significantly influenced public interest in health-related behaviors, as reflected in online search trends. Analyzing these trends provides insights into shifting health concerns and informing future public health strategies. This study examined Google Trends data to assess the changes in public interest in mental health, healthy diet, sleep, screen time, physical activity, and tobacco smoking before, during, and after the COVID-19 pandemic.

Methods: Google Trends data (2019-2023) were analyzed using joinpoint regression to identify statistically significant shifts in relative search volume (RSV) over time. Additionally, the Mann-Whitney U test was conducted to examine differences in mean RSV across time period.

Results: Awareness that consistently increased during and after the pandemic was observed in mental health, particularly anxiety, and sleep patterns. These topics showed significant positive trends in joinpoint regression and higher mean RSVs, with statistically significant differences across time periods (p < 0.05). In contrast, some behaviors such as physical activity and screen time saw increased awareness only during the pandemic but did not sustain afterward. Whilst, dietary behavior and smoking either remained stagnant or declined, indicating limited or declining public interest despite their relevance to health outcomes.

Conclusion: Digital interest in health behaviors varied during and after COVID-19, with only mental health and sleep showing sustained concern. However, spikes in awareness often reflected personally relevant issues, highlighting Google Trends' potential as an early signal for health promotion efforts.

从网络搜索趋势可以看出,2019冠状病毒病大流行严重影响了公众对健康相关行为的兴趣。分析这些趋势有助于深入了解卫生问题的转变,并为未来的公共卫生战略提供信息。本研究检查了谷歌趋势数据,以评估在2019冠状病毒病大流行之前、期间和之后,公众对心理健康、健康饮食、睡眠、屏幕时间、身体活动和吸烟的兴趣的变化。方法:对谷歌Trends数据(2019-2023)进行联合点回归分析,以确定相对搜索量(RSV)随时间的统计学显著变化。此外,还进行了Mann-Whitney U检验,以检查各时间段平均RSV的差异。结果:在大流行期间和之后,人们对心理健康,特别是焦虑和睡眠模式的认识不断提高。这些主题的连接点回归呈现显著的正趋势,平均RSVs较高,各时间段差异有统计学意义(p < 0.05)。相比之下,只有在大流行期间,人们对身体活动和屏幕时间等一些行为的认识才有所提高,但在大流行之后并没有持续下去。与此同时,饮食行为和吸烟要么保持不变,要么有所下降,这表明公众的兴趣有限或下降,尽管它们与健康结果有关。结论:在COVID-19期间和之后,数字对健康行为的兴趣有所不同,只有心理健康和睡眠表现出持续的关注。然而,意识的提高往往反映了与个人相关的问题,突出了谷歌趋势作为健康促进努力的早期信号的潜力。
{"title":"Dynamic patterns of healthy lifestyle awareness after COVID-19: a study using Google Trends and joinpoint regression.","authors":"Zahroh Shaluhiyah, Shabrina Arifia Qatrannada, Roshan Kumar Mahato, Farid Agushybana, Sri Handayani, Dzul Fahmi Afriyanto, Usha Rani, Dewie Sulistyorini","doi":"10.3389/fdata.2025.1717592","DOIUrl":"https://doi.org/10.3389/fdata.2025.1717592","url":null,"abstract":"<p><strong>Introduction: </strong>The COVID-19 pandemic has significantly influenced public interest in health-related behaviors, as reflected in online search trends. Analyzing these trends provides insights into shifting health concerns and informing future public health strategies. This study examined Google Trends data to assess the changes in public interest in mental health, healthy diet, sleep, screen time, physical activity, and tobacco smoking before, during, and after the COVID-19 pandemic.</p><p><strong>Methods: </strong>Google Trends data (2019-2023) were analyzed using joinpoint regression to identify statistically significant shifts in relative search volume (RSV) over time. Additionally, the Mann-Whitney U test was conducted to examine differences in mean RSV across time period.</p><p><strong>Results: </strong>Awareness that consistently increased during and after the pandemic was observed in mental health, particularly anxiety, and sleep patterns. These topics showed significant positive trends in joinpoint regression and higher mean RSVs, with statistically significant differences across time periods (<i>p</i> < 0.05). In contrast, some behaviors such as physical activity and screen time saw increased awareness only during the pandemic but did not sustain afterward. Whilst, dietary behavior and smoking either remained stagnant or declined, indicating limited or declining public interest despite their relevance to health outcomes.</p><p><strong>Conclusion: </strong>Digital interest in health behaviors varied during and after COVID-19, with only mental health and sleep showing sustained concern. However, spikes in awareness often reflected personally relevant issues, highlighting Google Trends' potential as an early signal for health promotion efforts.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1717592"},"PeriodicalIF":2.4,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12834755/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving early liver metastasis detection in colorectal cancer using a weighted ensemble of ResNet50 and swin transformer: a KHCC study. 使用ResNet50和swin transformer加权集合提高结直肠癌早期肝转移检测:一项KHCC研究。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-12 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1700292
Ahmad Nasayreh, Hasan Gharaibeh, Rula Al-Qawabah, Azza Gharaibeh, Bayan Altalla, Iyad Sultan

Colorectal cancer represents the third most diagnosed malignancy globally, with liver metastasis occurring in approximately 50-60% of patients following initial treatment. Current surveillance strategies utilizing carcinoembryonic antigen monitoring and interval cross-sectional imaging demonstrate significant limitations in early hepatic recurrence detection, often identifying disease at advanced, unresectable stages. This study addresses the critical research gap in AI-driven surveillance frameworks by developing a novel ensemble deep learning model for early liver metastasis prediction in colorectal cancer patients. The methodology employed six state-of-the-art architectures including ResNet50, MobileNetV2, DenseNet121, CNN-LSTM, and Swin Transformer as feature extractors through transfer learning, followed by weighted soft voting ensemble learning combining the top-performing models. The framework was evaluated on a comprehensive dataset of 1,628 medical images from colorectal cancer patients, with rigorous statistical validation using Friedman and Wilcoxon signed-rank tests. Results demonstrated that the ensemble model combining ResNet50 and Swin Transformer achieved superior performance with 75.48% accuracy, 79.0% sensitivity, 73.6% specificity, and 0.8115 AUC, representing statistically significant improvements over all individual architectures. The ensemble approach successfully addressed the challenging nature of the dataset where multiple state-of-the-art models achieved near-random performance, demonstrating the effectiveness of architectural diversity in medical image analysis. The clinical impact of this work extends to enhancing early detection capabilities that could increase patient eligibility for curative interventions, with balanced diagnostic performance suitable for surveillance applications. The computationally efficient framework requires only 0.39 s per image inference time, making it feasible for integration into existing clinical workflows and potentially improving outcomes for colorectal cancer patients through earlier identification of hepatic recurrence.

结直肠癌是全球第三大确诊恶性肿瘤,约50-60%的患者在接受初始治疗后发生肝转移。目前使用癌胚抗原监测和间隔横断面成像的监测策略在早期肝脏复发检测方面存在显着局限性,通常在晚期,不可切除的阶段识别疾病。本研究通过开发一种用于结肠直肠癌患者早期肝转移预测的新型集成深度学习模型,解决了人工智能驱动的监测框架的关键研究空白。该方法采用六种最先进的架构,包括ResNet50、MobileNetV2、DenseNet121、CNN-LSTM和Swin Transformer,通过迁移学习作为特征提取器,然后结合表现最好的模型进行加权软投票集成学习。该框架在来自结直肠癌患者的1,628个医学图像的综合数据集上进行了评估,并使用Friedman和Wilcoxon符号秩检验进行了严格的统计验证。结果表明,结合ResNet50和Swin Transformer的集成模型取得了75.48%的准确率、79.0%的灵敏度、73.6%的特异性和0.8115的AUC的优异性能,比所有单独的架构都有统计学上的显著提高。集成方法成功地解决了数据集的挑战性,其中多个最先进的模型实现了近乎随机的性能,展示了医学图像分析中架构多样性的有效性。这项工作的临床影响扩展到增强早期检测能力,可以提高患者对治疗性干预的资格,具有适合监测应用的平衡诊断性能。计算效率高的框架每次图像推断时间仅为0.39秒,这使得它可以整合到现有的临床工作流程中,并有可能通过早期识别肝脏复发来改善结直肠癌患者的预后。
{"title":"Improving early liver metastasis detection in colorectal cancer using a weighted ensemble of ResNet50 and swin transformer: a KHCC study.","authors":"Ahmad Nasayreh, Hasan Gharaibeh, Rula Al-Qawabah, Azza Gharaibeh, Bayan Altalla, Iyad Sultan","doi":"10.3389/fdata.2025.1700292","DOIUrl":"10.3389/fdata.2025.1700292","url":null,"abstract":"<p><p>Colorectal cancer represents the third most diagnosed malignancy globally, with liver metastasis occurring in approximately 50-60% of patients following initial treatment. Current surveillance strategies utilizing carcinoembryonic antigen monitoring and interval cross-sectional imaging demonstrate significant limitations in early hepatic recurrence detection, often identifying disease at advanced, unresectable stages. This study addresses the critical research gap in AI-driven surveillance frameworks by developing a novel ensemble deep learning model for early liver metastasis prediction in colorectal cancer patients. The methodology employed six state-of-the-art architectures including ResNet50, MobileNetV2, DenseNet121, CNN-LSTM, and Swin Transformer as feature extractors through transfer learning, followed by weighted soft voting ensemble learning combining the top-performing models. The framework was evaluated on a comprehensive dataset of 1,628 medical images from colorectal cancer patients, with rigorous statistical validation using Friedman and Wilcoxon signed-rank tests. Results demonstrated that the ensemble model combining ResNet50 and Swin Transformer achieved superior performance with 75.48% accuracy, 79.0% sensitivity, 73.6% specificity, and 0.8115 AUC, representing statistically significant improvements over all individual architectures. The ensemble approach successfully addressed the challenging nature of the dataset where multiple state-of-the-art models achieved near-random performance, demonstrating the effectiveness of architectural diversity in medical image analysis. The clinical impact of this work extends to enhancing early detection capabilities that could increase patient eligibility for curative interventions, with balanced diagnostic performance suitable for surveillance applications. The computationally efficient framework requires only 0.39 s per image inference time, making it feasible for integration into existing clinical workflows and potentially improving outcomes for colorectal cancer patients through earlier identification of hepatic recurrence.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1700292"},"PeriodicalIF":2.4,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12832282/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146068527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable attrition risk scoring for managerial retention decisions in human resource analytics. 人力资源分析中管理人员保留决策的可解释的流失风险评分。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-12 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1699561
M S Pavithran, S M Vadivel

Introduction: Employee turnover remains a significant challenge for organizations as it becomes difficult for them to retain the same employees and continue with their operations efficiently. With the assistance of predictive analytics, HR managers will be able to foresee and lower the potential turnover. Conventional research has focused on the effectiveness of technical models, yet there is a lack of studies investigating the interpretability and reliability of managerial forecasts.

Methods: This research used the Employee Attrition dataset and applied various pre-processing methods, including label encoding, feature scaling, and SMOTE for class balancing. Machine learning models were trained and optimized using grid search with stratified cross-validation. The best-performing model was calibrated using the sigmoid method to ensure the accuracy of the predicted probabilities. LIME enabled local interpretability, thus providing practical insights into individual employee attrition-related risks. Permutation feature importance analysis and SHAP summary plots helped in better understanding the model by showing the individual features that contributed to the attrition probability.

Results: The Random Forest classifier achieved the highest AUC-ROC score of 97.37%. Risk distribution visualizations highlight employees with the highest attrition probability, and calibration is the main reason for the Brier Score reduction from 0.03873 to 0.03480.

Discussion: The study concludes that by prioritizing interventions and increasing the accuracy of retention strategies, a calibrated, interpretable, and risk-stratified model can enhance HR decision-making. This framework aids HR leaders in transitioning from reactive to proactive workforce management by leveraging data-driven insights.

导读:员工流动对于组织来说仍然是一个重大的挑战,因为他们很难留住同样的员工并有效地继续他们的运营。在预测分析的帮助下,人力资源经理将能够预测和降低潜在的流动率。传统的研究集中在技术模型的有效性上,但缺乏对管理预测的可解释性和可靠性的研究。方法:本研究使用员工流失数据集,并采用各种预处理方法,包括标签编码、特征缩放和SMOTE进行类平衡。使用分层交叉验证的网格搜索对机器学习模型进行训练和优化。使用s型方法对表现最好的模型进行校准,以确保预测概率的准确性。LIME实现了本地可解释性,从而提供了对个别员工流失相关风险的实际见解。排列特征重要性分析和SHAP总结图通过显示对磨损概率有贡献的单个特征,有助于更好地理解模型。结果:随机森林分类器的AUC-ROC得分最高,为97.37%。风险分布可视化显示了流失概率最高的员工,校准是Brier评分从0.03873降至0.03480的主要原因。讨论:研究得出结论,通过优先干预和提高保留策略的准确性,一个校准的、可解释的和风险分层的模型可以提高人力资源决策。该框架通过利用数据驱动的洞察力,帮助人力资源领导者从被动的劳动力管理转变为主动的劳动力管理。
{"title":"Explainable attrition risk scoring for managerial retention decisions in human resource analytics.","authors":"M S Pavithran, S M Vadivel","doi":"10.3389/fdata.2025.1699561","DOIUrl":"10.3389/fdata.2025.1699561","url":null,"abstract":"<p><strong>Introduction: </strong>Employee turnover remains a significant challenge for organizations as it becomes difficult for them to retain the same employees and continue with their operations efficiently. With the assistance of predictive analytics, HR managers will be able to foresee and lower the potential turnover. Conventional research has focused on the effectiveness of technical models, yet there is a lack of studies investigating the interpretability and reliability of managerial forecasts.</p><p><strong>Methods: </strong>This research used the Employee Attrition dataset and applied various pre-processing methods, including label encoding, feature scaling, and SMOTE for class balancing. Machine learning models were trained and optimized using grid search with stratified cross-validation. The best-performing model was calibrated using the sigmoid method to ensure the accuracy of the predicted probabilities. LIME enabled local interpretability, thus providing practical insights into individual employee attrition-related risks. Permutation feature importance analysis and SHAP summary plots helped in better understanding the model by showing the individual features that contributed to the attrition probability.</p><p><strong>Results: </strong>The Random Forest classifier achieved the highest AUC-ROC score of 97.37%. Risk distribution visualizations highlight employees with the highest attrition probability, and calibration is the main reason for the Brier Score reduction from 0.03873 to 0.03480.</p><p><strong>Discussion: </strong>The study concludes that by prioritizing interventions and increasing the accuracy of retention strategies, a calibrated, interpretable, and risk-stratified model can enhance HR decision-making. This framework aids HR leaders in transitioning from reactive to proactive workforce management by leveraging data-driven insights.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1699561"},"PeriodicalIF":2.4,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12832383/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146068532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decoding deception: state-of-the-art approaches to deep fake detection. 解码欺骗:最先进的深度假检测方法。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-09 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1670833
Tarak Hussain, B Tirapathi Reddy, Kondaveti Phanindra, Sailaja Terumalasetti, Ghufran Ahmad Khan

Deepfake technology evolves at an alarming pace, threatening information integrity and social trust. We present new multimodal deepfake detection framework exploiting cross-domain inconsistencies, utilizing audio-visual consistency. Its core is the Synchronization-Aware Feature Fusion (SAFF) architecture combined with Cross-Modal Graph Attention Networks (CM-GAN), both addressing the temporal misalignments explicitly for improved detection accuracy. Across eight models and five benchmark datasets with 93,750 test samples, the framework obtains 98.76% accuracy and significant robustness against multiple compression levels. Synchronized audio-visual inconsistencies are thus highly discriminative according to statistical analysis (Cohen's d = 1.87). With contributions centering around a cross-modal feature extraction pipeline, a graph-based attention mechanism for inter-modal reasoning and an extensive number of ablation studies validating the fusion strategy, the paper also provides statistically sound insights to guide future pursuit in this area. With a 17.85% generalization advantage over unimodal methods, the framework represents a new state of the art and introduces a self-supervised pre-training strategy that leverages labeled data 65% less.

深度造假技术以惊人的速度发展,威胁着信息完整性和社会信任。我们提出了新的多模态深度伪造检测框架,利用跨域不一致性,利用视听一致性。其核心是同步感知特征融合(SAFF)架构与跨模态图注意网络(CM-GAN)相结合,两者都明确地解决了时间偏差,以提高检测精度。在包含93750个测试样本的8个模型和5个基准数据集中,该框架获得了98.76%的准确率和对多个压缩级别的显著鲁棒性。因此,根据统计分析,同步视听不一致具有高度的判别性(Cohen’s d = 1.87)。本文围绕跨模态特征提取管道、基于图的跨模态推理注意机制以及验证融合策略的大量消融研究做出了贡献,为指导该领域的未来追求提供了统计上的合理见解。与单峰方法相比,该框架具有17.85%的泛化优势,代表了一种新的技术状态,并引入了一种自我监督的预训练策略,该策略对标记数据的利用减少了65%。
{"title":"Decoding deception: state-of-the-art approaches to deep fake detection.","authors":"Tarak Hussain, B Tirapathi Reddy, Kondaveti Phanindra, Sailaja Terumalasetti, Ghufran Ahmad Khan","doi":"10.3389/fdata.2025.1670833","DOIUrl":"10.3389/fdata.2025.1670833","url":null,"abstract":"<p><p>Deepfake technology evolves at an alarming pace, threatening information integrity and social trust. We present new multimodal deepfake detection framework exploiting cross-domain inconsistencies, utilizing audio-visual consistency. Its core is the Synchronization-Aware Feature Fusion (SAFF) architecture combined with Cross-Modal Graph Attention Networks (CM-GAN), both addressing the temporal misalignments explicitly for improved detection accuracy. Across eight models and five benchmark datasets with 93,750 test samples, the framework obtains 98.76% accuracy and significant robustness against multiple compression levels. Synchronized audio-visual inconsistencies are thus highly discriminative according to statistical analysis (Cohen's <i>d</i> = 1.87). With contributions centering around a cross-modal feature extraction pipeline, a graph-based attention mechanism for inter-modal reasoning and an extensive number of ablation studies validating the fusion strategy, the paper also provides statistically sound insights to guide future pursuit in this area. With a 17.85% generalization advantage over unimodal methods, the framework represents a new state of the art and introduces a self-supervised pre-training strategy that leverages labeled data 65% less.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1670833"},"PeriodicalIF":2.4,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12827133/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bias in AI systems: integrating formal and socio-technical approaches. 人工智能系统中的偏见:整合正式和社会技术方法。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-08 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1686452
Amar Ahmad, Yvonne Vallès, Youssef Idaghdour

Artificial Intelligence (AI) systems are increasingly embedded in high-stakes decision-making across domains such as healthcare, finance, criminal justice, and employment. Evidence has been accumulated showing that these systems can reproduce and amplify structural inequities, leading to ethical, social, and technical concerns. In this review, formal mathematical definitions of bias are integrated with socio-technical perspectives to examine its origins, manifestations, and impacts. Bias is categorized into four interrelated families: historical/representational, selection/measurement, algorithmic/optimization, and feedback/emergent, and its operation is illustrated through case studies in facial recognition, large language models, credit scoring, healthcare, employment, and criminal justice. Current mitigation strategies are critically evaluated, including dataset diversification, fairness-aware modeling, post-deployment auditing, regulatory frameworks, and participatory design. An integrated framework is proposed in which statistical diagnostics are coupled with governance mechanisms to enable bias mitigation across the entire AI lifecycle. By bridging technical precision with sociological insight, guidance is offered for the development of AI systems that are equitable, accountable, and responsive to the needs of diverse populations.

人工智能(AI)系统越来越多地嵌入到医疗、金融、刑事司法和就业等领域的高风险决策中。积累的证据表明,这些系统可以再现和扩大结构性不平等,导致伦理、社会和技术方面的担忧。在这篇综述中,偏见的正式数学定义与社会技术观点相结合,以研究其起源,表现形式和影响。偏见被分为四个相互关联的家族:历史/代表性,选择/测量,算法/优化和反馈/紧急,并通过面部识别,大型语言模型,信用评分,医疗保健,就业和刑事司法的案例研究来说明其运作。对当前的缓解策略进行了严格评估,包括数据集多样化、公平意识建模、部署后审计、监管框架和参与式设计。提出了一个综合框架,其中将统计诊断与治理机制相结合,以便在整个人工智能生命周期中减轻偏见。通过将技术精确性与社会学洞察力结合起来,为开发公平、负责任并能满足不同人群需求的人工智能系统提供指导。
{"title":"Bias in AI systems: integrating formal and socio-technical approaches.","authors":"Amar Ahmad, Yvonne Vallès, Youssef Idaghdour","doi":"10.3389/fdata.2025.1686452","DOIUrl":"10.3389/fdata.2025.1686452","url":null,"abstract":"<p><p>Artificial Intelligence (AI) systems are increasingly embedded in high-stakes decision-making across domains such as healthcare, finance, criminal justice, and employment. Evidence has been accumulated showing that these systems can reproduce and amplify structural inequities, leading to ethical, social, and technical concerns. In this review, formal mathematical definitions of bias are integrated with socio-technical perspectives to examine its origins, manifestations, and impacts. Bias is categorized into four interrelated families: historical/representational, selection/measurement, algorithmic/optimization, and feedback/emergent, and its operation is illustrated through case studies in facial recognition, large language models, credit scoring, healthcare, employment, and criminal justice. Current mitigation strategies are critically evaluated, including dataset diversification, fairness-aware modeling, post-deployment auditing, regulatory frameworks, and participatory design. An integrated framework is proposed in which statistical diagnostics are coupled with governance mechanisms to enable bias mitigation across the entire AI lifecycle. By bridging technical precision with sociological insight, guidance is offered for the development of AI systems that are equitable, accountable, and responsive to the needs of diverse populations.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1686452"},"PeriodicalIF":2.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823528/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid deep learning models for fake news detection: case study on Arabic and English languages. 假新闻检测的混合深度学习模型:阿拉伯语和英语的案例研究。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-06 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1683786
Baqer M Merzah, Jafar Razmara, Zolfaghar Salmanian

Introduction: Fake news has become a significant threat to public discourse due to the swift spread of online content and the difficulty of detecting and distinguishing it from real news. This challenge is further amplified by society's increasing dependence on online social networks. Many researchers have developed machine learning and deep learning models to combat the spread of misinformation and identify fake news. However, the studies focused on a single language, and the performance analysis achieved a low accuracy, especially for Arabic, which faces challenges due to resource constraints and linguistic intricacies.

Methods: This paper introduces an effective deep-learning technique for fake news detection (FND) in Arabic and English. The proposed model integrates a multi-channel Convolutional Neural Network (CNN) and dual Bidirectional Long Short-Term Memory (BiLSTM), parallelly capturing semantic and local textual features embedded by a pre-trained FastText model. Subsequently, a global max-pooling layer was added to reduce dimensionality and extract salient features from the sequential output. Finally, the model classifies news as fake or real. Moreover, the model is trained and evaluated on three benchmark datasets, AFND and ANS, Arabic datasets, and WELFake, an English dataset.

Results: Experimental results highlight the model's effectiveness and performance superiority over state-of-the-art (SOTA) approaches, with (94.43 ± 0.19) %, (71.63 ± 1.45) %, and (98.85 ± 0.03) %, accuracy on AFND, ANS and WELFake, respectively.

Discussion: This work provides a robust approach to combating misinformation, offering practical applications in enhancing the reliability of information on social networks.

导读:由于网络内容的迅速传播以及与真实新闻的难以识别和区分,假新闻已经成为对公共话语的重大威胁。社会对在线社交网络的日益依赖进一步放大了这一挑战。许多研究人员开发了机器学习和深度学习模型来对抗错误信息的传播和识别假新闻。然而,这些研究集中在单一语言上,性能分析的准确性较低,特别是阿拉伯语,由于资源限制和语言复杂性而面临挑战。方法:介绍了一种有效的阿拉伯语和英语假新闻检测(FND)深度学习技术。该模型集成了多通道卷积神经网络(CNN)和双双向长短期记忆(BiLSTM),通过预训练的FastText模型并行捕获嵌入的语义和局部文本特征。随后,加入全局最大池化层,从序列输出中降维提取显著特征。最后,该模型将新闻分为假新闻和真新闻。此外,该模型在三个基准数据集上进行了训练和评估,分别是阿拉伯语数据集AFND和ANS,以及英语数据集WELFake。结果:实验结果表明,该模型在AFND、ANS和WELFake上的准确率分别为(94.43±0.19)%、(71.63±1.45)%和(98.85±0.03)%,优于最先进的(SOTA)方法。讨论:这项工作提供了一种强有力的方法来打击错误信息,为提高社交网络上信息的可靠性提供了实际应用。
{"title":"Hybrid deep learning models for fake news detection: case study on Arabic and English languages.","authors":"Baqer M Merzah, Jafar Razmara, Zolfaghar Salmanian","doi":"10.3389/fdata.2025.1683786","DOIUrl":"10.3389/fdata.2025.1683786","url":null,"abstract":"<p><strong>Introduction: </strong>Fake news has become a significant threat to public discourse due to the swift spread of online content and the difficulty of detecting and distinguishing it from real news. This challenge is further amplified by society's increasing dependence on online social networks. Many researchers have developed machine learning and deep learning models to combat the spread of misinformation and identify fake news. However, the studies focused on a single language, and the performance analysis achieved a low accuracy, especially for Arabic, which faces challenges due to resource constraints and linguistic intricacies.</p><p><strong>Methods: </strong>This paper introduces an effective deep-learning technique for fake news detection (FND) in Arabic and English. The proposed model integrates a multi-channel Convolutional Neural Network (CNN) and dual Bidirectional Long Short-Term Memory (BiLSTM), parallelly capturing semantic and local textual features embedded by a pre-trained FastText model. Subsequently, a global max-pooling layer was added to reduce dimensionality and extract salient features from the sequential output. Finally, the model classifies news as fake or real. Moreover, the model is trained and evaluated on three benchmark datasets, AFND and ANS, Arabic datasets, and WELFake, an English dataset.</p><p><strong>Results: </strong>Experimental results highlight the model's effectiveness and performance superiority over state-of-the-art (SOTA) approaches, with (94.43 ± 0.19) %, (71.63 ± 1.45) %, and (98.85 ± 0.03) %, accuracy on AFND, ANS and WELFake, respectively.</p><p><strong>Discussion: </strong>This work provides a robust approach to combating misinformation, offering practical applications in enhancing the reliability of information on social networks.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1683786"},"PeriodicalIF":2.4,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12815712/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1