Recent advances in Artificial Intelligence have highlighted the role of Machine Learning in healthcare decision-making, but centralized data collection raises significant privacy risks. Federated Learning addresses this by enabling collaborative training across multiple clients without sharing raw data. However, Federated Learning remains vulnerable to security threats that can compromise model reliability. This paper proposes a dual-security Federated Learning framework that integrates Fernet Symmetric Encryption for secure transmission of model updates using symmetric encryption and an Intrusion Detection System to detect anomalous client behavior. Experiments on a publicly available healthcare dataset show that the proposed system enhances privacy and robustness compared to traditional FL. Among tested models, including Logistic Regression, Random Forest, and SVC, the ensemble method achieved the best performance with 99% accuracy.
{"title":"EnDuSecFed: an ensemble approach for privacy preserving Federated Learning with dual-security framework for sustainable healthcare.","authors":"Bela Shrimali, Jenil Gajjar, Swapnoneel Roy, Sanjay Patel, Kanu Patel, Ramesh Ram Naik","doi":"10.3389/fdata.2025.1659026","DOIUrl":"https://doi.org/10.3389/fdata.2025.1659026","url":null,"abstract":"<p><p>Recent advances in Artificial Intelligence have highlighted the role of Machine Learning in healthcare decision-making, but centralized data collection raises significant privacy risks. Federated Learning addresses this by enabling collaborative training across multiple clients without sharing raw data. However, Federated Learning remains vulnerable to security threats that can compromise model reliability. This paper proposes a dual-security Federated Learning framework that integrates Fernet Symmetric Encryption for secure transmission of model updates using symmetric encryption and an Intrusion Detection System to detect anomalous client behavior. Experiments on a publicly available healthcare dataset show that the proposed system enhances privacy and robustness compared to traditional FL. Among tested models, including Logistic Regression, Random Forest, and SVC, the ensemble method achieved the best performance with 99% accuracy.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1659026"},"PeriodicalIF":2.4,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12878652/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1679897
Ashikur Rahman, Fahmid Al Farid, Mohammad Abul Bashar, Jia Uddin, Arif Mahmud, Hezerul Abdul Karim
Introduction: The rise of contactless payment has made quick response (QR) code-mobile payment (QR-MP) platform increasingly popular among mobile financial service (MFS) users, especially in emerging economies. It has been demonstrated that the ongoing use of QR payments can significantly drive the growth of emerging economies. However, despite its importance, the continued use of this technology has not been satisfactory. Thus, this study seeks to explore the modified Unified Theory of Acceptance and Use of Technology 2 (UTAUT2) model, including four additional constructs: amotivation (AM), alternative attractiveness (AA), QR transaction anxiety (QTA), and transaction convenience (TC) to examine the MFS users' sustained usage of QR payment.
Methods: Data were collected from 247 MFS users in Bangladesh using an online survey and analyzed through SEM-PLS and non-linear analysis of IPMA.
Results: The research findings reveal that effort expectancy is the most influential factor, and that both moderator factors, QTA and TC, are significant. However, social influence and hedonic motivation were found to be insignificant. Furthermore, our extended research model explains 76.5% of the variance in CINT without the moderation effect.
Discussion: The IPMA findings help to find the best-performing variables and provide practical insights for this study. Theoretical and managerial implications are provided to enrich the existing literature on the study of information technology, indicating how MFS providers in developing countries can retain their existing users.
简介:非接触式支付的兴起,使得QR码移动支付平台在移动金融服务(MFS)用户中越来越受欢迎,尤其是在新兴经济体。事实证明,二维码支付的持续使用可以显著推动新兴经济体的增长。然而,尽管这项技术很重要,但它的继续使用并不令人满意。因此,本研究试图通过改进的UTAUT2 (Unified Theory of Acceptance and Use of Technology 2)模型,包括动机(motivation, AM)、替代吸引力(alternative attractiveness, AA)、QR交易焦虑(transaction anxiety, QTA)和交易便利(transaction convenience, TC)四个构式,来考察MFS用户对QR支付的持续使用。方法:采用在线调查的方式收集孟加拉国247名MFS用户的数据,并通过SEM-PLS和非线性IPMA分析进行分析。结果:研究结果表明,努力期望是最重要的影响因素,并且QTA和TC这两个调节因素都显著。然而,社会影响和享乐动机的影响不显著。此外,我们的扩展研究模型解释了76.5%的CINT方差,没有调节效应。讨论:IPMA的研究结果有助于找到表现最佳的变量,并为本研究提供实用的见解。提供了理论和管理方面的影响,以丰富现有的信息技术研究文献,指出发展中国家的MFS提供者如何能够保留其现有用户。
{"title":"Examining the influence of deterrent and enhancement factors on QR-code mobile payment continuance intention: insights from PLS-SEM and IPMA analysis.","authors":"Ashikur Rahman, Fahmid Al Farid, Mohammad Abul Bashar, Jia Uddin, Arif Mahmud, Hezerul Abdul Karim","doi":"10.3389/fdata.2025.1679897","DOIUrl":"https://doi.org/10.3389/fdata.2025.1679897","url":null,"abstract":"<p><strong>Introduction: </strong>The rise of contactless payment has made quick response (QR) code-mobile payment (QR-MP) platform increasingly popular among mobile financial service (MFS) users, especially in emerging economies. It has been demonstrated that the ongoing use of QR payments can significantly drive the growth of emerging economies. However, despite its importance, the continued use of this technology has not been satisfactory. Thus, this study seeks to explore the modified Unified Theory of Acceptance and Use of Technology 2 (UTAUT2) model, including four additional constructs: amotivation (AM), alternative attractiveness (AA), QR transaction anxiety (QTA), and transaction convenience (TC) to examine the MFS users' sustained usage of QR payment.</p><p><strong>Methods: </strong>Data were collected from 247 MFS users in Bangladesh using an online survey and analyzed through SEM-PLS and non-linear analysis of IPMA.</p><p><strong>Results: </strong>The research findings reveal that effort expectancy is the most influential factor, and that both moderator factors, QTA and TC, are significant. However, social influence and hedonic motivation were found to be insignificant. Furthermore, our extended research model explains 76.5% of the variance in CINT without the moderation effect.</p><p><strong>Discussion: </strong>The IPMA findings help to find the best-performing variables and provide practical insights for this study. Theoretical and managerial implications are provided to enrich the existing literature on the study of information technology, indicating how MFS providers in developing countries can retain their existing users.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1679897"},"PeriodicalIF":2.4,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12873545/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1723155
Mayuri Kate, Suresh Neethirajan
The convergence of IoT sensing, edge computing, and machine learning is revolutionizing precision livestock farming. Yet bioacoustic data streams remain underexploited due to computational-complexity and ecological-validity challenges. We present one of the most comprehensive bovine vocalization datasets to date-569 expertly curated clips spanning 48 behavioral classes, recorded across three commercial dairy farms using multi-microphone arrays and expanded to 2,900 samples through domain-informed data augmentation. This FAIR-compliant resource addresses key Big Data challenges: volume (90 h of raw recordings, 65.6 GB), variety (multi-farm, multi-zone acoustic environments), velocity (real-time processing requirements), and veracity (noise-robust feature-extraction pipelines). A modular data-processing workflow combines denoising implemented both in iZotope RX 11 for quality control and an equivalent open-source Python pipeline using noisereduce, multi-modal synchronization (audio-video alignment), and standardized feature engineering (24 acoustic descriptors via Praat, librosa, and openSMILE) to enable scalable welfare monitoring. Preliminary machine-learning benchmarks reveal distinct class-wise acoustic signatures across estrus detection, distress classification, and maternal-communication recognition. The dataset's ecological realism-embracing authentic barn acoustics rather than controlled conditions-ensures deployment-ready model development. This work establishes the foundation for animal-centered AI, where bioacoustic streams enable continuous, non-invasive welfare assessment at industrial scale. By releasing a Zenodo-hosted, FAIR-compliant dataset (restricted access) and an open-source preprocessing pipeline on GitHub, together with comprehensive metadata schemas, we advance reproducible research at the intersection of Big Data analytics, sustainable agriculture, and precision livestock management. The framework directly supports UN SDG 9, demonstrating how data science can transform traditional farming into intelligent, welfare-optimized production systems capable of meeting global food demands while maintaining ethical animal-care standards.
{"title":"Big data approaches to bovine bioacoustics: a FAIR-compliant dataset and scalable ML framework for precision livestock welfare.","authors":"Mayuri Kate, Suresh Neethirajan","doi":"10.3389/fdata.2025.1723155","DOIUrl":"10.3389/fdata.2025.1723155","url":null,"abstract":"<p><p>The convergence of IoT sensing, edge computing, and machine learning is revolutionizing precision livestock farming. Yet bioacoustic data streams remain underexploited due to computational-complexity and ecological-validity challenges. We present one of the most comprehensive bovine vocalization datasets to date-569 expertly curated clips spanning 48 behavioral classes, recorded across three commercial dairy farms using multi-microphone arrays and expanded to 2,900 samples through domain-informed data augmentation. This FAIR-compliant resource addresses key Big Data challenges: volume (90 h of raw recordings, 65.6 GB), variety (multi-farm, multi-zone acoustic environments), velocity (real-time processing requirements), and veracity (noise-robust feature-extraction pipelines). A modular data-processing workflow combines denoising implemented both in iZotope RX 11 for quality control and an equivalent open-source Python pipeline using noisereduce, multi-modal synchronization (audio-video alignment), and standardized feature engineering (24 acoustic descriptors via Praat, librosa, and openSMILE) to enable scalable welfare monitoring. Preliminary machine-learning benchmarks reveal distinct class-wise acoustic signatures across estrus detection, distress classification, and maternal-communication recognition. The dataset's ecological realism-embracing authentic barn acoustics rather than controlled conditions-ensures deployment-ready model development. This work establishes the foundation for animal-centered AI, where bioacoustic streams enable continuous, non-invasive welfare assessment at industrial scale. By releasing a Zenodo-hosted, FAIR-compliant dataset (restricted access) and an open-source preprocessing pipeline on GitHub, together with comprehensive metadata schemas, we advance reproducible research at the intersection of Big Data analytics, sustainable agriculture, and precision livestock management. The framework directly supports UN SDG 9, demonstrating how data science can transform traditional farming into intelligent, welfare-optimized production systems capable of meeting global food demands while maintaining ethical animal-care standards.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1723155"},"PeriodicalIF":2.4,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12855049/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146108455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1753871
Keke Zhang, Mingyu Guan, Chao Wu, Yutong Li, Qingguo Lü, Yi Liu, Yi Wang, Wei Wang, Wei Zhang
Chinese seals are widely used in various fields within Chinese society as a tool for certifying legal documents. However, recognizing text on these seals presents challenges due to background text, high noise levels, and minimalistic image features. This paper introduces a hybrid model to address these difficulties in Chinese seal text recognition. Our model integrates preprocessing techniques tailored for real seals, a deep learning-based position correction model, a circular text unwrapping model, and OCR text recognition. First, we apply a color-based method to effectively remove the black background text on seals, eliminating redundant information while retaining crucial features for further analysis. Next, we introduce an innovative image denoising algorithm to significantly improve the system's robustness in processing noisy seal images. Additionally, we develop a deep learning-based angle prediction network and create synthetic datasets that mimic real seal scenes, enabling optimal seal image positioning for enhanced text flattening and recognition, thus boosting overall system performance. Finally, polar coordinate transformation is employed to convert the circular seal into a rectangular image for more efficient text recognition. Experimental results indicate that our proposed methods effectively enhance the accuracy of seal text recognition.
{"title":"Deep learning-enabled hybrid systems for accurate recognition of text in seal images.","authors":"Keke Zhang, Mingyu Guan, Chao Wu, Yutong Li, Qingguo Lü, Yi Liu, Yi Wang, Wei Wang, Wei Zhang","doi":"10.3389/fdata.2025.1753871","DOIUrl":"10.3389/fdata.2025.1753871","url":null,"abstract":"<p><p>Chinese seals are widely used in various fields within Chinese society as a tool for certifying legal documents. However, recognizing text on these seals presents challenges due to background text, high noise levels, and minimalistic image features. This paper introduces a hybrid model to address these difficulties in Chinese seal text recognition. Our model integrates preprocessing techniques tailored for real seals, a deep learning-based position correction model, a circular text unwrapping model, and OCR text recognition. First, we apply a color-based method to effectively remove the black background text on seals, eliminating redundant information while retaining crucial features for further analysis. Next, we introduce an innovative image denoising algorithm to significantly improve the system's robustness in processing noisy seal images. Additionally, we develop a deep learning-based angle prediction network and create synthetic datasets that mimic real seal scenes, enabling optimal seal image positioning for enhanced text flattening and recognition, thus boosting overall system performance. Finally, polar coordinate transformation is employed to convert the circular seal into a rectangular image for more efficient text recognition. Experimental results indicate that our proposed methods effectively enhance the accuracy of seal text recognition.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1753871"},"PeriodicalIF":2.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12847014/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Introduction: The COVID-19 pandemic has significantly influenced public interest in health-related behaviors, as reflected in online search trends. Analyzing these trends provides insights into shifting health concerns and informing future public health strategies. This study examined Google Trends data to assess the changes in public interest in mental health, healthy diet, sleep, screen time, physical activity, and tobacco smoking before, during, and after the COVID-19 pandemic.
Methods: Google Trends data (2019-2023) were analyzed using joinpoint regression to identify statistically significant shifts in relative search volume (RSV) over time. Additionally, the Mann-Whitney U test was conducted to examine differences in mean RSV across time period.
Results: Awareness that consistently increased during and after the pandemic was observed in mental health, particularly anxiety, and sleep patterns. These topics showed significant positive trends in joinpoint regression and higher mean RSVs, with statistically significant differences across time periods (p < 0.05). In contrast, some behaviors such as physical activity and screen time saw increased awareness only during the pandemic but did not sustain afterward. Whilst, dietary behavior and smoking either remained stagnant or declined, indicating limited or declining public interest despite their relevance to health outcomes.
Conclusion: Digital interest in health behaviors varied during and after COVID-19, with only mental health and sleep showing sustained concern. However, spikes in awareness often reflected personally relevant issues, highlighting Google Trends' potential as an early signal for health promotion efforts.
{"title":"Dynamic patterns of healthy lifestyle awareness after COVID-19: a study using Google Trends and joinpoint regression.","authors":"Zahroh Shaluhiyah, Shabrina Arifia Qatrannada, Roshan Kumar Mahato, Farid Agushybana, Sri Handayani, Dzul Fahmi Afriyanto, Usha Rani, Dewie Sulistyorini","doi":"10.3389/fdata.2025.1717592","DOIUrl":"https://doi.org/10.3389/fdata.2025.1717592","url":null,"abstract":"<p><strong>Introduction: </strong>The COVID-19 pandemic has significantly influenced public interest in health-related behaviors, as reflected in online search trends. Analyzing these trends provides insights into shifting health concerns and informing future public health strategies. This study examined Google Trends data to assess the changes in public interest in mental health, healthy diet, sleep, screen time, physical activity, and tobacco smoking before, during, and after the COVID-19 pandemic.</p><p><strong>Methods: </strong>Google Trends data (2019-2023) were analyzed using joinpoint regression to identify statistically significant shifts in relative search volume (RSV) over time. Additionally, the Mann-Whitney U test was conducted to examine differences in mean RSV across time period.</p><p><strong>Results: </strong>Awareness that consistently increased during and after the pandemic was observed in mental health, particularly anxiety, and sleep patterns. These topics showed significant positive trends in joinpoint regression and higher mean RSVs, with statistically significant differences across time periods (<i>p</i> < 0.05). In contrast, some behaviors such as physical activity and screen time saw increased awareness only during the pandemic but did not sustain afterward. Whilst, dietary behavior and smoking either remained stagnant or declined, indicating limited or declining public interest despite their relevance to health outcomes.</p><p><strong>Conclusion: </strong>Digital interest in health behaviors varied during and after COVID-19, with only mental health and sleep showing sustained concern. However, spikes in awareness often reflected personally relevant issues, highlighting Google Trends' potential as an early signal for health promotion efforts.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1717592"},"PeriodicalIF":2.4,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12834755/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1700292
Ahmad Nasayreh, Hasan Gharaibeh, Rula Al-Qawabah, Azza Gharaibeh, Bayan Altalla, Iyad Sultan
Colorectal cancer represents the third most diagnosed malignancy globally, with liver metastasis occurring in approximately 50-60% of patients following initial treatment. Current surveillance strategies utilizing carcinoembryonic antigen monitoring and interval cross-sectional imaging demonstrate significant limitations in early hepatic recurrence detection, often identifying disease at advanced, unresectable stages. This study addresses the critical research gap in AI-driven surveillance frameworks by developing a novel ensemble deep learning model for early liver metastasis prediction in colorectal cancer patients. The methodology employed six state-of-the-art architectures including ResNet50, MobileNetV2, DenseNet121, CNN-LSTM, and Swin Transformer as feature extractors through transfer learning, followed by weighted soft voting ensemble learning combining the top-performing models. The framework was evaluated on a comprehensive dataset of 1,628 medical images from colorectal cancer patients, with rigorous statistical validation using Friedman and Wilcoxon signed-rank tests. Results demonstrated that the ensemble model combining ResNet50 and Swin Transformer achieved superior performance with 75.48% accuracy, 79.0% sensitivity, 73.6% specificity, and 0.8115 AUC, representing statistically significant improvements over all individual architectures. The ensemble approach successfully addressed the challenging nature of the dataset where multiple state-of-the-art models achieved near-random performance, demonstrating the effectiveness of architectural diversity in medical image analysis. The clinical impact of this work extends to enhancing early detection capabilities that could increase patient eligibility for curative interventions, with balanced diagnostic performance suitable for surveillance applications. The computationally efficient framework requires only 0.39 s per image inference time, making it feasible for integration into existing clinical workflows and potentially improving outcomes for colorectal cancer patients through earlier identification of hepatic recurrence.
{"title":"Improving early liver metastasis detection in colorectal cancer using a weighted ensemble of ResNet50 and swin transformer: a KHCC study.","authors":"Ahmad Nasayreh, Hasan Gharaibeh, Rula Al-Qawabah, Azza Gharaibeh, Bayan Altalla, Iyad Sultan","doi":"10.3389/fdata.2025.1700292","DOIUrl":"10.3389/fdata.2025.1700292","url":null,"abstract":"<p><p>Colorectal cancer represents the third most diagnosed malignancy globally, with liver metastasis occurring in approximately 50-60% of patients following initial treatment. Current surveillance strategies utilizing carcinoembryonic antigen monitoring and interval cross-sectional imaging demonstrate significant limitations in early hepatic recurrence detection, often identifying disease at advanced, unresectable stages. This study addresses the critical research gap in AI-driven surveillance frameworks by developing a novel ensemble deep learning model for early liver metastasis prediction in colorectal cancer patients. The methodology employed six state-of-the-art architectures including ResNet50, MobileNetV2, DenseNet121, CNN-LSTM, and Swin Transformer as feature extractors through transfer learning, followed by weighted soft voting ensemble learning combining the top-performing models. The framework was evaluated on a comprehensive dataset of 1,628 medical images from colorectal cancer patients, with rigorous statistical validation using Friedman and Wilcoxon signed-rank tests. Results demonstrated that the ensemble model combining ResNet50 and Swin Transformer achieved superior performance with 75.48% accuracy, 79.0% sensitivity, 73.6% specificity, and 0.8115 AUC, representing statistically significant improvements over all individual architectures. The ensemble approach successfully addressed the challenging nature of the dataset where multiple state-of-the-art models achieved near-random performance, demonstrating the effectiveness of architectural diversity in medical image analysis. The clinical impact of this work extends to enhancing early detection capabilities that could increase patient eligibility for curative interventions, with balanced diagnostic performance suitable for surveillance applications. The computationally efficient framework requires only 0.39 s per image inference time, making it feasible for integration into existing clinical workflows and potentially improving outcomes for colorectal cancer patients through earlier identification of hepatic recurrence.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1700292"},"PeriodicalIF":2.4,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12832282/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146068527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1699561
M S Pavithran, S M Vadivel
Introduction: Employee turnover remains a significant challenge for organizations as it becomes difficult for them to retain the same employees and continue with their operations efficiently. With the assistance of predictive analytics, HR managers will be able to foresee and lower the potential turnover. Conventional research has focused on the effectiveness of technical models, yet there is a lack of studies investigating the interpretability and reliability of managerial forecasts.
Methods: This research used the Employee Attrition dataset and applied various pre-processing methods, including label encoding, feature scaling, and SMOTE for class balancing. Machine learning models were trained and optimized using grid search with stratified cross-validation. The best-performing model was calibrated using the sigmoid method to ensure the accuracy of the predicted probabilities. LIME enabled local interpretability, thus providing practical insights into individual employee attrition-related risks. Permutation feature importance analysis and SHAP summary plots helped in better understanding the model by showing the individual features that contributed to the attrition probability.
Results: The Random Forest classifier achieved the highest AUC-ROC score of 97.37%. Risk distribution visualizations highlight employees with the highest attrition probability, and calibration is the main reason for the Brier Score reduction from 0.03873 to 0.03480.
Discussion: The study concludes that by prioritizing interventions and increasing the accuracy of retention strategies, a calibrated, interpretable, and risk-stratified model can enhance HR decision-making. This framework aids HR leaders in transitioning from reactive to proactive workforce management by leveraging data-driven insights.
{"title":"Explainable attrition risk scoring for managerial retention decisions in human resource analytics.","authors":"M S Pavithran, S M Vadivel","doi":"10.3389/fdata.2025.1699561","DOIUrl":"10.3389/fdata.2025.1699561","url":null,"abstract":"<p><strong>Introduction: </strong>Employee turnover remains a significant challenge for organizations as it becomes difficult for them to retain the same employees and continue with their operations efficiently. With the assistance of predictive analytics, HR managers will be able to foresee and lower the potential turnover. Conventional research has focused on the effectiveness of technical models, yet there is a lack of studies investigating the interpretability and reliability of managerial forecasts.</p><p><strong>Methods: </strong>This research used the Employee Attrition dataset and applied various pre-processing methods, including label encoding, feature scaling, and SMOTE for class balancing. Machine learning models were trained and optimized using grid search with stratified cross-validation. The best-performing model was calibrated using the sigmoid method to ensure the accuracy of the predicted probabilities. LIME enabled local interpretability, thus providing practical insights into individual employee attrition-related risks. Permutation feature importance analysis and SHAP summary plots helped in better understanding the model by showing the individual features that contributed to the attrition probability.</p><p><strong>Results: </strong>The Random Forest classifier achieved the highest AUC-ROC score of 97.37%. Risk distribution visualizations highlight employees with the highest attrition probability, and calibration is the main reason for the Brier Score reduction from 0.03873 to 0.03480.</p><p><strong>Discussion: </strong>The study concludes that by prioritizing interventions and increasing the accuracy of retention strategies, a calibrated, interpretable, and risk-stratified model can enhance HR decision-making. This framework aids HR leaders in transitioning from reactive to proactive workforce management by leveraging data-driven insights.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1699561"},"PeriodicalIF":2.4,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12832383/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146068532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1670833
Tarak Hussain, B Tirapathi Reddy, Kondaveti Phanindra, Sailaja Terumalasetti, Ghufran Ahmad Khan
Deepfake technology evolves at an alarming pace, threatening information integrity and social trust. We present new multimodal deepfake detection framework exploiting cross-domain inconsistencies, utilizing audio-visual consistency. Its core is the Synchronization-Aware Feature Fusion (SAFF) architecture combined with Cross-Modal Graph Attention Networks (CM-GAN), both addressing the temporal misalignments explicitly for improved detection accuracy. Across eight models and five benchmark datasets with 93,750 test samples, the framework obtains 98.76% accuracy and significant robustness against multiple compression levels. Synchronized audio-visual inconsistencies are thus highly discriminative according to statistical analysis (Cohen's d = 1.87). With contributions centering around a cross-modal feature extraction pipeline, a graph-based attention mechanism for inter-modal reasoning and an extensive number of ablation studies validating the fusion strategy, the paper also provides statistically sound insights to guide future pursuit in this area. With a 17.85% generalization advantage over unimodal methods, the framework represents a new state of the art and introduces a self-supervised pre-training strategy that leverages labeled data 65% less.
深度造假技术以惊人的速度发展,威胁着信息完整性和社会信任。我们提出了新的多模态深度伪造检测框架,利用跨域不一致性,利用视听一致性。其核心是同步感知特征融合(SAFF)架构与跨模态图注意网络(CM-GAN)相结合,两者都明确地解决了时间偏差,以提高检测精度。在包含93750个测试样本的8个模型和5个基准数据集中,该框架获得了98.76%的准确率和对多个压缩级别的显著鲁棒性。因此,根据统计分析,同步视听不一致具有高度的判别性(Cohen’s d = 1.87)。本文围绕跨模态特征提取管道、基于图的跨模态推理注意机制以及验证融合策略的大量消融研究做出了贡献,为指导该领域的未来追求提供了统计上的合理见解。与单峰方法相比,该框架具有17.85%的泛化优势,代表了一种新的技术状态,并引入了一种自我监督的预训练策略,该策略对标记数据的利用减少了65%。
{"title":"Decoding deception: state-of-the-art approaches to deep fake detection.","authors":"Tarak Hussain, B Tirapathi Reddy, Kondaveti Phanindra, Sailaja Terumalasetti, Ghufran Ahmad Khan","doi":"10.3389/fdata.2025.1670833","DOIUrl":"10.3389/fdata.2025.1670833","url":null,"abstract":"<p><p>Deepfake technology evolves at an alarming pace, threatening information integrity and social trust. We present new multimodal deepfake detection framework exploiting cross-domain inconsistencies, utilizing audio-visual consistency. Its core is the Synchronization-Aware Feature Fusion (SAFF) architecture combined with Cross-Modal Graph Attention Networks (CM-GAN), both addressing the temporal misalignments explicitly for improved detection accuracy. Across eight models and five benchmark datasets with 93,750 test samples, the framework obtains 98.76% accuracy and significant robustness against multiple compression levels. Synchronized audio-visual inconsistencies are thus highly discriminative according to statistical analysis (Cohen's <i>d</i> = 1.87). With contributions centering around a cross-modal feature extraction pipeline, a graph-based attention mechanism for inter-modal reasoning and an extensive number of ablation studies validating the fusion strategy, the paper also provides statistically sound insights to guide future pursuit in this area. With a 17.85% generalization advantage over unimodal methods, the framework represents a new state of the art and introduces a self-supervised pre-training strategy that leverages labeled data 65% less.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1670833"},"PeriodicalIF":2.4,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12827133/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1686452
Amar Ahmad, Yvonne Vallès, Youssef Idaghdour
Artificial Intelligence (AI) systems are increasingly embedded in high-stakes decision-making across domains such as healthcare, finance, criminal justice, and employment. Evidence has been accumulated showing that these systems can reproduce and amplify structural inequities, leading to ethical, social, and technical concerns. In this review, formal mathematical definitions of bias are integrated with socio-technical perspectives to examine its origins, manifestations, and impacts. Bias is categorized into four interrelated families: historical/representational, selection/measurement, algorithmic/optimization, and feedback/emergent, and its operation is illustrated through case studies in facial recognition, large language models, credit scoring, healthcare, employment, and criminal justice. Current mitigation strategies are critically evaluated, including dataset diversification, fairness-aware modeling, post-deployment auditing, regulatory frameworks, and participatory design. An integrated framework is proposed in which statistical diagnostics are coupled with governance mechanisms to enable bias mitigation across the entire AI lifecycle. By bridging technical precision with sociological insight, guidance is offered for the development of AI systems that are equitable, accountable, and responsive to the needs of diverse populations.
{"title":"Bias in AI systems: integrating formal and socio-technical approaches.","authors":"Amar Ahmad, Yvonne Vallès, Youssef Idaghdour","doi":"10.3389/fdata.2025.1686452","DOIUrl":"10.3389/fdata.2025.1686452","url":null,"abstract":"<p><p>Artificial Intelligence (AI) systems are increasingly embedded in high-stakes decision-making across domains such as healthcare, finance, criminal justice, and employment. Evidence has been accumulated showing that these systems can reproduce and amplify structural inequities, leading to ethical, social, and technical concerns. In this review, formal mathematical definitions of bias are integrated with socio-technical perspectives to examine its origins, manifestations, and impacts. Bias is categorized into four interrelated families: historical/representational, selection/measurement, algorithmic/optimization, and feedback/emergent, and its operation is illustrated through case studies in facial recognition, large language models, credit scoring, healthcare, employment, and criminal justice. Current mitigation strategies are critically evaluated, including dataset diversification, fairness-aware modeling, post-deployment auditing, regulatory frameworks, and participatory design. An integrated framework is proposed in which statistical diagnostics are coupled with governance mechanisms to enable bias mitigation across the entire AI lifecycle. By bridging technical precision with sociological insight, guidance is offered for the development of AI systems that are equitable, accountable, and responsive to the needs of diverse populations.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1686452"},"PeriodicalIF":2.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823528/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1683786
Baqer M Merzah, Jafar Razmara, Zolfaghar Salmanian
Introduction: Fake news has become a significant threat to public discourse due to the swift spread of online content and the difficulty of detecting and distinguishing it from real news. This challenge is further amplified by society's increasing dependence on online social networks. Many researchers have developed machine learning and deep learning models to combat the spread of misinformation and identify fake news. However, the studies focused on a single language, and the performance analysis achieved a low accuracy, especially for Arabic, which faces challenges due to resource constraints and linguistic intricacies.
Methods: This paper introduces an effective deep-learning technique for fake news detection (FND) in Arabic and English. The proposed model integrates a multi-channel Convolutional Neural Network (CNN) and dual Bidirectional Long Short-Term Memory (BiLSTM), parallelly capturing semantic and local textual features embedded by a pre-trained FastText model. Subsequently, a global max-pooling layer was added to reduce dimensionality and extract salient features from the sequential output. Finally, the model classifies news as fake or real. Moreover, the model is trained and evaluated on three benchmark datasets, AFND and ANS, Arabic datasets, and WELFake, an English dataset.
Results: Experimental results highlight the model's effectiveness and performance superiority over state-of-the-art (SOTA) approaches, with (94.43 ± 0.19) %, (71.63 ± 1.45) %, and (98.85 ± 0.03) %, accuracy on AFND, ANS and WELFake, respectively.
Discussion: This work provides a robust approach to combating misinformation, offering practical applications in enhancing the reliability of information on social networks.
{"title":"Hybrid deep learning models for fake news detection: case study on Arabic and English languages.","authors":"Baqer M Merzah, Jafar Razmara, Zolfaghar Salmanian","doi":"10.3389/fdata.2025.1683786","DOIUrl":"10.3389/fdata.2025.1683786","url":null,"abstract":"<p><strong>Introduction: </strong>Fake news has become a significant threat to public discourse due to the swift spread of online content and the difficulty of detecting and distinguishing it from real news. This challenge is further amplified by society's increasing dependence on online social networks. Many researchers have developed machine learning and deep learning models to combat the spread of misinformation and identify fake news. However, the studies focused on a single language, and the performance analysis achieved a low accuracy, especially for Arabic, which faces challenges due to resource constraints and linguistic intricacies.</p><p><strong>Methods: </strong>This paper introduces an effective deep-learning technique for fake news detection (FND) in Arabic and English. The proposed model integrates a multi-channel Convolutional Neural Network (CNN) and dual Bidirectional Long Short-Term Memory (BiLSTM), parallelly capturing semantic and local textual features embedded by a pre-trained FastText model. Subsequently, a global max-pooling layer was added to reduce dimensionality and extract salient features from the sequential output. Finally, the model classifies news as fake or real. Moreover, the model is trained and evaluated on three benchmark datasets, AFND and ANS, Arabic datasets, and WELFake, an English dataset.</p><p><strong>Results: </strong>Experimental results highlight the model's effectiveness and performance superiority over state-of-the-art (SOTA) approaches, with (94.43 ± 0.19) %, (71.63 ± 1.45) %, and (98.85 ± 0.03) %, accuracy on AFND, ANS and WELFake, respectively.</p><p><strong>Discussion: </strong>This work provides a robust approach to combating misinformation, offering practical applications in enhancing the reliability of information on social networks.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1683786"},"PeriodicalIF":2.4,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12815712/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}