Pub Date : 2025-11-10eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1705587
Yunhong Bu, Tingshan Yao, Shaowu Geng, Renjie Huang
Introduction: Tobacco growers usually face particular challenges in predicting the risks of tobacco root diseases due to complex pathogenesis, concealed early symptoms, and heterogeneous farm conditions.
Methods: To address this problem, we proposed a flexible Probabilistic Hybrid Temporal Fusion Network with Random Period Mask (PHTFNet-RPM). This model is designed to forecast future multi-day disease incidences and indices. It incorporates a hybrid input structure with RPM to handle configurable static management variables and time-series data of weather factors and disease metrics, using the RPM to simulate diverse absences of historical observations. The model's internal hierarchically aggregated modules learn cross-variable and cross-temporal feature representations to model the complex non-linear relationships. Furthermore, probabilistic theory-based uncertainty quantification is designed to enhance the model's credibility and reliability.
Results: The proposed PHTFNet-RPM was validated using a large-scale time-series dataset of tobacco root diseases, organized from 20-year meteorological and disease survey records in Chuxiong Prefecture, Yunnan Province. Extensive comparative experiments demonstrated that our model achieves a 4.44%-16.43% lower mean absolute error (MAE) than existing models (including LR, SVR, CNN-LSTM, and LSTM-Attention).
Discussion: The results confirm that the model can reliably forecast disease progression trends under different configurations, even when relying solely on historical weather observations. The integration of uncertainty quantification provides a robust tool for assessing prediction reliability, offering significant practical value for disease management.
{"title":"PHTFNet-RPM: a probabilistic hybrid network with RPM for tobacco root disease forecasting.","authors":"Yunhong Bu, Tingshan Yao, Shaowu Geng, Renjie Huang","doi":"10.3389/fdata.2025.1705587","DOIUrl":"10.3389/fdata.2025.1705587","url":null,"abstract":"<p><strong>Introduction: </strong>Tobacco growers usually face particular challenges in predicting the risks of tobacco root diseases due to complex pathogenesis, concealed early symptoms, and heterogeneous farm conditions.</p><p><strong>Methods: </strong>To address this problem, we proposed a flexible Probabilistic Hybrid Temporal Fusion Network with Random Period Mask (PHTFNet-RPM). This model is designed to forecast future multi-day disease incidences and indices. It incorporates a hybrid input structure with RPM to handle configurable static management variables and time-series data of weather factors and disease metrics, using the RPM to simulate diverse absences of historical observations. The model's internal hierarchically aggregated modules learn cross-variable and cross-temporal feature representations to model the complex non-linear relationships. Furthermore, probabilistic theory-based uncertainty quantification is designed to enhance the model's credibility and reliability.</p><p><strong>Results: </strong>The proposed PHTFNet-RPM was validated using a large-scale time-series dataset of tobacco root diseases, organized from 20-year meteorological and disease survey records in Chuxiong Prefecture, Yunnan Province. Extensive comparative experiments demonstrated that our model achieves a 4.44%-16.43% lower mean absolute error (MAE) than existing models (including LR, SVR, CNN-LSTM, and LSTM-Attention).</p><p><strong>Discussion: </strong>The results confirm that the model can reliably forecast disease progression trends under different configurations, even when relying solely on historical weather observations. The integration of uncertainty quantification provides a robust tool for assessing prediction reliability, offering significant practical value for disease management.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1705587"},"PeriodicalIF":2.4,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12640811/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145607330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-04eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1659757
Nida Nasir, Hussam Al Hamadi
Introduction: As cyber-physical systems become increasingly virtualized, digital twins have emerged as essential components for real-time monitoring, simulation, and control. However, their growing complexity and exposure to dynamic network environments make them vulnerable to sophisticated cyber threats. Traditional rule-based and machine-learning-based security models often fail to adapt in real time to evolving attack patterns, particularly in decentralized and resource-constrained settings.
Methods: This study introduces the Neuromorphic Cyber-Twin (NCT), a brain-inspired architectural framework that integrates spiking neural networks (SNNs) and event-driven cognition to enhance adaptive cyber defense. The NCT leverages neuromorphic principles such as sparse coding, temporal encoding, and spike-timing-dependent plasticity (STDP) to transform telemetry data from the digital-twin layer into spike-based sensory inputs. A layered cognitive architecture continuously monitors behavioral deviations, infers anomalies, and autonomously adapts its defensive responses in alignment with system dynamics.
Results: Lightweight prototype simulations demonstrate the feasibility of NCT-based event-driven anomaly detection and adaptive defense. The results highlight advantages in low-latency detection, contextual awareness, and energy efficiency compared with conventional machine-learning models.
Discussion: The NCT framework represents a biologically inspired paradigm for scalable, self-evolving cybersecurity in virtualized ecosystems. Potential applications include infrastructure monitoring, autonomous transportation, and industrial control systems. Comprehensive benchmarking and large-scale validation are identified as future research directions.
{"title":"Towards the neuromorphic Cyber-Twin: an architecture for cognitive defense in digital twin ecosystems.","authors":"Nida Nasir, Hussam Al Hamadi","doi":"10.3389/fdata.2025.1659757","DOIUrl":"10.3389/fdata.2025.1659757","url":null,"abstract":"<p><strong>Introduction: </strong>As cyber-physical systems become increasingly virtualized, digital twins have emerged as essential components for real-time monitoring, simulation, and control. However, their growing complexity and exposure to dynamic network environments make them vulnerable to sophisticated cyber threats. Traditional rule-based and machine-learning-based security models often fail to adapt in real time to evolving attack patterns, particularly in decentralized and resource-constrained settings.</p><p><strong>Methods: </strong>This study introduces the Neuromorphic Cyber-Twin (NCT), a brain-inspired architectural framework that integrates spiking neural networks (SNNs) and event-driven cognition to enhance adaptive cyber defense. The NCT leverages neuromorphic principles such as sparse coding, temporal encoding, and spike-timing-dependent plasticity (STDP) to transform telemetry data from the digital-twin layer into spike-based sensory inputs. A layered cognitive architecture continuously monitors behavioral deviations, infers anomalies, and autonomously adapts its defensive responses in alignment with system dynamics.</p><p><strong>Results: </strong>Lightweight prototype simulations demonstrate the feasibility of NCT-based event-driven anomaly detection and adaptive defense. The results highlight advantages in low-latency detection, contextual awareness, and energy efficiency compared with conventional machine-learning models.</p><p><strong>Discussion: </strong>The NCT framework represents a biologically inspired paradigm for scalable, self-evolving cybersecurity in virtualized ecosystems. Potential applications include infrastructure monitoring, autonomous transportation, and industrial control systems. Comprehensive benchmarking and large-scale validation are identified as future research directions.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1659757"},"PeriodicalIF":2.4,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12623207/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-31eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1579332
Karl Vachuska
The advent of widely available cell phone mobility data in the United States has rapidly expanded the study of everyday mobility patterns in social science research. A wide range of existing literature finds ambient population (e.g., visitors) estimates of an area to be predictive of crime. Much of the past research frames neighborhood visitor flows in predictive terms without necessarily indicating or implying a causal effect. Through the use of two causal inference approaches-conventional two-way fixed effects and a novel instrumental variable approach, this brief research report explicitly formulates the causal effect of visitors in counterfactual terms. This study addresses this gap by explicitly estimating the causal effect of visitor flows on crime rates. Using high-resolution mobility and crime data from New York City for the year 2019, I estimate the additive effect of visitors on the multiple measurements of criminal activity. While two-way fixed effects models show a significant effect of visitors on a wide array of crime forms, instrumental variable estimates indicate no statistically significant causal impact, with large standard errors indicating substantial uncertainty in visitors' effect on crime rates.
{"title":"Urban mobility and crime: causal inference using street closures as an instrumental variable.","authors":"Karl Vachuska","doi":"10.3389/fdata.2025.1579332","DOIUrl":"10.3389/fdata.2025.1579332","url":null,"abstract":"<p><p>The advent of widely available cell phone mobility data in the United States has rapidly expanded the study of everyday mobility patterns in social science research. A wide range of existing literature finds ambient population (e.g., visitors) estimates of an area to be predictive of crime. Much of the past research frames neighborhood visitor flows in predictive terms without necessarily indicating or implying a causal effect. Through the use of two causal inference approaches-conventional two-way fixed effects and a novel instrumental variable approach, this brief research report explicitly formulates the causal effect of visitors in counterfactual terms. This study addresses this gap by explicitly estimating the causal effect of visitor flows on crime rates. Using high-resolution mobility and crime data from New York City for the year 2019, I estimate the additive effect of visitors on the multiple measurements of criminal activity. While two-way fixed effects models show a significant effect of visitors on a wide array of crime forms, instrumental variable estimates indicate no statistically significant causal impact, with large standard errors indicating substantial uncertainty in visitors' effect on crime rates.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1579332"},"PeriodicalIF":2.4,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12615182/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145543826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-24eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1604887
Alexander Grote, Anuja Hariharan, Christof Weinhardt
Introduction: The analysis of discrete sequential data, such as event logs and customer clickstreams, is often challenged by the vast number of possible sequential patterns. This complexity makes it difficult to identify meaningful sequences and derive actionable insights.
Methods: We propose a novel feature selection algorithm, that integrates unsupervised sequential pattern mining with supervised machine learning. Unlike existing interpretable machine learning methods, we determine important sequential patterns during the mining process, eliminating the need for post-hoc classification to assess their relevance. Compared to existing interesting measures, we introduce a local, class-specific interestingness measure that is inherently interpretable.
Results: We evaluated the algorithm on three diverse datasets - churn prediction, malware sequence analysis, and a synthetic dataset - covering different sizes, application domains, and feature complexities. Our method achieved classification performance comparable to established feature selection algorithms while maintaining interpretability and reducing computational costs.
Discussion: This study demonstrates a practical and efficient approach for uncovering important sequential patterns in classification tasks. By combining interpretability with competitive predictive performance, our algorithm provides practitioners with an interpretable and efficient alternative to existing methods, paving the way for new advances in sequential data analysis.
{"title":"Finding the needle in the haystack-An interpretable sequential pattern mining method for classification problems.","authors":"Alexander Grote, Anuja Hariharan, Christof Weinhardt","doi":"10.3389/fdata.2025.1604887","DOIUrl":"10.3389/fdata.2025.1604887","url":null,"abstract":"<p><strong>Introduction: </strong>The analysis of discrete sequential data, such as event logs and customer clickstreams, is often challenged by the vast number of possible sequential patterns. This complexity makes it difficult to identify meaningful sequences and derive actionable insights.</p><p><strong>Methods: </strong>We propose a novel feature selection algorithm, that integrates unsupervised sequential pattern mining with supervised machine learning. Unlike existing interpretable machine learning methods, we determine important sequential patterns during the mining process, eliminating the need for post-hoc classification to assess their relevance. Compared to existing interesting measures, we introduce a local, class-specific interestingness measure that is inherently interpretable.</p><p><strong>Results: </strong>We evaluated the algorithm on three diverse datasets - churn prediction, malware sequence analysis, and a synthetic dataset - covering different sizes, application domains, and feature complexities. Our method achieved classification performance comparable to established feature selection algorithms while maintaining interpretability and reducing computational costs.</p><p><strong>Discussion: </strong>This study demonstrates a practical and efficient approach for uncovering important sequential patterns in classification tasks. By combining interpretability with competitive predictive performance, our algorithm provides practitioners with an interpretable and efficient alternative to existing methods, paving the way for new advances in sequential data analysis.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1604887"},"PeriodicalIF":2.4,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12604564/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145508005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-20eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1623883
Qian Xie, Junsheng Yan, Zhenhua Dai, Wengang Du, Xuefei Wu
The rapid advancement of artificial intelligence (AI) and machine learning (ML) technologies has opened up novel avenues for predicting coal and gas outbursts in coal mines. This study proposes a novel prediction framework that integrates advanced AI methodologies through a multi-model fusion strategy based on ensemble learning and model Stacking. The proposed model leverages the diverse data interpretation capabilities and distinct training mechanisms of various algorithms, thereby capitalizing on the complementary strengths of each constituent learner. Specifically, a Stacking-based ensemble model is constructed, incorporating Support Vector Machines (SVM), Random Forests (RF), and k-Nearest Neighbors (KNN) as base learners. An attention mechanism is then employed to adaptively weight the outputs of these base learners, thereby harnessing their complementary strengths. The meta-learner, primarily built upon the XGBoost algorithm, integrates these weighted outputs to generate the final prediction. The model's performance is rigorously evaluated using real-world coal and gas outburst data collected from a mine in Pingdingshan, China, with evaluation metrics including the F1-score and other standard classification indicators. The results reveal that individual models, such as XGBoost, SVM, and RF, can effectively quantify the contribution of input feature importance using their inherent mechanisms. Furthermore, the ensemble model significantly outperforms single-model approaches, particularly when the base learners are both strong and mutually uncorrelated. The proposed ensemble framework achieves a markedly higher F1-score, demonstrating its robustness and effectiveness in the complex task of coal and gas outburst prediction.
{"title":"Study on coal and gas outburst prediction technology based on multi-model fusion.","authors":"Qian Xie, Junsheng Yan, Zhenhua Dai, Wengang Du, Xuefei Wu","doi":"10.3389/fdata.2025.1623883","DOIUrl":"10.3389/fdata.2025.1623883","url":null,"abstract":"<p><p>The rapid advancement of artificial intelligence (AI) and machine learning (ML) technologies has opened up novel avenues for predicting coal and gas outbursts in coal mines. This study proposes a novel prediction framework that integrates advanced AI methodologies through a multi-model fusion strategy based on ensemble learning and model Stacking. The proposed model leverages the diverse data interpretation capabilities and distinct training mechanisms of various algorithms, thereby capitalizing on the complementary strengths of each constituent learner. Specifically, a Stacking-based ensemble model is constructed, incorporating Support Vector Machines (SVM), Random Forests (RF), and k-Nearest Neighbors (KNN) as base learners. An attention mechanism is then employed to adaptively weight the outputs of these base learners, thereby harnessing their complementary strengths. The meta-learner, primarily built upon the XGBoost algorithm, integrates these weighted outputs to generate the final prediction. The model's performance is rigorously evaluated using real-world coal and gas outburst data collected from a mine in Pingdingshan, China, with evaluation metrics including the F1-score and other standard classification indicators. The results reveal that individual models, such as XGBoost, SVM, and RF, can effectively quantify the contribution of input feature importance using their inherent mechanisms. Furthermore, the ensemble model significantly outperforms single-model approaches, particularly when the base learners are both strong and mutually uncorrelated. The proposed ensemble framework achieves a markedly higher F1-score, demonstrating its robustness and effectiveness in the complex task of coal and gas outburst prediction.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1623883"},"PeriodicalIF":2.4,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12580147/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145446498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-20eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1600540
Jianxin Li, Ruchun Jia, Ning Xiang, Yizhun Tian
Introduction: Traditional operation and maintenance decision algorithms often ignore the analysis of data source security, making them highly susceptible to noise, time-consuming in execution, and lacking in rationality.
Methods: In this study, we design an automated operation and maintenance decision algorithm based on data source security analysis. A multi-angle learning algorithm is adopted to establish a noise data model, introduce relaxation variables, and compare sharing factors with noise data characteristics to determine whether the data source is secure. Taking the ideal power shortage and minimum maintenance cost as the objective function, we construct a classical particle swarm optimization model and derive the expressions for particle search velocity and position. To address the problem of local optima, a niche mechanism is incorporated: the obtained automated data is treated as the population, a reasonable number of iterations is determined, individual fitness is stored, and the optimal state is obtained through a continuous iterative update strategy.
Results: Experimental results show that the proposed strategy can shorten operation and maintenance time, enhance the rationality of decision-making, improve algorithm convergence, and avoid falling into local optima.
Discussion: In addition, fault-tolerant analysis is performed on data source security, effectively eliminating bad data, preventing interference from malicious data, and further improving convergence performance.
{"title":"Research on fault-tolerant decision algorithm for data security automation.","authors":"Jianxin Li, Ruchun Jia, Ning Xiang, Yizhun Tian","doi":"10.3389/fdata.2025.1600540","DOIUrl":"https://doi.org/10.3389/fdata.2025.1600540","url":null,"abstract":"<p><strong>Introduction: </strong>Traditional operation and maintenance decision algorithms often ignore the analysis of data source security, making them highly susceptible to noise, time-consuming in execution, and lacking in rationality.</p><p><strong>Methods: </strong>In this study, we design an automated operation and maintenance decision algorithm based on data source security analysis. A multi-angle learning algorithm is adopted to establish a noise data model, introduce relaxation variables, and compare sharing factors with noise data characteristics to determine whether the data source is secure. Taking the ideal power shortage and minimum maintenance cost as the objective function, we construct a classical particle swarm optimization model and derive the expressions for particle search velocity and position. To address the problem of local optima, a niche mechanism is incorporated: the obtained automated data is treated as the population, a reasonable number of iterations is determined, individual fitness is stored, and the optimal state is obtained through a continuous iterative update strategy.</p><p><strong>Results: </strong>Experimental results show that the proposed strategy can shorten operation and maintenance time, enhance the rationality of decision-making, improve algorithm convergence, and avoid falling into local optima.</p><p><strong>Discussion: </strong>In addition, fault-tolerant analysis is performed on data source security, effectively eliminating bad data, preventing interference from malicious data, and further improving convergence performance.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1600540"},"PeriodicalIF":2.4,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12580102/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145446537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-17eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1615788
Hikmat Ullah Khan, Anam Naz, Fawaz Khaled Alarfaj, Naif Almusallam
The mental health of students plays an important role in their overall wellbeing and academic performance. Growing pressure from academics, co-curricular activities such as sports and personal challenges highlight the need for modern methods of monitoring mental health. Traditional approaches, such as self-reported surveys and psychological evaluations, can be time-consuming and subject to bias. With advancement in artificial intelligence (AI), particularly in natural language processing (NLP), sentiment analysis has emerged as an effective technique for identifying mental health patterns in textual data. However, analyzing students' mental health remains a challenging task due to the intensity of emotional expressions, linguistic variations, and context-dependent sentiments. In this study, our primary objective was to investigate the mental health of students by conducting sentiment analysis using advanced deep learning models. To accomplish this task, state-of-the-art Large Language Model (LLM) approaches, such as RoBERTa (a robustly optimized BERT approach), RoBERTa-Large, and ELECTRA, were used for empirical analysis. RoBERTa-Large, an expanded architecture derived from Google's BERT, captures complex patterns and performs more effectively on various NLP tasks. Among the applied algorithms, RoBERTa-Large achieved the highest accuracy of 97%, while ELECTRA yielded 91% accuracy on a multi-classification task with seven diverse mental health status labels. These results demonstrate the potential of LLM-based approaches for predicting students' mental health, particularly in relation to the effects of academic and physical activities.
{"title":"Analyzing student mental health with RoBERTa-Large: a sentiment analysis and data analytics approach.","authors":"Hikmat Ullah Khan, Anam Naz, Fawaz Khaled Alarfaj, Naif Almusallam","doi":"10.3389/fdata.2025.1615788","DOIUrl":"10.3389/fdata.2025.1615788","url":null,"abstract":"<p><p>The mental health of students plays an important role in their overall wellbeing and academic performance. Growing pressure from academics, co-curricular activities such as sports and personal challenges highlight the need for modern methods of monitoring mental health. Traditional approaches, such as self-reported surveys and psychological evaluations, can be time-consuming and subject to bias. With advancement in artificial intelligence (AI), particularly in natural language processing (NLP), sentiment analysis has emerged as an effective technique for identifying mental health patterns in textual data. However, analyzing students' mental health remains a challenging task due to the intensity of emotional expressions, linguistic variations, and context-dependent sentiments. In this study, our primary objective was to investigate the mental health of students by conducting sentiment analysis using advanced deep learning models. To accomplish this task, state-of-the-art Large Language Model (LLM) approaches, such as RoBERTa (a robustly optimized BERT approach), RoBERTa-Large, and ELECTRA, were used for empirical analysis. RoBERTa-Large, an expanded architecture derived from Google's BERT, captures complex patterns and performs more effectively on various NLP tasks. Among the applied algorithms, RoBERTa-Large achieved the highest accuracy of 97%, while ELECTRA yielded 91% accuracy on a multi-classification task with seven diverse mental health status labels. These results demonstrate the potential of LLM-based approaches for predicting students' mental health, particularly in relation to the effects of academic and physical activities.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1615788"},"PeriodicalIF":2.4,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12575187/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145433127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rapid development of the global digital economy, cross-border e-commerce has rapidly emerged and developed at a high speed, and has become a crucial bridge connecting global markets. This research focuses on the cross-border e-commerce sector of outdoor sports products, in response to the common problems in the cross-border e-commerce field, such as "information overload" and "insufficient recommendation accuracy," a personalized recommendation optimization framework integrating customer value segmentation and collaborative filtering is proposed. Based on the classic RFM model, the purchase quantity indicator (Quantity) is introduced to construct the RFMQ model, thereby more comprehensively characterizing user behavior characteristics. Further, the customer value stratification is achieved by using the indicator segmentation method and the K-means clustering algorithm, and a differentiated collaborative filtering recommendation mechanism is designed based on the segmented groups. Through a five-fold cross-validation experiment, it is shown that the proposed method significantly outperforms the traditional collaborative filtering model in the TOPN recommendation task. Specifically, when the number of recommended products is between 3 and 7, the RFMQ recommendation model based on indicator segmentation performs best in terms of F1 score (for example, when TOPN = 5, the F1 value increases from 0.1709 to 0.3093), and the method based on K-means clustering also shows a stable improvement (with the F1 value reaching 0.267 at the same time). The results indicate that the indicator segmentation method has a significant advantage in smaller recommendation quantity scenarios. This study verifies the effectiveness of the RFMQ model in customer segmentation and recommendation performance optimization, providing an operational solution for e-commerce platforms to implement precise marketing, enhance user stickiness and commercial competitiveness, and is particularly suitable for low-cost and high-efficiency personalized recommendation scenarios of small and medium-sized enterprises.
{"title":"Research on optimization of personalized recommendation method based on RFMQ model- taking outdoor sports products in cross-border e-commerce as an example.","authors":"Qianlan Chen, Chupeng Chen, Zubai Jiang, Chaoling Li, Yangxizi Tan, Niannian Li, Bolin Zhou, Bingxian Yang","doi":"10.3389/fdata.2025.1680669","DOIUrl":"10.3389/fdata.2025.1680669","url":null,"abstract":"<p><p>With the rapid development of the global digital economy, cross-border e-commerce has rapidly emerged and developed at a high speed, and has become a crucial bridge connecting global markets. This research focuses on the cross-border e-commerce sector of outdoor sports products, in response to the common problems in the cross-border e-commerce field, such as \"information overload\" and \"insufficient recommendation accuracy,\" a personalized recommendation optimization framework integrating customer value segmentation and collaborative filtering is proposed. Based on the classic RFM model, the purchase quantity indicator (Quantity) is introduced to construct the RFMQ model, thereby more comprehensively characterizing user behavior characteristics. Further, the customer value stratification is achieved by using the indicator segmentation method and the K-means clustering algorithm, and a differentiated collaborative filtering recommendation mechanism is designed based on the segmented groups. Through a five-fold cross-validation experiment, it is shown that the proposed method significantly outperforms the traditional collaborative filtering model in the TOPN recommendation task. Specifically, when the number of recommended products is between 3 and 7, the RFMQ recommendation model based on indicator segmentation performs best in terms of F1 score (for example, when TOPN = 5, the F1 value increases from 0.1709 to 0.3093), and the method based on K-means clustering also shows a stable improvement (with the F1 value reaching 0.267 at the same time). The results indicate that the indicator segmentation method has a significant advantage in smaller recommendation quantity scenarios. This study verifies the effectiveness of the RFMQ model in customer segmentation and recommendation performance optimization, providing an operational solution for e-commerce platforms to implement precise marketing, enhance user stickiness and commercial competitiveness, and is particularly suitable for low-cost and high-efficiency personalized recommendation scenarios of small and medium-sized enterprises.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1680669"},"PeriodicalIF":2.4,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12558725/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145402935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-13eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1532397
Gabriella Waters, Phillip Honenberger
The understanding of bias in AI is currently undergoing a revolution. Often assumed to be errors or flaws, biases are increasingly recognized as integral to AI systems and sometimes preferable to less biased alternatives. In this paper we review the reasons for this changed understanding and provide new guidance on three questions: First, how should we think about and measure biases in AI systems, consistent with the new understanding? Second, what kinds of bias in an AI system should we accept or even amplify, and why? And, third, what kinds should we attempt to minimize or eliminate, and why? In answer to the first question, we argue that biases are "violations of a symmetry standard" (following Kelly). Per this definition, many biases in AI systems are benign. This raises the question of how to identify biases that are problematic or undesirable when they occur. To address this question, we distinguish three main ways that asymmetries in AI systems can be problematic or undesirable-erroneous representation, unfair treatment, and violation of process ideals-and highlight places in the pipeline of AI development and application where bias of these types can occur.
{"title":"AI biases as asymmetries: a review to guide practice.","authors":"Gabriella Waters, Phillip Honenberger","doi":"10.3389/fdata.2025.1532397","DOIUrl":"10.3389/fdata.2025.1532397","url":null,"abstract":"<p><p>The understanding of bias in AI is currently undergoing a revolution. Often assumed to be errors or flaws, biases are increasingly recognized as integral to AI systems and sometimes preferable to less biased alternatives. In this paper we review the reasons for this changed understanding and provide new guidance on three questions: First, how should we think about and measure biases in AI systems, consistent with the new understanding? Second, what kinds of bias in an AI system should we accept or even amplify, and why? And, third, what kinds should we attempt to minimize or eliminate, and why? In answer to the first question, we argue that biases are \"violations of a symmetry standard\" (following Kelly). Per this definition, many biases in AI systems are benign. This raises the question of how to identify biases that <i>are</i> problematic or undesirable when they occur. To address this question, we distinguish three main ways that asymmetries in AI systems can be problematic or undesirable-erroneous representation, unfair treatment, and violation of process ideals-and highlight places in the pipeline of AI development and application where bias of these types can occur.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1532397"},"PeriodicalIF":2.4,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12554557/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145394968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-06eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1605258
Jie Su, Yuechao Tang, Yanan Wang, Chao Chen, Biao Song
Objective: Lower limb deep vein thrombosis (DVT) is a serious health problem, causing local discomfort and hindering walking. It can lead to severe complications, including pulmonary embolism, chronic post-thrombotic syndrome, and limb amputation, posing risks of death or severe disability. This study aims to develop a diagnostic model for DVT using routine blood analysis and evaluate its effectiveness in early diagnosis.
Methods: This study retrospectively analyzed patient medical records from January 2022 to June 2023, including 658 DVT patients (case group) and 1,418 healthy subjects (control group). SHAP (SHapley Additive exPlanations) analysis was employed for feature selection to identify key blood indices significantly impacting DVT risk prediction. Based on the selected features, six machine learning models were constructed: k-Nearest Neighbors (kNN), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN). Model performance was assessed using the area under the curve (AUC).
Results: SHAP analysis identified ten key blood routine indices. The six models constructed using these indices demonstrated strong predictive performance, with AUC values exceeding 0.8, accuracy above 70%, and sensitivity and specificity over 70%. Notably, the RF model exhibited superior performance in assessing the risk of DVT.
Conclusions: Our study successfully developed machine learning models for predicting DVT risk using routine blood tests. These models achieved high predictive performance, suggesting their potential for early DVT diagnosis without additional medical burden on patients. Future research will focus on further validation and refinement of these models to enhance their clinical applicability.
{"title":"Predicting deep vein thrombosis using machine learning and blood routine analysis.","authors":"Jie Su, Yuechao Tang, Yanan Wang, Chao Chen, Biao Song","doi":"10.3389/fdata.2025.1605258","DOIUrl":"10.3389/fdata.2025.1605258","url":null,"abstract":"<p><strong>Objective: </strong>Lower limb deep vein thrombosis (DVT) is a serious health problem, causing local discomfort and hindering walking. It can lead to severe complications, including pulmonary embolism, chronic post-thrombotic syndrome, and limb amputation, posing risks of death or severe disability. This study aims to develop a diagnostic model for DVT using routine blood analysis and evaluate its effectiveness in early diagnosis.</p><p><strong>Methods: </strong>This study retrospectively analyzed patient medical records from January 2022 to June 2023, including 658 DVT patients (case group) and 1,418 healthy subjects (control group). SHAP (SHapley Additive exPlanations) analysis was employed for feature selection to identify key blood indices significantly impacting DVT risk prediction. Based on the selected features, six machine learning models were constructed: k-Nearest Neighbors (kNN), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN). Model performance was assessed using the area under the curve (AUC).</p><p><strong>Results: </strong>SHAP analysis identified ten key blood routine indices. The six models constructed using these indices demonstrated strong predictive performance, with AUC values exceeding 0.8, accuracy above 70%, and sensitivity and specificity over 70%. Notably, the RF model exhibited superior performance in assessing the risk of DVT.</p><p><strong>Conclusions: </strong>Our study successfully developed machine learning models for predicting DVT risk using routine blood tests. These models achieved high predictive performance, suggesting their potential for early DVT diagnosis without additional medical burden on patients. Future research will focus on further validation and refinement of these models to enhance their clinical applicability.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1605258"},"PeriodicalIF":2.4,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12535902/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145349693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}