首页 > 最新文献

Frontiers in Big Data最新文献

英文 中文
Achieving health equity in immune disease: leveraging big data and artificial intelligence in an evolving health system landscape. 在免疫疾病方面实现卫生公平:在不断变化的卫生系统环境中利用大数据和人工智能。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-14 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1621526
Stan Kachnowski, Asif H Khan, Shadé Floquet, Kendal K Whitlock, Juan Pablo Wisnivesky, Daniel B Neill, Irene Dankwa-Mullan, Gezzer Ortega, Moataz Daoud, Raza Zaheer, Maia Hightower, Paul Rowe

Prevalence of immune diseases is rising, imposing burdens on patients, healthcare providers, and society. Addressing the future impact of immune diseases requires "big data" on global distribution/prevalence, patient demographics, risk factors, biomarkers, and prognosis to inform prevention, diagnosis, and treatment strategies. Big data offer promise by integrating diverse real-world data sources with artificial intelligence (AI) and big data analytics (BDA), yet cautious implementation is vital due to the potential to perpetuate and exacerbate biases. In this review, we outline some of the key challenges associated with achieving health equity through the use of big data, AI, and BDA in immune diseases and present potential solutions. For example, political/institutional will and stakeholder engagement are essential, requiring evidence of return on investment, a clear definition of success (including key metrics), and improved communication of unmet needs, disparities in treatments and outcomes, and the benefits of AI and BDA in achieving health equity. Broad representation and engagement are required to foster trust and inclusivity, involving patients and community organizations in study design, data collection, and decision-making processes. Enhancing technical capabilities and accountability with AI and BDA are also crucial to address data quality and diversity issues, ensuring datasets are of sufficient quality and representative of minoritized populations. Lastly, mitigating biases in AI and BDA is imperative, necessitating robust and iterative fairness assessments, continuous evaluation, and strong governance. Collaborative efforts to overcome these challenges are needed to leverage AI and BDA effectively, including an infrastructure for sharing harmonized big data, to advance health equity in immune diseases through transparent, fair, and impactful data-driven solutions.

免疫疾病的患病率正在上升,给患者、卫生保健提供者和社会带来了负担。解决免疫疾病的未来影响需要关于全球分布/流行、患者人口统计、风险因素、生物标志物和预后的“大数据”,以便为预防、诊断和治疗策略提供信息。大数据通过将各种现实世界的数据源与人工智能(AI)和大数据分析(BDA)相结合,带来了希望,但谨慎实施至关重要,因为有可能延续和加剧偏见。在这篇综述中,我们概述了通过在免疫疾病中使用大数据、人工智能和BDA来实现卫生公平所面临的一些关键挑战,并提出了可能的解决方案。例如,政治/机构意愿和利益攸关方参与至关重要,这需要投资回报的证据、对成功的明确定义(包括关键指标),以及就未满足的需求、治疗和结果的差异以及人工智能和BDA在实现卫生公平方面的益处进行更好的沟通。需要广泛的代表和参与,以促进信任和包容性,让患者和社区组织参与研究设计、数据收集和决策过程。加强人工智能和BDA的技术能力和问责制对于解决数据质量和多样性问题也至关重要,确保数据集具有足够的质量并能代表少数群体。最后,减少人工智能和BDA中的偏见是必要的,需要稳健和迭代的公平评估、持续评估和强有力的治理。需要共同努力克服这些挑战,有效利用人工智能和生物数据分析,包括共享统一大数据的基础设施,通过透明、公平和有影响力的数据驱动解决方案,促进免疫疾病方面的卫生公平。
{"title":"Achieving health equity in immune disease: leveraging big data and artificial intelligence in an evolving health system landscape.","authors":"Stan Kachnowski, Asif H Khan, Shadé Floquet, Kendal K Whitlock, Juan Pablo Wisnivesky, Daniel B Neill, Irene Dankwa-Mullan, Gezzer Ortega, Moataz Daoud, Raza Zaheer, Maia Hightower, Paul Rowe","doi":"10.3389/fdata.2025.1621526","DOIUrl":"10.3389/fdata.2025.1621526","url":null,"abstract":"<p><p>Prevalence of immune diseases is rising, imposing burdens on patients, healthcare providers, and society. Addressing the future impact of immune diseases requires \"big data\" on global distribution/prevalence, patient demographics, risk factors, biomarkers, and prognosis to inform prevention, diagnosis, and treatment strategies. Big data offer promise by integrating diverse real-world data sources with artificial intelligence (AI) and big data analytics (BDA), yet cautious implementation is vital due to the potential to perpetuate and exacerbate biases. In this review, we outline some of the key challenges associated with achieving health equity through the use of big data, AI, and BDA in immune diseases and present potential solutions. For example, political/institutional will and stakeholder engagement are essential, requiring evidence of return on investment, a clear definition of success (including key metrics), and improved communication of unmet needs, disparities in treatments and outcomes, and the benefits of AI and BDA in achieving health equity. Broad representation and engagement are required to foster trust and inclusivity, involving patients and community organizations in study design, data collection, and decision-making processes. Enhancing technical capabilities and accountability with AI and BDA are also crucial to address data quality and diversity issues, ensuring datasets are of sufficient quality and representative of minoritized populations. Lastly, mitigating biases in AI and BDA is imperative, necessitating robust and iterative fairness assessments, continuous evaluation, and strong governance. Collaborative efforts to overcome these challenges are needed to leverage AI and BDA effectively, including an infrastructure for sharing harmonized big data, to advance health equity in immune diseases through transparent, fair, and impactful data-driven solutions.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1621526"},"PeriodicalIF":2.4,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12660090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLM-supported collaborative ontology design for data and knowledge management platforms. llm支持的数据和知识管理平台协同本体设计。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-12 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1676477
Janis Kampars, Guntis Mosans, Tushar Jogi, Franz Roters, Napat Vajragupta

The management of vast, heterogeneous, and multidisciplinary data presents a critical challenge across scientific domains, hindering interoperability and slowing scientific progress. This paper addresses this challenge by presenting a pragmatic extension to the NeOn iterative ontology engineering framework, a well-established methodology for collaborative ontology design, which integrates Large Language Models (LLMs) to accelerate key tasks while retaining domain expert-in-the-loop validation. The methodology was applied within the HyWay project, an EU-funded research initiative on hydrogen-materials interactions, to develop the Hydrogen-Material Interaction Ontology (HMIO), a domain-specific ontology covering 29 experimental methods and 14 simulation types for assessing interactions between hydrogen and advanced metallic materials. A key result is the successful integration of the HMIO into a Data and Knowledge Management Platform (DKMP), where it drives the automated generation of data entry forms, ensuring that all captured data is Findable, Accessible, Interoperable, and Reusable (FAIR) and HMIO compliant by design. The validation of this approach demonstrates that this hybrid human-machine workflow for ontology engineering and further integration with the DKMP is an effective and efficient strategy for creating and operationalising complex scientific ontologies, thereby providing a scalable solution to advance data-driven research in materials science and other complex scientific domains.

对庞大、异构和多学科数据的管理是跨科学领域的重大挑战,阻碍了互操作性并减缓了科学进步。本文通过提出NeOn迭代本体工程框架的实用扩展来解决这一挑战,NeOn迭代本体工程框架是一种完善的协作本体设计方法,它集成了大型语言模型(llm)来加速关键任务,同时保留了领域专家在环验证。该方法应用于HyWay项目,该项目是欧盟资助的氢-材料相互作用研究计划,用于开发氢-材料相互作用本体(HMIO),这是一个特定领域的本体,涵盖29种实验方法和14种模拟类型,用于评估氢与先进金属材料之间的相互作用。一个关键的结果是HMIO成功地集成到数据和知识管理平台(DKMP)中,它驱动数据输入表单的自动生成,确保所有捕获的数据都是可查找的、可访问的、可互操作的和可重用的(FAIR),并且在设计上符合HMIO。该方法的验证表明,这种用于本体工程的混合人机工作流以及与DKMP的进一步集成是创建和操作复杂科学本体的有效和高效策略,从而为推进材料科学和其他复杂科学领域的数据驱动研究提供了可扩展的解决方案。
{"title":"LLM-supported collaborative ontology design for data and knowledge management platforms.","authors":"Janis Kampars, Guntis Mosans, Tushar Jogi, Franz Roters, Napat Vajragupta","doi":"10.3389/fdata.2025.1676477","DOIUrl":"https://doi.org/10.3389/fdata.2025.1676477","url":null,"abstract":"<p><p>The management of vast, heterogeneous, and multidisciplinary data presents a critical challenge across scientific domains, hindering interoperability and slowing scientific progress. This paper addresses this challenge by presenting a pragmatic extension to the NeOn iterative ontology engineering framework, a well-established methodology for collaborative ontology design, which integrates Large Language Models (LLMs) to accelerate key tasks while retaining domain expert-in-the-loop validation. The methodology was applied within the HyWay project, an EU-funded research initiative on hydrogen-materials interactions, to develop the Hydrogen-Material Interaction Ontology (HMIO), a domain-specific ontology covering 29 experimental methods and 14 simulation types for assessing interactions between hydrogen and advanced metallic materials. A key result is the successful integration of the HMIO into a Data and Knowledge Management Platform (DKMP), where it drives the automated generation of data entry forms, ensuring that all captured data is Findable, Accessible, Interoperable, and Reusable (FAIR) and HMIO compliant by design. The validation of this approach demonstrates that this hybrid human-machine workflow for ontology engineering and further integration with the DKMP is an effective and efficient strategy for creating and operationalising complex scientific ontologies, thereby providing a scalable solution to advance data-driven research in materials science and other complex scientific domains.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1676477"},"PeriodicalIF":2.4,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646930/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145641833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application and comparison of ARIMA, LSTM, and ARIMA-LSTM models for predicting foodborne diseases in Liaoning Province. ARIMA、LSTM及ARIMA-LSTM模型在辽宁省食源性疾病预测中的应用及比较
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-12 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1666962
Xiaoxiao Du, Haomiao Yu, Hao Zhang, Xiangyun Liu, Xinling Yu, Tao Xie, Wenli Diao

Objective: To compare the application of the ARIMA model, the Long Short-Term Memory (LSTM) model and the ARIMA-LSTM model in forecasting foodborne disease incidence.

Methods: Monthly case data of foodborne diseases in Liaoning Province from January 2015 to December 2023 were used to construct ARIMA, LSTM, and ARIMA-LSTM models. These three models were then applied to forecast the monthly incidence of foodborne diseases in 2024, and their predictions were compared with those of a baseline model. Model performance was evaluated by comparing the predicted and observed values using root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE), allowing identification of the optimal model. The best-performing model was subsequently employed to predict the monthly incidence for 2025.

Results: The ARIMA-LSTM model was identified as the optimal model. Specifically, the ARIMA (2,0,0) (0,1,1)12 model produced RMSE = 300.03, MAE = 187.11, and MAPE = 16.38%, while the LSTM model yielded RMSE = 408.71, MAE = 226.03, and MAPE = 17.21%. In contrast, the ARIMA-LSTM model achieved RMSE = 0.44, MAE = 0.44, and MAPE = 0.08%, representing a dramatic improvement over the baseline model (RMSE = 204.17, MAE = 146.75, MAPE = 15.62%), with reductions of 99.5%, 99.7%, and 99.4% in RMSE, MAE, and MAPE, respectively. Based on the ARIMA-LSTM model, the predicted monthly cases of foodborne diseases for 2025 are: 214.62 (Jan), 260.84 (Feb), 462.92 (Mar), 590.92 (Apr), 800.88 (May), 965.11 (Jun), 2410.36 (Jul), 2651.36 (Aug), 1711.15 (Sep), 941.22 (Oct), 628.21 (Nov), and 465.05 (Dec).

Conclusion: The ARIMA-LSTM model is considered the optimal model for predicting foodborne disease incidence in Liaoning Province in 2025.

目的:比较ARIMA模型、长短期记忆(LSTM)模型和ARIMA-LSTM模型在食源性疾病发病率预测中的应用。方法:利用辽宁省2015年1月- 2023年12月食源性疾病病例资料,构建ARIMA、LSTM和ARIMA-LSTM模型。然后应用这三个模型预测2024年食源性疾病的月发病率,并将其预测结果与基线模型的预测结果进行比较。通过使用均方根误差(RMSE)、平均绝对误差(MAE)和平均绝对百分比误差(MAPE)比较预测值和观察值来评估模型的性能,从而确定最优模型。随后采用表现最好的模型预测2025年的月发病率。结果:ARIMA-LSTM模型为最优模型。其中,ARIMA(2,0,0)(0,1,1)12模型的RMSE = 300.03, MAE = 187.11, MAPE = 16.38%; LSTM模型的RMSE = 408.71, MAE = 226.03, MAPE = 17.21%。相比之下,ARIMA-LSTM模型的RMSE = 0.44, MAE = 0.44, MAPE = 0.08%,比基线模型(RMSE = 204.17, MAE = 146.75, MAPE = 15.62%)有了显著改善,RMSE、MAE和MAPE分别降低了99.5%、99.7%和99.4%。基于ARIMA-LSTM模型,预测2025年食源性疾病月发病例数分别为:214.62(1月)、260.84(2月)、462.92(3月)、590.92(4月)、800.88(5月)、965.11(6月)、2410.36(7月)、2651.36(8月)、1711.15(9月)、941.22(10月)、628.21(11月)和465.05(12月)。结论:ARIMA-LSTM模型是预测2025年辽宁省食源性疾病发病率的最佳模型。
{"title":"Application and comparison of ARIMA, LSTM, and ARIMA-LSTM models for predicting foodborne diseases in Liaoning Province.","authors":"Xiaoxiao Du, Haomiao Yu, Hao Zhang, Xiangyun Liu, Xinling Yu, Tao Xie, Wenli Diao","doi":"10.3389/fdata.2025.1666962","DOIUrl":"https://doi.org/10.3389/fdata.2025.1666962","url":null,"abstract":"<p><strong>Objective: </strong>To compare the application of the ARIMA model, the Long Short-Term Memory (LSTM) model and the ARIMA-LSTM model in forecasting foodborne disease incidence.</p><p><strong>Methods: </strong>Monthly case data of foodborne diseases in Liaoning Province from January 2015 to December 2023 were used to construct ARIMA, LSTM, and ARIMA-LSTM models. These three models were then applied to forecast the monthly incidence of foodborne diseases in 2024, and their predictions were compared with those of a baseline model. Model performance was evaluated by comparing the predicted and observed values using root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE), allowing identification of the optimal model. The best-performing model was subsequently employed to predict the monthly incidence for 2025.</p><p><strong>Results: </strong>The ARIMA-LSTM model was identified as the optimal model. Specifically, the ARIMA (2,0,0) (0,1,1)1<sub>2</sub> model produced RMSE = 300.03, MAE = 187.11, and MAPE = 16.38%, while the LSTM model yielded RMSE = 408.71, MAE = 226.03, and MAPE = 17.21%. In contrast, the ARIMA-LSTM model achieved RMSE = 0.44, MAE = 0.44, and MAPE = 0.08%, representing a dramatic improvement over the baseline model (RMSE = 204.17, MAE = 146.75, MAPE = 15.62%), with reductions of 99.5%, 99.7%, and 99.4% in RMSE, MAE, and MAPE, respectively. Based on the ARIMA-LSTM model, the predicted monthly cases of foodborne diseases for 2025 are: 214.62 (Jan), 260.84 (Feb), 462.92 (Mar), 590.92 (Apr), 800.88 (May), 965.11 (Jun), 2410.36 (Jul), 2651.36 (Aug), 1711.15 (Sep), 941.22 (Oct), 628.21 (Nov), and 465.05 (Dec).</p><p><strong>Conclusion: </strong>The ARIMA-LSTM model is considered the optimal model for predicting foodborne disease incidence in Liaoning Province in 2025.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1666962"},"PeriodicalIF":2.4,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646886/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145641815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PHTFNet-RPM: a probabilistic hybrid network with RPM for tobacco root disease forecasting. PHTFNet-RPM:一种用于烟草根病预测的概率混合网络。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-10 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1705587
Yunhong Bu, Tingshan Yao, Shaowu Geng, Renjie Huang

Introduction: Tobacco growers usually face particular challenges in predicting the risks of tobacco root diseases due to complex pathogenesis, concealed early symptoms, and heterogeneous farm conditions.

Methods: To address this problem, we proposed a flexible Probabilistic Hybrid Temporal Fusion Network with Random Period Mask (PHTFNet-RPM). This model is designed to forecast future multi-day disease incidences and indices. It incorporates a hybrid input structure with RPM to handle configurable static management variables and time-series data of weather factors and disease metrics, using the RPM to simulate diverse absences of historical observations. The model's internal hierarchically aggregated modules learn cross-variable and cross-temporal feature representations to model the complex non-linear relationships. Furthermore, probabilistic theory-based uncertainty quantification is designed to enhance the model's credibility and reliability.

Results: The proposed PHTFNet-RPM was validated using a large-scale time-series dataset of tobacco root diseases, organized from 20-year meteorological and disease survey records in Chuxiong Prefecture, Yunnan Province. Extensive comparative experiments demonstrated that our model achieves a 4.44%-16.43% lower mean absolute error (MAE) than existing models (including LR, SVR, CNN-LSTM, and LSTM-Attention).

Discussion: The results confirm that the model can reliably forecast disease progression trends under different configurations, even when relying solely on historical weather observations. The integration of uncertainty quantification provides a robust tool for assessing prediction reliability, offering significant practical value for disease management.

导言:由于发病机制复杂、早期症状隐蔽和农场条件不同,烟草种植者在预测烟草根病风险方面通常面临着特殊的挑战。方法:为了解决这一问题,我们提出了一种灵活的随机周期掩码概率混合时间融合网络(PHTFNet-RPM)。该模型旨在预测未来多日的疾病发病率和指数。它结合了一个混合输入结构和RPM来处理可配置的静态管理变量和天气因素和疾病指标的时间序列数据,使用RPM来模拟各种缺乏历史观测的情况。该模型的内部分层聚合模块学习跨变量和跨时间的特征表示来建模复杂的非线性关系。在此基础上,设计了基于概率理论的不确定性量化方法,提高了模型的可信度和可靠性。结果:利用云南省楚雄州20年气象和病害调查记录整理的大规模烟草根系病害时间序列数据,验证了所提出的PHTFNet-RPM。大量的对比实验表明,我们的模型比现有模型(包括LR、SVR、CNN-LSTM和LSTM-Attention)的平均绝对误差(MAE)低4.44%-16.43%。讨论:结果证实,即使仅依靠历史天气观测,该模型也可以可靠地预测不同配置下的疾病进展趋势。不确定性量化的集成为评估预测可靠性提供了一个强大的工具,为疾病管理提供了重要的实用价值。
{"title":"PHTFNet-RPM: a probabilistic hybrid network with RPM for tobacco root disease forecasting.","authors":"Yunhong Bu, Tingshan Yao, Shaowu Geng, Renjie Huang","doi":"10.3389/fdata.2025.1705587","DOIUrl":"10.3389/fdata.2025.1705587","url":null,"abstract":"<p><strong>Introduction: </strong>Tobacco growers usually face particular challenges in predicting the risks of tobacco root diseases due to complex pathogenesis, concealed early symptoms, and heterogeneous farm conditions.</p><p><strong>Methods: </strong>To address this problem, we proposed a flexible Probabilistic Hybrid Temporal Fusion Network with Random Period Mask (PHTFNet-RPM). This model is designed to forecast future multi-day disease incidences and indices. It incorporates a hybrid input structure with RPM to handle configurable static management variables and time-series data of weather factors and disease metrics, using the RPM to simulate diverse absences of historical observations. The model's internal hierarchically aggregated modules learn cross-variable and cross-temporal feature representations to model the complex non-linear relationships. Furthermore, probabilistic theory-based uncertainty quantification is designed to enhance the model's credibility and reliability.</p><p><strong>Results: </strong>The proposed PHTFNet-RPM was validated using a large-scale time-series dataset of tobacco root diseases, organized from 20-year meteorological and disease survey records in Chuxiong Prefecture, Yunnan Province. Extensive comparative experiments demonstrated that our model achieves a 4.44%-16.43% lower mean absolute error (MAE) than existing models (including LR, SVR, CNN-LSTM, and LSTM-Attention).</p><p><strong>Discussion: </strong>The results confirm that the model can reliably forecast disease progression trends under different configurations, even when relying solely on historical weather observations. The integration of uncertainty quantification provides a robust tool for assessing prediction reliability, offering significant practical value for disease management.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1705587"},"PeriodicalIF":2.4,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12640811/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145607330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards the neuromorphic Cyber-Twin: an architecture for cognitive defense in digital twin ecosystems. 迈向神经形态的网络孪生:数字孪生生态系统中认知防御的架构。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-04 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1659757
Nida Nasir, Hussam Al Hamadi

Introduction: As cyber-physical systems become increasingly virtualized, digital twins have emerged as essential components for real-time monitoring, simulation, and control. However, their growing complexity and exposure to dynamic network environments make them vulnerable to sophisticated cyber threats. Traditional rule-based and machine-learning-based security models often fail to adapt in real time to evolving attack patterns, particularly in decentralized and resource-constrained settings.

Methods: This study introduces the Neuromorphic Cyber-Twin (NCT), a brain-inspired architectural framework that integrates spiking neural networks (SNNs) and event-driven cognition to enhance adaptive cyber defense. The NCT leverages neuromorphic principles such as sparse coding, temporal encoding, and spike-timing-dependent plasticity (STDP) to transform telemetry data from the digital-twin layer into spike-based sensory inputs. A layered cognitive architecture continuously monitors behavioral deviations, infers anomalies, and autonomously adapts its defensive responses in alignment with system dynamics.

Results: Lightweight prototype simulations demonstrate the feasibility of NCT-based event-driven anomaly detection and adaptive defense. The results highlight advantages in low-latency detection, contextual awareness, and energy efficiency compared with conventional machine-learning models.

Discussion: The NCT framework represents a biologically inspired paradigm for scalable, self-evolving cybersecurity in virtualized ecosystems. Potential applications include infrastructure monitoring, autonomous transportation, and industrial control systems. Comprehensive benchmarking and large-scale validation are identified as future research directions.

随着信息物理系统的日益虚拟化,数字孪生体已经成为实时监测、模拟和控制的重要组成部分。然而,它们日益增长的复杂性和对动态网络环境的暴露使它们容易受到复杂的网络威胁。传统的基于规则和基于机器学习的安全模型往往无法实时适应不断变化的攻击模式,特别是在分散和资源受限的环境中。方法:本研究引入了神经形态网络孪生体(NCT),这是一种大脑启发的架构框架,集成了脉冲神经网络(snn)和事件驱动认知,以增强自适应网络防御。NCT利用神经形态学原理,如稀疏编码、时间编码和峰值时间依赖的可塑性(STDP),将遥测数据从数字孪生层转换为基于峰值的感官输入。分层的认知架构持续监控行为偏差,推断异常,并根据系统动态自主调整其防御反应。结果:轻量级原型仿真验证了基于nct的事件驱动异常检测和自适应防御的可行性。与传统的机器学习模型相比,研究结果突出了低延迟检测、上下文感知和能源效率方面的优势。讨论:NCT框架代表了虚拟化生态系统中可扩展、自进化的网络安全的生物学启发范例。潜在的应用包括基础设施监控、自动运输和工业控制系统。全面对标和大规模验证是未来的研究方向。
{"title":"Towards the neuromorphic Cyber-Twin: an architecture for cognitive defense in digital twin ecosystems.","authors":"Nida Nasir, Hussam Al Hamadi","doi":"10.3389/fdata.2025.1659757","DOIUrl":"10.3389/fdata.2025.1659757","url":null,"abstract":"<p><strong>Introduction: </strong>As cyber-physical systems become increasingly virtualized, digital twins have emerged as essential components for real-time monitoring, simulation, and control. However, their growing complexity and exposure to dynamic network environments make them vulnerable to sophisticated cyber threats. Traditional rule-based and machine-learning-based security models often fail to adapt in real time to evolving attack patterns, particularly in decentralized and resource-constrained settings.</p><p><strong>Methods: </strong>This study introduces the Neuromorphic Cyber-Twin (NCT), a brain-inspired architectural framework that integrates spiking neural networks (SNNs) and event-driven cognition to enhance adaptive cyber defense. The NCT leverages neuromorphic principles such as sparse coding, temporal encoding, and spike-timing-dependent plasticity (STDP) to transform telemetry data from the digital-twin layer into spike-based sensory inputs. A layered cognitive architecture continuously monitors behavioral deviations, infers anomalies, and autonomously adapts its defensive responses in alignment with system dynamics.</p><p><strong>Results: </strong>Lightweight prototype simulations demonstrate the feasibility of NCT-based event-driven anomaly detection and adaptive defense. The results highlight advantages in low-latency detection, contextual awareness, and energy efficiency compared with conventional machine-learning models.</p><p><strong>Discussion: </strong>The NCT framework represents a biologically inspired paradigm for scalable, self-evolving cybersecurity in virtualized ecosystems. Potential applications include infrastructure monitoring, autonomous transportation, and industrial control systems. Comprehensive benchmarking and large-scale validation are identified as future research directions.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1659757"},"PeriodicalIF":2.4,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12623207/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Urban mobility and crime: causal inference using street closures as an instrumental variable. 城市交通和犯罪:使用街道封闭作为工具变量的因果推理。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-31 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1579332
Karl Vachuska

The advent of widely available cell phone mobility data in the United States has rapidly expanded the study of everyday mobility patterns in social science research. A wide range of existing literature finds ambient population (e.g., visitors) estimates of an area to be predictive of crime. Much of the past research frames neighborhood visitor flows in predictive terms without necessarily indicating or implying a causal effect. Through the use of two causal inference approaches-conventional two-way fixed effects and a novel instrumental variable approach, this brief research report explicitly formulates the causal effect of visitors in counterfactual terms. This study addresses this gap by explicitly estimating the causal effect of visitor flows on crime rates. Using high-resolution mobility and crime data from New York City for the year 2019, I estimate the additive effect of visitors on the multiple measurements of criminal activity. While two-way fixed effects models show a significant effect of visitors on a wide array of crime forms, instrumental variable estimates indicate no statistically significant causal impact, with large standard errors indicating substantial uncertainty in visitors' effect on crime rates.

在美国,广泛可用的手机移动数据的出现迅速扩展了社会科学研究中日常移动模式的研究。大量现有文献发现,周围人口(如游客)对一个地区的估计可以预测犯罪。过去的许多研究都是用预测的方式来构建社区游客流量,而不一定表明或暗示因果关系。通过使用两种因果推理方法——传统的双向固定效应和一种新的工具变量方法,本简短的研究报告明确地以反事实的方式阐述了游客的因果效应。本研究通过明确估计游客流量对犯罪率的因果影响来解决这一差距。利用2019年纽约市的高分辨率流动性和犯罪数据,我估计了游客对犯罪活动的多重测量的叠加效应。虽然双向固定效应模型显示游客对各种犯罪形式的显著影响,但工具变量估计表明,在统计上没有显著的因果影响,较大的标准误差表明游客对犯罪率的影响存在很大的不确定性。
{"title":"Urban mobility and crime: causal inference using street closures as an instrumental variable.","authors":"Karl Vachuska","doi":"10.3389/fdata.2025.1579332","DOIUrl":"10.3389/fdata.2025.1579332","url":null,"abstract":"<p><p>The advent of widely available cell phone mobility data in the United States has rapidly expanded the study of everyday mobility patterns in social science research. A wide range of existing literature finds ambient population (e.g., visitors) estimates of an area to be predictive of crime. Much of the past research frames neighborhood visitor flows in predictive terms without necessarily indicating or implying a causal effect. Through the use of two causal inference approaches-conventional two-way fixed effects and a novel instrumental variable approach, this brief research report explicitly formulates the causal effect of visitors in counterfactual terms. This study addresses this gap by explicitly estimating the causal effect of visitor flows on crime rates. Using high-resolution mobility and crime data from New York City for the year 2019, I estimate the additive effect of visitors on the multiple measurements of criminal activity. While two-way fixed effects models show a significant effect of visitors on a wide array of crime forms, instrumental variable estimates indicate no statistically significant causal impact, with large standard errors indicating substantial uncertainty in visitors' effect on crime rates.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1579332"},"PeriodicalIF":2.4,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12615182/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145543826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding the needle in the haystack-An interpretable sequential pattern mining method for classification problems. 大海捞针——一种用于分类问题的可解释顺序模式挖掘方法。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-24 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1604887
Alexander Grote, Anuja Hariharan, Christof Weinhardt

Introduction: The analysis of discrete sequential data, such as event logs and customer clickstreams, is often challenged by the vast number of possible sequential patterns. This complexity makes it difficult to identify meaningful sequences and derive actionable insights.

Methods: We propose a novel feature selection algorithm, that integrates unsupervised sequential pattern mining with supervised machine learning. Unlike existing interpretable machine learning methods, we determine important sequential patterns during the mining process, eliminating the need for post-hoc classification to assess their relevance. Compared to existing interesting measures, we introduce a local, class-specific interestingness measure that is inherently interpretable.

Results: We evaluated the algorithm on three diverse datasets - churn prediction, malware sequence analysis, and a synthetic dataset - covering different sizes, application domains, and feature complexities. Our method achieved classification performance comparable to established feature selection algorithms while maintaining interpretability and reducing computational costs.

Discussion: This study demonstrates a practical and efficient approach for uncovering important sequential patterns in classification tasks. By combining interpretability with competitive predictive performance, our algorithm provides practitioners with an interpretable and efficient alternative to existing methods, paving the way for new advances in sequential data analysis.

对离散顺序数据(如事件日志和客户点击流)的分析经常受到大量可能的顺序模式的挑战。这种复杂性使得识别有意义的序列和获得可操作的见解变得困难。方法:提出了一种新的特征选择算法,该算法将无监督顺序模式挖掘与有监督机器学习相结合。与现有的可解释机器学习方法不同,我们在挖掘过程中确定重要的顺序模式,从而消除了对事后分类来评估其相关性的需要。与现有的兴趣度量相比,我们引入了一个局部的、特定于类的、内在可解释的兴趣度量。结果:我们在三个不同的数据集(流失预测、恶意软件序列分析和合成数据集)上评估了该算法,这些数据集涵盖了不同的规模、应用领域和特征复杂性。我们的方法实现了与现有特征选择算法相当的分类性能,同时保持了可解释性并降低了计算成本。讨论:本研究展示了一种实用而有效的方法来发现分类任务中重要的顺序模式。通过将可解释性与竞争性预测性能相结合,我们的算法为从业者提供了一种可解释且有效的替代现有方法,为序列数据分析的新进展铺平了道路。
{"title":"Finding the needle in the haystack-An interpretable sequential pattern mining method for classification problems.","authors":"Alexander Grote, Anuja Hariharan, Christof Weinhardt","doi":"10.3389/fdata.2025.1604887","DOIUrl":"10.3389/fdata.2025.1604887","url":null,"abstract":"<p><strong>Introduction: </strong>The analysis of discrete sequential data, such as event logs and customer clickstreams, is often challenged by the vast number of possible sequential patterns. This complexity makes it difficult to identify meaningful sequences and derive actionable insights.</p><p><strong>Methods: </strong>We propose a novel feature selection algorithm, that integrates unsupervised sequential pattern mining with supervised machine learning. Unlike existing interpretable machine learning methods, we determine important sequential patterns during the mining process, eliminating the need for post-hoc classification to assess their relevance. Compared to existing interesting measures, we introduce a local, class-specific interestingness measure that is inherently interpretable.</p><p><strong>Results: </strong>We evaluated the algorithm on three diverse datasets - churn prediction, malware sequence analysis, and a synthetic dataset - covering different sizes, application domains, and feature complexities. Our method achieved classification performance comparable to established feature selection algorithms while maintaining interpretability and reducing computational costs.</p><p><strong>Discussion: </strong>This study demonstrates a practical and efficient approach for uncovering important sequential patterns in classification tasks. By combining interpretability with competitive predictive performance, our algorithm provides practitioners with an interpretable and efficient alternative to existing methods, paving the way for new advances in sequential data analysis.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1604887"},"PeriodicalIF":2.4,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12604564/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145508005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Study on coal and gas outburst prediction technology based on multi-model fusion. 基于多模型融合的煤与瓦斯突出预测技术研究。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-20 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1623883
Qian Xie, Junsheng Yan, Zhenhua Dai, Wengang Du, Xuefei Wu

The rapid advancement of artificial intelligence (AI) and machine learning (ML) technologies has opened up novel avenues for predicting coal and gas outbursts in coal mines. This study proposes a novel prediction framework that integrates advanced AI methodologies through a multi-model fusion strategy based on ensemble learning and model Stacking. The proposed model leverages the diverse data interpretation capabilities and distinct training mechanisms of various algorithms, thereby capitalizing on the complementary strengths of each constituent learner. Specifically, a Stacking-based ensemble model is constructed, incorporating Support Vector Machines (SVM), Random Forests (RF), and k-Nearest Neighbors (KNN) as base learners. An attention mechanism is then employed to adaptively weight the outputs of these base learners, thereby harnessing their complementary strengths. The meta-learner, primarily built upon the XGBoost algorithm, integrates these weighted outputs to generate the final prediction. The model's performance is rigorously evaluated using real-world coal and gas outburst data collected from a mine in Pingdingshan, China, with evaluation metrics including the F1-score and other standard classification indicators. The results reveal that individual models, such as XGBoost, SVM, and RF, can effectively quantify the contribution of input feature importance using their inherent mechanisms. Furthermore, the ensemble model significantly outperforms single-model approaches, particularly when the base learners are both strong and mutually uncorrelated. The proposed ensemble framework achieves a markedly higher F1-score, demonstrating its robustness and effectiveness in the complex task of coal and gas outburst prediction.

人工智能(AI)和机器学习(ML)技术的快速发展为预测煤矿煤和瓦斯突出开辟了新的途径。本研究提出了一个新的预测框架,该框架通过基于集成学习和模型堆叠的多模型融合策略集成了先进的人工智能方法。该模型利用了各种算法的不同数据解释能力和不同的训练机制,从而利用了每个组成学习器的互补优势。具体来说,构建了一个基于堆叠的集成模型,将支持向量机(SVM)、随机森林(RF)和k近邻(KNN)作为基础学习器。然后采用注意机制自适应地权衡这些基础学习器的输出,从而利用它们的互补优势。元学习器,主要建立在XGBoost算法上,整合这些加权输出来生成最终的预测。利用平顶山某煤矿实际煤与瓦斯突出数据对模型的性能进行了严格评价,评价指标包括f1分和其他标准分类指标。结果表明,单个模型(如XGBoost、SVM和RF)可以利用其固有机制有效地量化输入特征重要性的贡献。此外,集成模型显著优于单模型方法,特别是当基础学习器既强又相互不相关时。所提出的集成框架获得了较高的f1分数,证明了其在复杂的煤与瓦斯突出预测任务中的鲁棒性和有效性。
{"title":"Study on coal and gas outburst prediction technology based on multi-model fusion.","authors":"Qian Xie, Junsheng Yan, Zhenhua Dai, Wengang Du, Xuefei Wu","doi":"10.3389/fdata.2025.1623883","DOIUrl":"10.3389/fdata.2025.1623883","url":null,"abstract":"<p><p>The rapid advancement of artificial intelligence (AI) and machine learning (ML) technologies has opened up novel avenues for predicting coal and gas outbursts in coal mines. This study proposes a novel prediction framework that integrates advanced AI methodologies through a multi-model fusion strategy based on ensemble learning and model Stacking. The proposed model leverages the diverse data interpretation capabilities and distinct training mechanisms of various algorithms, thereby capitalizing on the complementary strengths of each constituent learner. Specifically, a Stacking-based ensemble model is constructed, incorporating Support Vector Machines (SVM), Random Forests (RF), and k-Nearest Neighbors (KNN) as base learners. An attention mechanism is then employed to adaptively weight the outputs of these base learners, thereby harnessing their complementary strengths. The meta-learner, primarily built upon the XGBoost algorithm, integrates these weighted outputs to generate the final prediction. The model's performance is rigorously evaluated using real-world coal and gas outburst data collected from a mine in Pingdingshan, China, with evaluation metrics including the F1-score and other standard classification indicators. The results reveal that individual models, such as XGBoost, SVM, and RF, can effectively quantify the contribution of input feature importance using their inherent mechanisms. Furthermore, the ensemble model significantly outperforms single-model approaches, particularly when the base learners are both strong and mutually uncorrelated. The proposed ensemble framework achieves a markedly higher F1-score, demonstrating its robustness and effectiveness in the complex task of coal and gas outburst prediction.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1623883"},"PeriodicalIF":2.4,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12580147/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145446498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on fault-tolerant decision algorithm for data security automation. 数据安全自动化中的容错决策算法研究。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-20 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1600540
Jianxin Li, Ruchun Jia, Ning Xiang, Yizhun Tian

Introduction: Traditional operation and maintenance decision algorithms often ignore the analysis of data source security, making them highly susceptible to noise, time-consuming in execution, and lacking in rationality.

Methods: In this study, we design an automated operation and maintenance decision algorithm based on data source security analysis. A multi-angle learning algorithm is adopted to establish a noise data model, introduce relaxation variables, and compare sharing factors with noise data characteristics to determine whether the data source is secure. Taking the ideal power shortage and minimum maintenance cost as the objective function, we construct a classical particle swarm optimization model and derive the expressions for particle search velocity and position. To address the problem of local optima, a niche mechanism is incorporated: the obtained automated data is treated as the population, a reasonable number of iterations is determined, individual fitness is stored, and the optimal state is obtained through a continuous iterative update strategy.

Results: Experimental results show that the proposed strategy can shorten operation and maintenance time, enhance the rationality of decision-making, improve algorithm convergence, and avoid falling into local optima.

Discussion: In addition, fault-tolerant analysis is performed on data source security, effectively eliminating bad data, preventing interference from malicious data, and further improving convergence performance.

简介:传统的运维决策算法往往忽略了对数据源安全性的分析,易受噪声影响,执行时间长,缺乏合理性。方法:设计了一种基于数据源安全分析的运维自动化决策算法。采用多角度学习算法建立噪声数据模型,引入松弛变量,将共享因子与噪声数据特征进行比较,判断数据源是否安全。以理想功率短缺和最小维护成本为目标函数,构造了经典粒子群优化模型,推导了粒子群搜索速度和位置的表达式。为了解决局部最优问题,引入了小生境机制:将获得的自动化数据作为总体,确定合理的迭代次数,存储个体适应度,通过连续迭代更新策略获得最优状态。结果:实验结果表明,所提策略能够缩短运维时间,增强决策的合理性,提高算法收敛性,避免陷入局部最优。讨论:另外,对数据源安全性进行容错分析,有效消除不良数据,防止恶意数据干扰,进一步提高收敛性能。
{"title":"Research on fault-tolerant decision algorithm for data security automation.","authors":"Jianxin Li, Ruchun Jia, Ning Xiang, Yizhun Tian","doi":"10.3389/fdata.2025.1600540","DOIUrl":"https://doi.org/10.3389/fdata.2025.1600540","url":null,"abstract":"<p><strong>Introduction: </strong>Traditional operation and maintenance decision algorithms often ignore the analysis of data source security, making them highly susceptible to noise, time-consuming in execution, and lacking in rationality.</p><p><strong>Methods: </strong>In this study, we design an automated operation and maintenance decision algorithm based on data source security analysis. A multi-angle learning algorithm is adopted to establish a noise data model, introduce relaxation variables, and compare sharing factors with noise data characteristics to determine whether the data source is secure. Taking the ideal power shortage and minimum maintenance cost as the objective function, we construct a classical particle swarm optimization model and derive the expressions for particle search velocity and position. To address the problem of local optima, a niche mechanism is incorporated: the obtained automated data is treated as the population, a reasonable number of iterations is determined, individual fitness is stored, and the optimal state is obtained through a continuous iterative update strategy.</p><p><strong>Results: </strong>Experimental results show that the proposed strategy can shorten operation and maintenance time, enhance the rationality of decision-making, improve algorithm convergence, and avoid falling into local optima.</p><p><strong>Discussion: </strong>In addition, fault-tolerant analysis is performed on data source security, effectively eliminating bad data, preventing interference from malicious data, and further improving convergence performance.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1600540"},"PeriodicalIF":2.4,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12580102/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145446537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing student mental health with RoBERTa-Large: a sentiment analysis and data analytics approach. 用RoBERTa-Large分析学生心理健康:情感分析和数据分析方法。
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-10-17 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1615788
Hikmat Ullah Khan, Anam Naz, Fawaz Khaled Alarfaj, Naif Almusallam

The mental health of students plays an important role in their overall wellbeing and academic performance. Growing pressure from academics, co-curricular activities such as sports and personal challenges highlight the need for modern methods of monitoring mental health. Traditional approaches, such as self-reported surveys and psychological evaluations, can be time-consuming and subject to bias. With advancement in artificial intelligence (AI), particularly in natural language processing (NLP), sentiment analysis has emerged as an effective technique for identifying mental health patterns in textual data. However, analyzing students' mental health remains a challenging task due to the intensity of emotional expressions, linguistic variations, and context-dependent sentiments. In this study, our primary objective was to investigate the mental health of students by conducting sentiment analysis using advanced deep learning models. To accomplish this task, state-of-the-art Large Language Model (LLM) approaches, such as RoBERTa (a robustly optimized BERT approach), RoBERTa-Large, and ELECTRA, were used for empirical analysis. RoBERTa-Large, an expanded architecture derived from Google's BERT, captures complex patterns and performs more effectively on various NLP tasks. Among the applied algorithms, RoBERTa-Large achieved the highest accuracy of 97%, while ELECTRA yielded 91% accuracy on a multi-classification task with seven diverse mental health status labels. These results demonstrate the potential of LLM-based approaches for predicting students' mental health, particularly in relation to the effects of academic and physical activities.

学生的心理健康对他们的整体健康和学习成绩起着重要的作用。来自学术、课外活动(如体育)和个人挑战的压力越来越大,这凸显了对监测心理健康的现代方法的需求。传统的方法,如自我报告的调查和心理评估,可能耗时且容易产生偏见。随着人工智能(AI),特别是自然语言处理(NLP)的发展,情绪分析已成为识别文本数据中心理健康模式的有效技术。然而,由于情绪表达的强度、语言变化和情境依赖情绪,分析学生的心理健康仍然是一项具有挑战性的任务。在这项研究中,我们的主要目标是通过使用先进的深度学习模型进行情绪分析来调查学生的心理健康状况。为了完成这项任务,使用了最先进的大型语言模型(LLM)方法,如RoBERTa(一种鲁棒优化的BERT方法)、RoBERTa-Large和ELECTRA进行实证分析。RoBERTa-Large是谷歌的BERT的扩展架构,可以捕获复杂的模式,并在各种NLP任务上更有效地执行。在应用的算法中,RoBERTa-Large的准确率最高,达到97%,而ELECTRA在包含七种不同心理健康状态标签的多分类任务上的准确率为91%。这些结果证明了基于法学硕士的方法在预测学生心理健康方面的潜力,特别是在学术和体育活动的影响方面。
{"title":"Analyzing student mental health with RoBERTa-Large: a sentiment analysis and data analytics approach.","authors":"Hikmat Ullah Khan, Anam Naz, Fawaz Khaled Alarfaj, Naif Almusallam","doi":"10.3389/fdata.2025.1615788","DOIUrl":"10.3389/fdata.2025.1615788","url":null,"abstract":"<p><p>The mental health of students plays an important role in their overall wellbeing and academic performance. Growing pressure from academics, co-curricular activities such as sports and personal challenges highlight the need for modern methods of monitoring mental health. Traditional approaches, such as self-reported surveys and psychological evaluations, can be time-consuming and subject to bias. With advancement in artificial intelligence (AI), particularly in natural language processing (NLP), sentiment analysis has emerged as an effective technique for identifying mental health patterns in textual data. However, analyzing students' mental health remains a challenging task due to the intensity of emotional expressions, linguistic variations, and context-dependent sentiments. In this study, our primary objective was to investigate the mental health of students by conducting sentiment analysis using advanced deep learning models. To accomplish this task, state-of-the-art Large Language Model (LLM) approaches, such as RoBERTa (a robustly optimized BERT approach), RoBERTa-Large, and ELECTRA, were used for empirical analysis. RoBERTa-Large, an expanded architecture derived from Google's BERT, captures complex patterns and performs more effectively on various NLP tasks. Among the applied algorithms, RoBERTa-Large achieved the highest accuracy of 97%, while ELECTRA yielded 91% accuracy on a multi-classification task with seven diverse mental health status labels. These results demonstrate the potential of LLM-based approaches for predicting students' mental health, particularly in relation to the effects of academic and physical activities.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1615788"},"PeriodicalIF":2.4,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12575187/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145433127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1