Pub Date : 2025-06-06eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1603106
Jiandong Si, Chang Liu, Jingxian Ye, Jianfeng Wu, Jianguo Wang, Kairui Hu, Chunhua Ju, Qianwen Cao
Introduction: The decision regarding the supply of emergency equipments for power emergencies requires timeliness, efficiency, and accuracy. The multi-agent supply relationship graph, based on complex data fusion, enables the comprehensive exploration of interconnections among key entities in power emergency supplies.
Methods: This approach enhances decision-making efficiency and quality by uncovering multiple relationships between main bodies involved. The present study focuses on the decision-making process for power emergency equipments supply and aims to enhance its professionalization. To achieve this goal, multi-modal data regarding power emergency equipments supply is collected from both internal and external power enterprises. Subsequently, a decision support knowledge base is established, along with a four-dimensional relationship graph that integrates events, time, equipments, and suppliers based on the knowledge graph. This enables the mining of multidimensional relationships pertaining to the main body. Finally, supported by the graph, the platform can offer intelligent assistance in decision-making, supplier recommendation, optimization of emergency equipment scheduling for electric power supply, and provides effective information and guidance for decision-making in electric power emergency equipment supply.
Results: After conducting a comparative analysis, the decision support system based on the knowledge graph proposed in this study demonstrates superior effectiveness and precision. By integrating the four-dimensional relationship graph with data mining algorithms, precise decision support can be provided for power emergency response. After verification through case studies, the model developed in this study was utilized to recommend suppliers of power emergency equipment, and the recommendation results demonstrated a closer alignment with actual procurement outcomes.
Conclusion and recommendation: This system proposed by this study delivers multidimensional knowledge guidance and optimized decision pathways for emergency supply management.
{"title":"Conceptual design of a decision knowledge service model integrating a multi-agent supply relationship diagram for electric power emergency equipment.","authors":"Jiandong Si, Chang Liu, Jingxian Ye, Jianfeng Wu, Jianguo Wang, Kairui Hu, Chunhua Ju, Qianwen Cao","doi":"10.3389/fdata.2025.1603106","DOIUrl":"10.3389/fdata.2025.1603106","url":null,"abstract":"<p><strong>Introduction: </strong>The decision regarding the supply of emergency equipments for power emergencies requires timeliness, efficiency, and accuracy. The multi-agent supply relationship graph, based on complex data fusion, enables the comprehensive exploration of interconnections among key entities in power emergency supplies.</p><p><strong>Methods: </strong>This approach enhances decision-making efficiency and quality by uncovering multiple relationships between main bodies involved. The present study focuses on the decision-making process for power emergency equipments supply and aims to enhance its professionalization. To achieve this goal, multi-modal data regarding power emergency equipments supply is collected from both internal and external power enterprises. Subsequently, a decision support knowledge base is established, along with a four-dimensional relationship graph that integrates events, time, equipments, and suppliers based on the knowledge graph. This enables the mining of multidimensional relationships pertaining to the main body. Finally, supported by the graph, the platform can offer intelligent assistance in decision-making, supplier recommendation, optimization of emergency equipment scheduling for electric power supply, and provides effective information and guidance for decision-making in electric power emergency equipment supply.</p><p><strong>Results: </strong>After conducting a comparative analysis, the decision support system based on the knowledge graph proposed in this study demonstrates superior effectiveness and precision. By integrating the four-dimensional relationship graph with data mining algorithms, precise decision support can be provided for power emergency response. After verification through case studies, the model developed in this study was utilized to recommend suppliers of power emergency equipment, and the recommendation results demonstrated a closer alignment with actual procurement outcomes.</p><p><strong>Conclusion and recommendation: </strong>This system proposed by this study delivers multidimensional knowledge guidance and optimized decision pathways for emergency supply management.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1603106"},"PeriodicalIF":2.4,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12179217/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144477801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-04eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1600267
K Jyothi Upadhya, Ronan Lobo, Mini Shail Chhabra, Aman Paleja, B Dinesh Rao, Geetha M, Prachi Sisodia, Bolusani Akshita Reddy
Periodic pattern mining, a branch of data mining, is expanding to provide insight into the occurrence behavior of large volumes of data. Recently, a variety of industries, including fraud detection, telecommunications, retail marketing, research, and medical have found applications for rare association rule mining, which uncovers unusual or unexpected combinations. A limited amount of literature demonstrated how periodicity is essential in mining low-support rare patterns. In addition, attention must be placed on temporal datasets that analyze crucial information about the timing of pattern occurrences and stream datasets to manage high-speed streaming data. Several algorithms have been developed that effectively track the cyclic behavior of patterns and identify the patterns that display complete or partial periodic behavior in temporal datasets. Numerous frameworks have been created to examine the periodic behavior of streaming data. Nevertheless, such a method that focuses on the temporal information in the data stream and extracts rare partial periodic patterns has yet to be proposed. With a focus on identifying rare partial periodic patterns from temporal data streams, this paper proposes two novel sliding window-based single scan approaches called R3PStreamSW-Growth and R3PStreamSW-BitVectorMiner. The findings showed that when a dense dataset Accidents is considered, for different threshold variations R3P-StreamSWBitVectorMiner outperformed R3PStreamSW-Growth by about 93%. Similarly, when the sparse dataset T10I4D100K is taken into account, R3P-StreamSWBitVectorMiner exhibits a 90% boost in performance. This demonstrates that on a range of synthetic, real-world, sparse, and dense datasets for different thresholds, R3P-StreamSWBitVectorMiner is significantly faster than R3PStreamSW-Growth.
{"title":"Sliding window based rare partial periodic pattern mining algorithms over temporal data streams.","authors":"K Jyothi Upadhya, Ronan Lobo, Mini Shail Chhabra, Aman Paleja, B Dinesh Rao, Geetha M, Prachi Sisodia, Bolusani Akshita Reddy","doi":"10.3389/fdata.2025.1600267","DOIUrl":"10.3389/fdata.2025.1600267","url":null,"abstract":"<p><p>Periodic pattern mining, a branch of data mining, is expanding to provide insight into the occurrence behavior of large volumes of data. Recently, a variety of industries, including fraud detection, telecommunications, retail marketing, research, and medical have found applications for rare association rule mining, which uncovers unusual or unexpected combinations. A limited amount of literature demonstrated how periodicity is essential in mining low-support rare patterns. In addition, attention must be placed on temporal datasets that analyze crucial information about the timing of pattern occurrences and stream datasets to manage high-speed streaming data. Several algorithms have been developed that effectively track the cyclic behavior of patterns and identify the patterns that display complete or partial periodic behavior in temporal datasets. Numerous frameworks have been created to examine the periodic behavior of streaming data. Nevertheless, such a method that focuses on the temporal information in the data stream and extracts rare partial periodic patterns has yet to be proposed. With a focus on identifying rare partial periodic patterns from temporal data streams, this paper proposes two novel sliding window-based single scan approaches called <i>R3PStreamSW-Growth</i> and <i>R3PStreamSW-BitVectorMiner</i>. The findings showed that when a dense dataset <i>Accidents</i> is considered, for different threshold variations <i>R3P-StreamSWBitVectorMiner</i> outperformed <i>R3PStreamSW-Growth</i> by about 93%. Similarly, when the sparse dataset <i>T10I4D100K</i> is taken into account, <i>R3P-StreamSWBitVectorMiner</i> exhibits a 90% boost in performance. This demonstrates that on a range of synthetic, real-world, sparse, and dense datasets for different thresholds, <i>R3P-StreamSWBitVectorMiner</i> is significantly faster than <i>R3PStreamSW-Growth</i>.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1600267"},"PeriodicalIF":2.4,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12175007/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144327735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-22eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1605788
Paolo Parigi, Kinga Makovi
{"title":"Editorial: Applied computational social sciences.","authors":"Paolo Parigi, Kinga Makovi","doi":"10.3389/fdata.2025.1605788","DOIUrl":"https://doi.org/10.3389/fdata.2025.1605788","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1605788"},"PeriodicalIF":2.4,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12137231/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144235945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-19eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1569623
Markus Hadler, Alexander Ertl, Beate Klösch, Markus Reiter-Haas, Elisabeth Lex
Recent climate-related protests by social movements such as Extinction Rebellion, Just Stop Oil, and others have included actions like defacing artwork and gluing oneself to objects and streets. Using sentiment analysis and frame detection models, we analyze a corpus of all available English-language news articles in LexisNexis, with the first recorded instance of a gluing protest appearing in 1986. Our study traces the development of this protest tactic over time and addresses three central questions from social movement literature: the use of glue in protests, the geographical spread of this tactic, and the framing of these actions. We find that gluing protests were initially associated with a range of issues-including abortion, criminal justice, and environmental concerns-but in recent years have become more strongly linked to climate activism. Media coverage of these protests is predominantly negative, although public media tends to be comparatively less so. Moreover, protesters' prognostic frames-suggestions for what should be done-are relatively rare, with discourse more often centering on policy and security concerns. From a data science perspective, we explore the use of various Natural Language Processing (NLP) methods. The discussion and conclusion section highlights challenges encountered when working with our corpus and NLP models, and suggests ways to address them in future research. We also consider how recent advancements in large language models (LLMs) could refine or extend these analyses while acknowledging important concerns related to their use.
{"title":"The climate gluing protests: analyzing their development and framing in media since 1986 using sentiment analyses and frame detection models.","authors":"Markus Hadler, Alexander Ertl, Beate Klösch, Markus Reiter-Haas, Elisabeth Lex","doi":"10.3389/fdata.2025.1569623","DOIUrl":"10.3389/fdata.2025.1569623","url":null,"abstract":"<p><p>Recent climate-related protests by social movements such as <i>Extinction Rebellion, Just Stop Oil</i>, and others have included actions like defacing artwork and gluing oneself to objects and streets. Using sentiment analysis and frame detection models, we analyze a corpus of all available English-language news articles in LexisNexis, with the first recorded instance of a gluing protest appearing in 1986. Our study traces the development of this protest tactic over time and addresses three central questions from social movement literature: the use of glue in protests, the geographical spread of this tactic, and the framing of these actions. We find that gluing protests were initially associated with a range of issues-including abortion, criminal justice, and environmental concerns-but in recent years have become more strongly linked to climate activism. Media coverage of these protests is predominantly negative, although public media tends to be comparatively less so. Moreover, protesters' prognostic frames-suggestions for what should be done-are relatively rare, with discourse more often centering on policy and security concerns. From a data science perspective, we explore the use of various Natural Language Processing (NLP) methods. The discussion and conclusion section highlights challenges encountered when working with our corpus and NLP models, and suggests ways to address them in future research. We also consider how recent advancements in large language models (LLMs) could refine or extend these analyses while acknowledging important concerns related to their use.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1569623"},"PeriodicalIF":2.4,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12127360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144210132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-14eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1520574
Liliana Ortega-Diaz, Julian Jaramillo-Ibarra, German Osma-Pinto
Air conditioning energy consumption in buildings represents a considerable percentage of total energy consumption, which underlines the importance of implementing measures contributing to its reduction. Predicting energy consumption is critical to making informed decisions and identifying factors influencing power consumption. Machine learning is the most widely used approach for prediction due to its speed, accuracy, and non-linear modeling. In this study, three machine learning models were used to predict the air conditioning energy demand in a classroom of an educational building in a hot tropical climate. The models selected are SVR (Support Vector Regressor), DT (Decision Tree), and RFR (Random Forest Regressor) due to their wide use in the literature; therefore, the goal is to establish which one offers the best performance for this case study based on a comparative analysis using performance metrics. Cross-validation was used to perform robust training. Twenty-two input variables were considered: climatological, operational, and temporal. Occupancy is the variable with the highest correlation with air conditioning consumption; these two variables have a positive relationship of 0.65. Monitoring was carried out for 72 days, including weekends. Six study scenarios were considered, in which the monitoring period varied, influencing the number of samples. In addition, two sensitivity analyses were performed by modifying the time interval of the data (1, 5, 10, 20, 30, and 60 min) and the data split (50:50, 60:40, 70:30, 80:20 and 90:10). The evaluation of the models was performed using RMSE, MAE and R2 metrics, to different characteristics and approaches to error measurement. During the training phase, the RFR model achieved a coefficient of determination (R2) of 0.97, while the SVR obtained an R2 of 0.78 in the test phase. Finally, it is concluded that using shorter time intervals (every 1 min) in the data improves the performance of the predictive models. Splitting the data into 80:20 and 90:10 ratios resulted in the lowest RMSE values for the three models evaluated. Training the models with a larger amount of data allows for capturing more representative patterns, which improves their generalization ability and performance on new data.
{"title":"Estimation of the air conditioning energy consumption of a classroom using machine learning in a tropical climate.","authors":"Liliana Ortega-Diaz, Julian Jaramillo-Ibarra, German Osma-Pinto","doi":"10.3389/fdata.2025.1520574","DOIUrl":"https://doi.org/10.3389/fdata.2025.1520574","url":null,"abstract":"<p><p>Air conditioning energy consumption in buildings represents a considerable percentage of total energy consumption, which underlines the importance of implementing measures contributing to its reduction. Predicting energy consumption is critical to making informed decisions and identifying factors influencing power consumption. Machine learning is the most widely used approach for prediction due to its speed, accuracy, and non-linear modeling. In this study, three machine learning models were used to predict the air conditioning energy demand in a classroom of an educational building in a hot tropical climate. The models selected are SVR (Support Vector Regressor), DT (Decision Tree), and RFR (Random Forest Regressor) due to their wide use in the literature; therefore, the goal is to establish which one offers the best performance for this case study based on a comparative analysis using performance metrics. Cross-validation was used to perform robust training. Twenty-two input variables were considered: climatological, operational, and temporal. Occupancy is the variable with the highest correlation with air conditioning consumption; these two variables have a positive relationship of 0.65. Monitoring was carried out for 72 days, including weekends. Six study scenarios were considered, in which the monitoring period varied, influencing the number of samples. In addition, two sensitivity analyses were performed by modifying the time interval of the data (1, 5, 10, 20, 30, and 60 min) and the data split (50:50, 60:40, 70:30, 80:20 and 90:10). The evaluation of the models was performed using RMSE, MAE and <i>R</i> <sup>2</sup> metrics, to different characteristics and approaches to error measurement. During the training phase, the RFR model achieved a coefficient of determination (<i>R</i> <sup>2</sup>) of 0.97, while the SVR obtained an <i>R</i> <sup>2</sup> of 0.78 in the test phase. Finally, it is concluded that using shorter time intervals (every 1 min) in the data improves the performance of the predictive models. Splitting the data into 80:20 and 90:10 ratios resulted in the lowest RMSE values for the three models evaluated. Training the models with a larger amount of data allows for capturing more representative patterns, which improves their generalization ability and performance on new data.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1520574"},"PeriodicalIF":2.4,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12116678/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144175826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-14eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1526480
Mohamed Abd Elaziz, Ibrahim A Fares, Abdelghani Dahou, Mansour Shrahili
Intrusion detection has been of prime concern in the Internet of Things (IoT) environment due to the rapid increase in cyber threats. Majority of traditional intrusion detection systems (IDSs) rely on centralized models, raising significant privacy concerns. Federated learning (FL) offers a decentralized alternative; however, many existing FL-based IDS frameworks suffer from poor performance due to suboptimal model architectures and ineffective hyperparameter selection. To address these challenges, this paper introduces a novel trust-centric FL framework based on the tab transformer (TTF) model for IDS. We enhance the Tab model through an optimization process, utilizing a hyperparameter tuning algorithm inspired by the nature-based electric eel foraging optimization (EEFO) algorithm. The goal of the developed framework is to improve the detection of IDS without using centralized data to preserve privacy. Whereas it enhances the processing and detection capability of huge amounts of data generated from IoT devices. Our framework is tested on three IoT datasets: N-BaIoT, UNSW-NB15, and CICIoT2023 to ensure the model's performance. Experimental results show that the proposed framework significantly exceeds traditional methods in terms of accuracy, precision, and recall. The results presented in this study confirm the effectiveness and superior performance of the proposed FL-based IDS framework.
{"title":"Federated learning framework for IoT intrusion detection using tab transformer and nature-inspired hyperparameter optimization.","authors":"Mohamed Abd Elaziz, Ibrahim A Fares, Abdelghani Dahou, Mansour Shrahili","doi":"10.3389/fdata.2025.1526480","DOIUrl":"10.3389/fdata.2025.1526480","url":null,"abstract":"<p><p>Intrusion detection has been of prime concern in the Internet of Things (IoT) environment due to the rapid increase in cyber threats. Majority of traditional intrusion detection systems (IDSs) rely on centralized models, raising significant privacy concerns. Federated learning (FL) offers a decentralized alternative; however, many existing FL-based IDS frameworks suffer from poor performance due to suboptimal model architectures and ineffective hyperparameter selection. To address these challenges, this paper introduces a novel trust-centric FL framework based on the tab transformer (TTF) model for IDS. We enhance the Tab model through an optimization process, utilizing a hyperparameter tuning algorithm inspired by the nature-based electric eel foraging optimization (EEFO) algorithm. The goal of the developed framework is to improve the detection of IDS without using centralized data to preserve privacy. Whereas it enhances the processing and detection capability of huge amounts of data generated from IoT devices. Our framework is tested on three IoT datasets: N-BaIoT, UNSW-NB15, and CICIoT2023 to ensure the model's performance. Experimental results show that the proposed framework significantly exceeds traditional methods in terms of accuracy, precision, and recall. The results presented in this study confirm the effectiveness and superior performance of the proposed FL-based IDS framework.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1526480"},"PeriodicalIF":2.4,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12116512/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144175827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-06eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1611364
Yves Philippe Rybarczyk
{"title":"Editorial: Air quality and biosphere-atmosphere interactions.","authors":"Yves Philippe Rybarczyk","doi":"10.3389/fdata.2025.1611364","DOIUrl":"https://doi.org/10.3389/fdata.2025.1611364","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1611364"},"PeriodicalIF":2.4,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089036/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144112461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-23eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1542483
Hossein Hassani, Mohammad Reza Entezarian, Sara Zaeimzadeh, Leila Marvian, Nadejda Komendantova
Effective record linkage in big data, particularly in imbalanced datasets, is a critical yet highly challenging task due to the inherent complexity involved. This article utilizes an oversampling-undersampling strategy to address linkage imbalances, enabling more accurate and efficient record linkage within large-scale datasets. It tries to increase the instances of the minority class and decrease the dominance of the majority classes to try to reach a more balanced dataset that can be used for training and testing. Sensitivity testing was carried out by varying the training-test ratio and degree of imbalance.
{"title":"An oversampling-undersampling strategy for large-scale data linkage.","authors":"Hossein Hassani, Mohammad Reza Entezarian, Sara Zaeimzadeh, Leila Marvian, Nadejda Komendantova","doi":"10.3389/fdata.2025.1542483","DOIUrl":"https://doi.org/10.3389/fdata.2025.1542483","url":null,"abstract":"<p><p>Effective record linkage in big data, particularly in imbalanced datasets, is a critical yet highly challenging task due to the inherent complexity involved. This article utilizes an oversampling-undersampling strategy to address linkage imbalances, enabling more accurate and efficient record linkage within large-scale datasets. It tries to increase the instances of the minority class and decrease the dominance of the majority classes to try to reach a more balanced dataset that can be used for training and testing. Sensitivity testing was carried out by varying the training-test ratio and degree of imbalance.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1542483"},"PeriodicalIF":2.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12055850/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144060444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-16eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1556157
Suresh Neethirajan
The rapid digital transformation of dairy and poultry farming through big data analytics and Internet of Things (IoT) innovations has significantly advanced precision management of feeding, animal health, and environmental conditions. However, this digitization has simultaneously escalated cybersecurity vulnerabilities, presenting serious threats to economic stability, animal welfare, and food safety. This paper provides an in-depth analysis of the evolving cyber threat landscape confronting digital livestock farming, examining ransomware incidents, hacktivist interference, and state-sponsored cyber intrusions. It critically assesses how compromised digital systems disrupt critical farm operations, including milking routines, feed formulations, and climate control, profoundly impacting animal health, productivity, and consumer trust. Responding to these challenges, we present a comprehensive cybersecurity roadmap that integrates established IT security practices with agriculture-specific requirements. The roadmap emphasizes advanced solutions, such as AI-driven anomaly detection, blockchain-based traceability, and integrated cybersecurity-biosecurity frameworks, tailored explicitly to safeguard livestock farming. Additionally, we highlight human-centric elements such as targeted workforce education, rural cybersecurity capacity building, and robust cross-sector collaboration as indispensable components of a resilient cybersecurity ecosystem. By synthesizing technical advancements, regulatory perspectives, and socio-economic insights, the paper proposes a proactive strategy to enhance data integrity, secure animal welfare, and reinforce food supply chains. Ultimately, we underscore that effective cybersecurity is not merely a technical consideration but foundational to ensuring the sustainable, ethical, and trustworthy advancement of livestock agriculture in a data-driven world.
{"title":"Safeguarding digital livestock farming - a comprehensive cybersecurity roadmap for dairy and poultry industries.","authors":"Suresh Neethirajan","doi":"10.3389/fdata.2025.1556157","DOIUrl":"https://doi.org/10.3389/fdata.2025.1556157","url":null,"abstract":"<p><p>The rapid digital transformation of dairy and poultry farming through big data analytics and Internet of Things (IoT) innovations has significantly advanced precision management of feeding, animal health, and environmental conditions. However, this digitization has simultaneously escalated cybersecurity vulnerabilities, presenting serious threats to economic stability, animal welfare, and food safety. This paper provides an in-depth analysis of the evolving cyber threat landscape confronting digital livestock farming, examining ransomware incidents, hacktivist interference, and state-sponsored cyber intrusions. It critically assesses how compromised digital systems disrupt critical farm operations, including milking routines, feed formulations, and climate control, profoundly impacting animal health, productivity, and consumer trust. Responding to these challenges, we present a comprehensive cybersecurity roadmap that integrates established IT security practices with agriculture-specific requirements. The roadmap emphasizes advanced solutions, such as AI-driven anomaly detection, blockchain-based traceability, and integrated cybersecurity-biosecurity frameworks, tailored explicitly to safeguard livestock farming. Additionally, we highlight human-centric elements such as targeted workforce education, rural cybersecurity capacity building, and robust cross-sector collaboration as indispensable components of a resilient cybersecurity ecosystem. By synthesizing technical advancements, regulatory perspectives, and socio-economic insights, the paper proposes a proactive strategy to enhance data integrity, secure animal welfare, and reinforce food supply chains. Ultimately, we underscore that effective cybersecurity is not merely a technical consideration but foundational to ensuring the sustainable, ethical, and trustworthy advancement of livestock agriculture in a data-driven world.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1556157"},"PeriodicalIF":2.4,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12040926/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144043040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-09eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1557600
Jorge Raul Navarro-Cabrera, Miguel Angel Valles-Coral, María Elena Farro-Roque, Nelly Reátegui-Lozano, Lolita Arévalo-Fasanando
Introduction: Iron deficiency anemia (IDA) is a global health issue that significantly affects quality of life. Non-invasive methods, such as image analysis using artificial vision, offer accessible alternatives for diagnosis. This study proposes a DenseNet169-based model to detect anemia from nail images and compares its performance with that of the Rad-67 hemoglobin meter.
Methods: A cross-sectional study was conducted with 909 nail images collected from university students aged 18-25 years at the Universidad Nacional de San Martín, Peru. Samsung Galaxy A73 5G was used to capture images under controlled conditions, and clinical data were complemented with hemoglobin readings from the Rad-67 device. The images were pre-processed using segmentation and data augmentation techniques to standardize the dataset. Three models (DenseNet169, InceptionV3, and Xception) were trained and evaluated using metrics, such as accuracy, recall, and AUC.
Results: DenseNet169169 demonstrated the best performance, achieving an accuracy of 0.6983, recall of 0.6477, F1-Score of 0.6525, and AUC of 0.7409. Despite the presence of false-negatives, the results showed a positive correlation with Rad-67 readings.
Conclusion: The DenseNet169-based model proved to be a promising tool for non-invasive detection of iron deficiency anemia, with potential for application in clinical and educational settings. Future improvements in preprocessing and dataset diversification could enhance performance and applicability.
{"title":"Machine vision model using nail images for non-invasive detection of iron deficiency anemia in university students.","authors":"Jorge Raul Navarro-Cabrera, Miguel Angel Valles-Coral, María Elena Farro-Roque, Nelly Reátegui-Lozano, Lolita Arévalo-Fasanando","doi":"10.3389/fdata.2025.1557600","DOIUrl":"https://doi.org/10.3389/fdata.2025.1557600","url":null,"abstract":"<p><strong>Introduction: </strong>Iron deficiency anemia (IDA) is a global health issue that significantly affects quality of life. Non-invasive methods, such as image analysis using artificial vision, offer accessible alternatives for diagnosis. This study proposes a DenseNet169-based model to detect anemia from nail images and compares its performance with that of the Rad-67 hemoglobin meter.</p><p><strong>Methods: </strong>A cross-sectional study was conducted with 909 nail images collected from university students aged 18-25 years at the Universidad Nacional de San Martín, Peru. Samsung Galaxy A73 5G was used to capture images under controlled conditions, and clinical data were complemented with hemoglobin readings from the Rad-67 device. The images were pre-processed using segmentation and data augmentation techniques to standardize the dataset. Three models (DenseNet169, InceptionV3, and Xception) were trained and evaluated using metrics, such as accuracy, recall, and AUC.</p><p><strong>Results: </strong>DenseNet169169 demonstrated the best performance, achieving an accuracy of 0.6983, recall of 0.6477, F1-Score of 0.6525, and AUC of 0.7409. Despite the presence of false-negatives, the results showed a positive correlation with Rad-67 readings.</p><p><strong>Conclusion: </strong>The DenseNet169-based model proved to be a promising tool for non-invasive detection of iron deficiency anemia, with potential for application in clinical and educational settings. Future improvements in preprocessing and dataset diversification could enhance performance and applicability.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1557600"},"PeriodicalIF":2.4,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12015980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144040422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}