Introduction: Vaccination is critical for reducing childhood mortality, yet completion rates for the third dose of the pentavalent vaccine (Penta 3) in East Africa remain inadequate. This study aims to predict Penta 3 vaccination dropout using a stacking ensemble machine learning model with Demographic and Health Survey (DHS) data. The objective is to identify predictors of dropout and enhance intervention strategies.
Methods: The study utilized seven base machine learning algorithms to create a stacked ensemble model with three meta-learners: Random Forest (RF), Generalized Linear Model (GLM), and Extreme Gradient Boosting (XGBoost). The H2O package facilitated the development of base learners and the stacking of super learners. Feature selection (FS) and comparisons were performed using the LASSO and Boruta algorithms. The selected features were one-hot encoded, and ordinal encoding was applied where appropriate. Hyperparameter optimization (HPO) and comparisons were conducted using grid search and random search. Model performance was assessed using five key metrics, including accuracy and the area under the curve (AUC). SHAP (Shapley Additive Explanations) values were used to interpret the model outputs and identify influential predictors. The experimental design was employed to present the results.
Results: Four experiments were conducted to evaluate feature selection and HPO methods. All stacked ensemble models outperformed individual learners, with the XGBoost meta-learner optimized with grid search and LASSO FS achieving the highest performance: 93.9% accuracy and 99.4% AUC. While RF and GLM meta-learners were also evaluated, they were outperformed by the XGBoost meta-learner. SHAP analysis revealed key features influencing Penta 3 dropout, including the place of delivery, decision-making autonomy, the mother's level of earning, and healthcare access. Home delivery increased the risk of dropout, while postnatal care by midwives and health insurance coverage lowered dropout likelihood.
Conclusion and recommendation: This study provides insights into the factors influencing Penta 3 vaccination dropout in East Africa. To reduce dropout rates, interventions should focus on enhancing maternal livelihood opportunities, improving healthcare access in rural areas, and promoting institutional deliveries.
{"title":"A stacked ensemble machine learning model for the prediction of pentavalent 3 vaccination dropout in East Africa.","authors":"Meron Asmamaw Alemayehu, Shimels Derso Kebede, Agmasie Damtew Walle, Daniel Niguse Mamo, Ermias Bekele Enyew, Jibril Bashir Adem","doi":"10.3389/fdata.2025.1522578","DOIUrl":"https://doi.org/10.3389/fdata.2025.1522578","url":null,"abstract":"<p><strong>Introduction: </strong>Vaccination is critical for reducing childhood mortality, yet completion rates for the third dose of the pentavalent vaccine (Penta 3) in East Africa remain inadequate. This study aims to predict Penta 3 vaccination dropout using a stacking ensemble machine learning model with Demographic and Health Survey (DHS) data. The objective is to identify predictors of dropout and enhance intervention strategies.</p><p><strong>Methods: </strong>The study utilized seven base machine learning algorithms to create a stacked ensemble model with three meta-learners: Random Forest (RF), Generalized Linear Model (GLM), and Extreme Gradient Boosting (XGBoost). The H2O package facilitated the development of base learners and the stacking of super learners. Feature selection (FS) and comparisons were performed using the LASSO and Boruta algorithms. The selected features were one-hot encoded, and ordinal encoding was applied where appropriate. Hyperparameter optimization (HPO) and comparisons were conducted using grid search and random search. Model performance was assessed using five key metrics, including accuracy and the area under the curve (AUC). SHAP (Shapley Additive Explanations) values were used to interpret the model outputs and identify influential predictors. The experimental design was employed to present the results.</p><p><strong>Results: </strong>Four experiments were conducted to evaluate feature selection and HPO methods. All stacked ensemble models outperformed individual learners, with the XGBoost meta-learner optimized with grid search and LASSO FS achieving the highest performance: 93.9% accuracy and 99.4% AUC. While RF and GLM meta-learners were also evaluated, they were outperformed by the XGBoost meta-learner. SHAP analysis revealed key features influencing Penta 3 dropout, including the place of delivery, decision-making autonomy, the mother's level of earning, and healthcare access. Home delivery increased the risk of dropout, while postnatal care by midwives and health insurance coverage lowered dropout likelihood.</p><p><strong>Conclusion and recommendation: </strong>This study provides insights into the factors influencing Penta 3 vaccination dropout in East Africa. To reduce dropout rates, interventions should focus on enhancing maternal livelihood opportunities, improving healthcare access in rural areas, and promoting institutional deliveries.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1522578"},"PeriodicalIF":2.4,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12009798/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144060436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-04eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1532362
Johnson Masinde, Franklin Mugambi, Daniel Wambiri Muthee
The present study examined the correlation between big data and personal information privacy in Kenya, a developing nation which has experienced a significant rise in utilization of data in the recent past. The study sought to assess the effectiveness of present data protection laws and policies, highlight challenges that individuals and organizations experience while securing their data, and propose mechanisms to enhance data protection frameworks and raise public awareness of data privacy issues. The study employed a mixed-methods approach, which included a survey of 500 participants, 20 interviews with key stakeholders, and an examination of 50 pertinent documents. Study findings show that the regulatory and legal frameworks though present are not enforced, demonstrating a gap between legislation and implementation. Furthermore, there is a lack of understanding about the risks posed by sharing personal information, and that more public education and awareness activities are required. The findings also demonstrate that while people are prepared to trade their personal information for concrete benefits, they are concerned about how their data is utilized and by whom. The study proposes the establishment of a National Data Literacy Training and Capacity Building Framework (NADACA), that should mandate the training of government officials in best practices for data governance and enforcement mechanisms, educate the public on personal data privacy and relevant laws, and ensure the integration of data literacy into the curriculum, alongside the provision of regular resources and workshops on data literacy. The study has significant implications for policymakers, industry representatives, and civil society organizations in Kenya and globally.
{"title":"Big data and personal information privacy in developing countries: insights from Kenya.","authors":"Johnson Masinde, Franklin Mugambi, Daniel Wambiri Muthee","doi":"10.3389/fdata.2025.1532362","DOIUrl":"https://doi.org/10.3389/fdata.2025.1532362","url":null,"abstract":"<p><p>The present study examined the correlation between big data and personal information privacy in Kenya, a developing nation which has experienced a significant rise in utilization of data in the recent past. The study sought to assess the effectiveness of present data protection laws and policies, highlight challenges that individuals and organizations experience while securing their data, and propose mechanisms to enhance data protection frameworks and raise public awareness of data privacy issues. The study employed a mixed-methods approach, which included a survey of 500 participants, 20 interviews with key stakeholders, and an examination of 50 pertinent documents. Study findings show that the regulatory and legal frameworks though present are not enforced, demonstrating a gap between legislation and implementation. Furthermore, there is a lack of understanding about the risks posed by sharing personal information, and that more public education and awareness activities are required. The findings also demonstrate that while people are prepared to trade their personal information for concrete benefits, they are concerned about how their data is utilized and by whom. The study proposes the establishment of a National Data Literacy Training and Capacity Building Framework (NADACA), that should mandate the training of government officials in best practices for data governance and enforcement mechanisms, educate the public on personal data privacy and relevant laws, and ensure the integration of data literacy into the curriculum, alongside the provision of regular resources and workshops on data literacy. The study has significant implications for policymakers, industry representatives, and civil society organizations in Kenya and globally.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1532362"},"PeriodicalIF":2.4,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12006125/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144053409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Introduction: Rapid advancements in artificial intelligence and generative artificial intelligence have enabled the creation of fake images and videos that appear highly realistic. According to a report published in 2022, approximately 71% of people rely on fake videos and become victims of blackmail. Moreover, these fake videos and images are used to tarnish the reputation of popular public figures. This has increased the demand for deepfake detection techniques. The accuracy of the techniques proposed in the literature so far varies with changes in fake content generation techniques. Additionally, these techniques are computationally intensive. The techniques discussed in the literature are based on convolutional neural networks, Linformer models, or transformer models for deepfake detection, each with its advantages and disadvantages.
Methods: In this manuscript, a hybrid architecture combining transformer and Linformer models is proposed for deepfake detection. This architecture converts an image into patches and performs position encoding to retain spatial relationships between patches. Its encoder captures the contextual information from the input patches, and Gaussian Error Linear Unit resolves the vanishing gradient problem.
Results: The Linformer component reduces the size of the attention matrix. Thus, it reduces the execution time to half without compromising accuracy. Moreover, it utilizes the unique features of transformer and Linformer models to enhance the robustness and generalization of deepfake detection techniques. The low computational requirement and high accuracy of 98.9% increase the real-time applicability of the model, preventing blackmail and other losses to the public.
Discussion: The proposed hybrid model utilizes the strength of the transformer model in capturing complex patterns in data. It uses the self-attention potential of the Linformer model and reduces the computation time without compromising the accuracy. Moreover, the models were implemented on patch sizes of 6 and 11. It is evident from the obtained results that increasing the patch size improves the performance of the model. This allows the model to capture fine-grained features and learn more effectively from the same set of videos. The larger patch size also enables the model to better preserve spatial details, which contributes to improved feature extraction.
{"title":"Lightweight and hybrid transformer-based solution for quick and reliable deepfake detection.","authors":"Geeta Rani, Atharv Kothekar, Shawn George Philip, Vijaypal Singh Dhaka, Ester Zumpano, Eugenio Vocaturo","doi":"10.3389/fdata.2025.1521653","DOIUrl":"https://doi.org/10.3389/fdata.2025.1521653","url":null,"abstract":"<p><strong>Introduction: </strong>Rapid advancements in artificial intelligence and generative artificial intelligence have enabled the creation of fake images and videos that appear highly realistic. According to a report published in 2022, approximately 71% of people rely on fake videos and become victims of blackmail. Moreover, these fake videos and images are used to tarnish the reputation of popular public figures. This has increased the demand for deepfake detection techniques. The accuracy of the techniques proposed in the literature so far varies with changes in fake content generation techniques. Additionally, these techniques are computationally intensive. The techniques discussed in the literature are based on convolutional neural networks, Linformer models, or transformer models for deepfake detection, each with its advantages and disadvantages.</p><p><strong>Methods: </strong>In this manuscript, a hybrid architecture combining transformer and Linformer models is proposed for deepfake detection. This architecture converts an image into patches and performs position encoding to retain spatial relationships between patches. Its encoder captures the contextual information from the input patches, and Gaussian Error Linear Unit resolves the vanishing gradient problem.</p><p><strong>Results: </strong>The Linformer component reduces the size of the attention matrix. Thus, it reduces the execution time to half without compromising accuracy. Moreover, it utilizes the unique features of transformer and Linformer models to enhance the robustness and generalization of deepfake detection techniques. The low computational requirement and high accuracy of 98.9% increase the real-time applicability of the model, preventing blackmail and other losses to the public.</p><p><strong>Discussion: </strong>The proposed hybrid model utilizes the strength of the transformer model in capturing complex patterns in data. It uses the self-attention potential of the Linformer model and reduces the computation time without compromising the accuracy. Moreover, the models were implemented on patch sizes of 6 and 11. It is evident from the obtained results that increasing the patch size improves the performance of the model. This allows the model to capture fine-grained features and learn more effectively from the same set of videos. The larger patch size also enables the model to better preserve spatial details, which contributes to improved feature extraction.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1521653"},"PeriodicalIF":2.4,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12023275/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144045976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-25eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1573072
Alfred Krzywicki, Michael Bain, Wayne Wobcke
{"title":"Editorial: Natural language processing for recommender systems.","authors":"Alfred Krzywicki, Michael Bain, Wayne Wobcke","doi":"10.3389/fdata.2025.1573072","DOIUrl":"https://doi.org/10.3389/fdata.2025.1573072","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1573072"},"PeriodicalIF":2.4,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11975900/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143813038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As environmental awareness increased due to the surge in greenhouse gases, green travel modes such as bicycles and walking have gradually became popular choices. However, the current traffic environment has many hidden problems that endanger the personal safety of traffic participants and hinder the development of green travel. Traditional methods, such as identifying risky locations after traffic accidents, suffer from the disadvantages of delayed response and lack of foresight. Against this background, we proposed a mobile edge crowdsensing framework to dynamically assess urban traffic green travel safety risks. Specifically, a large number of mobile devices were used to sense the road environment, from which a semantic detection framework detected the traffic high-risk behaviors of traffic participants. Then multi-source and heterogeneous urban crowdsensing data were used to model the travel safety risk to achieve a comprehensive and real-time assessment of urban green travel safety. We evaluated our method by leveraging real-world datasets collected from Xiamen Island. Results showed that our framework could accurately detect traffic high-risk behaviors with average F1-scores of 86.5% and assessed the travel safety risk with R2 of 0.85 outperforming various baseline methods.
{"title":"CrowdRadar: a mobile crowdsensing framework for urban traffic green travel safety risk assessment.","authors":"Yigao Wang, Qingxian Tang, Wenxuan Wei, Chenhui Yang, Dingqi Yang, Cheng Wang, Liang Xu, Longbiao Chen","doi":"10.3389/fdata.2025.1440816","DOIUrl":"10.3389/fdata.2025.1440816","url":null,"abstract":"<p><p>As environmental awareness increased due to the surge in greenhouse gases, green travel modes such as bicycles and walking have gradually became popular choices. However, the current traffic environment has many hidden problems that endanger the personal safety of traffic participants and hinder the development of green travel. Traditional methods, such as identifying risky locations after traffic accidents, suffer from the disadvantages of delayed response and lack of foresight. Against this background, we proposed a mobile edge crowdsensing framework to dynamically assess urban traffic green travel safety risks. Specifically, a large number of mobile devices were used to sense the road environment, from which a semantic detection framework detected the traffic high-risk behaviors of traffic participants. Then multi-source and heterogeneous urban crowdsensing data were used to model the travel safety risk to achieve a comprehensive and real-time assessment of urban green travel safety. We evaluated our method by leveraging real-world datasets collected from Xiamen Island. Results showed that our framework could accurately detect traffic high-risk behaviors with average F1-scores of 86.5% and assessed the travel safety risk with <i>R</i> <sup>2</sup> of 0.85 outperforming various baseline methods.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1440816"},"PeriodicalIF":2.4,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11968729/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143796985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-14eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1546223
Yves Rybarczyk, Rasa Zalakeviciute, Marija Ereminaite, Ivana Costa-Stolz
The planet is experiencing global warming, with an increasing number of heat waves worldwide. Cities are particularly affected by the high temperatures because of the urban heat island (UHI) effect. This phenomenon is mostly explained by the land cover changes, reduced green spaces, and the concentration of infrastructure in urban settings. However, the reasons for the UHI are complex and involve multiple factors still understudied. Air pollution is one of them. This work investigates the link between particulate matter ≤2.5 μm (PM2.5) and air temperature by convergent cross-mapping (CCM), a statistical method to infer causation in dynamic non-linear systems. A positive correlation between the concentration of fine particulate matter and urban temperature is observed. The causal relationship between PM2.5 and temperature is confirmed in the most urbanized areas of the study site (Quito, Ecuador). The results show that (i) the UHI is present even in the most elevated capital city of the world, and (ii) air quality is an important contributor to the higher temperatures in urban than outlying areas. This study supports the hypothesis of a non-linear threshold effect of pollution concentration on urban temperature.
{"title":"Causal effect of PM<sub>2.5</sub> on the urban heat island.","authors":"Yves Rybarczyk, Rasa Zalakeviciute, Marija Ereminaite, Ivana Costa-Stolz","doi":"10.3389/fdata.2025.1546223","DOIUrl":"10.3389/fdata.2025.1546223","url":null,"abstract":"<p><p>The planet is experiencing global warming, with an increasing number of heat waves worldwide. Cities are particularly affected by the high temperatures because of the urban heat island (UHI) effect. This phenomenon is mostly explained by the land cover changes, reduced green spaces, and the concentration of infrastructure in urban settings. However, the reasons for the UHI are complex and involve multiple factors still understudied. Air pollution is one of them. This work investigates the link between particulate matter ≤2.5 μm (PM<sub>2.5</sub>) and air temperature by convergent cross-mapping (CCM), a statistical method to infer causation in dynamic non-linear systems. A positive correlation between the concentration of fine particulate matter and urban temperature is observed. The causal relationship between PM<sub>2.5</sub> and temperature is confirmed in the most urbanized areas of the study site (Quito, Ecuador). The results show that (i) the UHI is present even in the most elevated capital city of the world, and (ii) air quality is an important contributor to the higher temperatures in urban than outlying areas. This study supports the hypothesis of a non-linear threshold effect of pollution concentration on urban temperature.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1546223"},"PeriodicalIF":2.4,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11949916/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143755854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-13eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1455442
Waleed Albattah, Rehan Ullah Khan
The exponential growth of image and video data motivates the need for practical real-time content-based searching algorithms. Features play a vital role in identifying objects within images. However, feature-based classification faces a challenge due to uneven class instance distribution. Ideally, each class should have an equal number of instances and features to ensure optimal classifier performance. However, real-world scenarios often exhibit class imbalances. Thus, this article explores the classification framework based on image features, analyzing balanced and imbalanced distributions. Through extensive experimentation, we examine the impact of class imbalance on image classification performance, primarily on large datasets. The comprehensive evaluation shows that all models perform better with balancing compared to using an imbalanced dataset, underscoring the importance of dataset balancing for model accuracy. Distributed Gaussian (D-GA) and Distributed Poisson (D-PO) are found to be the most effective techniques, especially in improving Random Forest (RF) and SVM models. The deep learning experiments also show an improvement as such.
{"title":"Impact of imbalanced features on large datasets.","authors":"Waleed Albattah, Rehan Ullah Khan","doi":"10.3389/fdata.2025.1455442","DOIUrl":"10.3389/fdata.2025.1455442","url":null,"abstract":"<p><p>The exponential growth of image and video data motivates the need for practical real-time content-based searching algorithms. Features play a vital role in identifying objects within images. However, feature-based classification faces a challenge due to uneven class instance distribution. Ideally, each class should have an equal number of instances and features to ensure optimal classifier performance. However, real-world scenarios often exhibit class imbalances. Thus, this article explores the classification framework based on image features, analyzing balanced and imbalanced distributions. Through extensive experimentation, we examine the impact of class imbalance on image classification performance, primarily on large datasets. The comprehensive evaluation shows that all models perform better with balancing compared to using an imbalanced dataset, underscoring the importance of dataset balancing for model accuracy. Distributed Gaussian (D-GA) and Distributed Poisson (D-PO) are found to be the most effective techniques, especially in improving Random Forest (RF) and SVM models. The deep learning experiments also show an improvement as such.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1455442"},"PeriodicalIF":2.4,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948280/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143732910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-12eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1485493
Sandio Maciel Dos Santos, Marcelino Silva da Silva, Fábio Manoel França Lobato, Carlos Renato Lisboa Francês
This study examines the impact of the COVID-19 pandemic on academic performance and student participation in the National High School Exam (ENEM) in the state of Pará, Brazil, focusing on the interaction between socioeconomic factors, access to technology, and regional disparities. The research employed a mixed-methods approach, analyzing quantitative data from ENEM results (2020-2022) and qualitative interviews with educators and students. The findings indicate that the pandemic exacerbated pre-existing educational inequalities, particularly affecting low-income students and those enrolled in public schools. The highest dropout rates were recorded among students with a family income of up to one minimum wage, highlighting the barriers posed by limited access to technology and infrastructure for remote learning. A statistical analysis revealed a 20% increase in scores among students with access to computers and the Internet, particularly in private schools. The study also found significant regional differences across Pará's mesoregions, with Marajó and Southeast Pará facing more persistent challenges in reducing dropout rates compared to the Metropolitan Region of Belém. These results underscore the urgent need for region-specific public policies that address disparities in educational resources, including targeted investments in digital infrastructure and teacher training for remote education. The study concludes that comprehensive support programs, including psychological assistance for students, are essential for building a more resilient and equitable educational system capable of withstanding future crises.
{"title":"Use of Bayesian networks in Brazil high school educational database: analysis of the impact of COVID-19 on ENEM in Pará between 2019 and 2022.","authors":"Sandio Maciel Dos Santos, Marcelino Silva da Silva, Fábio Manoel França Lobato, Carlos Renato Lisboa Francês","doi":"10.3389/fdata.2025.1485493","DOIUrl":"10.3389/fdata.2025.1485493","url":null,"abstract":"<p><p>This study examines the impact of the COVID-19 pandemic on academic performance and student participation in the National High School Exam (ENEM) in the state of Pará, Brazil, focusing on the interaction between socioeconomic factors, access to technology, and regional disparities. The research employed a mixed-methods approach, analyzing quantitative data from ENEM results (2020-2022) and qualitative interviews with educators and students. The findings indicate that the pandemic exacerbated pre-existing educational inequalities, particularly affecting low-income students and those enrolled in public schools. The highest dropout rates were recorded among students with a family income of up to one minimum wage, highlighting the barriers posed by limited access to technology and infrastructure for remote learning. A statistical analysis revealed a 20% increase in scores among students with access to computers and the Internet, particularly in private schools. The study also found significant regional differences across Pará's mesoregions, with Marajó and Southeast Pará facing more persistent challenges in reducing dropout rates compared to the Metropolitan Region of Belém. These results underscore the urgent need for region-specific public policies that address disparities in educational resources, including targeted investments in digital infrastructure and teacher training for remote education. The study concludes that comprehensive support programs, including psychological assistance for students, are essential for building a more resilient and equitable educational system capable of withstanding future crises.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1485493"},"PeriodicalIF":2.4,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11937093/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143722233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-06eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1529848
Asma'a Mohammad Al-Mnayyis, Hasan Gharaibeh, Mohammad Amin, Duha Anakreh, Hanan Fawaz Akhdar, Eman Hussein Alshdaifat, Khalid M O Nahar, Ahmad Nasayreh, Mohammad Gharaibeh, Neda'a Alsalman, Alaa Alomar, Maha Gharaibeh, Hamad Yahia Abu Mhanna
The categorization of benign and malignant patterns in digital mammography is a critical step in the diagnosis of breast cancer, facilitating early detection and potentially saving many lives. Diverse breast tissue architectures often obscure and conceal breast issues. Classifying worrying regions (benign and malignant patterns) in digital mammograms is a significant challenge for radiologists. Even for specialists, the first visual indicators are nuanced and irregular, complicating identification. Therefore, radiologists want an advanced classifier to assist in identifying breast cancer and categorizing regions of concern. This study presents an enhanced technique for the classification of breast cancer using mammography images. The collection comprises real-world data from King Abdullah University Hospital (KAUH) at Jordan University of Science and Technology, consisting of 7,205 photographs from 5,000 patients aged 18-75. After being classified as benign or malignant, the pictures underwent preprocessing by rescaling, normalization, and augmentation. Multi-fusion approaches, such as high-boost filtering and contrast-limited adaptive histogram equalization (CLAHE), were used to improve picture quality. We created a unique Residual Depth-wise Network (RDN) to enhance the precision of breast cancer detection. The suggested RDN model was compared with many prominent models, including MobileNetV2, VGG16, VGG19, ResNet50, InceptionV3, Xception, and DenseNet121. The RDN model exhibited superior performance, achieving an accuracy of 97.82%, precision of 96.55%, recall of 99.19%, specificity of 96.45%, F1 score of 97.85%, and validation accuracy of 96.20%. The findings indicate that the proposed RDN model is an excellent instrument for early diagnosis using mammography images and significantly improves breast cancer detection when integrated with multi-fusion and efficient preprocessing approaches.
{"title":"(KAUH-BCMD) dataset: advancing mammographic breast cancer classification with multi-fusion preprocessing and residual depth-wise network.","authors":"Asma'a Mohammad Al-Mnayyis, Hasan Gharaibeh, Mohammad Amin, Duha Anakreh, Hanan Fawaz Akhdar, Eman Hussein Alshdaifat, Khalid M O Nahar, Ahmad Nasayreh, Mohammad Gharaibeh, Neda'a Alsalman, Alaa Alomar, Maha Gharaibeh, Hamad Yahia Abu Mhanna","doi":"10.3389/fdata.2025.1529848","DOIUrl":"10.3389/fdata.2025.1529848","url":null,"abstract":"<p><p>The categorization of benign and malignant patterns in digital mammography is a critical step in the diagnosis of breast cancer, facilitating early detection and potentially saving many lives. Diverse breast tissue architectures often obscure and conceal breast issues. Classifying worrying regions (benign and malignant patterns) in digital mammograms is a significant challenge for radiologists. Even for specialists, the first visual indicators are nuanced and irregular, complicating identification. Therefore, radiologists want an advanced classifier to assist in identifying breast cancer and categorizing regions of concern. This study presents an enhanced technique for the classification of breast cancer using mammography images. The collection comprises real-world data from King Abdullah University Hospital (KAUH) at Jordan University of Science and Technology, consisting of 7,205 photographs from 5,000 patients aged 18-75. After being classified as benign or malignant, the pictures underwent preprocessing by rescaling, normalization, and augmentation. Multi-fusion approaches, such as high-boost filtering and contrast-limited adaptive histogram equalization (CLAHE), were used to improve picture quality. We created a unique Residual Depth-wise Network (RDN) to enhance the precision of breast cancer detection. The suggested RDN model was compared with many prominent models, including MobileNetV2, VGG16, VGG19, ResNet50, InceptionV3, Xception, and DenseNet121. The RDN model exhibited superior performance, achieving an accuracy of 97.82%, precision of 96.55%, recall of 99.19%, specificity of 96.45%, F1 score of 97.85%, and validation accuracy of 96.20%. The findings indicate that the proposed RDN model is an excellent instrument for early diagnosis using mammography images and significantly improves breast cancer detection when integrated with multi-fusion and efficient preprocessing approaches.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1529848"},"PeriodicalIF":2.4,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11922913/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-04eCollection Date: 2025-01-01DOI: 10.3389/fdata.2025.1582619
[This corrects the article DOI: 10.3389/fdata.2025.1546850.].
[这更正了文章DOI: 10.3389/fdata.2025.1546850.]。
{"title":"Erratum: Edge-level multi-constraint graph pattern matching with lung cancer knowledge graph.","authors":"","doi":"10.3389/fdata.2025.1582619","DOIUrl":"10.3389/fdata.2025.1582619","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.3389/fdata.2025.1546850.].</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1582619"},"PeriodicalIF":2.4,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11915023/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143659763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}