首页 > 最新文献

Frontiers in Big Data最新文献

英文 中文
Deep learning for accurate classification of conifer pollen grains: enhancing species identification in palynology.
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-02-14 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1507036
Masoud A Rostami, LeMaur Kydd, Behnaz Balmaki, Lee A Dyer, Julie M Allen

Accurate identification of pollen grains from Abies (fir), Picea (spruce), and Pinus (pine) is an important method for reconstructing historical environments, past landscapes and understanding human-environment interactions. However, distinguishing between pollen grains of conifer genera poses challenges in palynology due to their morphological similarities. To address this identification challenge, this study leverages advanced deep learning techniques, specifically transfer learning models, which are effective in identifying similarities among detailed features. We evaluated nine different transfer learning architectures: DenseNet201, EfficientNetV2S, InceptionV3, MobileNetV2, ResNet101, ResNet50, VGG16, VGG19, and Xception. Each model was trained and validated on a dataset of images of pollen grains collected from museum specimens, mounted and imaged for training purposes. The models were assessed on various performance metrics, including accuracy, precision, recall, and F1-score across training, validation, and testing phases. Our results indicate that ResNet101 relatively outperformed other models, achieving a test accuracy of 99%, with equally high precision, recall, and F1-score. This study underscores the efficacy of transfer learning to produce models that can aid in identifications of difficult species. These models may aid conifer species classification and enhance pollen grain analysis, critical for ecological research and monitoring environmental changes.

{"title":"Deep learning for accurate classification of conifer pollen grains: enhancing species identification in palynology.","authors":"Masoud A Rostami, LeMaur Kydd, Behnaz Balmaki, Lee A Dyer, Julie M Allen","doi":"10.3389/fdata.2025.1507036","DOIUrl":"https://doi.org/10.3389/fdata.2025.1507036","url":null,"abstract":"<p><p>Accurate identification of pollen grains from <i>Abies</i> (fir), <i>Picea</i> (spruce), and <i>Pinus</i> (pine) is an important method for reconstructing historical environments, past landscapes and understanding human-environment interactions. However, distinguishing between pollen grains of conifer genera poses challenges in palynology due to their morphological similarities. To address this identification challenge, this study leverages advanced deep learning techniques, specifically transfer learning models, which are effective in identifying similarities among detailed features. We evaluated nine different transfer learning architectures: DenseNet201, EfficientNetV2S, InceptionV3, MobileNetV2, ResNet101, ResNet50, VGG16, VGG19, and Xception. Each model was trained and validated on a dataset of images of pollen grains collected from museum specimens, mounted and imaged for training purposes. The models were assessed on various performance metrics, including accuracy, precision, recall, and F1-score across training, validation, and testing phases. Our results indicate that ResNet101 relatively outperformed other models, achieving a test accuracy of 99%, with equally high precision, recall, and F1-score. This study underscores the efficacy of transfer learning to produce models that can aid in identifications of difficult species. These models may aid conifer species classification and enhance pollen grain analysis, critical for ecological research and monitoring environmental changes.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1507036"},"PeriodicalIF":2.4,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11868112/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143544467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Machine learning and immersive technologies for user-centered digital healthcare innovation.
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-02-14 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1567941
Federico Colecchia, Daniele Giunchi, Rui Qin, Eleonora Ceccaldi, Fang Wang
{"title":"Editorial: Machine learning and immersive technologies for user-centered digital healthcare innovation.","authors":"Federico Colecchia, Daniele Giunchi, Rui Qin, Eleonora Ceccaldi, Fang Wang","doi":"10.3389/fdata.2025.1567941","DOIUrl":"https://doi.org/10.3389/fdata.2025.1567941","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1567941"},"PeriodicalIF":2.4,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11868051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143544470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Training and onboarding initiatives in high energy physics experiments.
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-02-10 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1497622
Allison Reinsvold Hall, Nicole Skidmore, Gabriele Benelli, Ben Carlson, Claire David, Jonathan Davies, Wouter Deconinck, David DeMuth, Peter Elmer, Rocky Bala Garg, Stephan Hageböck, Killian Lieret, Valeriia Lukashenko, Sudhir Malik, Andy Morris, Heidi Schellman, Graeme A Stewart, Jason Veatch, Michel Hernandez Villanueva

In this article we document the current analysis software training and onboarding activities in several High Energy Physics (HEP) experiments: ATLAS, CMS, LHCb, Belle II and DUNE. Fast and efficient onboarding of new collaboration members is increasingly important for HEP experiments. With rapidly increasing data volumes and larger collaborations the analyses and consequently, the related software, become ever more complex. This necessitates structured onboarding and training. Recognizing this, a meeting series was held by the HEP Software Foundation (HSF) in 2022 for experiments to showcase their initiatives. Here we document and analyze these in an attempt to determine a set of key considerations for future HEP experiments.

{"title":"Training and onboarding initiatives in high energy physics experiments.","authors":"Allison Reinsvold Hall, Nicole Skidmore, Gabriele Benelli, Ben Carlson, Claire David, Jonathan Davies, Wouter Deconinck, David DeMuth, Peter Elmer, Rocky Bala Garg, Stephan Hageböck, Killian Lieret, Valeriia Lukashenko, Sudhir Malik, Andy Morris, Heidi Schellman, Graeme A Stewart, Jason Veatch, Michel Hernandez Villanueva","doi":"10.3389/fdata.2025.1497622","DOIUrl":"10.3389/fdata.2025.1497622","url":null,"abstract":"<p><p>In this article we document the current analysis software training and onboarding activities in several High Energy Physics (HEP) experiments: ATLAS, CMS, LHCb, Belle II and DUNE. Fast and efficient onboarding of new collaboration members is increasingly important for HEP experiments. With rapidly increasing data volumes and larger collaborations the analyses and consequently, the related software, become ever more complex. This necessitates structured onboarding and training. Recognizing this, a meeting series was held by the HEP Software Foundation (HSF) in 2022 for experiments to showcase their initiatives. Here we document and analyze these in an attempt to determine a set of key considerations for future HEP experiments.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1497622"},"PeriodicalIF":2.4,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11847898/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143494809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Big data analytics and AI as success factors for online video streaming platforms.
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-02-06 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1513027
Muhammad Arshad, Choo Wou Onn, Ashfaq Ahmad, Goabaone Mogwe

As the trend in the current generation with the use of mobile devices is rapidly increasing, online video streaming has risen to the top in the entertainment industry. These platforms have experienced radical expansion due to the incorporation of Big Data Analytics and Artificial Intelligence which are critical in improving the user interface, improving its functioning, and customization of recommended content. This paper seeks to examine how Big Data Analytics makes it possible to obtain large amounts of data about users and how they view, what they like, or how they behave. While customers benefit from this data by receiving more suitable material, getting better recommendations, and allowing for more efficient content delivery, AI utilizes it. As a result, the study also points to the importance and relevance of such technologies to promote business development, and user interaction and maintain competitiveness in the online video streaming market with examples of their effective application. This work presents a comprehensive investigation of the combined role of Big Data and AI and presents the necessary findings to determine their efficacy as success factors of existing and future video streaming services.

{"title":"Big data analytics and AI as success factors for online video streaming platforms.","authors":"Muhammad Arshad, Choo Wou Onn, Ashfaq Ahmad, Goabaone Mogwe","doi":"10.3389/fdata.2025.1513027","DOIUrl":"https://doi.org/10.3389/fdata.2025.1513027","url":null,"abstract":"<p><p>As the trend in the current generation with the use of mobile devices is rapidly increasing, online video streaming has risen to the top in the entertainment industry. These platforms have experienced radical expansion due to the incorporation of Big Data Analytics and Artificial Intelligence which are critical in improving the user interface, improving its functioning, and customization of recommended content. This paper seeks to examine how Big Data Analytics makes it possible to obtain large amounts of data about users and how they view, what they like, or how they behave. While customers benefit from this data by receiving more suitable material, getting better recommendations, and allowing for more efficient content delivery, AI utilizes it. As a result, the study also points to the importance and relevance of such technologies to promote business development, and user interaction and maintain competitiveness in the online video streaming market with examples of their effective application. This work presents a comprehensive investigation of the combined role of Big Data and AI and presents the necessary findings to determine their efficacy as success factors of existing and future video streaming services.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1513027"},"PeriodicalIF":2.4,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11841674/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143469954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Visualizing big culture and history data.
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-02-04 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1563730
Florian Windhager, Steffen Koch, Sander Münster, Eva Mayr
{"title":"Editorial: Visualizing big culture and history data.","authors":"Florian Windhager, Steffen Koch, Sander Münster, Eva Mayr","doi":"10.3389/fdata.2025.1563730","DOIUrl":"https://doi.org/10.3389/fdata.2025.1563730","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1563730"},"PeriodicalIF":2.4,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11832713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143451018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On explaining recommendations with Large Language Models: a review.
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-27 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1505284
Alan Said

The rise of Large Language Models (LLMs), such as LLaMA and ChatGPT, has opened new opportunities for enhancing recommender systems through improved explainability. This paper provides a systematic literature review focused on leveraging LLMs to generate explanations for recommendations-a critical aspect for fostering transparency and user trust. We conducted a comprehensive search within the ACM Guide to Computing Literature, covering publications from the launch of ChatGPT (November 2022) to the present (November 2024). Our search yielded 232 articles, but after applying inclusion criteria, only six were identified as directly addressing the use of LLMs in explaining recommendations. This scarcity highlights that, despite the rise of LLMs, their application in explainable recommender systems is still in an early stage. We analyze these select studies to understand current methodologies, identify challenges, and suggest directions for future research. Our findings underscore the potential of LLMs improving explanations of recommender systems and encourage the development of more transparent and user-centric recommendation explanation solutions.

{"title":"On explaining recommendations with Large Language Models: a review.","authors":"Alan Said","doi":"10.3389/fdata.2024.1505284","DOIUrl":"https://doi.org/10.3389/fdata.2024.1505284","url":null,"abstract":"<p><p>The rise of Large Language Models (LLMs), such as LLaMA and ChatGPT, has opened new opportunities for enhancing recommender systems through improved explainability. This paper provides a systematic literature review focused on leveraging LLMs to generate explanations for recommendations-a critical aspect for fostering transparency and user trust. We conducted a comprehensive search within the ACM Guide to Computing Literature, covering publications from the launch of ChatGPT (November 2022) to the present (November 2024). Our search yielded 232 articles, but after applying inclusion criteria, only six were identified as directly addressing the use of LLMs in explaining recommendations. This scarcity highlights that, despite the rise of LLMs, their application in explainable recommender systems is still in an early stage. We analyze these select studies to understand current methodologies, identify challenges, and suggest directions for future research. Our findings underscore the potential of LLMs improving explanations of recommender systems and encourage the development of more transparent and user-centric recommendation explanation solutions.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1505284"},"PeriodicalIF":2.4,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11808143/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143392512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing smart home environments: a novel pattern recognition approach to ambient acoustic event detection and localization.
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-23 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1419562
Ahsan Shabbir, Abdul Haleem Butt, Taha Khan, Lorenzo Chiari, Ahmad Almadhor, Vincent Karovic

Introduction: Ambient acoustic detection and localization play a vital role in identifying events and their origins from acoustic data. This study aimed to establish a comprehensive framework for classifying activities in home environments to detect emergency events and transmit emergency signals. Localization enhances the detection of the acoustic event's location, thereby improving the effectiveness of emergency services, situational awareness, and response times.

Methods: Acoustic data were collected from a home environment using six strategically placed microphones in a bedroom, kitchen, restroom, and corridor. A total of 512 audio samples were recorded from 11 activities. Background noise was eliminated using a filtering technique. State-of-the-art features were extracted from the time domain, frequency domain, time frequency domain, and cepstral domain to develop efficient detection and localization frameworks. Random forest and linear discriminant analysis classifiers were employed for event detection, while the estimation signal parameters through rational-in-variance techniques (ESPRIT) algorithm was used for sound source localization.

Results: The study achieved high detection accuracy, with random forest and linear discriminant analysis classifiers attaining 95% and 87%, respectively, for event detection. For sound source localization, the proposed framework demonstrated significant performance, with an error rate of 3.61, a mean squared error (MSE) of 14.98, and a root mean squared error (RMSE) of 3.87.

Discussion: The integration of detection and localization models facilitated the identification of emergency activities and the transmission of notifications via electronic mail. The results highlight the potential of the proposed methodology to develop a real-time emergency alert system for domestic environments.

{"title":"Enhancing smart home environments: a novel pattern recognition approach to ambient acoustic event detection and localization.","authors":"Ahsan Shabbir, Abdul Haleem Butt, Taha Khan, Lorenzo Chiari, Ahmad Almadhor, Vincent Karovic","doi":"10.3389/fdata.2024.1419562","DOIUrl":"10.3389/fdata.2024.1419562","url":null,"abstract":"<p><strong>Introduction: </strong>Ambient acoustic detection and localization play a vital role in identifying events and their origins from acoustic data. This study aimed to establish a comprehensive framework for classifying activities in home environments to detect emergency events and transmit emergency signals. Localization enhances the detection of the acoustic event's location, thereby improving the effectiveness of emergency services, situational awareness, and response times.</p><p><strong>Methods: </strong>Acoustic data were collected from a home environment using six strategically placed microphones in a bedroom, kitchen, restroom, and corridor. A total of 512 audio samples were recorded from 11 activities. Background noise was eliminated using a filtering technique. State-of-the-art features were extracted from the time domain, frequency domain, time frequency domain, and cepstral domain to develop efficient detection and localization frameworks. Random forest and linear discriminant analysis classifiers were employed for event detection, while the estimation signal parameters through rational-in-variance techniques (ESPRIT) algorithm was used for sound source localization.</p><p><strong>Results: </strong>The study achieved high detection accuracy, with random forest and linear discriminant analysis classifiers attaining 95% and 87%, respectively, for event detection. For sound source localization, the proposed framework demonstrated significant performance, with an error rate of 3.61, a mean squared error (MSE) of 14.98, and a root mean squared error (RMSE) of 3.87.</p><p><strong>Discussion: </strong>The integration of detection and localization models facilitated the identification of emergency activities and the transmission of notifications via electronic mail. The results highlight the potential of the proposed methodology to develop a real-time emergency alert system for domestic environments.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1419562"},"PeriodicalIF":2.4,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11799250/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143366790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balancing act: Europeans' privacy calculus and security concerns in online CSAM detection.
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-22 eCollection Date: 2025-01-01 DOI: 10.3389/fdata.2025.1477911
Răzvan Rughiniş, Simona-Nicoleta Vulpe, Dinu Ţurcanu, Daniel Rosner

This study examines privacy calculus in online child sexual abuse material (CSAM) detection across Europe, using Flash Eurobarometer 532 data. Drawing on theories of structuration and risk society, we analyze how individual agency and institutional frameworks interact in shaping privacy attitudes in high-stakes digital scenarios. Multinomial regression reveals age as a significant individual-level predictor, with younger individuals prioritizing privacy more. Country-level analysis shows Central and Eastern European nations have higher privacy concerns, reflecting distinct institutional and cultural contexts. Notably, the Digital Economy and Society Index (DESI) shows a positive association with privacy concerns in regression models when controlling for Augmented Human Development Index (AHDI) components, contrasting its negative bivariate correlation. Life expectancy emerges as the strongest country-level predictor, negatively associated with privacy concerns, suggesting deep institutional mechanisms shape privacy attitudes beyond individual factors. This dual approach reveals that both individual factors and national contexts are shaping privacy calculus in CSAM detection. The study contributes to a better understanding of privacy calculus in high-stakes scenarios, with implications for policy development in online child protection.

{"title":"Balancing act: Europeans' privacy calculus and security concerns in online CSAM detection.","authors":"Răzvan Rughiniş, Simona-Nicoleta Vulpe, Dinu Ţurcanu, Daniel Rosner","doi":"10.3389/fdata.2025.1477911","DOIUrl":"10.3389/fdata.2025.1477911","url":null,"abstract":"<p><p>This study examines privacy calculus in online child sexual abuse material (CSAM) detection across Europe, using Flash Eurobarometer 532 data. Drawing on theories of structuration and risk society, we analyze how individual agency and institutional frameworks interact in shaping privacy attitudes in high-stakes digital scenarios. Multinomial regression reveals age as a significant individual-level predictor, with younger individuals prioritizing privacy more. Country-level analysis shows Central and Eastern European nations have higher privacy concerns, reflecting distinct institutional and cultural contexts. Notably, the Digital Economy and Society Index (DESI) shows a positive association with privacy concerns in regression models when controlling for Augmented Human Development Index (AHDI) components, contrasting its negative bivariate correlation. Life expectancy emerges as the strongest country-level predictor, negatively associated with privacy concerns, suggesting deep institutional mechanisms shape privacy attitudes beyond individual factors. This dual approach reveals that both individual factors and national contexts are shaping privacy calculus in CSAM detection. The study contributes to a better understanding of privacy calculus in high-stakes scenarios, with implications for policy development in online child protection.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1477911"},"PeriodicalIF":2.4,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11794313/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143366792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scalable tool for analyzing genomic variants of humans using knowledge graphs and graph machine learning.
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-21 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1466391
Shivika Prasanna, Ajay Kumar, Deepthi Rao, Eduardo J Simoes, Praveen Rao

Advances in high-throughput genome sequencing have enabled large-scale genome sequencing in clinical practice and research studies. By analyzing genomic variants of humans, scientists can gain better understanding of the risk factors of complex diseases such as cancer and COVID-19. To model and analyze the rich genomic data, knowledge graphs (KGs) and graph machine learning (GML) can be regarded as enabling technologies. In this article, we present a scalable tool called VariantKG for analyzing genomic variants of humans modeled using KGs and GML. Specifically, we used publicly available genome sequencing data from patients with COVID-19. VariantKG extracts variant-level genetic information output by a variant calling pipeline, annotates the variant data with additional metadata, and converts the annotated variant information into a KG represented using the Resource Description Framework (RDF). The resulting KG is further enhanced with patient metadata and stored in a scalable graph database that enables efficient RDF indexing and query processing. VariantKG employs the Deep Graph Library (DGL) to perform GML tasks such as node classification. A user can extract a subset of the KG and perform inference tasks using DGL. The user can monitor the training and testing performance and hardware utilization. We tested VariantKG for KG construction by using 1,508 genome sequences, leading to 4 billion RDF statements. We evaluated GML tasks using VariantKG by selecting a subset of 500 sequences from the KG and performing node classification using well-known GML techniques such as GraphSAGE, Graph Convolutional Network (GCN) and Graph Transformer. VariantKG has intuitive user interfaces and features enabling a low barrier to entry for KG construction, model inference, and model interpretation on genomic variants of humans.

{"title":"A scalable tool for analyzing genomic variants of humans using knowledge graphs and graph machine learning.","authors":"Shivika Prasanna, Ajay Kumar, Deepthi Rao, Eduardo J Simoes, Praveen Rao","doi":"10.3389/fdata.2024.1466391","DOIUrl":"10.3389/fdata.2024.1466391","url":null,"abstract":"<p><p>Advances in high-throughput genome sequencing have enabled large-scale genome sequencing in clinical practice and research studies. By analyzing genomic variants of humans, scientists can gain better understanding of the risk factors of complex diseases such as cancer and COVID-19. To model and analyze the rich genomic data, knowledge graphs (KGs) and graph machine learning (GML) can be regarded as enabling technologies. In this article, we present a scalable tool called VariantKG for analyzing genomic variants of humans modeled using KGs and GML. Specifically, we used publicly available genome sequencing data from patients with COVID-19. VariantKG extracts variant-level genetic information output by a variant calling pipeline, annotates the variant data with additional metadata, and converts the annotated variant information into a KG represented using the Resource Description Framework (RDF). The resulting KG is further enhanced with patient metadata and stored in a scalable graph database that enables efficient RDF indexing and query processing. VariantKG employs the Deep Graph Library (DGL) to perform GML tasks such as node classification. A user can extract a subset of the KG and perform inference tasks using DGL. The user can monitor the training and testing performance and hardware utilization. We tested VariantKG for KG construction by using 1,508 genome sequences, leading to 4 billion RDF statements. We evaluated GML tasks using VariantKG by selecting a subset of 500 sequences from the KG and performing node classification using well-known GML techniques such as GraphSAGE, Graph Convolutional Network (GCN) and Graph Transformer. VariantKG has intuitive user interfaces and features enabling a low barrier to entry for KG construction, model inference, and model interpretation on genomic variants of humans.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1466391"},"PeriodicalIF":2.4,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11790625/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143190383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial intelligence for the detection of acute myeloid leukemia from microscopic blood images; a systematic review and meta-analysis.
IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-17 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1402926
Feras Al-Obeidat, Wael Hafez, Asrar Rashid, Mahir Khalil Jallo, Munier Gador, Ivan Cherrez-Ojeda, Daniel Simancas-Racines

Background: Leukemia is the 11th most prevalent type of cancer worldwide, with acute myeloid leukemia (AML) being the most frequent malignant blood malignancy in adults. Microscopic blood tests are the most common methods for identifying leukemia subtypes. An automated optical image-processing system using artificial intelligence (AI) has recently been applied to facilitate clinical decision-making.

Aim: To evaluate the performance of all AI-based approaches for the detection and diagnosis of acute myeloid leukemia (AML).

Methods: Medical databases including PubMed, Web of Science, and Scopus were searched until December 2023. We used the "metafor" and "metagen" libraries in R to analyze the different models used in the studies. Accuracy and sensitivity were the primary outcome measures.

Results: Ten studies were included in our review and meta-analysis, conducted between 2016 and 2023. Most deep-learning models have been utilized, including convolutional neural networks (CNNs). The common- and random-effects models had accuracies of 1.0000 [0.9999; 1.0001] and 0.9557 [0.9312, and 0.9802], respectively. The common and random effects models had high sensitivity values of 1.0000 and 0.8581, respectively, indicating that the machine learning models in this study can accurately detect true-positive leukemia cases. Studies have shown substantial variations in accuracy and sensitivity, as shown by the Q values and I2 statistics.

Conclusion: Our systematic review and meta-analysis found an overall high accuracy and sensitivity of AI models in correctly identifying true-positive AML cases. Future research should focus on unifying reporting methods and performance assessment metrics of AI-based diagnostics.

Systematic review registration: https://www.crd.york.ac.uk/prospero/#recordDetails, CRD42024501980.

{"title":"Artificial intelligence for the detection of acute myeloid leukemia from microscopic blood images; a systematic review and meta-analysis.","authors":"Feras Al-Obeidat, Wael Hafez, Asrar Rashid, Mahir Khalil Jallo, Munier Gador, Ivan Cherrez-Ojeda, Daniel Simancas-Racines","doi":"10.3389/fdata.2024.1402926","DOIUrl":"10.3389/fdata.2024.1402926","url":null,"abstract":"<p><strong>Background: </strong>Leukemia is the 11<sup>th</sup> most prevalent type of cancer worldwide, with acute myeloid leukemia (AML) being the most frequent malignant blood malignancy in adults. Microscopic blood tests are the most common methods for identifying leukemia subtypes. An automated optical image-processing system using artificial intelligence (AI) has recently been applied to facilitate clinical decision-making.</p><p><strong>Aim: </strong>To evaluate the performance of all AI-based approaches for the detection and diagnosis of acute myeloid leukemia (AML).</p><p><strong>Methods: </strong>Medical databases including PubMed, Web of Science, and Scopus were searched until December 2023. We used the \"metafor\" and \"metagen\" libraries in R to analyze the different models used in the studies. Accuracy and sensitivity were the primary outcome measures.</p><p><strong>Results: </strong>Ten studies were included in our review and meta-analysis, conducted between 2016 and 2023. Most deep-learning models have been utilized, including convolutional neural networks (CNNs). The common- and random-effects models had accuracies of 1.0000 [0.9999; 1.0001] and 0.9557 [0.9312, and 0.9802], respectively. The common and random effects models had high sensitivity values of 1.0000 and 0.8581, respectively, indicating that the machine learning models in this study can accurately detect true-positive leukemia cases. Studies have shown substantial variations in accuracy and sensitivity, as shown by the Q values and I<sup>2</sup> statistics.</p><p><strong>Conclusion: </strong>Our systematic review and meta-analysis found an overall high accuracy and sensitivity of AI models in correctly identifying true-positive AML cases. Future research should focus on unifying reporting methods and performance assessment metrics of AI-based diagnostics.</p><p><strong>Systematic review registration: </strong>https://www.crd.york.ac.uk/prospero/#recordDetails, CRD42024501980.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1402926"},"PeriodicalIF":2.4,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11782132/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143081942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1