Pub Date : 2024-10-23eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1485344
Hammad Ather, Sophie Berkman, Giuseppe Cerati, Matti J Kortelainen, Ka Hei Martin Kwok, Steven Lantz, Seyong Lee, Boyana Norris, Michael Reid, Allison Reinsvold Hall, Daniel Riley, Alexei Strelchenko, Cong Wang
Traditionally, high energy physics (HEP) experiments have relied on x86 CPUs for the majority of their significant computing needs. As the field looks ahead to the next generation of experiments such as DUNE and the High-Luminosity LHC, the computing demands are expected to increase dramatically. To cope with this increase, it will be necessary to take advantage of all available computing resources, including GPUs from different vendors. A broad landscape of code portability tools-including compiler pragma-based approaches, abstraction libraries, and other tools-allow the same source code to run efficiently on multiple architectures. In this paper, we use a test code taken from a HEP tracking algorithm to compare the performance and experience of implementing different portability solutions. While in several cases portable implementations perform close to the reference code version, we find that the performance varies significantly depending on the details of the implementation. Achieving optimal performance is not easy, even for relatively simple applications such as the test codes considered in this work. Several factors can affect the performance, such as the choice of the memory layout, the memory pinning strategy, and the compiler used. The compilers and tools are being actively developed, so future developments may be critical for their deployment in HEP experiments.
传统上,高能物理(HEP)实验的大部分重要计算需求都依赖于 x86 CPU。随着该领域对下一代实验(如 DUNE 和高亮度 LHC)的展望,预计计算需求将急剧增加。为了应对这一增长,有必要利用所有可用的计算资源,包括来自不同供应商的 GPU。代码可移植性工具的广泛应用--包括基于编译器语法的方法、抽象库和其他工具--允许相同的源代码在多种体系结构上高效运行。在本文中,我们使用 HEP 跟踪算法的测试代码来比较不同可移植性解决方案的性能和实施经验。虽然在某些情况下,可移植实现的性能接近参考代码版本,但我们发现,性能因实现细节的不同而有很大差异。实现最佳性能并非易事,即使是相对简单的应用,如本研究中考虑的测试代码。有几个因素会影响性能,如内存布局的选择、内存引脚策略和所使用的编译器。编译器和工具正在积极开发中,因此未来的发展可能对其在 HEP 实验中的部署至关重要。
{"title":"Exploring code portability solutions for HEP with a particle tracking test code.","authors":"Hammad Ather, Sophie Berkman, Giuseppe Cerati, Matti J Kortelainen, Ka Hei Martin Kwok, Steven Lantz, Seyong Lee, Boyana Norris, Michael Reid, Allison Reinsvold Hall, Daniel Riley, Alexei Strelchenko, Cong Wang","doi":"10.3389/fdata.2024.1485344","DOIUrl":"10.3389/fdata.2024.1485344","url":null,"abstract":"<p><p>Traditionally, high energy physics (HEP) experiments have relied on x86 CPUs for the majority of their significant computing needs. As the field looks ahead to the next generation of experiments such as DUNE and the High-Luminosity LHC, the computing demands are expected to increase dramatically. To cope with this increase, it will be necessary to take advantage of all available computing resources, including GPUs from different vendors. A broad landscape of code portability tools-including compiler pragma-based approaches, abstraction libraries, and other tools-allow the same source code to run efficiently on multiple architectures. In this paper, we use a test code taken from a HEP tracking algorithm to compare the performance and experience of implementing different portability solutions. While in several cases portable implementations perform close to the reference code version, we find that the performance varies significantly depending on the details of the implementation. Achieving optimal performance is not easy, even for relatively simple applications such as the test codes considered in this work. Several factors can affect the performance, such as the choice of the memory layout, the memory pinning strategy, and the compiler used. The compilers and tools are being actively developed, so future developments may be critical for their deployment in HEP experiments.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11537910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-22eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1502398
V E Sathishkumar
{"title":"Editorial: Utilizing big data and deep learning to improve healthcare intelligence and biomedical service delivery.","authors":"V E Sathishkumar","doi":"10.3389/fdata.2024.1502398","DOIUrl":"https://doi.org/10.3389/fdata.2024.1502398","url":null,"abstract":"","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11534799/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142585086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-16eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1436019
Anagha Joshi
Artificial intelligence and machine learning are rapidly evolving fields that have the potential to transform women's health by improving diagnostic accuracy, personalizing treatment plans, and building predictive models of disease progression leading to preventive care. Three categories of women's health issues are discussed where machine learning can facilitate accessible, affordable, personalized, and evidence-based healthcare. In this perspective, firstly the promise of big data and machine learning applications in the context of women's health is elaborated. Despite these promises, machine learning applications are not widely adapted in clinical care due to many issues including ethical concerns, patient privacy, informed consent, algorithmic biases, data quality and availability, and education and training of health care professionals. In the medical field, discrimination against women has a long history. Machine learning implicitly carries biases in the data. Thus, despite the fact that machine learning has the potential to improve some aspects of women's health, it can also reinforce sex and gender biases. Advanced machine learning tools blindly integrated without properly understanding and correcting for socio-cultural sex and gender biased practices and policies is therefore unlikely to result in sex and gender equality in health.
{"title":"Big data and AI for gender equality in health: bias is a big challenge.","authors":"Anagha Joshi","doi":"10.3389/fdata.2024.1436019","DOIUrl":"10.3389/fdata.2024.1436019","url":null,"abstract":"<p><p>Artificial intelligence and machine learning are rapidly evolving fields that have the potential to transform women's health by improving diagnostic accuracy, personalizing treatment plans, and building predictive models of disease progression leading to preventive care. Three categories of women's health issues are discussed where machine learning can facilitate accessible, affordable, personalized, and evidence-based healthcare. In this perspective, firstly the promise of big data and machine learning applications in the context of women's health is elaborated. Despite these promises, machine learning applications are not widely adapted in clinical care due to many issues including ethical concerns, patient privacy, informed consent, algorithmic biases, data quality and availability, and education and training of health care professionals. In the medical field, discrimination against women has a long history. Machine learning implicitly carries biases in the data. Thus, despite the fact that machine learning has the potential to improve some aspects of women's health, it can also reinforce sex and gender biases. Advanced machine learning tools blindly integrated without properly understanding and correcting for socio-cultural sex and gender biased practices and policies is therefore unlikely to result in sex and gender equality in health.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11521869/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142548903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-11eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1435510
Bylhah Mugotitsa, Tathagata Bhattacharjee, Michael Ochola, Dorothy Mailosi, David Amadi, Pauline Andeso, Joseph Kuria, Reinpeter Momanyi, Evans Omondi, Dan Kajungu, Jim Todd, Agnes Kiragga, Jay Greenfield
Background: Longitudinal studies are essential for understanding the progression of mental health disorders over time, but combining data collected through different methods to assess conditions like depression, anxiety, and psychosis presents significant challenges. This study presents a mapping technique allowing for the conversion of diverse longitudinal data into a standardized staging database, leveraging the Data Documentation Initiative (DDI) Lifecycle and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standards to ensure consistency and compatibility across datasets.
Methods: The "INSPIRE" project integrates longitudinal data from African studies into a staging database using metadata documentation standards structured with a snowflake schema. This facilitates the development of Extraction, Transformation, and Loading (ETL) scripts for integrating data into OMOP CDM. The staging database schema is designed to capture the dynamic nature of longitudinal studies, including changes in research protocols and the use of different instruments across data collection waves.
Results: Utilizing this mapping method, we streamlined the data migration process to the staging database, enabling subsequent integration into the OMOP CDM. Adherence to metadata standards ensures data quality, promotes interoperability, and expands opportunities for data sharing in mental health research.
Conclusion: The staging database serves as an innovative tool in managing longitudinal mental health data, going beyond simple data hosting to act as a comprehensive study descriptor. It provides detailed insights into each study stage and establishes a data science foundation for standardizing and integrating the data into OMOP CDM.
{"title":"Integrating longitudinal mental health data into a staging database: harnessing DDI-lifecycle and OMOP vocabularies within the INSPIRE Network Datahub.","authors":"Bylhah Mugotitsa, Tathagata Bhattacharjee, Michael Ochola, Dorothy Mailosi, David Amadi, Pauline Andeso, Joseph Kuria, Reinpeter Momanyi, Evans Omondi, Dan Kajungu, Jim Todd, Agnes Kiragga, Jay Greenfield","doi":"10.3389/fdata.2024.1435510","DOIUrl":"10.3389/fdata.2024.1435510","url":null,"abstract":"<p><strong>Background: </strong>Longitudinal studies are essential for understanding the progression of mental health disorders over time, but combining data collected through different methods to assess conditions like depression, anxiety, and psychosis presents significant challenges. This study presents a mapping technique allowing for the conversion of diverse longitudinal data into a standardized staging database, leveraging the Data Documentation Initiative (DDI) Lifecycle and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standards to ensure consistency and compatibility across datasets.</p><p><strong>Methods: </strong>The \"INSPIRE\" project integrates longitudinal data from African studies into a staging database using metadata documentation standards structured with a snowflake schema. This facilitates the development of Extraction, Transformation, and Loading (ETL) scripts for integrating data into OMOP CDM. The staging database schema is designed to capture the dynamic nature of longitudinal studies, including changes in research protocols and the use of different instruments across data collection waves.</p><p><strong>Results: </strong>Utilizing this mapping method, we streamlined the data migration process to the staging database, enabling subsequent integration into the OMOP CDM. Adherence to metadata standards ensures data quality, promotes interoperability, and expands opportunities for data sharing in mental health research.</p><p><strong>Conclusion: </strong>The staging database serves as an innovative tool in managing longitudinal mental health data, going beyond simple data hosting to act as a comprehensive study descriptor. It provides detailed insights into each study stage and establishes a data science foundation for standardizing and integrating the data into OMOP CDM.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502395/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142512789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-10eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1402745
Petar Radanliev, David De Roure, Carsten Maple, Jason R C Nurse, Razvan Nicolescu, Uchenna Ani
Internet-of-Things (IoT) refers to low-memory connected devices used in various new technologies, including drones, autonomous machines, and robotics. The article aims to understand better cyber risks in low-memory devices and the challenges in IoT risk management. The article includes a critical reflection on current risk methods and their level of appropriateness for IoT. We present a dependency model tailored in context toward current challenges in data strategies and make recommendations for the cybersecurity community. The model can be used for cyber risk estimation and assessment and generic risk impact assessment. The model is developed for cyber risk insurance for new technologies (e.g., drones, robots). Still, practitioners can apply it to estimate and assess cyber risks in organizations and enterprises. Furthermore, this paper critically discusses why risk assessment and management are crucial in this domain and what open questions on IoT risk assessment and risk management remain areas for further research. The paper then presents a more holistic understanding of cyber risks in the IoT. We explain how the industry can use new risk assessment, and management approaches to deal with the challenges posed by emerging IoT cyber risks. We explain how these approaches influence policy on cyber risk and data strategy. We also present a new approach for cyber risk assessment that incorporates IoT risks through dependency modeling. The paper describes why this approach is well suited to estimate IoT risks.
{"title":"AI security and cyber risk in IoT systems.","authors":"Petar Radanliev, David De Roure, Carsten Maple, Jason R C Nurse, Razvan Nicolescu, Uchenna Ani","doi":"10.3389/fdata.2024.1402745","DOIUrl":"https://doi.org/10.3389/fdata.2024.1402745","url":null,"abstract":"<p><p>Internet-of-Things (IoT) refers to low-memory connected devices used in various new technologies, including drones, autonomous machines, and robotics. The article aims to understand better cyber risks in low-memory devices and the challenges in IoT risk management. The article includes a critical reflection on current risk methods and their level of appropriateness for IoT. We present a dependency model tailored in context toward current challenges in data strategies and make recommendations for the cybersecurity community. The model can be used for cyber risk estimation and assessment and generic risk impact assessment. The model is developed for cyber risk insurance for new technologies (e.g., drones, robots). Still, practitioners can apply it to estimate and assess cyber risks in organizations and enterprises. Furthermore, this paper critically discusses why risk assessment and management are crucial in this domain and what open questions on IoT risk assessment and risk management remain areas for further research. The paper then presents a more holistic understanding of cyber risks in the IoT. We explain how the industry can use new risk assessment, and management approaches to deal with the challenges posed by emerging IoT cyber risks. We explain how these approaches influence policy on cyber risk and data strategy. We also present a new approach for cyber risk assessment that incorporates IoT risks through dependency modeling. The paper describes why this approach is well suited to estimate IoT risks.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11499169/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142512788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-07eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1463543
Guanchen Wu, Chen Ling, Ilana Graetz, Liang Zhao
An ontology is a structured framework that categorizes entities, concepts, and relationships within a domain to facilitate shared understanding, and it is important in computational linguistics and knowledge representation. In this paper, we propose a novel framework to automatically extend an existing ontology from streaming data in a zero-shot manner. Specifically, the zero-shot ontology extension framework uses online and hierarchical clustering to integrate new knowledge into existing ontologies without substantial annotated data or domain-specific expertise. Focusing on the medical field, this approach leverages Large Language Models (LLMs) for two key tasks: Symptom Typing and Symptom Taxonomy among breast and bladder cancer survivors. Symptom Typing involves identifying and classifying medical symptoms from unstructured online patient forum data, while Symptom Taxonomy organizes and integrates these symptoms into an existing ontology. The combined use of online and hierarchical clustering enables real-time and structured categorization and integration of symptoms. The dual-phase model employs multiple LLMs to ensure accurate classification and seamless integration of new symptoms with minimal human oversight. The paper details the framework's development, experiments, quantitative analyses, and data visualizations, demonstrating its effectiveness in enhancing medical ontologies and advancing knowledge-based systems in healthcare.
{"title":"Ontology extension by online clustering with large language model agents.","authors":"Guanchen Wu, Chen Ling, Ilana Graetz, Liang Zhao","doi":"10.3389/fdata.2024.1463543","DOIUrl":"10.3389/fdata.2024.1463543","url":null,"abstract":"<p><p>An ontology is a structured framework that categorizes entities, concepts, and relationships within a domain to facilitate shared understanding, and it is important in computational linguistics and knowledge representation. In this paper, we propose a novel framework to automatically extend an existing ontology from streaming data in a zero-shot manner. Specifically, the zero-shot ontology extension framework uses online and hierarchical clustering to integrate new knowledge into existing ontologies without substantial annotated data or domain-specific expertise. Focusing on the medical field, this approach leverages Large Language Models (LLMs) for two key tasks: Symptom Typing and Symptom Taxonomy among breast and bladder cancer survivors. Symptom Typing involves identifying and classifying medical symptoms from unstructured online patient forum data, while Symptom Taxonomy organizes and integrates these symptoms into an existing ontology. The combined use of online and hierarchical clustering enables real-time and structured categorization and integration of symptoms. The dual-phase model employs multiple LLMs to ensure accurate classification and seamless integration of new symptoms with minimal human oversight. The paper details the framework's development, experiments, quantitative analyses, and data visualizations, demonstrating its effectiveness in enhancing medical ontologies and advancing knowledge-based systems in healthcare.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11491333/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142480536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-03eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1406365
Ahmad R Alsaber, Adeeba Al-Herz, Balqees Alawadhi, Iyad Abu Doush, Parul Setiya, Ahmad T Al-Sultan, Khulood Saleh, Adel Al-Awadhi, Eman Hasan, Waleed Al-Kandari, Khalid Mokaddem, Aqeel A Ghanem, Yousef Attia, Mohammed Hussain, Naser AlHadhood, Yaser Ali, Hoda Tarakmeh, Ghaydaa Aldabie, Amjad AlKadi, Hebah Alhajeri
Background: Rheumatoid arthritis (RA) is a common condition treated with biological disease-modifying anti-rheumatic medicines (bDMARDs). However, many patients exhibit resistance, necessitating the use of machine learning models to predict remissions in patients treated with bDMARDs, thereby reducing healthcare costs and minimizing negative effects.
Objective: The study aims to develop machine learning models using data from the Kuwait Registry for Rheumatic Diseases (KRRD) to identify clinical characteristics predictive of remission in RA patients treated with biologics.
Methods: The study collected follow-up data from 1,968 patients treated with bDMARDs from four public hospitals in Kuwait from 2013 to 2022. Machine learning techniques like lasso, ridge, support vector machine, random forest, XGBoost, and Shapley additive explanation were used to predict remission at a 1-year follow-up.
Results: The study used the Shapley plot in explainable Artificial Intelligence (XAI) to analyze the effects of predictors on remission prognosis across different types of bDMARDs. Top clinical features were identified for patients treated with bDMARDs, each associated with specific mean SHAP values. The findings highlight the importance of clinical assessments and specific treatments in shaping treatment outcomes.
Conclusion: The proposed machine learning model system effectively identifies clinical features predicting remission in bDMARDs, potentially improving treatment efficacy in rheumatoid arthritis patients.
{"title":"Machine learning-based remission prediction in rheumatoid arthritis patients treated with biologic disease-modifying anti-rheumatic drugs: findings from the Kuwait rheumatic disease registry.","authors":"Ahmad R Alsaber, Adeeba Al-Herz, Balqees Alawadhi, Iyad Abu Doush, Parul Setiya, Ahmad T Al-Sultan, Khulood Saleh, Adel Al-Awadhi, Eman Hasan, Waleed Al-Kandari, Khalid Mokaddem, Aqeel A Ghanem, Yousef Attia, Mohammed Hussain, Naser AlHadhood, Yaser Ali, Hoda Tarakmeh, Ghaydaa Aldabie, Amjad AlKadi, Hebah Alhajeri","doi":"10.3389/fdata.2024.1406365","DOIUrl":"https://doi.org/10.3389/fdata.2024.1406365","url":null,"abstract":"<p><strong>Background: </strong>Rheumatoid arthritis (RA) is a common condition treated with biological disease-modifying anti-rheumatic medicines (bDMARDs). However, many patients exhibit resistance, necessitating the use of machine learning models to predict remissions in patients treated with bDMARDs, thereby reducing healthcare costs and minimizing negative effects.</p><p><strong>Objective: </strong>The study aims to develop machine learning models using data from the Kuwait Registry for Rheumatic Diseases (KRRD) to identify clinical characteristics predictive of remission in RA patients treated with biologics.</p><p><strong>Methods: </strong>The study collected follow-up data from 1,968 patients treated with bDMARDs from four public hospitals in Kuwait from 2013 to 2022. Machine learning techniques like lasso, ridge, support vector machine, random forest, XGBoost, and Shapley additive explanation were used to predict remission at a 1-year follow-up.</p><p><strong>Results: </strong>The study used the Shapley plot in explainable Artificial Intelligence (XAI) to analyze the effects of predictors on remission prognosis across different types of bDMARDs. Top clinical features were identified for patients treated with bDMARDs, each associated with specific mean SHAP values. The findings highlight the importance of clinical assessments and specific treatments in shaping treatment outcomes.</p><p><strong>Conclusion: </strong>The proposed machine learning model system effectively identifies clinical features predicting remission in bDMARDs, potentially improving treatment efficacy in rheumatoid arthritis patients.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11484091/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142480535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Volumetric modulated arc therapy (VMAT) is a new treatment modality in modern radiotherapy. To ensure the quality of the radiotherapy plan, a physics plan review is routinely conducted by senior clinicians; however, this process is less efficient and less accurate. In this study, a multi-task AutoEncoder (AE) is proposed to automate anomaly detection of VMAT plans for lung cancer patients.
Methods: The feature maps are first extracted from a VMAT plan. Then, a multi-task AE is trained based on the input of a feature map, and its output is the two targets (beam aperture and prescribed dose). Based on the distribution of reconstruction errors on the training set, a detection threshold value is obtained. For a testing sample, its reconstruction error is calculated using the AE model and compared with the threshold value to determine its classes (anomaly or regular). The proposed multi-task AE model is compared to the other existing AE models, including Vanilla AE, Contractive AE, and Variational AE. The area under the receiver operating characteristic curve (AUC) and the other statistics are used to evaluate the performance of these models.
Results: Among the four tested AE models, the proposed multi-task AE model achieves the highest values in AUC (0.964), accuracy (0.821), precision (0.471), and F1 score (0.632), and the lowest value in FPR (0.206).
Conclusion: The proposed multi-task AE model using two-dimensional (2D) feature maps can effectively detect anomalies in radiotherapy plans for lung cancer patients. Compared to the other existing AE models, the multi-task AE is more accurate and efficient. The proposed model provides a feasible way to carry out automated anomaly detection of VMAT plans in radiotherapy.
{"title":"Unsupervised machine learning model for detecting anomalous volumetric modulated arc therapy plans for lung cancer patients.","authors":"Peng Huang, Jiawen Shang, Yuhan Fan, Zhihui Hu, Jianrong Dai, Zhiqiang Liu, Hui Yan","doi":"10.3389/fdata.2024.1462745","DOIUrl":"https://doi.org/10.3389/fdata.2024.1462745","url":null,"abstract":"<p><strong>Purpose: </strong>Volumetric modulated arc therapy (VMAT) is a new treatment modality in modern radiotherapy. To ensure the quality of the radiotherapy plan, a physics plan review is routinely conducted by senior clinicians; however, this process is less efficient and less accurate. In this study, a multi-task AutoEncoder (AE) is proposed to automate anomaly detection of VMAT plans for lung cancer patients.</p><p><strong>Methods: </strong>The feature maps are first extracted from a VMAT plan. Then, a multi-task AE is trained based on the input of a feature map, and its output is the two targets (beam aperture and prescribed dose). Based on the distribution of reconstruction errors on the training set, a detection threshold value is obtained. For a testing sample, its reconstruction error is calculated using the AE model and compared with the threshold value to determine its classes (anomaly or regular). The proposed multi-task AE model is compared to the other existing AE models, including Vanilla AE, Contractive AE, and Variational AE. The area under the receiver operating characteristic curve (AUC) and the other statistics are used to evaluate the performance of these models.</p><p><strong>Results: </strong>Among the four tested AE models, the proposed multi-task AE model achieves the highest values in AUC (0.964), accuracy (0.821), precision (0.471), and <i>F</i>1 score (0.632), and the lowest value in FPR (0.206).</p><p><strong>Conclusion: </strong>The proposed multi-task AE model using two-dimensional (2D) feature maps can effectively detect anomalies in radiotherapy plans for lung cancer patients. Compared to the other existing AE models, the multi-task AE is more accurate and efficient. The proposed model provides a feasible way to carry out automated anomaly detection of VMAT plans in radiotherapy.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11484413/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142480538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-30eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1469981
Zarindokht Helforoush, Hossein Sayyad
Introduction: As the global prevalence of obesity continues to rise, it has become a major public health concern requiring more accurate prediction methods. Traditional regression models often fail to capture the complex interactions between genetic, environmental, and behavioral factors contributing to obesity.
Methods: This study explores the potential of machine-learning techniques to improve obesity risk prediction. Various supervised learning algorithms, including the novel ANN-PSO hybrid model, were applied following comprehensive data preprocessing and evaluation.
Results: The proposed ANN-PSO model achieved a remarkable accuracy rate of 92%, outperforming traditional regression methods. SHAP was employed to analyze feature importance, offering deeper insights into the influence of various factors on obesity risk.
Discussion: The findings highlight the transformative role of advanced machine-learning models in public health research, offering a pathway for personalized healthcare interventions. By providing detailed obesity risk profiles, these models enable healthcare providers to tailor prevention and treatment strategies to individual needs. The results underscore the need to integrate innovative machine-learning approaches into global public health efforts to combat the growing obesity epidemic.
{"title":"Prediction and classification of obesity risk based on a hybrid metaheuristic machine learning approach.","authors":"Zarindokht Helforoush, Hossein Sayyad","doi":"10.3389/fdata.2024.1469981","DOIUrl":"https://doi.org/10.3389/fdata.2024.1469981","url":null,"abstract":"<p><strong>Introduction: </strong>As the global prevalence of obesity continues to rise, it has become a major public health concern requiring more accurate prediction methods. Traditional regression models often fail to capture the complex interactions between genetic, environmental, and behavioral factors contributing to obesity.</p><p><strong>Methods: </strong>This study explores the potential of machine-learning techniques to improve obesity risk prediction. Various supervised learning algorithms, including the novel ANN-PSO hybrid model, were applied following comprehensive data preprocessing and evaluation.</p><p><strong>Results: </strong>The proposed ANN-PSO model achieved a remarkable accuracy rate of 92%, outperforming traditional regression methods. SHAP was employed to analyze feature importance, offering deeper insights into the influence of various factors on obesity risk.</p><p><strong>Discussion: </strong>The findings highlight the transformative role of advanced machine-learning models in public health research, offering a pathway for personalized healthcare interventions. By providing detailed obesity risk profiles, these models enable healthcare providers to tailor prevention and treatment strategies to individual needs. The results underscore the need to integrate innovative machine-learning approaches into global public health efforts to combat the growing obesity epidemic.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471553/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142480537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-25eCollection Date: 2024-01-01DOI: 10.3389/fdata.2024.1455399
Abinaya Chandrasekar, Sigrún Eyrúnardóttir Clark, Sam Martin, Samantha Vanderslott, Elaine C Flores, David Aceituno, Phoebe Barnett, Cecilia Vindrola-Padros, Norha Vera San Juan
Introduction: Qualitative data provides deep insights into an individual's behaviors and beliefs, and the contextual factors that may shape these. Big qualitative data analysis is an emerging field that aims to identify trends and patterns in large qualitative datasets. The purpose of this review was to identify the methods used to analyse large bodies of qualitative data, their cited strengths and limitations and comparisons between manual and digital analysis approaches.
Methods: A multifaceted approach has been taken to develop the review relying on academic, gray and media-based literature, using approaches such as iterative analysis, frequency analysis, text network analysis and team discussion.
Results: The review identified 520 articles that detailed analysis approaches of big qualitative data. From these publications a diverse range of methods and software used for analysis were identified, with thematic analysis and basic software being most common. Studies were most commonly conducted in high-income countries, and the most common data sources were open-ended survey responses, interview transcripts, and first-person narratives.
Discussion: We identified an emerging trend to expand the sources of qualitative data (e.g., using social media data, images, or videos), and develop new methods and software for analysis. As the qualitative analysis field may continue to change, it will be necessary to conduct further research to compare the utility of different big qualitative analysis methods and to develop standardized guidelines to raise awareness and support researchers in the use of more novel approaches for big qualitative analysis.
{"title":"Making the most of big qualitative datasets: a living systematic review of analysis methods.","authors":"Abinaya Chandrasekar, Sigrún Eyrúnardóttir Clark, Sam Martin, Samantha Vanderslott, Elaine C Flores, David Aceituno, Phoebe Barnett, Cecilia Vindrola-Padros, Norha Vera San Juan","doi":"10.3389/fdata.2024.1455399","DOIUrl":"10.3389/fdata.2024.1455399","url":null,"abstract":"<p><strong>Introduction: </strong>Qualitative data provides deep insights into an individual's behaviors and beliefs, and the contextual factors that may shape these. Big qualitative data analysis is an emerging field that aims to identify trends and patterns in large qualitative datasets. The purpose of this review was to identify the methods used to analyse large bodies of qualitative data, their cited strengths and limitations and comparisons between manual and digital analysis approaches.</p><p><strong>Methods: </strong>A multifaceted approach has been taken to develop the review relying on academic, gray and media-based literature, using approaches such as iterative analysis, frequency analysis, text network analysis and team discussion.</p><p><strong>Results: </strong>The review identified 520 articles that detailed analysis approaches of big qualitative data. From these publications a diverse range of methods and software used for analysis were identified, with thematic analysis and basic software being most common. Studies were most commonly conducted in high-income countries, and the most common data sources were open-ended survey responses, interview transcripts, and first-person narratives.</p><p><strong>Discussion: </strong>We identified an emerging trend to expand the sources of qualitative data (e.g., using social media data, images, or videos), and develop new methods and software for analysis. As the qualitative analysis field may continue to change, it will be necessary to conduct further research to compare the utility of different big qualitative analysis methods and to develop standardized guidelines to raise awareness and support researchers in the use of more novel approaches for big qualitative analysis.</p><p><strong>Systematic review registration: </strong>https://osf.io/hbvsy/?view_only=.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461344/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}