Frontiers in Big Data最新文献_第3页

Editorial: Utilizing big data and deep learning to improve healthcare intelligence and biomedical service delivery. 社论：利用大数据和深度学习改善医疗智能和生物医学服务的提供。

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Frontiers in Big Data

Pub Date : 2024-10-22 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1502398

V E Sathishkumar

引用次数: 0

Big data and AI for gender equality in health: bias is a big challenge. 大数据和人工智能促进卫生领域的性别平等：偏见是一大挑战。

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Frontiers in Big Data

Pub Date : 2024-10-16 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1436019

Anagha Joshi

Artificial intelligence and machine learning are rapidly evolving fields that have the potential to transform women's health by improving diagnostic accuracy, personalizing treatment plans, and building predictive models of disease progression leading to preventive care. Three categories of women's health issues are discussed where machine learning can facilitate accessible, affordable, personalized, and evidence-based healthcare. In this perspective, firstly the promise of big data and machine learning applications in the context of women's health is elaborated. Despite these promises, machine learning applications are not widely adapted in clinical care due to many issues including ethical concerns, patient privacy, informed consent, algorithmic biases, data quality and availability, and education and training of health care professionals. In the medical field, discrimination against women has a long history. Machine learning implicitly carries biases in the data. Thus, despite the fact that machine learning has the potential to improve some aspects of women's health, it can also reinforce sex and gender biases. Advanced machine learning tools blindly integrated without properly understanding and correcting for socio-cultural sex and gender biased practices and policies is therefore unlikely to result in sex and gender equality in health.

人工智能和机器学习是快速发展的领域，通过提高诊断准确性、个性化治疗方案和建立疾病进展预测模型以实现预防保健，它们有可能改变妇女的健康状况。本文讨论了三类妇女健康问题，在这些问题中，机器学习可以促进可获得、可负担、个性化和基于证据的医疗保健。在这一视角中，首先阐述了大数据和机器学习应用在妇女健康方面的前景。尽管有这些前景，但由于许多问题，包括伦理问题、患者隐私、知情同意、算法偏差、数据质量和可用性以及医疗保健专业人员的教育和培训，机器学习应用并未广泛应用于临床医疗。在医疗领域，对女性的歧视由来已久。机器学习隐含着数据中的偏见。因此，尽管机器学习有可能改善女性健康的某些方面，但它也可能强化性和性别偏见。因此，盲目整合先进的机器学习工具，而不正确理解和纠正社会文化中带有性和性别偏见的做法和政策，不太可能实现健康领域的性和性别平等。

{"title":"Big data and AI for gender equality in health: bias is a big challenge.","authors":"Anagha Joshi","doi":"10.3389/fdata.2024.1436019","DOIUrl":"10.3389/fdata.2024.1436019","url":null,"abstract":"Artificial intelligence and machine learning are rapidly evolving fields that have the potential to transform women's health by improving diagnostic accuracy, personalizing treatment plans, and building predictive models of disease progression leading to preventive care. Three categories of women's health issues are discussed where machine learning can facilitate accessible, affordable, personalized, and evidence-based healthcare. In this perspective, firstly the promise of big data and machine learning applications in the context of women's health is elaborated. Despite these promises, machine learning applications are not widely adapted in clinical care due to many issues including ethical concerns, patient privacy, informed consent, algorithmic biases, data quality and availability, and education and training of health care professionals. In the medical field, discrimination against women has a long history. Machine learning implicitly carries biases in the data. Thus, despite the fact that machine learning has the potential to improve some aspects of women's health, it can also reinforce sex and gender biases. Advanced machine learning tools blindly integrated without properly understanding and correcting for socio-cultural sex and gender biased practices and policies is therefore unlikely to result in sex and gender equality in health.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1436019"},"PeriodicalIF":2.4,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11521869/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142548903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrating longitudinal mental health data into a staging database: harnessing DDI-lifecycle and OMOP vocabularies within the INSPIRE Network Datahub. 将纵向心理健康数据纳入分期数据库：在 INSPIRE 网络 Datahub 中利用 DDI-lifecycle 和 OMOP 词汇表。

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Frontiers in Big Data

Pub Date : 2024-10-11 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1435510

Bylhah Mugotitsa, Tathagata Bhattacharjee, Michael Ochola, Dorothy Mailosi, David Amadi, Pauline Andeso, Joseph Kuria, Reinpeter Momanyi, Evans Omondi, Dan Kajungu, Jim Todd, Agnes Kiragga, Jay Greenfield

Background: Longitudinal studies are essential for understanding the progression of mental health disorders over time, but combining data collected through different methods to assess conditions like depression, anxiety, and psychosis presents significant challenges. This study presents a mapping technique allowing for the conversion of diverse longitudinal data into a standardized staging database, leveraging the Data Documentation Initiative (DDI) Lifecycle and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standards to ensure consistency and compatibility across datasets.

Methods: The "INSPIRE" project integrates longitudinal data from African studies into a staging database using metadata documentation standards structured with a snowflake schema. This facilitates the development of Extraction, Transformation, and Loading (ETL) scripts for integrating data into OMOP CDM. The staging database schema is designed to capture the dynamic nature of longitudinal studies, including changes in research protocols and the use of different instruments across data collection waves.

Results: Utilizing this mapping method, we streamlined the data migration process to the staging database, enabling subsequent integration into the OMOP CDM. Adherence to metadata standards ensures data quality, promotes interoperability, and expands opportunities for data sharing in mental health research.

Conclusion: The staging database serves as an innovative tool in managing longitudinal mental health data, going beyond simple data hosting to act as a comprehensive study descriptor. It provides detailed insights into each study stage and establishes a data science foundation for standardizing and integrating the data into OMOP CDM.

背景：纵向研究对于了解心理健康疾病随时间的发展至关重要，但将通过不同方法收集的数据结合起来以评估抑郁症、焦虑症和精神病等疾病却面临着巨大的挑战。本研究提出了一种映射技术，可将不同的纵向数据转换为标准化的分期数据库，利用数据文档倡议（DDI）生命周期和观察性医疗结果合作组织（OMOP）通用数据模型（CDM）标准，确保不同数据集之间的一致性和兼容性：方法："INSPIRE "项目采用雪花模式结构的元数据文件标准，将非洲研究的纵向数据整合到一个分期数据库中。这有助于开发提取、转换和加载（ETL）脚本，将数据整合到 OMOP CDM 中。分期数据库模式旨在捕捉纵向研究的动态特性，包括研究方案的变化和不同数据收集浪潮中不同工具的使用：利用这种映射方法，我们简化了将数据迁移到分期数据库的过程，从而可以将其整合到 OMOP CDM 中。遵守元数据标准可确保数据质量，促进互操作性，并扩大心理健康研究数据共享的机会：分阶段数据库是管理纵向心理健康数据的创新工具，它不仅是简单的数据托管，还是全面的研究描述符。它提供了对每个研究阶段的详细了解，并为将数据标准化并整合到 OMOP CDM 中奠定了数据科学基础。

{"title":"Integrating longitudinal mental health data into a staging database: harnessing DDI-lifecycle and OMOP vocabularies within the INSPIRE Network Datahub.","authors":"Bylhah Mugotitsa, Tathagata Bhattacharjee, Michael Ochola, Dorothy Mailosi, David Amadi, Pauline Andeso, Joseph Kuria, Reinpeter Momanyi, Evans Omondi, Dan Kajungu, Jim Todd, Agnes Kiragga, Jay Greenfield","doi":"10.3389/fdata.2024.1435510","DOIUrl":"10.3389/fdata.2024.1435510","url":null,"abstract":"Background: Longitudinal studies are essential for understanding the progression of mental health disorders over time, but combining data collected through different methods to assess conditions like depression, anxiety, and psychosis presents significant challenges. This study presents a mapping technique allowing for the conversion of diverse longitudinal data into a standardized staging database, leveraging the Data Documentation Initiative (DDI) Lifecycle and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standards to ensure consistency and compatibility across datasets.Methods: The \"INSPIRE\" project integrates longitudinal data from African studies into a staging database using metadata documentation standards structured with a snowflake schema. This facilitates the development of Extraction, Transformation, and Loading (ETL) scripts for integrating data into OMOP CDM. The staging database schema is designed to capture the dynamic nature of longitudinal studies, including changes in research protocols and the use of different instruments across data collection waves.Results: Utilizing this mapping method, we streamlined the data migration process to the staging database, enabling subsequent integration into the OMOP CDM. Adherence to metadata standards ensures data quality, promotes interoperability, and expands opportunities for data sharing in mental health research.Conclusion: The staging database serves as an innovative tool in managing longitudinal mental health data, going beyond simple data hosting to act as a comprehensive study descriptor. It provides detailed insights into each study stage and establishes a data science foundation for standardizing and integrating the data into OMOP CDM.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1435510"},"PeriodicalIF":2.4,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502395/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142512789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AI security and cyber risk in IoT systems. 物联网系统中的人工智能安全和网络风险。

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Frontiers in Big Data

Pub Date : 2024-10-10 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1402745

Petar Radanliev, David De Roure, Carsten Maple, Jason R C Nurse, Razvan Nicolescu, Uchenna Ani

Internet-of-Things (IoT) refers to low-memory connected devices used in various new technologies, including drones, autonomous machines, and robotics. The article aims to understand better cyber risks in low-memory devices and the challenges in IoT risk management. The article includes a critical reflection on current risk methods and their level of appropriateness for IoT. We present a dependency model tailored in context toward current challenges in data strategies and make recommendations for the cybersecurity community. The model can be used for cyber risk estimation and assessment and generic risk impact assessment. The model is developed for cyber risk insurance for new technologies (e.g., drones, robots). Still, practitioners can apply it to estimate and assess cyber risks in organizations and enterprises. Furthermore, this paper critically discusses why risk assessment and management are crucial in this domain and what open questions on IoT risk assessment and risk management remain areas for further research. The paper then presents a more holistic understanding of cyber risks in the IoT. We explain how the industry can use new risk assessment, and management approaches to deal with the challenges posed by emerging IoT cyber risks. We explain how these approaches influence policy on cyber risk and data strategy. We also present a new approach for cyber risk assessment that incorporates IoT risks through dependency modeling. The paper describes why this approach is well suited to estimate IoT risks.

物联网（IoT）是指各种新技术中使用的低内存连接设备，包括无人机、自主机器和机器人。文章旨在更好地了解低内存设备的网络风险以及物联网风险管理所面临的挑战。文章对当前的风险方法及其对物联网的适用程度进行了批判性反思。我们针对当前数据战略面临的挑战，提出了一个量身定制的依赖性模型，并为网络安全界提出了建议。该模型可用于网络风险估计和评估以及一般风险影响评估。该模型是为新技术（如无人机、机器人）的网络风险保险而开发的。不过，从业人员仍可将其用于估算和评估组织和企业的网络风险。此外，本文还批判性地讨论了为什么风险评估和管理在这一领域至关重要，以及在物联网风险评估和风险管理方面还有哪些开放性问题有待进一步研究。然后，本文介绍了对物联网网络风险的更全面理解。我们解释了业界如何利用新的风险评估和管理方法来应对新出现的物联网网络风险所带来的挑战。我们解释了这些方法如何影响网络风险政策和数据战略。我们还介绍了一种新的网络风险评估方法，该方法通过依赖性建模将物联网风险纳入其中。本文介绍了这种方法非常适合估算物联网风险的原因。

{"title":"AI security and cyber risk in IoT systems.","authors":"Petar Radanliev, David De Roure, Carsten Maple, Jason R C Nurse, Razvan Nicolescu, Uchenna Ani","doi":"10.3389/fdata.2024.1402745","DOIUrl":"https://doi.org/10.3389/fdata.2024.1402745","url":null,"abstract":"Internet-of-Things (IoT) refers to low-memory connected devices used in various new technologies, including drones, autonomous machines, and robotics. The article aims to understand better cyber risks in low-memory devices and the challenges in IoT risk management. The article includes a critical reflection on current risk methods and their level of appropriateness for IoT. We present a dependency model tailored in context toward current challenges in data strategies and make recommendations for the cybersecurity community. The model can be used for cyber risk estimation and assessment and generic risk impact assessment. The model is developed for cyber risk insurance for new technologies (e.g., drones, robots). Still, practitioners can apply it to estimate and assess cyber risks in organizations and enterprises. Furthermore, this paper critically discusses why risk assessment and management are crucial in this domain and what open questions on IoT risk assessment and risk management remain areas for further research. The paper then presents a more holistic understanding of cyber risks in the IoT. We explain how the industry can use new risk assessment, and management approaches to deal with the challenges posed by emerging IoT cyber risks. We explain how these approaches influence policy on cyber risk and data strategy. We also present a new approach for cyber risk assessment that incorporates IoT risks through dependency modeling. The paper describes why this approach is well suited to estimate IoT risks.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1402745"},"PeriodicalIF":2.4,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11499169/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142512788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ontology extension by online clustering with large language model agents. 利用大型语言模型代理进行在线聚类的本体扩展。

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Frontiers in Big Data

Pub Date : 2024-10-07 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1463543

Guanchen Wu, Chen Ling, Ilana Graetz, Liang Zhao

An ontology is a structured framework that categorizes entities, concepts, and relationships within a domain to facilitate shared understanding, and it is important in computational linguistics and knowledge representation. In this paper, we propose a novel framework to automatically extend an existing ontology from streaming data in a zero-shot manner. Specifically, the zero-shot ontology extension framework uses online and hierarchical clustering to integrate new knowledge into existing ontologies without substantial annotated data or domain-specific expertise. Focusing on the medical field, this approach leverages Large Language Models (LLMs) for two key tasks: Symptom Typing and Symptom Taxonomy among breast and bladder cancer survivors. Symptom Typing involves identifying and classifying medical symptoms from unstructured online patient forum data, while Symptom Taxonomy organizes and integrates these symptoms into an existing ontology. The combined use of online and hierarchical clustering enables real-time and structured categorization and integration of symptoms. The dual-phase model employs multiple LLMs to ensure accurate classification and seamless integration of new symptoms with minimal human oversight. The paper details the framework's development, experiments, quantitative analyses, and data visualizations, demonstrating its effectiveness in enhancing medical ontologies and advancing knowledge-based systems in healthcare.

本体是一种结构化框架，它将某一领域内的实体、概念和关系进行分类，以促进共同理解，在计算语言学和知识表示中非常重要。在本文中，我们提出了一个新颖的框架，以零镜头的方式从流式数据中自动扩展现有的本体。具体来说，零镜头本体扩展框架使用在线和分层聚类将新知识整合到现有本体中，而无需大量注释数据或特定领域的专业知识。该方法以医疗领域为重点，利用大型语言模型（LLM）完成两项关键任务：乳腺癌和膀胱癌幸存者的症状分类和症状分类学。症状分类包括从非结构化的在线患者论坛数据中识别医学症状并对其进行分类，而症状分类则是将这些症状组织并整合到现有的本体论中。在线聚类和分层聚类的结合使用实现了症状的实时、结构化分类和整合。双阶段模型采用了多个 LLM，以确保分类的准确性和新症状的无缝整合，并尽量减少人为监督。论文详细介绍了该框架的开发、实验、定量分析和数据可视化，展示了其在增强医学本体论和推进医疗保健领域基于知识的系统方面的有效性。

{"title":"Ontology extension by online clustering with large language model agents.","authors":"Guanchen Wu, Chen Ling, Ilana Graetz, Liang Zhao","doi":"10.3389/fdata.2024.1463543","DOIUrl":"10.3389/fdata.2024.1463543","url":null,"abstract":"An ontology is a structured framework that categorizes entities, concepts, and relationships within a domain to facilitate shared understanding, and it is important in computational linguistics and knowledge representation. In this paper, we propose a novel framework to automatically extend an existing ontology from streaming data in a zero-shot manner. Specifically, the zero-shot ontology extension framework uses online and hierarchical clustering to integrate new knowledge into existing ontologies without substantial annotated data or domain-specific expertise. Focusing on the medical field, this approach leverages Large Language Models (LLMs) for two key tasks: Symptom Typing and Symptom Taxonomy among breast and bladder cancer survivors. Symptom Typing involves identifying and classifying medical symptoms from unstructured online patient forum data, while Symptom Taxonomy organizes and integrates these symptoms into an existing ontology. The combined use of online and hierarchical clustering enables real-time and structured categorization and integration of symptoms. The dual-phase model employs multiple LLMs to ensure accurate classification and seamless integration of new symptoms with minimal human oversight. The paper details the framework's development, experiments, quantitative analyses, and data visualizations, demonstrating its effectiveness in enhancing medical ontologies and advancing knowledge-based systems in healthcare.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1463543"},"PeriodicalIF":2.4,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11491333/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142480536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine learning-based remission prediction in rheumatoid arthritis patients treated with biologic disease-modifying anti-rheumatic drugs: findings from the Kuwait rheumatic disease registry. 基于机器学习的类风湿关节炎患者缓解预测：科威特风湿病登记处的研究结果。

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Frontiers in Big Data

Pub Date : 2024-10-03 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1406365

Ahmad R Alsaber, Adeeba Al-Herz, Balqees Alawadhi, Iyad Abu Doush, Parul Setiya, Ahmad T Al-Sultan, Khulood Saleh, Adel Al-Awadhi, Eman Hasan, Waleed Al-Kandari, Khalid Mokaddem, Aqeel A Ghanem, Yousef Attia, Mohammed Hussain, Naser AlHadhood, Yaser Ali, Hoda Tarakmeh, Ghaydaa Aldabie, Amjad AlKadi, Hebah Alhajeri

Background: Rheumatoid arthritis (RA) is a common condition treated with biological disease-modifying anti-rheumatic medicines (bDMARDs). However, many patients exhibit resistance, necessitating the use of machine learning models to predict remissions in patients treated with bDMARDs, thereby reducing healthcare costs and minimizing negative effects.

Objective: The study aims to develop machine learning models using data from the Kuwait Registry for Rheumatic Diseases (KRRD) to identify clinical characteristics predictive of remission in RA patients treated with biologics.

Methods: The study collected follow-up data from 1,968 patients treated with bDMARDs from four public hospitals in Kuwait from 2013 to 2022. Machine learning techniques like lasso, ridge, support vector machine, random forest, XGBoost, and Shapley additive explanation were used to predict remission at a 1-year follow-up.

Results: The study used the Shapley plot in explainable Artificial Intelligence (XAI) to analyze the effects of predictors on remission prognosis across different types of bDMARDs. Top clinical features were identified for patients treated with bDMARDs, each associated with specific mean SHAP values. The findings highlight the importance of clinical assessments and specific treatments in shaping treatment outcomes.

Conclusion: The proposed machine learning model system effectively identifies clinical features predicting remission in bDMARDs, potentially improving treatment efficacy in rheumatoid arthritis patients.

背景：类风湿性关节炎（RA）是一种使用生物改良抗风湿药物（bDMARDs）治疗的常见疾病。然而，许多患者表现出抗药性，因此有必要使用机器学习模型来预测接受生物改良抗风湿药治疗的患者的病情缓解情况，从而降低医疗成本并将负面影响降至最低：该研究旨在利用科威特风湿病登记处（KRRD）的数据开发机器学习模型，以确定可预测接受生物制剂治疗的RA患者病情缓解的临床特征：该研究收集了2013年至2022年期间科威特四家公立医院1968名接受bDMARDs治疗的患者的随访数据。研究采用了拉索、脊、支持向量机、随机森林、XGBoost 和 Shapley 加性解释等机器学习技术来预测随访 1 年后的缓解情况：该研究利用可解释人工智能（XAI）中的夏普利图谱分析了不同类型 bDMARDs 的预测因素对缓解预后的影响。研究确定了接受 bDMARDs 治疗的患者的主要临床特征，每个特征都与特定的平均 SHAP 值相关。研究结果凸显了临床评估和特定治疗在影响治疗结果方面的重要性：结论：所提出的机器学习模型系统能有效识别预测bDMARDs缓解的临床特征，从而提高类风湿关节炎患者的治疗效果。

{"title":"Machine learning-based remission prediction in rheumatoid arthritis patients treated with biologic disease-modifying anti-rheumatic drugs: findings from the Kuwait rheumatic disease registry.","authors":"Ahmad R Alsaber, Adeeba Al-Herz, Balqees Alawadhi, Iyad Abu Doush, Parul Setiya, Ahmad T Al-Sultan, Khulood Saleh, Adel Al-Awadhi, Eman Hasan, Waleed Al-Kandari, Khalid Mokaddem, Aqeel A Ghanem, Yousef Attia, Mohammed Hussain, Naser AlHadhood, Yaser Ali, Hoda Tarakmeh, Ghaydaa Aldabie, Amjad AlKadi, Hebah Alhajeri","doi":"10.3389/fdata.2024.1406365","DOIUrl":"https://doi.org/10.3389/fdata.2024.1406365","url":null,"abstract":"Background: Rheumatoid arthritis (RA) is a common condition treated with biological disease-modifying anti-rheumatic medicines (bDMARDs). However, many patients exhibit resistance, necessitating the use of machine learning models to predict remissions in patients treated with bDMARDs, thereby reducing healthcare costs and minimizing negative effects.Objective: The study aims to develop machine learning models using data from the Kuwait Registry for Rheumatic Diseases (KRRD) to identify clinical characteristics predictive of remission in RA patients treated with biologics.Methods: The study collected follow-up data from 1,968 patients treated with bDMARDs from four public hospitals in Kuwait from 2013 to 2022. Machine learning techniques like lasso, ridge, support vector machine, random forest, XGBoost, and Shapley additive explanation were used to predict remission at a 1-year follow-up.Results: The study used the Shapley plot in explainable Artificial Intelligence (XAI) to analyze the effects of predictors on remission prognosis across different types of bDMARDs. Top clinical features were identified for patients treated with bDMARDs, each associated with specific mean SHAP values. The findings highlight the importance of clinical assessments and specific treatments in shaping treatment outcomes.Conclusion: The proposed machine learning model system effectively identifies clinical features predicting remission in bDMARDs, potentially improving treatment efficacy in rheumatoid arthritis patients.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1406365"},"PeriodicalIF":2.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11484091/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142480535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsupervised machine learning model for detecting anomalous volumetric modulated arc therapy plans for lung cancer patients. 用于检测肺癌患者异常容积调制弧治疗计划的无监督机器学习模型。

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Frontiers in Big Data

Pub Date : 2024-10-03 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1462745

Peng Huang, Jiawen Shang, Yuhan Fan, Zhihui Hu, Jianrong Dai, Zhiqiang Liu, Hui Yan

Purpose: Volumetric modulated arc therapy (VMAT) is a new treatment modality in modern radiotherapy. To ensure the quality of the radiotherapy plan, a physics plan review is routinely conducted by senior clinicians; however, this process is less efficient and less accurate. In this study, a multi-task AutoEncoder (AE) is proposed to automate anomaly detection of VMAT plans for lung cancer patients.

Methods: The feature maps are first extracted from a VMAT plan. Then, a multi-task AE is trained based on the input of a feature map, and its output is the two targets (beam aperture and prescribed dose). Based on the distribution of reconstruction errors on the training set, a detection threshold value is obtained. For a testing sample, its reconstruction error is calculated using the AE model and compared with the threshold value to determine its classes (anomaly or regular). The proposed multi-task AE model is compared to the other existing AE models, including Vanilla AE, Contractive AE, and Variational AE. The area under the receiver operating characteristic curve (AUC) and the other statistics are used to evaluate the performance of these models.

Results: Among the four tested AE models, the proposed multi-task AE model achieves the highest values in AUC (0.964), accuracy (0.821), precision (0.471), and F1 score (0.632), and the lowest value in FPR (0.206).

Conclusion: The proposed multi-task AE model using two-dimensional (2D) feature maps can effectively detect anomalies in radiotherapy plans for lung cancer patients. Compared to the other existing AE models, the multi-task AE is more accurate and efficient. The proposed model provides a feasible way to carry out automated anomaly detection of VMAT plans in radiotherapy.

目的：容积调制弧治疗（VMAT）是现代放射治疗的一种新的治疗方式。为确保放疗计划的质量，资深临床医生通常会对放疗计划进行物理审查，但这一过程效率较低，准确性也不高。本研究提出了一种多任务自动编码器（AE），用于自动检测肺癌患者的 VMAT 计划异常：方法：首先从 VMAT 计划中提取特征图。方法：首先从 VMAT 计划中提取特征图，然后根据特征图的输入训练多任务 AE，其输出为两个目标（光束孔径和规定剂量）。根据训练集上重建误差的分布，得出检测阈值。对于测试样本，使用 AE 模型计算其重建误差，并与阈值进行比较，以确定其类别（异常或正常）。建议的多任务 AE 模型与其他现有的 AE 模型（包括香草 AE、收缩 AE 和变异 AE）进行了比较。使用接收器工作特征曲线下面积（AUC）和其他统计数据来评估这些模型的性能：在四个测试的 AE 模型中，所提出的多任务 AE 模型的 AUC 值（0.964）、准确度（0.821）、精确度（0.471）和 F1 分数（0.632）最高，而 FPR 值（0.206）最低：结论：利用二维（2D）特征图提出的多任务 AE 模型能有效检测肺癌患者放疗计划中的异常情况。与现有的其他 AE 模型相比，多任务 AE 更准确、更高效。所提出的模型为放疗中的 VMAT 计划异常自动检测提供了一种可行的方法。

{"title":"Unsupervised machine learning model for detecting anomalous volumetric modulated arc therapy plans for lung cancer patients.","authors":"Peng Huang, Jiawen Shang, Yuhan Fan, Zhihui Hu, Jianrong Dai, Zhiqiang Liu, Hui Yan","doi":"10.3389/fdata.2024.1462745","DOIUrl":"https://doi.org/10.3389/fdata.2024.1462745","url":null,"abstract":"Purpose: Volumetric modulated arc therapy (VMAT) is a new treatment modality in modern radiotherapy. To ensure the quality of the radiotherapy plan, a physics plan review is routinely conducted by senior clinicians; however, this process is less efficient and less accurate. In this study, a multi-task AutoEncoder (AE) is proposed to automate anomaly detection of VMAT plans for lung cancer patients.Methods: The feature maps are first extracted from a VMAT plan. Then, a multi-task AE is trained based on the input of a feature map, and its output is the two targets (beam aperture and prescribed dose). Based on the distribution of reconstruction errors on the training set, a detection threshold value is obtained. For a testing sample, its reconstruction error is calculated using the AE model and compared with the threshold value to determine its classes (anomaly or regular). The proposed multi-task AE model is compared to the other existing AE models, including Vanilla AE, Contractive AE, and Variational AE. The area under the receiver operating characteristic curve (AUC) and the other statistics are used to evaluate the performance of these models.Results: Among the four tested AE models, the proposed multi-task AE model achieves the highest values in AUC (0.964), accuracy (0.821), precision (0.471), and F1 score (0.632), and the lowest value in FPR (0.206).Conclusion: The proposed multi-task AE model using two-dimensional (2D) feature maps can effectively detect anomalies in radiotherapy plans for lung cancer patients. Compared to the other existing AE models, the multi-task AE is more accurate and efficient. The proposed model provides a feasible way to carry out automated anomaly detection of VMAT plans in radiotherapy.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1462745"},"PeriodicalIF":2.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11484413/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142480538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prediction and classification of obesity risk based on a hybrid metaheuristic machine learning approach. 基于混合元启发式机器学习方法的肥胖风险预测与分类。

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Frontiers in Big Data

Pub Date : 2024-09-30 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1469981

Zarindokht Helforoush, Hossein Sayyad

Introduction: As the global prevalence of obesity continues to rise, it has become a major public health concern requiring more accurate prediction methods. Traditional regression models often fail to capture the complex interactions between genetic, environmental, and behavioral factors contributing to obesity.

Methods: This study explores the potential of machine-learning techniques to improve obesity risk prediction. Various supervised learning algorithms, including the novel ANN-PSO hybrid model, were applied following comprehensive data preprocessing and evaluation.

Results: The proposed ANN-PSO model achieved a remarkable accuracy rate of 92%, outperforming traditional regression methods. SHAP was employed to analyze feature importance, offering deeper insights into the influence of various factors on obesity risk.

Discussion: The findings highlight the transformative role of advanced machine-learning models in public health research, offering a pathway for personalized healthcare interventions. By providing detailed obesity risk profiles, these models enable healthcare providers to tailor prevention and treatment strategies to individual needs. The results underscore the need to integrate innovative machine-learning approaches into global public health efforts to combat the growing obesity epidemic.

导言：随着全球肥胖症发病率的持续上升，肥胖症已成为一个重大的公共卫生问题，需要更准确的预测方法。传统的回归模型往往无法捕捉到导致肥胖的遗传、环境和行为因素之间复杂的相互作用：本研究探讨了机器学习技术在改善肥胖风险预测方面的潜力。在对数据进行全面预处理和评估后，应用了各种监督学习算法，包括新型 ANN-PSO 混合模型：结果：所提出的 ANN-PSO 模型准确率高达 92%，优于传统的回归方法。采用 SHAP 分析特征重要性，更深入地了解了各种因素对肥胖风险的影响：讨论：研究结果凸显了先进的机器学习模型在公共卫生研究中的变革性作用，为个性化医疗干预提供了途径。通过提供详细的肥胖风险概况，这些模型使医疗服务提供者能够根据个人需求制定预防和治疗策略。研究结果强调，有必要将创新的机器学习方法纳入全球公共卫生工作，以应对日益严重的肥胖症流行。

{"title":"Prediction and classification of obesity risk based on a hybrid metaheuristic machine learning approach.","authors":"Zarindokht Helforoush, Hossein Sayyad","doi":"10.3389/fdata.2024.1469981","DOIUrl":"https://doi.org/10.3389/fdata.2024.1469981","url":null,"abstract":"Introduction: As the global prevalence of obesity continues to rise, it has become a major public health concern requiring more accurate prediction methods. Traditional regression models often fail to capture the complex interactions between genetic, environmental, and behavioral factors contributing to obesity.Methods: This study explores the potential of machine-learning techniques to improve obesity risk prediction. Various supervised learning algorithms, including the novel ANN-PSO hybrid model, were applied following comprehensive data preprocessing and evaluation.Results: The proposed ANN-PSO model achieved a remarkable accuracy rate of 92%, outperforming traditional regression methods. SHAP was employed to analyze feature importance, offering deeper insights into the influence of various factors on obesity risk.Discussion: The findings highlight the transformative role of advanced machine-learning models in public health research, offering a pathway for personalized healthcare interventions. By providing detailed obesity risk profiles, these models enable healthcare providers to tailor prevention and treatment strategies to individual needs. The results underscore the need to integrate innovative machine-learning approaches into global public health efforts to combat the growing obesity epidemic.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1469981"},"PeriodicalIF":2.4,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471553/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142480537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Making the most of big qualitative datasets: a living systematic review of analysis methods. 充分利用大型定性数据集：对分析方法的系统回顾。

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Frontiers in Big Data

Pub Date : 2024-09-25 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1455399

Abinaya Chandrasekar, Sigrún Eyrúnardóttir Clark, Sam Martin, Samantha Vanderslott, Elaine C Flores, David Aceituno, Phoebe Barnett, Cecilia Vindrola-Padros, Norha Vera San Juan

Introduction: Qualitative data provides deep insights into an individual's behaviors and beliefs, and the contextual factors that may shape these. Big qualitative data analysis is an emerging field that aims to identify trends and patterns in large qualitative datasets. The purpose of this review was to identify the methods used to analyse large bodies of qualitative data, their cited strengths and limitations and comparisons between manual and digital analysis approaches.

Methods: A multifaceted approach has been taken to develop the review relying on academic, gray and media-based literature, using approaches such as iterative analysis, frequency analysis, text network analysis and team discussion.

Results: The review identified 520 articles that detailed analysis approaches of big qualitative data. From these publications a diverse range of methods and software used for analysis were identified, with thematic analysis and basic software being most common. Studies were most commonly conducted in high-income countries, and the most common data sources were open-ended survey responses, interview transcripts, and first-person narratives.

Discussion: We identified an emerging trend to expand the sources of qualitative data (e.g., using social media data, images, or videos), and develop new methods and software for analysis. As the qualitative analysis field may continue to change, it will be necessary to conduct further research to compare the utility of different big qualitative analysis methods and to develop standardized guidelines to raise awareness and support researchers in the use of more novel approaches for big qualitative analysis.

Systematic review registration: https://osf.io/hbvsy/?view_only=.

简介定性数据能够深入揭示个人的行为和信念，以及可能影响这些行为和信念的背景因素。大型定性数据分析是一个新兴领域，旨在识别大型定性数据集中的趋势和模式。本综述旨在确定用于分析大量定性数据的方法、这些方法的优势和局限性，以及人工分析方法和数字分析方法之间的比较：方法：采用多方面的方法，依靠学术、灰色和基于媒体的文献，使用迭代分析、频率分析、文本网络分析和团队讨论等方法进行综述：结果：综述确定了 520 篇详细介绍大型定性数据分析方法的文章。从这些出版物中发现了各种不同的分析方法和软件，其中主题分析和基本软件最为常见。研究通常在高收入国家进行，最常见的数据来源是开放式调查回复、访谈记录和第一人称叙述：我们发现了一种新趋势，即扩大定性数据的来源（如使用社交媒体数据、图像或视频），并开发新的分析方法和软件。由于定性分析领域可能会继续发生变化，因此有必要开展进一步的研究，比较不同的大样本定性分析方法的效用，并制定标准化指南，以提高研究人员的认识，支持他们使用更多新颖的方法进行大样本定性分析。系统综述注册：https://osf.io/hbvsy/?view_only=。

{"title":"Making the most of big qualitative datasets: a living systematic review of analysis methods.","authors":"Abinaya Chandrasekar, Sigrún Eyrúnardóttir Clark, Sam Martin, Samantha Vanderslott, Elaine C Flores, David Aceituno, Phoebe Barnett, Cecilia Vindrola-Padros, Norha Vera San Juan","doi":"10.3389/fdata.2024.1455399","DOIUrl":"10.3389/fdata.2024.1455399","url":null,"abstract":"Introduction: Qualitative data provides deep insights into an individual's behaviors and beliefs, and the contextual factors that may shape these. Big qualitative data analysis is an emerging field that aims to identify trends and patterns in large qualitative datasets. The purpose of this review was to identify the methods used to analyse large bodies of qualitative data, their cited strengths and limitations and comparisons between manual and digital analysis approaches.Methods: A multifaceted approach has been taken to develop the review relying on academic, gray and media-based literature, using approaches such as iterative analysis, frequency analysis, text network analysis and team discussion.Results: The review identified 520 articles that detailed analysis approaches of big qualitative data. From these publications a diverse range of methods and software used for analysis were identified, with thematic analysis and basic software being most common. Studies were most commonly conducted in high-income countries, and the most common data sources were open-ended survey responses, interview transcripts, and first-person narratives.Discussion: We identified an emerging trend to expand the sources of qualitative data (e.g., using social media data, images, or videos), and develop new methods and software for analysis. As the qualitative analysis field may continue to change, it will be necessary to conduct further research to compare the utility of different big qualitative analysis methods and to develop standardized guidelines to raise awareness and support researchers in the use of more novel approaches for big qualitative analysis.Systematic review registration: https://osf.io/hbvsy/?view_only=.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1455399"},"PeriodicalIF":2.4,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461344/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data-driven classification and explainable-AI in the field of lung imaging. 肺部成像领域的数据驱动分类和可解释人工智能。

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Frontiers in Big Data

Pub Date : 2024-09-19 eCollection Date: 2024-01-01 DOI: 10.3389/fdata.2024.1393758

Syed Taimoor Hussain Shah, Syed Adil Hussain Shah, Iqra Iqbal Khan, Atif Imran, Syed Baqir Hussain Shah, Atif Mehmood, Shahzad Ahmad Qureshi, Mudassar Raza, Angelo Di Terlizzi, Marco Cavaglià, Marco Agostino Deriu

Detecting lung diseases in medical images can be quite challenging for radiologists. In some cases, even experienced experts may struggle with accurately diagnosing chest diseases, leading to potential inaccuracies due to complex or unseen biomarkers. This review paper delves into various datasets and machine learning techniques employed in recent research for lung disease classification, focusing on pneumonia analysis using chest X-ray images. We explore conventional machine learning methods, pretrained deep learning models, customized convolutional neural networks (CNNs), and ensemble methods. A comprehensive comparison of different classification approaches is presented, encompassing data acquisition, preprocessing, feature extraction, and classification using machine vision, machine and deep learning, and explainable-AI (XAI). Our analysis highlights the superior performance of transfer learning-based methods using CNNs and ensemble models/features for lung disease classification. In addition, our comprehensive review offers insights for researchers in other medical domains too who utilize radiological images. By providing a thorough overview of various techniques, our work enables the establishment of effective strategies and identification of suitable methods for a wide range of challenges. Currently, beyond traditional evaluation metrics, researchers emphasize the importance of XAI techniques in machine and deep learning models and their applications in classification tasks. This incorporation helps in gaining a deeper understanding of their decision-making processes, leading to improved trust, transparency, and overall clinical decision-making. Our comprehensive review serves as a valuable resource for researchers and practitioners seeking not only to advance the field of lung disease detection using machine learning and XAI but also from other diverse domains.

对于放射科医生来说，在医学影像中检测肺部疾病是一项相当具有挑战性的工作。在某些情况下，即使是经验丰富的专家也很难准确诊断胸部疾病，因为复杂或未见的生物标志物可能会导致诊断不准确。本综述论文深入探讨了近期研究中用于肺部疾病分类的各种数据集和机器学习技术，重点是使用胸部 X 光图像进行肺炎分析。我们探讨了传统机器学习方法、预训练深度学习模型、定制卷积神经网络（CNN）和集合方法。我们对不同的分类方法进行了全面比较，包括数据采集、预处理、特征提取，以及使用机器视觉、机器学习、深度学习和可解释人工智能（XAI）进行分类。我们的分析强调了基于迁移学习的方法（使用 CNN 和集合模型/特征）在肺病分类方面的卓越性能。此外，我们的全面综述还为其他医疗领域利用放射图像的研究人员提供了启示。通过对各种技术的全面概述，我们的工作有助于建立有效的策略，并针对广泛的挑战确定合适的方法。目前，除了传统的评估指标外，研究人员还强调了 XAI 技术在机器学习和深度学习模型中的重要性及其在分类任务中的应用。这种结合有助于更深入地了解他们的决策过程，从而提高信任度、透明度和整体临床决策水平。我们的综合综述不仅为寻求利用机器学习和 XAI 推动肺病检测领域发展的研究人员和从业人员提供了宝贵的资源，也为其他不同领域的研究人员和从业人员提供了宝贵的资源。

{"title":"Data-driven classification and explainable-AI in the field of lung imaging.","authors":"Syed Taimoor Hussain Shah, Syed Adil Hussain Shah, Iqra Iqbal Khan, Atif Imran, Syed Baqir Hussain Shah, Atif Mehmood, Shahzad Ahmad Qureshi, Mudassar Raza, Angelo Di Terlizzi, Marco Cavaglià, Marco Agostino Deriu","doi":"10.3389/fdata.2024.1393758","DOIUrl":"10.3389/fdata.2024.1393758","url":null,"abstract":"Detecting lung diseases in medical images can be quite challenging for radiologists. In some cases, even experienced experts may struggle with accurately diagnosing chest diseases, leading to potential inaccuracies due to complex or unseen biomarkers. This review paper delves into various datasets and machine learning techniques employed in recent research for lung disease classification, focusing on pneumonia analysis using chest X-ray images. We explore conventional machine learning methods, pretrained deep learning models, customized convolutional neural networks (CNNs), and ensemble methods. A comprehensive comparison of different classification approaches is presented, encompassing data acquisition, preprocessing, feature extraction, and classification using machine vision, machine and deep learning, and explainable-AI (XAI). Our analysis highlights the superior performance of transfer learning-based methods using CNNs and ensemble models/features for lung disease classification. In addition, our comprehensive review offers insights for researchers in other medical domains too who utilize radiological images. By providing a thorough overview of various techniques, our work enables the establishment of effective strategies and identification of suitable methods for a wide range of challenges. Currently, beyond traditional evaluation metrics, researchers emphasize the importance of XAI techniques in machine and deep learning models and their applications in classification tasks. This incorporation helps in gaining a deeper understanding of their decision-making processes, leading to improved trust, transparency, and overall clinical decision-making. Our comprehensive review serves as a valuable resource for researchers and practitioners seeking not only to advance the field of lung disease detection using machine learning and XAI but also from other diverse domains.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1393758"},"PeriodicalIF":2.4,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11446784/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0