Pub Date : 2024-08-09DOI: 10.3390/biomedinformatics4030102
Colm Brandon, S. Boßelmann, Amandeep Singh, Stephen Ryan, Alexander Schieweck, É. Fennell, Bernhard Steffen, Tiziana Margaria
Background: In biomedical imaging research, experimental biologists generate vast amounts of data that require advanced computational analysis. Breakthroughs in experimental techniques, such as multiplex immunofluorescence tissue imaging, enable detailed proteomic analysis, but most biomedical researchers lack the programming and Artificial Intelligence (AI) expertise to leverage these innovations effectively. Methods: Cinco de Bio (CdB) is a web-based, collaborative low-code/no-code modelling and execution platform designed to address this challenge. It is designed along Model-Driven Development (MDD) and Service-Orientated Architecture (SOA) to enable modularity and scalability, and it is underpinned by formal methods to ensure correctness. The pre-processing of immunofluorescence images illustrates the ease of use and ease of modelling with CdB in comparison with the current, mostly manual, approaches. Results: CdB simplifies the deployment of data processing services that may use heterogeneous technologies. User-designed models support both a collaborative and user-centred design for biologists. Domain-Specific Languages for the Application domain (A-DSLs) are supported through data and process ontologies/taxonomies. They allow biologists to effectively model workflows in the terminology of their field. Conclusions: Comparative analysis of similar platforms in the literature illustrates the superiority of CdB along a number of comparison dimensions. We are expanding the platform’s capabilities and applying it to other domains of biomedical research.
背景:在生物医学成像研究中,实验生物学家会产生大量数据,需要进行高级计算分析。实验技术的突破(如多重免疫荧光组织成像)使详细的蛋白质组分析成为可能,但大多数生物医学研究人员缺乏编程和人工智能(AI)专业知识,无法有效利用这些创新技术。研究方法Cinco de Bio(CdB)是一个基于网络的协作式低代码/无代码建模和执行平台,旨在应对这一挑战。它按照模型驱动开发(MDD)和服务导向架构(SOA)设计,以实现模块化和可扩展性,并以正规方法为基础,确保正确性。免疫荧光图像的预处理说明,与目前大多采用手工操作的方法相比,CdB 使用方便,易于建模。结果CdB 简化了可能使用异构技术的数据处理服务的部署。用户设计的模型既支持生物学家的协作设计,也支持以用户为中心的设计。应用领域的特定领域语言(A-DSL)通过数据和流程本体论/分类法得到支持。它们允许生物学家以其领域的术语对工作流程进行有效建模。结论对文献中类似平台的比较分析表明,CdB 在多个比较维度上都具有优势。我们正在扩展该平台的功能,并将其应用于生物医学研究的其他领域。
{"title":"Cinco de Bio: A Low-Code Platform for Domain-Specific Workflows for Biomedical Imaging Research","authors":"Colm Brandon, S. Boßelmann, Amandeep Singh, Stephen Ryan, Alexander Schieweck, É. Fennell, Bernhard Steffen, Tiziana Margaria","doi":"10.3390/biomedinformatics4030102","DOIUrl":"https://doi.org/10.3390/biomedinformatics4030102","url":null,"abstract":"Background: In biomedical imaging research, experimental biologists generate vast amounts of data that require advanced computational analysis. Breakthroughs in experimental techniques, such as multiplex immunofluorescence tissue imaging, enable detailed proteomic analysis, but most biomedical researchers lack the programming and Artificial Intelligence (AI) expertise to leverage these innovations effectively. Methods: Cinco de Bio (CdB) is a web-based, collaborative low-code/no-code modelling and execution platform designed to address this challenge. It is designed along Model-Driven Development (MDD) and Service-Orientated Architecture (SOA) to enable modularity and scalability, and it is underpinned by formal methods to ensure correctness. The pre-processing of immunofluorescence images illustrates the ease of use and ease of modelling with CdB in comparison with the current, mostly manual, approaches. Results: CdB simplifies the deployment of data processing services that may use heterogeneous technologies. User-designed models support both a collaborative and user-centred design for biologists. Domain-Specific Languages for the Application domain (A-DSLs) are supported through data and process ontologies/taxonomies. They allow biologists to effectively model workflows in the terminology of their field. Conclusions: Comparative analysis of similar platforms in the literature illustrates the superiority of CdB along a number of comparison dimensions. We are expanding the platform’s capabilities and applying it to other domains of biomedical research.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141922204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.3390/biomedinformatics4030103
Jonas Bambi, Hanieh Sadri, Ken Moselle, Ernie Chang, Yudi Santoso, Joseph Howie, Abraham Rudnick, Lloyd T. Elliott, Alex Kuo
Background: As patients interact with a healthcare service system, patterns of service utilization (PSUs) emerge. These PSUs are embedded in the sparse high-dimensional space of longitudinal cross-continuum health service encounter data. Once extracted, PSUs can provide quality assurance/quality improvement (QA/QI) efforts with the information required to optimize service system structures and functions. This may improve outcomes for complex patients with chronic diseases. Method: Working with longitudinal cross-continuum encounter data from a regional health service system, various pattern detection analyses were conducted, employing (1) graph community detection algorithms, (2) natural language processing (NLP) clustering, and (3) a hybrid NLP–graph method. Result: These approaches produced similar PSUs, as determined from a clinical perspective by clinical subject matter experts and service system operations experts. Conclusions: The similarity in the results provides validation for the methodologies. Moreover, the results stress the need to engage with clinical or service system operations experts, both in providing the taxonomies and ontologies of the service system, the cohort definitions, and determining the level of granularity that produces the most clinically meaningful results. Finally, the uniqueness of each approach provides an opportunity to take advantage of the various analytical capabilities that each approach brings, which will be further explored in our future research.
{"title":"Approaches to Extracting Patterns of Service Utilization for Patients with Complex Conditions: Graph Community Detection vs. Natural Language Processing Clustering","authors":"Jonas Bambi, Hanieh Sadri, Ken Moselle, Ernie Chang, Yudi Santoso, Joseph Howie, Abraham Rudnick, Lloyd T. Elliott, Alex Kuo","doi":"10.3390/biomedinformatics4030103","DOIUrl":"https://doi.org/10.3390/biomedinformatics4030103","url":null,"abstract":"Background: As patients interact with a healthcare service system, patterns of service utilization (PSUs) emerge. These PSUs are embedded in the sparse high-dimensional space of longitudinal cross-continuum health service encounter data. Once extracted, PSUs can provide quality assurance/quality improvement (QA/QI) efforts with the information required to optimize service system structures and functions. This may improve outcomes for complex patients with chronic diseases. Method: Working with longitudinal cross-continuum encounter data from a regional health service system, various pattern detection analyses were conducted, employing (1) graph community detection algorithms, (2) natural language processing (NLP) clustering, and (3) a hybrid NLP–graph method. Result: These approaches produced similar PSUs, as determined from a clinical perspective by clinical subject matter experts and service system operations experts. Conclusions: The similarity in the results provides validation for the methodologies. Moreover, the results stress the need to engage with clinical or service system operations experts, both in providing the taxonomies and ontologies of the service system, the cohort definitions, and determining the level of granularity that produces the most clinically meaningful results. Finally, the uniqueness of each approach provides an opportunity to take advantage of the various analytical capabilities that each approach brings, which will be further explored in our future research.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141923369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-24DOI: 10.3390/biomedinformatics4030097
Edward J. Pavlik, Dharani D. Ramaiah, Taylor A. Rives, Allison L. Swiecki-Sikora, Jamie M. Land
When women receive a diagnosis of a gynecologic malignancy, they can have questions about their diagnosis or treatment that can result in voice queries to virtual assistants for more information. Recent advancement in artificial intelligence (AI) has transformed the landscape of medical information accessibility. The Google virtual assistant (VA) outperformed Siri, Alexa and Cortana in voice queries presented prior to the explosive implementation of AI in early 2023. The efforts presented here focus on determining if advances in AI in the last 12 months have improved the accuracy of Google VA responses related to gynecologic oncology. Previous questions were utilized to form a common basis for queries prior to 2023 and responses in 2024. Correct answers were obtained from the UpToDate medical resource. Responses related to gynecologic oncology were obtained using Google VA, as well as the generative AI chatbots Google Bard/Gemini and Microsoft Bing-Copilot. The AI narrative responses varied in length and positioning of answers within the response. Google Bard/Gemini achieved an 87.5% accuracy rate, while Microsoft Bing-Copilot reached 83.3%. In contrast, the Google VA’s accuracy in audible responses improved from 18% prior to 2023 to 63% in 2024. While the accuracy of the Google VA has improved in the last year, it underperformed Google Bard/Gemini and Microsoft Bing-Copilot so there is considerable room for further improved accuracy.
当女性被诊断出患有妇科恶性肿瘤时,她们可能会对诊断或治疗产生疑问,从而通过语音向虚拟助手询问更多信息。人工智能(AI)的最新进展改变了医疗信息的可及性。在 2023 年初人工智能爆炸性发展之前,谷歌虚拟助手(VA)在语音查询方面的表现优于 Siri、Alexa 和 Cortana。本文介绍的工作重点是确定过去 12 个月中人工智能的进步是否提高了谷歌虚拟助手回答妇科肿瘤相关问题的准确性。以前的问题被用来作为 2023 年之前查询和 2024 年回复的共同基础。正确答案来自 UpToDate 医学资源。与妇科肿瘤学相关的回答通过 Google VA 以及生成式人工智能聊天机器人 Google Bard/Gemini 和 Microsoft Bing-Copilot 获得。人工智能叙述式回复的长度和答案在回复中的位置各不相同。谷歌 Bard/Gemini 的准确率达到了 87.5%,而微软 Bing-Copilot 则达到了 83.3%。相比之下,谷歌虚拟现实的有声回答准确率从 2023 年之前的 18% 提高到 2024 年的 63%。虽然谷歌虚拟现实的准确率在去年有所提高,但它的表现不如谷歌Bard/Gemini和微软Bing-Copilot,因此准确率还有很大的进一步提高空间。
{"title":"Replies to Queries in Gynecologic Oncology by Bard, Bing and the Google Assistant","authors":"Edward J. Pavlik, Dharani D. Ramaiah, Taylor A. Rives, Allison L. Swiecki-Sikora, Jamie M. Land","doi":"10.3390/biomedinformatics4030097","DOIUrl":"https://doi.org/10.3390/biomedinformatics4030097","url":null,"abstract":"When women receive a diagnosis of a gynecologic malignancy, they can have questions about their diagnosis or treatment that can result in voice queries to virtual assistants for more information. Recent advancement in artificial intelligence (AI) has transformed the landscape of medical information accessibility. The Google virtual assistant (VA) outperformed Siri, Alexa and Cortana in voice queries presented prior to the explosive implementation of AI in early 2023. The efforts presented here focus on determining if advances in AI in the last 12 months have improved the accuracy of Google VA responses related to gynecologic oncology. Previous questions were utilized to form a common basis for queries prior to 2023 and responses in 2024. Correct answers were obtained from the UpToDate medical resource. Responses related to gynecologic oncology were obtained using Google VA, as well as the generative AI chatbots Google Bard/Gemini and Microsoft Bing-Copilot. The AI narrative responses varied in length and positioning of answers within the response. Google Bard/Gemini achieved an 87.5% accuracy rate, while Microsoft Bing-Copilot reached 83.3%. In contrast, the Google VA’s accuracy in audible responses improved from 18% prior to 2023 to 63% in 2024. While the accuracy of the Google VA has improved in the last year, it underperformed Google Bard/Gemini and Microsoft Bing-Copilot so there is considerable room for further improved accuracy.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141806907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-24DOI: 10.3390/biomedinformatics4030096
Kokiladevi Alagarswamy, Wenjie Shi, Aishwarya Boini, Nouredin Messaoudi, Vincent Grasso, Thomas Cattabiani, Bruce Turner, Roland S Croner, U. D. Kahlert, Andrew Gumbs
In this scoping review, we delve into the transformative potential of artificial intelligence (AI) in addressing challenges inherent in whole-genome sequencing (WGS) analysis, with a specific focus on its implications in oncology. Unveiling the limitations of existing sequencing technologies, the review illuminates how AI-powered methods emerge as innovative solutions to surmount these obstacles. The evolution of DNA sequencing technologies, progressing from Sanger sequencing to next-generation sequencing, sets the backdrop for AI’s emergence as a potent ally in processing and analyzing the voluminous genomic data generated. Particularly, deep learning methods play a pivotal role in extracting knowledge and discerning patterns from the vast landscape of genomic information. In the context of oncology, AI-powered methods exhibit considerable potential across diverse facets of WGS analysis, including variant calling, structural variation identification, and pharmacogenomic analysis. This review underscores the significance of multimodal approaches in diagnoses and therapies, highlighting the importance of ongoing research and development in AI-powered WGS techniques. Integrating AI into the analytical framework empowers scientists and clinicians to unravel the intricate interplay of genomics within the realm of multi-omics research, paving the way for more successful personalized and targeted treatments.
{"title":"Should AI-Powered Whole-Genome Sequencing Be Used Routinely for Personalized Decision Support in Surgical Oncology—A Scoping Review","authors":"Kokiladevi Alagarswamy, Wenjie Shi, Aishwarya Boini, Nouredin Messaoudi, Vincent Grasso, Thomas Cattabiani, Bruce Turner, Roland S Croner, U. D. Kahlert, Andrew Gumbs","doi":"10.3390/biomedinformatics4030096","DOIUrl":"https://doi.org/10.3390/biomedinformatics4030096","url":null,"abstract":"In this scoping review, we delve into the transformative potential of artificial intelligence (AI) in addressing challenges inherent in whole-genome sequencing (WGS) analysis, with a specific focus on its implications in oncology. Unveiling the limitations of existing sequencing technologies, the review illuminates how AI-powered methods emerge as innovative solutions to surmount these obstacles. The evolution of DNA sequencing technologies, progressing from Sanger sequencing to next-generation sequencing, sets the backdrop for AI’s emergence as a potent ally in processing and analyzing the voluminous genomic data generated. Particularly, deep learning methods play a pivotal role in extracting knowledge and discerning patterns from the vast landscape of genomic information. In the context of oncology, AI-powered methods exhibit considerable potential across diverse facets of WGS analysis, including variant calling, structural variation identification, and pharmacogenomic analysis. This review underscores the significance of multimodal approaches in diagnoses and therapies, highlighting the importance of ongoing research and development in AI-powered WGS techniques. Integrating AI into the analytical framework empowers scientists and clinicians to unravel the intricate interplay of genomics within the realm of multi-omics research, paving the way for more successful personalized and targeted treatments.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141807854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.3390/biomedinformatics4030095
Amarnath Amarnath, Ali Al Bataineh, Jeremy A. Hansen
Background: Intracranial neoplasm, often referred to as a brain tumor, is an abnormal growth or mass of tissues in the brain. The complexity of the brain and the associated diagnostic delays cause significant stress for patients. This study aims to enhance the efficiency of MRI analysis for brain tumors using deep transfer learning. Methods: We developed and evaluated the performance of five pre-trained deep learning models—ResNet50, Xception, EfficientNetV2-S, ResNet152V2, and VGG16—using a publicly available MRI scan dataset to classify images as glioma, meningioma, pituitary, or no tumor. Various classification metrics were used for evaluation. Results: Our findings indicate that these models can improve the accuracy of MRI analysis for brain tumor classification, with the Xception model achieving the highest performance with a test F1 score of 0.9817, followed by EfficientNetV2-S with a test F1 score of 0.9629. Conclusions: Implementing pre-trained deep learning models can enhance MRI accuracy for detecting brain tumors.
背景:颅内肿瘤通常被称为脑瘤,是指脑部组织的异常增生或肿块。脑部的复杂性和相关的诊断延迟给患者带来了巨大的压力。本研究旨在利用深度迁移学习提高脑肿瘤核磁共振成像分析的效率。方法:我们开发并评估了五个预训练深度学习模型--ResNet50、Xception、EfficientNetV2-S、ResNet152V2 和 VGG16--的性能,使用公开的核磁共振扫描数据集将图像分类为胶质瘤、脑膜瘤、垂体瘤或无肿瘤。评估中使用了各种分类指标。结果:我们的研究结果表明,这些模型可以提高磁共振成像分析对脑肿瘤分类的准确性,其中 Xception 模型的性能最高,测试 F1 得分为 0.9817,其次是 EfficientNetV2-S,测试 F1 得分为 0.9629。结论采用预训练的深度学习模型可以提高磁共振成像检测脑肿瘤的准确性。
{"title":"Transfer-Learning Approach for Enhanced Brain Tumor Classification in MRI Imaging","authors":"Amarnath Amarnath, Ali Al Bataineh, Jeremy A. Hansen","doi":"10.3390/biomedinformatics4030095","DOIUrl":"https://doi.org/10.3390/biomedinformatics4030095","url":null,"abstract":"Background: Intracranial neoplasm, often referred to as a brain tumor, is an abnormal growth or mass of tissues in the brain. The complexity of the brain and the associated diagnostic delays cause significant stress for patients. This study aims to enhance the efficiency of MRI analysis for brain tumors using deep transfer learning. Methods: We developed and evaluated the performance of five pre-trained deep learning models—ResNet50, Xception, EfficientNetV2-S, ResNet152V2, and VGG16—using a publicly available MRI scan dataset to classify images as glioma, meningioma, pituitary, or no tumor. Various classification metrics were used for evaluation. Results: Our findings indicate that these models can improve the accuracy of MRI analysis for brain tumor classification, with the Xception model achieving the highest performance with a test F1 score of 0.9817, followed by EfficientNetV2-S with a test F1 score of 0.9629. Conclusions: Implementing pre-trained deep learning models can enhance MRI accuracy for detecting brain tumors.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141817588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-19DOI: 10.3390/biomedinformatics4030094
Teresa Angela Trunfio, G. Improta
Background: Malignant breast cancer is the most common cancer affecting women worldwide. The COVID-19 pandemic appears to have slowed the diagnostic process, leading to an enhanced use of invasive approaches such as mastectomy. The increased use of a surgical procedure pushes towards an objective analysis of patient flow with measurable quality indicators such as length of stay (LOS) in order to optimize it. Methods: In this work, different regression and classification models were implemented to analyze the total LOS as a function of a set of independent variables (age, gender, pre-op LOS, discharge ward, year of discharge, type of procedure, presence of hypertension, diabetes, cardiovascular disease, respiratory disease, secondary tumors, and surgery with complications) extracted from the discharge records of patients undergoing mastectomy at the ‘San Giovanni di Dio e Ruggi d’Aragona’ University Hospital of Salerno (Italy) in the years 2011–2021. In addition, the impact of COVID-19 was assessed by statistically comparing data from patients discharged in 2018–2019 with those discharged in 2020–2021. Results: The results obtained generally show the good performance of the regression models in characterizing the particular case studies. Among the models, the best at predicting the LOS from the set of variables described above was polynomial regression, with an R2 value above 0.689. The classification algorithms that operated on a LOS divided into 3 arbitrary classes also proved to be good tools, reaching 79% accuracy with the voting classifier. Among the independent variables, both implemented models showed that the ward of discharge, year of discharge, type of procedure and complications during surgery had the greatest impact on LOS. The final focus to assess the impact of COVID-19 showed a statically significant increase in surgical complications. Conclusion: Through this study, it was possible to validate the use of regression and classification models to characterize the total LOS of mastectomy patients. LOS proves to be an excellent indicator of performance, and through its analysis with advanced methods, such as machine learning algorithms, it is possible to understand which of the demographic and organizational variables collected have a significant impact and thus build simple predictors to support healthcare management.
背景:恶性乳腺癌是全球妇女最常见的癌症。COVID-19 的流行似乎减缓了诊断过程,导致乳房切除术等侵入性方法的使用增加。外科手术使用的增加推动了对患者流量进行客观分析,并采用可衡量的质量指标,如住院时间(LOS),以优化患者流量。方法:在这项工作中,我们采用了不同的回归和分类模型来分析总住院时间与一系列自变量(年龄、性别、术前住院时间、出院病房、出院年份、手术类型、是否患有高血压、糖尿病、心血管疾病、呼吸系统疾病、继发性肿瘤和手术并发症)的函数关系,这些自变量是从 2011-2021 年期间在意大利萨勒诺 "San Giovanni di Dio e Ruggi d'Aragona "大学医院接受乳房切除术的患者出院记录中提取的。此外,通过统计比较 2018-2019 年出院患者与 2020-2021 年出院患者的数据,评估了 COVID-19 的影响。结果:所得结果总体上表明,回归模型在描述特定病例研究的特征方面表现良好。在这些模型中,根据上述变量集预测生命周期最好的是多项式回归,其 R2 值高于 0.689。将 LOS 任意分为三类的分类算法也被证明是很好的工具,投票分类器的准确率达到了 79%。在自变量中,两个模型都显示出出院病房、出院年份、手术类型和手术并发症对 LOS 的影响最大。评估 COVID-19 影响的最终重点显示,手术并发症的增加具有统计学意义。结论:通过这项研究,我们可以验证使用回归和分类模型来描述乳房切除术患者的总 LOS。事实证明,LOS 是一个很好的绩效指标,通过使用机器学习算法等先进方法对其进行分析,可以了解所收集的人口和组织变量中哪些变量会产生重大影响,从而建立简单的预测器来支持医疗管理。
{"title":"Flow Analysis of Mastectomy Patients Using Length of Stay: A Single-Center Study","authors":"Teresa Angela Trunfio, G. Improta","doi":"10.3390/biomedinformatics4030094","DOIUrl":"https://doi.org/10.3390/biomedinformatics4030094","url":null,"abstract":"Background: Malignant breast cancer is the most common cancer affecting women worldwide. The COVID-19 pandemic appears to have slowed the diagnostic process, leading to an enhanced use of invasive approaches such as mastectomy. The increased use of a surgical procedure pushes towards an objective analysis of patient flow with measurable quality indicators such as length of stay (LOS) in order to optimize it. Methods: In this work, different regression and classification models were implemented to analyze the total LOS as a function of a set of independent variables (age, gender, pre-op LOS, discharge ward, year of discharge, type of procedure, presence of hypertension, diabetes, cardiovascular disease, respiratory disease, secondary tumors, and surgery with complications) extracted from the discharge records of patients undergoing mastectomy at the ‘San Giovanni di Dio e Ruggi d’Aragona’ University Hospital of Salerno (Italy) in the years 2011–2021. In addition, the impact of COVID-19 was assessed by statistically comparing data from patients discharged in 2018–2019 with those discharged in 2020–2021. Results: The results obtained generally show the good performance of the regression models in characterizing the particular case studies. Among the models, the best at predicting the LOS from the set of variables described above was polynomial regression, with an R2 value above 0.689. The classification algorithms that operated on a LOS divided into 3 arbitrary classes also proved to be good tools, reaching 79% accuracy with the voting classifier. Among the independent variables, both implemented models showed that the ward of discharge, year of discharge, type of procedure and complications during surgery had the greatest impact on LOS. The final focus to assess the impact of COVID-19 showed a statically significant increase in surgical complications. Conclusion: Through this study, it was possible to validate the use of regression and classification models to characterize the total LOS of mastectomy patients. LOS proves to be an excellent indicator of performance, and through its analysis with advanced methods, such as machine learning algorithms, it is possible to understand which of the demographic and organizational variables collected have a significant impact and thus build simple predictors to support healthcare management.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141822135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18DOI: 10.3390/biomedinformatics4030093
Katerina Kadena, E. Ouzounoglou
Background: Amyotrophic Lateral Sclerosis (ALS) is a devastating neurological disorder with increasing prevalence rates. Currently, only 8 FDA-approved drugs and 44 clinical trials exist for ALS treatment specifying the lacuna in disease-specific treatment. Drug repurposing, an alternative approach, is gaining huge importance. This study aims to identify potential repurposable compounds using gene expression analysis and structural similarity approaches. Methods: GSE833 and GSE3307 were analysed to retrieve Differentially Expressed Genes (DEGs) which were utilized to identify compounds reversing the gene signatures from LINCS. SMILES of ALS-specific FDA-approved and clinical trial compounds were used to retrieve structurally similar drugs from DrugBank. Drug-Target-Network (DTN) was constructed for the identified compounds to retrieve drug targets which were further subjected to functional enrichment analysis. Results: GSE833 retrieved 13 & 5 whereas GSE3307 retrieved 280 & 430 significant upregulated and downregulated DEGs respectively. Gene expression similarity identified 213 approved drugs. Structural similarity analysis of 44 compounds resulted in 411 approved and investigational compounds. DTN was constructed for 266 compounds to identify drug targets. Functional enrichment analysis resulted in neuroinflammatory response, cAMP signaling, PI3K-AKT signaling, and oxidative stress pathways. A preliminary relevancy check identified previous association of 105 compounds in ALS research, validating the approach, with 172 potential repurposable compounds.
{"title":"Drug Repurposing for Amyotrophic Lateral Sclerosis Based on Gene Expression Similarity and Structural Similarity: A Cheminformatics, Genomic and Network-Based Analysis","authors":"Katerina Kadena, E. Ouzounoglou","doi":"10.3390/biomedinformatics4030093","DOIUrl":"https://doi.org/10.3390/biomedinformatics4030093","url":null,"abstract":"Background: Amyotrophic Lateral Sclerosis (ALS) is a devastating neurological disorder with increasing prevalence rates. Currently, only 8 FDA-approved drugs and 44 clinical trials exist for ALS treatment specifying the lacuna in disease-specific treatment. Drug repurposing, an alternative approach, is gaining huge importance. This study aims to identify potential repurposable compounds using gene expression analysis and structural similarity approaches. Methods: GSE833 and GSE3307 were analysed to retrieve Differentially Expressed Genes (DEGs) which were utilized to identify compounds reversing the gene signatures from LINCS. SMILES of ALS-specific FDA-approved and clinical trial compounds were used to retrieve structurally similar drugs from DrugBank. Drug-Target-Network (DTN) was constructed for the identified compounds to retrieve drug targets which were further subjected to functional enrichment analysis. Results: GSE833 retrieved 13 & 5 whereas GSE3307 retrieved 280 & 430 significant upregulated and downregulated DEGs respectively. Gene expression similarity identified 213 approved drugs. Structural similarity analysis of 44 compounds resulted in 411 approved and investigational compounds. DTN was constructed for 266 compounds to identify drug targets. Functional enrichment analysis resulted in neuroinflammatory response, cAMP signaling, PI3K-AKT signaling, and oxidative stress pathways. A preliminary relevancy check identified previous association of 105 compounds in ALS research, validating the approach, with 172 potential repurposable compounds.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141824439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.3390/biomedinformatics4030091
Nur Hasanah Ali, Abdul Rahim Abdullah, N. Saad, A. Muda, Ervina Efzan Mhd Noor
Background: Ischemic stroke poses significant challenges in diagnosis and treatment, necessitating efficient and accurate methods for assessing collateral circulation, a critical determinant of patient prognosis. Manual classification of collateral circulation in ischemic stroke using traditional imaging techniques is labor-intensive and prone to subjectivity. This study presented the automated classification of collateral circulation patterns in cone-beam CT (CBCT) images, utilizing the VGG11 architecture. Methods: The study utilized a dataset of CBCT images from ischemic stroke patients, accurately labeled with their respective collateral circulation status. To ensure uniformity and comparability, image normalization was executed during the preprocessing phase to standardize pixel values to a consistent scale or range. Then, the VGG11 model is trained using an augmented dataset and classifies collateral circulation patterns. Results: Performance evaluation of the proposed approach demonstrates promising results, with the model achieving an accuracy of 58.32%, a sensitivity of 75.50%, a specificity of 44.10%, a precision of 52.70%, and an F1 score of 62.10% in classifying collateral circulation patterns. Conclusions: This approach automates classification, potentially reducing diagnostic delays and improving patient outcomes. It also lays the groundwork for future research in using deep learning for better stroke diagnosis and management. This study is a significant advancement toward developing practical tools to assist doctors in making informed decisions for ischemic stroke patients.
{"title":"Automated Classification of Collateral Circulation for Ischemic Stroke in Cone-Beam CT Images Using VGG11: A Deep Learning Approach","authors":"Nur Hasanah Ali, Abdul Rahim Abdullah, N. Saad, A. Muda, Ervina Efzan Mhd Noor","doi":"10.3390/biomedinformatics4030091","DOIUrl":"https://doi.org/10.3390/biomedinformatics4030091","url":null,"abstract":"Background: Ischemic stroke poses significant challenges in diagnosis and treatment, necessitating efficient and accurate methods for assessing collateral circulation, a critical determinant of patient prognosis. Manual classification of collateral circulation in ischemic stroke using traditional imaging techniques is labor-intensive and prone to subjectivity. This study presented the automated classification of collateral circulation patterns in cone-beam CT (CBCT) images, utilizing the VGG11 architecture. Methods: The study utilized a dataset of CBCT images from ischemic stroke patients, accurately labeled with their respective collateral circulation status. To ensure uniformity and comparability, image normalization was executed during the preprocessing phase to standardize pixel values to a consistent scale or range. Then, the VGG11 model is trained using an augmented dataset and classifies collateral circulation patterns. Results: Performance evaluation of the proposed approach demonstrates promising results, with the model achieving an accuracy of 58.32%, a sensitivity of 75.50%, a specificity of 44.10%, a precision of 52.70%, and an F1 score of 62.10% in classifying collateral circulation patterns. Conclusions: This approach automates classification, potentially reducing diagnostic delays and improving patient outcomes. It also lays the groundwork for future research in using deep learning for better stroke diagnosis and management. This study is a significant advancement toward developing practical tools to assist doctors in making informed decisions for ischemic stroke patients.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141668744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-03DOI: 10.3390/biomedinformatics4030090
Richard Fechner, Jens Dörpinghaus, R. Rockenfeller, Jennifer Faber
Background: Biomedical data are usually collections of longitudinal data assessed at certain points in time. Clinical observations assess the presences and severity of symptoms, which are the basis for the description and modeling of disease progression. Deciphering potential underlying unknowns from the distinct observation would substantially improve the understanding of pathological cascades. Hidden Markov Models (HMMs) have been successfully applied to the processing of possibly noisy continuous signals. We apply ensembles of HMMs to categorically distributed multivariate time series data, leaving space for expert domain knowledge in the prediction process. Methods: We use an ensemble of HMMs to predict the loss of free walking ability as one major clinical deterioration in the most common autosomal dominantly inherited ataxia disorder worldwide. Results: We present a prediction pipeline that processes data paired with a configuration file, enabling us to train, validate and query an ensemble of HMMs. In particular, we provide a theoretical and practical framework for multivariate time-series inference based on HMMs that includes constructing multiple HMMs, each to predict a particular observable variable. Our analysis is conducted on pseudo-data, but also on biomedical data based on Spinocerebellar ataxia type 3 disease. Conclusions: We find that the model shows promising results for the data we tested. The strength of this approach is that HMMs are well understood, probabilistic and interpretable models, setting it apart from most Deep Learning approaches. We publish all code and evaluation pseudo-data in an open-source repository.
{"title":"Ensemble of HMMs for Sequence Prediction on Multivariate Biomedical Data","authors":"Richard Fechner, Jens Dörpinghaus, R. Rockenfeller, Jennifer Faber","doi":"10.3390/biomedinformatics4030090","DOIUrl":"https://doi.org/10.3390/biomedinformatics4030090","url":null,"abstract":"Background: Biomedical data are usually collections of longitudinal data assessed at certain points in time. Clinical observations assess the presences and severity of symptoms, which are the basis for the description and modeling of disease progression. Deciphering potential underlying unknowns from the distinct observation would substantially improve the understanding of pathological cascades. Hidden Markov Models (HMMs) have been successfully applied to the processing of possibly noisy continuous signals. We apply ensembles of HMMs to categorically distributed multivariate time series data, leaving space for expert domain knowledge in the prediction process. Methods: We use an ensemble of HMMs to predict the loss of free walking ability as one major clinical deterioration in the most common autosomal dominantly inherited ataxia disorder worldwide. Results: We present a prediction pipeline that processes data paired with a configuration file, enabling us to train, validate and query an ensemble of HMMs. In particular, we provide a theoretical and practical framework for multivariate time-series inference based on HMMs that includes constructing multiple HMMs, each to predict a particular observable variable. Our analysis is conducted on pseudo-data, but also on biomedical data based on Spinocerebellar ataxia type 3 disease. Conclusions: We find that the model shows promising results for the data we tested. The strength of this approach is that HMMs are well understood, probabilistic and interpretable models, setting it apart from most Deep Learning approaches. We publish all code and evaluation pseudo-data in an open-source repository.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141682174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-02DOI: 10.3390/biomedinformatics4030089
J. Arslan, Kurt Benke
Background: Several studies have investigated various features and models in order to understand the growth and progression of the ocular disease geographic atrophy (GA). Commonly assessed features include age, sex, smoking, alcohol consumption, sedentary lifestyle, hypertension, and diabetes. There have been inconsistencies regarding which features correlate with GA progression. Chief amongst these inconsistencies is whether the investigated features are readily available for analysis across various ophthalmic institutions. Methods:In this study, we focused our attention on the association of fundus autofluorescence (FAF) imaging features and GA progression. Our method included feature extraction using radiomic processes and feature ranking by machine learning incorporating the algorithm XGBoost to determine the best-ranked features. This led to the development of an image-based linear mixed-effects model, which was designed to account for slope change based on within-subject variability and inter-eye correlation. Metrics used to assess the linear mixed-effects model included marginal and conditional R2, Pearson’s correlation coefficient (r), root mean square error (RMSE), mean error (ME), mean absolute error (MAE), mean absolute deviation (MAD), the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and loglikelihood. Results: We developed a linear mixed-effects model with 15 image-based features. The model results were as follows: R2 = 0.96, r = 0.981, RMSE = 1.32, ME = −7.3 × 10−15, MAE = 0.94, MAD = 0.999, AIC = 2084.93, BIC = 2169.97, and log likelihood = −1022.46. Conclusions: The advantage of our method is that it relies on the inherent properties of the image itself, rather than the availability of clinical or demographic data. Thus, the image features discovered in this study are universally and readily available across the board.
{"title":"Machine Learning for Extraction of Image Features Associated with Progression of Geographic Atrophy","authors":"J. Arslan, Kurt Benke","doi":"10.3390/biomedinformatics4030089","DOIUrl":"https://doi.org/10.3390/biomedinformatics4030089","url":null,"abstract":"Background: Several studies have investigated various features and models in order to understand the growth and progression of the ocular disease geographic atrophy (GA). Commonly assessed features include age, sex, smoking, alcohol consumption, sedentary lifestyle, hypertension, and diabetes. There have been inconsistencies regarding which features correlate with GA progression. Chief amongst these inconsistencies is whether the investigated features are readily available for analysis across various ophthalmic institutions. Methods:In this study, we focused our attention on the association of fundus autofluorescence (FAF) imaging features and GA progression. Our method included feature extraction using radiomic processes and feature ranking by machine learning incorporating the algorithm XGBoost to determine the best-ranked features. This led to the development of an image-based linear mixed-effects model, which was designed to account for slope change based on within-subject variability and inter-eye correlation. Metrics used to assess the linear mixed-effects model included marginal and conditional R2, Pearson’s correlation coefficient (r), root mean square error (RMSE), mean error (ME), mean absolute error (MAE), mean absolute deviation (MAD), the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and loglikelihood. Results: We developed a linear mixed-effects model with 15 image-based features. The model results were as follows: R2 = 0.96, r = 0.981, RMSE = 1.32, ME = −7.3 × 10−15, MAE = 0.94, MAD = 0.999, AIC = 2084.93, BIC = 2169.97, and log likelihood = −1022.46. Conclusions: The advantage of our method is that it relies on the inherent properties of the image itself, rather than the availability of clinical or demographic data. Thus, the image features discovered in this study are universally and readily available across the board.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141685211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}