Pub Date : 2024-04-01DOI: 10.3390/biomedinformatics4020050
Joshua Chuah, Pingkun Yan, Ge Wang, Juergen Hahn
Background: Machine learning (ML) and artificial intelligence (AI)-based classifiers can be used to diagnose diseases from medical imaging data. However, few of the classifiers proposed in the literature translate to clinical use because of robustness concerns. Materials and methods: This study investigates how to improve the robustness of AI/ML imaging classifiers by simultaneously applying perturbations of common effects (Gaussian noise, contrast, blur, rotation, and tilt) to different amounts of training and test images. Furthermore, a comparison with classifiers trained with adversarial noise is also presented. This procedure is illustrated using two publicly available datasets, the PneumoniaMNIST dataset and the Breast Ultrasound Images dataset (BUSI dataset). Results: Classifiers trained with small amounts of perturbed training images showed similar performance on unperturbed test images compared to the classifier trained with no perturbations. Additionally, classifiers trained with perturbed data performed significantly better on test data both perturbed by a single perturbation (p-values: noise = 0.0186; contrast = 0.0420; rotation, tilt, and blur = 0.000977) and multiple perturbations (p-values: PneumoniaMNIST = 0.000977; BUSI = 0.00684) than the classifier trained with unperturbed data. Conclusions: Classifiers trained with perturbed data were found to be more robust to perturbed test data than the unperturbed classifier without exhibiting a performance decrease on unperturbed test images, indicating benefits to training with data that include some perturbed images and no significant downsides.
{"title":"Towards the Generation of Medical Imaging Classifiers Robust to Common Perturbations","authors":"Joshua Chuah, Pingkun Yan, Ge Wang, Juergen Hahn","doi":"10.3390/biomedinformatics4020050","DOIUrl":"https://doi.org/10.3390/biomedinformatics4020050","url":null,"abstract":"Background: Machine learning (ML) and artificial intelligence (AI)-based classifiers can be used to diagnose diseases from medical imaging data. However, few of the classifiers proposed in the literature translate to clinical use because of robustness concerns. Materials and methods: This study investigates how to improve the robustness of AI/ML imaging classifiers by simultaneously applying perturbations of common effects (Gaussian noise, contrast, blur, rotation, and tilt) to different amounts of training and test images. Furthermore, a comparison with classifiers trained with adversarial noise is also presented. This procedure is illustrated using two publicly available datasets, the PneumoniaMNIST dataset and the Breast Ultrasound Images dataset (BUSI dataset). Results: Classifiers trained with small amounts of perturbed training images showed similar performance on unperturbed test images compared to the classifier trained with no perturbations. Additionally, classifiers trained with perturbed data performed significantly better on test data both perturbed by a single perturbation (p-values: noise = 0.0186; contrast = 0.0420; rotation, tilt, and blur = 0.000977) and multiple perturbations (p-values: PneumoniaMNIST = 0.000977; BUSI = 0.00684) than the classifier trained with unperturbed data. Conclusions: Classifiers trained with perturbed data were found to be more robust to perturbed test data than the unperturbed classifier without exhibiting a performance decrease on unperturbed test images, indicating benefits to training with data that include some perturbed images and no significant downsides.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140758561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01DOI: 10.3390/biomedinformatics4020051
Marek Żyliński, Amir Nassibi, Edoardo Occhipinti, Adil Malik, Matteo Bermond, H. Davies, Danilo P. Mandic
Background: Ambulatory heart rate (HR) monitors that acquire electrocardiogram (ECG) or/and photoplethysmographm (PPG) signals from the torso, wrists, or ears are notably less accurate in tasks associated with high levels of movement compared to clinical measurements. However, a reliable estimation of HR can be obtained through data fusion from different sensors. These methods are especially suitable for multimodal hearable devices, where heart rate can be tracked from different modalities, including electrical ECG, optical PPG, and sounds (heart tones). Combined information from different modalities can compensate for single source limitations. Methods: In this paper, we evaluate the possible application of data fusion methods in hearables. We assess data fusion for heart rate estimation from simultaneous in-ear ECG and in-ear PPG, recorded on ten subjects while performing 5-min sitting and walking tasks. Results: Our findings show that data fusion methods provide a similar level of mean absolute error as the best single-source heart rate estimation but with much lower intra-subject variability, especially during walking activities. Conclusion: We conclude that data fusion methods provide more robust HR estimation than a single cardiovascular signal. These methods can enhance the performance of wearable devices, especially multimodal hearables, in heart rate tracking during physical activity.
{"title":"Hearables: In-Ear Multimodal Data Fusion for Robust Heart Rate Estimation","authors":"Marek Żyliński, Amir Nassibi, Edoardo Occhipinti, Adil Malik, Matteo Bermond, H. Davies, Danilo P. Mandic","doi":"10.3390/biomedinformatics4020051","DOIUrl":"https://doi.org/10.3390/biomedinformatics4020051","url":null,"abstract":"Background: Ambulatory heart rate (HR) monitors that acquire electrocardiogram (ECG) or/and photoplethysmographm (PPG) signals from the torso, wrists, or ears are notably less accurate in tasks associated with high levels of movement compared to clinical measurements. However, a reliable estimation of HR can be obtained through data fusion from different sensors. These methods are especially suitable for multimodal hearable devices, where heart rate can be tracked from different modalities, including electrical ECG, optical PPG, and sounds (heart tones). Combined information from different modalities can compensate for single source limitations. Methods: In this paper, we evaluate the possible application of data fusion methods in hearables. We assess data fusion for heart rate estimation from simultaneous in-ear ECG and in-ear PPG, recorded on ten subjects while performing 5-min sitting and walking tasks. Results: Our findings show that data fusion methods provide a similar level of mean absolute error as the best single-source heart rate estimation but with much lower intra-subject variability, especially during walking activities. Conclusion: We conclude that data fusion methods provide more robust HR estimation than a single cardiovascular signal. These methods can enhance the performance of wearable devices, especially multimodal hearables, in heart rate tracking during physical activity.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140795589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01DOI: 10.3390/biomedinformatics4020054
Rezaul Haque, Abdullah Al Sakib, Md Forhad Hossain, Fahadul Islam, Ferdaus Ibne Aziz, Md Redwan Ahmed, Somasundar Kannan, Ali Rohan, Md Junayed Hasan
Disease recognition has been revolutionized by autonomous systems in the rapidly developing field of medical technology. A crucial aspect of diagnosis involves the visual assessment and enumeration of white blood cells in microscopic peripheral blood smears. This practice yields invaluable insights into a patient’s health, enabling the identification of conditions of blood malignancies such as leukemia. Early identification of leukemia subtypes is paramount for tailoring appropriate therapeutic interventions and enhancing patient survival rates. However, traditional diagnostic techniques, which depend on visual assessment, are arbitrary, laborious, and prone to errors. The advent of ML technologies offers a promising avenue for more accurate and efficient leukemia classification. In this study, we introduced a novel approach to leukemia classification by integrating advanced image processing, diverse dataset utilization, and sophisticated feature extraction techniques, coupled with the development of TL models. Focused on improving accuracy of previous studies, our approach utilized Kaggle datasets for binary and multiclass classifications. Extensive image processing involved a novel LoGMH method, complemented by diverse augmentation techniques. Feature extraction employed DCNN, with subsequent utilization of extracted features to train various ML and TL models. Rigorous evaluation using traditional metrics revealed Inception-ResNet’s superior performance, surpassing other models with F1 scores of 96.07% and 95.89% for binary and multiclass classification, respectively. Our results notably surpass previous research, particularly in cases involving a higher number of classes. These findings promise to influence clinical decision support systems, guide future research, and potentially revolutionize cancer diagnostics beyond leukemia, impacting broader medical imaging and oncology domains.
在快速发展的医疗技术领域,自主系统为疾病识别带来了革命性的变化。诊断的一个重要方面是对显微外周血涂片中的白细胞进行目测和计数。这种做法能为了解病人的健康状况提供宝贵的信息,从而识别血液恶性肿瘤(如白血病)的病情。早期识别白血病亚型对于制定适当的治疗干预措施和提高患者存活率至关重要。然而,依赖视觉评估的传统诊断技术随意性大、费力且容易出错。人工智能技术的出现为更准确、更高效地进行白血病分类提供了一条大有可为的途径。在这项研究中,我们通过整合先进的图像处理、多样化的数据集利用、复杂的特征提取技术以及 TL 模型的开发,引入了一种新的白血病分类方法。为了提高以往研究的准确性,我们的方法利用 Kaggle 数据集进行二元和多元分类。广泛的图像处理涉及一种新颖的 LoGMH 方法,并辅以多种增强技术。特征提取采用 DCNN,随后利用提取的特征训练各种 ML 和 TL 模型。使用传统指标进行的严格评估显示,Inception-ResNet 的性能优越,在二分类和多分类方面的 F1 分数分别为 96.07% 和 95.89%,超过了其他模型。我们的结果明显超过了之前的研究,尤其是在涉及较多类别的情况下。这些发现有望影响临床决策支持系统,指导未来的研究,并有可能彻底改变白血病以外的癌症诊断,影响更广泛的医学成像和肿瘤学领域。
{"title":"Advancing Early Leukemia Diagnostics: A Comprehensive Study Incorporating Image Processing and Transfer Learning","authors":"Rezaul Haque, Abdullah Al Sakib, Md Forhad Hossain, Fahadul Islam, Ferdaus Ibne Aziz, Md Redwan Ahmed, Somasundar Kannan, Ali Rohan, Md Junayed Hasan","doi":"10.3390/biomedinformatics4020054","DOIUrl":"https://doi.org/10.3390/biomedinformatics4020054","url":null,"abstract":"Disease recognition has been revolutionized by autonomous systems in the rapidly developing field of medical technology. A crucial aspect of diagnosis involves the visual assessment and enumeration of white blood cells in microscopic peripheral blood smears. This practice yields invaluable insights into a patient’s health, enabling the identification of conditions of blood malignancies such as leukemia. Early identification of leukemia subtypes is paramount for tailoring appropriate therapeutic interventions and enhancing patient survival rates. However, traditional diagnostic techniques, which depend on visual assessment, are arbitrary, laborious, and prone to errors. The advent of ML technologies offers a promising avenue for more accurate and efficient leukemia classification. In this study, we introduced a novel approach to leukemia classification by integrating advanced image processing, diverse dataset utilization, and sophisticated feature extraction techniques, coupled with the development of TL models. Focused on improving accuracy of previous studies, our approach utilized Kaggle datasets for binary and multiclass classifications. Extensive image processing involved a novel LoGMH method, complemented by diverse augmentation techniques. Feature extraction employed DCNN, with subsequent utilization of extracted features to train various ML and TL models. Rigorous evaluation using traditional metrics revealed Inception-ResNet’s superior performance, surpassing other models with F1 scores of 96.07% and 95.89% for binary and multiclass classification, respectively. Our results notably surpass previous research, particularly in cases involving a higher number of classes. These findings promise to influence clinical decision support systems, guide future research, and potentially revolutionize cancer diagnostics beyond leukemia, impacting broader medical imaging and oncology domains.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140785117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-25DOI: 10.3390/biomedinformatics4020049
Maurizio Cè, Vittoria Chiarpenello, Alessandra Bubba, P. Felisaz, G. Oliva, Giovanni Irmici, M. Cellina
Introduction: Oncological patients face numerous challenges throughout their cancer journey while navigating complex medical information. The advent of AI-based conversational models like ChatGPT (San Francisco, OpenAI) represents an innovation in oncological patient management. Methods: We conducted a comprehensive review of the literature on the use of ChatGPT in providing tailored information and support to patients with various types of cancer, including head and neck, liver, prostate, breast, lung, pancreas, colon, and cervical cancer. Results and Discussion: Our findings indicate that, in most instances, ChatGPT responses were accurate, dependable, and aligned with the expertise of oncology professionals, especially for certain subtypes of cancers like head and neck and prostate cancers. Furthermore, the system demonstrated a remarkable ability to comprehend patients’ emotional responses and offer proactive solutions and advice. Nevertheless, these models have also showed notable limitations and cannot serve as a substitute for the role of a physician under any circumstances. Conclusions: Conversational models like ChatGPT can significantly enhance the overall well-being and empowerment of oncological patients. Both patients and healthcare providers must become well-versed in the advantages and limitations of these emerging technologies.
{"title":"Exploring the Role of ChatGPT in Oncology: Providing Information and Support for Cancer Patients","authors":"Maurizio Cè, Vittoria Chiarpenello, Alessandra Bubba, P. Felisaz, G. Oliva, Giovanni Irmici, M. Cellina","doi":"10.3390/biomedinformatics4020049","DOIUrl":"https://doi.org/10.3390/biomedinformatics4020049","url":null,"abstract":"Introduction: Oncological patients face numerous challenges throughout their cancer journey while navigating complex medical information. The advent of AI-based conversational models like ChatGPT (San Francisco, OpenAI) represents an innovation in oncological patient management. Methods: We conducted a comprehensive review of the literature on the use of ChatGPT in providing tailored information and support to patients with various types of cancer, including head and neck, liver, prostate, breast, lung, pancreas, colon, and cervical cancer. Results and Discussion: Our findings indicate that, in most instances, ChatGPT responses were accurate, dependable, and aligned with the expertise of oncology professionals, especially for certain subtypes of cancers like head and neck and prostate cancers. Furthermore, the system demonstrated a remarkable ability to comprehend patients’ emotional responses and offer proactive solutions and advice. Nevertheless, these models have also showed notable limitations and cannot serve as a substitute for the role of a physician under any circumstances. Conclusions: Conversational models like ChatGPT can significantly enhance the overall well-being and empowerment of oncological patients. Both patients and healthcare providers must become well-versed in the advantages and limitations of these emerging technologies.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140381646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-19DOI: 10.3390/biomedinformatics4010048
Bujar Raufi, Luca Longo
Background: Creating models to differentiate self-reported mental workload perceptions is challenging and requires machine learning to identify features from EEG signals. EEG band ratios quantify human activity, but limited research on mental workload assessment exists. This study evaluates the use of theta-to-alpha and alpha-to-theta EEG band ratio features to distinguish human self-reported perceptions of mental workload. Methods: In this study, EEG data from 48 participants were analyzed while engaged in resting and task-intensive activities. Multiple mental workload indices were developed using different EEG channel clusters and band ratios. ANOVA’s F-score and PowerSHAP were used to extract the statistical features. At the same time, models were built and tested using techniques such as Logistic Regression, Gradient Boosting, and Random Forest. These models were then explained using Shapley Additive Explanations. Results: Based on the results, using PowerSHAP to select features led to improved model performance, exhibiting an accuracy exceeding 90% across three mental workload indexes. In contrast, statistical techniques for model building indicated poorer results across all mental workload indexes. Moreover, using Shapley values to evaluate feature contributions to the model output, it was noted that features rated low in importance by both ANOVA F-score and PowerSHAP measures played the most substantial role in determining the model output. Conclusions: Using models with Shapley values can reduce data complexity and improve the training of better discriminative models for perceived human mental workload. However, the outcomes can sometimes be unclear due to variations in the significance of features during the selection process and their actual impact on the model output.
{"title":"Comparing ANOVA and PowerShap Feature Selection Methods via Shapley Additive Explanations of Models of Mental Workload Built with the Theta and Alpha EEG Band Ratios","authors":"Bujar Raufi, Luca Longo","doi":"10.3390/biomedinformatics4010048","DOIUrl":"https://doi.org/10.3390/biomedinformatics4010048","url":null,"abstract":"Background: Creating models to differentiate self-reported mental workload perceptions is challenging and requires machine learning to identify features from EEG signals. EEG band ratios quantify human activity, but limited research on mental workload assessment exists. This study evaluates the use of theta-to-alpha and alpha-to-theta EEG band ratio features to distinguish human self-reported perceptions of mental workload. Methods: In this study, EEG data from 48 participants were analyzed while engaged in resting and task-intensive activities. Multiple mental workload indices were developed using different EEG channel clusters and band ratios. ANOVA’s F-score and PowerSHAP were used to extract the statistical features. At the same time, models were built and tested using techniques such as Logistic Regression, Gradient Boosting, and Random Forest. These models were then explained using Shapley Additive Explanations. Results: Based on the results, using PowerSHAP to select features led to improved model performance, exhibiting an accuracy exceeding 90% across three mental workload indexes. In contrast, statistical techniques for model building indicated poorer results across all mental workload indexes. Moreover, using Shapley values to evaluate feature contributions to the model output, it was noted that features rated low in importance by both ANOVA F-score and PowerSHAP measures played the most substantial role in determining the model output. Conclusions: Using models with Shapley values can reduce data complexity and improve the training of better discriminative models for perceived human mental workload. However, the outcomes can sometimes be unclear due to variations in the significance of features during the selection process and their actual impact on the model output.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140229698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-14DOI: 10.3390/biomedinformatics4010047
J. Chow, Valerie Wong, Kay Li
This review explores the transformative integration of artificial intelligence (AI) and healthcare through conversational AI leveraging Natural Language Processing (NLP). Focusing on Large Language Models (LLMs), this paper navigates through various sections, commencing with an overview of AI’s significance in healthcare and the role of conversational AI. It delves into fundamental NLP techniques, emphasizing their facilitation of seamless healthcare conversations. Examining the evolution of LLMs within NLP frameworks, the paper discusses key models used in healthcare, exploring their advantages and implementation challenges. Practical applications in healthcare conversations, from patient-centric utilities like diagnosis and treatment suggestions to healthcare provider support systems, are detailed. Ethical and legal considerations, including patient privacy, ethical implications, and regulatory compliance, are addressed. The review concludes by spotlighting current challenges, envisaging future trends, and highlighting the transformative potential of LLMs and NLP in reshaping healthcare interactions.
{"title":"Generative Pre-Trained Transformer-Empowered Healthcare Conversations: Current Trends, Challenges, and Future Directions in Large Language Model-Enabled Medical Chatbots","authors":"J. Chow, Valerie Wong, Kay Li","doi":"10.3390/biomedinformatics4010047","DOIUrl":"https://doi.org/10.3390/biomedinformatics4010047","url":null,"abstract":"This review explores the transformative integration of artificial intelligence (AI) and healthcare through conversational AI leveraging Natural Language Processing (NLP). Focusing on Large Language Models (LLMs), this paper navigates through various sections, commencing with an overview of AI’s significance in healthcare and the role of conversational AI. It delves into fundamental NLP techniques, emphasizing their facilitation of seamless healthcare conversations. Examining the evolution of LLMs within NLP frameworks, the paper discusses key models used in healthcare, exploring their advantages and implementation challenges. Practical applications in healthcare conversations, from patient-centric utilities like diagnosis and treatment suggestions to healthcare provider support systems, are detailed. Ethical and legal considerations, including patient privacy, ethical implications, and regulatory compliance, are addressed. The review concludes by spotlighting current challenges, envisaging future trends, and highlighting the transformative potential of LLMs and NLP in reshaping healthcare interactions.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140244748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-13DOI: 10.3390/biomedinformatics4010046
Kleanthis Marios Papadopoulos, P. Barmpoutis, Tania Stathaki, V. Kepenekian, Peggy Dartigues, S. Valmary-Degano, Claire Illac-Vauquelin, G. Avérous, A. Chevallier, M. Lavérriere, L. Villeneuve, Olivier Glehen, Sylvie Isaac, J. Hommell-Fontaine, Francois Ng Kee Kwong, N. Benzerdjeb
Background: The advent of Deep Learning initiated a new era in which neural networks relying solely on Whole-Slide Images can estimate the survival time of cancer patients. Remarkably, despite deep learning’s potential in this domain, no prior research has been conducted on image-based survival analysis specifically for peritoneal mesothelioma. Prior studies performed statistical analysis to identify disease factors impacting patients’ survival time. Methods: Therefore, we introduce MPeMSupervisedSurv, a Convolutional Neural Network designed to predict the survival time of patients diagnosed with this disease. We subsequently perform patient stratification based on factors such as their Peritoneal Cancer Index and on whether patients received chemotherapy treatment. Results: MPeMSupervisedSurv demonstrates improvements over comparable methods. Using our proposed model, we performed patient stratification to assess the impact of clinical variables on survival time. Notably, the inclusion of information regarding adjuvant chemotherapy significantly enhances the model’s predictive prowess. Conversely, repeating the process for other factors did not yield significant performance improvements. Conclusions: Overall, MPeMSupervisedSurv is an effective neural network which can predict the survival time of peritoneal mesothelioma patients. Our findings also indicate that treatment by adjuvant chemotherapy could be a factor affecting survival time.
{"title":"Overall Survival Time Estimation for Epithelioid Peritoneal Mesothelioma Patients from Whole-Slide Images","authors":"Kleanthis Marios Papadopoulos, P. Barmpoutis, Tania Stathaki, V. Kepenekian, Peggy Dartigues, S. Valmary-Degano, Claire Illac-Vauquelin, G. Avérous, A. Chevallier, M. Lavérriere, L. Villeneuve, Olivier Glehen, Sylvie Isaac, J. Hommell-Fontaine, Francois Ng Kee Kwong, N. Benzerdjeb","doi":"10.3390/biomedinformatics4010046","DOIUrl":"https://doi.org/10.3390/biomedinformatics4010046","url":null,"abstract":"Background: The advent of Deep Learning initiated a new era in which neural networks relying solely on Whole-Slide Images can estimate the survival time of cancer patients. Remarkably, despite deep learning’s potential in this domain, no prior research has been conducted on image-based survival analysis specifically for peritoneal mesothelioma. Prior studies performed statistical analysis to identify disease factors impacting patients’ survival time. Methods: Therefore, we introduce MPeMSupervisedSurv, a Convolutional Neural Network designed to predict the survival time of patients diagnosed with this disease. We subsequently perform patient stratification based on factors such as their Peritoneal Cancer Index and on whether patients received chemotherapy treatment. Results: MPeMSupervisedSurv demonstrates improvements over comparable methods. Using our proposed model, we performed patient stratification to assess the impact of clinical variables on survival time. Notably, the inclusion of information regarding adjuvant chemotherapy significantly enhances the model’s predictive prowess. Conversely, repeating the process for other factors did not yield significant performance improvements. Conclusions: Overall, MPeMSupervisedSurv is an effective neural network which can predict the survival time of peritoneal mesothelioma patients. Our findings also indicate that treatment by adjuvant chemotherapy could be a factor affecting survival time.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140247976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-06DOI: 10.3390/biomedinformatics4010043
Zain Jabbar, Peter Washington
Electronic Health Records (EHR) provide a vast amount of patient data that are relevant to predicting clinical outcomes. The inherent presence of missing values poses challenges to building performant machine learning models. This paper aims to investigate the effect of various imputation methods on the National Institutes of Health’s All of Us dataset, a dataset containing a high degree of data missingness. We apply several imputation techniques such as mean substitution, constant filling, and multiple imputation on the same dataset for the task of diabetes prediction. We find that imputing values causes heteroskedastic performance for machine learning models with increased data missingness. That is, the more missing values a patient has for their tests, the higher variance there is on a diabetes model AUROC, F1, precision, recall, and accuracy scores. This highlights a critical challenge in using EHR data for predictive modeling. This work highlights the need for future research to develop methodologies to mitigate the effects of missing data and heteroskedasticity in EHR-based predictive models.
电子健康记录(EHR)提供了大量与预测临床结果相关的患者数据。缺失值的固有存在给建立性能良好的机器学习模型带来了挑战。本文旨在研究各种估算方法对美国国立卫生研究院的 "All of Us "数据集的影响。我们在同一数据集上应用了几种归因技术,如均值替换、常数填充和多重归因,以完成糖尿病预测任务。我们发现,随着数据缺失度的增加,估算值会导致机器学习模型的异方差性能。也就是说,患者测试的缺失值越多,糖尿病模型的 AUROC、F1、精确度、召回率和准确度得分的方差就越大。这凸显了使用电子病历数据进行预测建模的一个关键挑战。这项工作凸显了未来研究的必要性,即在基于电子病历的预测模型中开发减轻缺失数据和异方差影响的方法。
{"title":"The Effect of Data Missingness on Machine Learning Predictions of Uncontrolled Diabetes Using All of Us Data","authors":"Zain Jabbar, Peter Washington","doi":"10.3390/biomedinformatics4010043","DOIUrl":"https://doi.org/10.3390/biomedinformatics4010043","url":null,"abstract":"Electronic Health Records (EHR) provide a vast amount of patient data that are relevant to predicting clinical outcomes. The inherent presence of missing values poses challenges to building performant machine learning models. This paper aims to investigate the effect of various imputation methods on the National Institutes of Health’s All of Us dataset, a dataset containing a high degree of data missingness. We apply several imputation techniques such as mean substitution, constant filling, and multiple imputation on the same dataset for the task of diabetes prediction. We find that imputing values causes heteroskedastic performance for machine learning models with increased data missingness. That is, the more missing values a patient has for their tests, the higher variance there is on a diabetes model AUROC, F1, precision, recall, and accuracy scores. This highlights a critical challenge in using EHR data for predictive modeling. This work highlights the need for future research to develop methodologies to mitigate the effects of missing data and heteroskedasticity in EHR-based predictive models.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140261449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-04DOI: 10.3390/biomedinformatics4010042
Stella C. Christopoulou
Background: Over the past few years, clinical studies have utilized machine learning in telehealth and smart care for disease management, self-management, and managing health issues like pulmonary diseases, heart failure, diabetes screening, and intraoperative risks. However, a systematic review of machine learning’s use in evidence-based telehealth and smart care is lacking, as evidence-based practice aims to eliminate biases and subjective opinions. Methods: The author conducted a mixed methods review to explore machine learning applications in evidence-based telehealth and smart care. A systematic search of the literature was performed during 16 June 2023–27 June 2023 in Google Scholar, PubMed, and the clinical registry platform ClinicalTrials.gov. The author included articles in the review if they were implemented by evidence-based health informatics and concerned with telehealth and smart care technologies. Results: The author identifies 18 key studies (17 clinical trials) from 175 citations found in internet databases and categorizes them using problem-specific groupings, medical/health domains, machine learning models, algorithms, and techniques. Conclusions: Machine learning combined with the application of evidence-based practices in healthcare can enhance telehealth and smart care strategies by improving quality of personalized care, early detection of health-related problems, patient quality of life, patient-physician communication, resource efficiency and cost-effectiveness. However, this requires interdisciplinary expertise and collaboration among stakeholders, including clinicians, informaticians, and policymakers. Therefore, further research using clinicall studies, systematic reviews, analyses, and meta-analyses is required to fully exploit the potential of machine learning in this area.
{"title":"Machine Learning Models and Technologies for Evidence-Based Telehealth and Smart Care: A Review","authors":"Stella C. Christopoulou","doi":"10.3390/biomedinformatics4010042","DOIUrl":"https://doi.org/10.3390/biomedinformatics4010042","url":null,"abstract":"Background: Over the past few years, clinical studies have utilized machine learning in telehealth and smart care for disease management, self-management, and managing health issues like pulmonary diseases, heart failure, diabetes screening, and intraoperative risks. However, a systematic review of machine learning’s use in evidence-based telehealth and smart care is lacking, as evidence-based practice aims to eliminate biases and subjective opinions. Methods: The author conducted a mixed methods review to explore machine learning applications in evidence-based telehealth and smart care. A systematic search of the literature was performed during 16 June 2023–27 June 2023 in Google Scholar, PubMed, and the clinical registry platform ClinicalTrials.gov. The author included articles in the review if they were implemented by evidence-based health informatics and concerned with telehealth and smart care technologies. Results: The author identifies 18 key studies (17 clinical trials) from 175 citations found in internet databases and categorizes them using problem-specific groupings, medical/health domains, machine learning models, algorithms, and techniques. Conclusions: Machine learning combined with the application of evidence-based practices in healthcare can enhance telehealth and smart care strategies by improving quality of personalized care, early detection of health-related problems, patient quality of life, patient-physician communication, resource efficiency and cost-effectiveness. However, this requires interdisciplinary expertise and collaboration among stakeholders, including clinicians, informaticians, and policymakers. Therefore, further research using clinicall studies, systematic reviews, analyses, and meta-analyses is required to fully exploit the potential of machine learning in this area.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140079600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-02DOI: 10.3390/biomedinformatics4010041
Sergio Sánchez-Herrero, Abtin Tondar, E. Pérez-Bernabeu, Laura Calvet, Angel A. Juan
Background: Antibiotics can play a pivotal role in the treatment of colorectal cancer (CRC) at various stages of the disease, both directly and indirectly. Identifying novel patterns of antibiotic effects or responses in CRC within extensive medical data poses a significant challenge that can be addressed through algorithmic approaches. Machine Learning (ML) emerges as a promising solution for predicting clinical outcomes using clinical and heterogeneous cancer data. In the pursuit of our objective, we employed ML techniques for predicting CRC mortality and antibiotic influence. Methods: We utilized a dataset to examine the accuracy of death prediction in metastatic colorectal cancer. In addition, we analyzed the association between antibiotic exposure and mortality in metastatic colorectal cancer. The dataset comprised 147 patients, nineteen independent variables, and one dependent variable. Our analysis involved testing different classification-supervised ML, including an oversampling pool for classification models, Logistic Regression, Decision Trees, Naive Bayes, Support Vector Machine, Random Forest, XGBboost Classifier, a consensus of all models, and a consensus of top models (meta models). Results: The consensus of the top models’ classifier exhibited the highest accuracy among the algorithms tested (93%). This model met the standards for good accuracy, surpassing the 90% threshold considered useful in ML applications. Consistent with the accuracy results, other metrics are also good, including precision (0.96), recall (0.93), F-Beta (0.94), and AUC (0.93). Hazard ratio analysis suggests that there is no discernible difference between patients who received antibiotics and those who did not. Conclusions: Our modelling approach provides an alternative for analyzing and predicting the relationship between antibiotics and mortality in metastatic colorectal cancer patients treated with bevacizumab, complementing classic statistical methods. This methodology lays the groundwork for future use of datasets in cancer treatment research and highlights the advantages of meta models.
背景:抗生素在结直肠癌(CRC)治疗的各个阶段都能直接或间接地发挥关键作用。在大量医疗数据中识别抗生素对 CRC 的影响或反应的新模式是一项重大挑战,可通过算法方法加以解决。机器学习(ML)是利用临床和异构癌症数据预测临床结果的一种有前途的解决方案。为了实现我们的目标,我们采用了 ML 技术来预测 CRC 死亡率和抗生素的影响。方法我们利用一个数据集来检验转移性结直肠癌死亡预测的准确性。此外,我们还分析了抗生素暴露与转移性结直肠癌死亡率之间的关联。数据集包括 147 名患者、19 个自变量和 1 个因变量。我们的分析涉及测试不同的分类监督 ML,包括分类模型的超采样池、逻辑回归、决策树、Naive Bayes、支持向量机、随机森林、XGBboost 分类器、所有模型的共识以及顶级模型的共识(元模型)。结果:在所测试的算法中,顶级模型分类器共识的准确率最高(93%)。该模型达到了良好准确率的标准,超过了 90% 的阈值,在 ML 应用中被认为是有用的。与准确率结果一致,其他指标也很好,包括精确度(0.96)、召回率(0.93)、F-Beta(0.94)和 AUC(0.93)。危险比分析表明,接受抗生素治疗的患者与未接受抗生素治疗的患者之间没有明显差异。结论:我们的建模方法为分析和预测接受贝伐珠单抗治疗的转移性结直肠癌患者抗生素与死亡率之间的关系提供了一种替代方法,是对传统统计方法的补充。这种方法为今后在癌症治疗研究中使用数据集奠定了基础,并凸显了元模型的优势。
{"title":"Forecasting Survival Rates in Metastatic Colorectal Cancer Patients Undergoing Bevacizumab-Based Chemotherapy: A Machine Learning Approach","authors":"Sergio Sánchez-Herrero, Abtin Tondar, E. Pérez-Bernabeu, Laura Calvet, Angel A. Juan","doi":"10.3390/biomedinformatics4010041","DOIUrl":"https://doi.org/10.3390/biomedinformatics4010041","url":null,"abstract":"Background: Antibiotics can play a pivotal role in the treatment of colorectal cancer (CRC) at various stages of the disease, both directly and indirectly. Identifying novel patterns of antibiotic effects or responses in CRC within extensive medical data poses a significant challenge that can be addressed through algorithmic approaches. Machine Learning (ML) emerges as a promising solution for predicting clinical outcomes using clinical and heterogeneous cancer data. In the pursuit of our objective, we employed ML techniques for predicting CRC mortality and antibiotic influence. Methods: We utilized a dataset to examine the accuracy of death prediction in metastatic colorectal cancer. In addition, we analyzed the association between antibiotic exposure and mortality in metastatic colorectal cancer. The dataset comprised 147 patients, nineteen independent variables, and one dependent variable. Our analysis involved testing different classification-supervised ML, including an oversampling pool for classification models, Logistic Regression, Decision Trees, Naive Bayes, Support Vector Machine, Random Forest, XGBboost Classifier, a consensus of all models, and a consensus of top models (meta models). Results: The consensus of the top models’ classifier exhibited the highest accuracy among the algorithms tested (93%). This model met the standards for good accuracy, surpassing the 90% threshold considered useful in ML applications. Consistent with the accuracy results, other metrics are also good, including precision (0.96), recall (0.93), F-Beta (0.94), and AUC (0.93). Hazard ratio analysis suggests that there is no discernible difference between patients who received antibiotics and those who did not. Conclusions: Our modelling approach provides an alternative for analyzing and predicting the relationship between antibiotics and mortality in metastatic colorectal cancer patients treated with bevacizumab, complementing classic statistical methods. This methodology lays the groundwork for future use of datasets in cancer treatment research and highlights the advantages of meta models.","PeriodicalId":72394,"journal":{"name":"BioMedInformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140081693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}