预测恶性脑肿瘤患者生存天数的可解释机器学习模型

IF 6.3 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Learning Science and Technology Pub Date : 2023-05-15 DOI:10.1088/2632-2153/acd5a9

Snehal Rajput, Rupal A. Kapdi, M. Raval, Mohendra Roy

{"title":"预测恶性脑肿瘤患者生存天数的可解释机器学习模型","authors":"Snehal Rajput, Rupal A. Kapdi, M. Raval, Mohendra Roy","doi":"10.1088/2632-2153/acd5a9","DOIUrl":null,"url":null,"abstract":"An artificial intelligence (AI) model’s performance is strongly influenced by the input features. Therefore, it is vital to find the optimal feature set. It is more crucial for the survival prediction of the glioblastoma multiforme (GBM) type of brain tumor. In this study, we identify the best feature set for predicting the survival days (SD) of GBM patients that outrank the current state-of-the-art methodologies. The proposed approach is an end-to-end AI model. This model first segments tumors from healthy brain parts in patients’ MRI images, extracts features from the segmented results, performs feature selection, and makes predictions about patients’ survival days (SD) based on selected features. The extracted features are primarily shape-based, location-based, and radiomics-based features. Additionally, patient metadata is also included as a feature. The selection methods include recursive feature elimination, permutation importance (PI), and finding the correlation between the features. Finally, we examined features’ behavior at local (single sample) and global (all the samples) levels. In this study, we find that out of 1265 extracted features, only 29 dominant features play a crucial role in predicting patients’ SD. Among these 29 features, one is metadata (age of patient), three are location-based, and the rest are radiomics features. Furthermore, we find explanations of these features using post-hoc interpretability methods to validate the model’s robust prediction and understand its decision. Finally, we analyzed the behavioral impact of the top six features on survival prediction, and the findings drawn from the explanations were coherent with the medical domain. We find that after the age of 50 years, the likelihood of survival of a patient deteriorates, and survival after 80 years is scarce. Again, for location-based features, the SD is less if the tumor location is in the central or back part of the brain. All these trends derived from the developed AI model are in sync with medically proven facts. The results show an overall 33% improvement in the accuracy of SD prediction compared to the top-performing methods of the BraTS-2020 challenge.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.3000,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Interpretable machine learning model to predict survival days of malignant brain tumor patients\",\"authors\":\"Snehal Rajput, Rupal A. Kapdi, M. Raval, Mohendra Roy\",\"doi\":\"10.1088/2632-2153/acd5a9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An artificial intelligence (AI) model’s performance is strongly influenced by the input features. Therefore, it is vital to find the optimal feature set. It is more crucial for the survival prediction of the glioblastoma multiforme (GBM) type of brain tumor. In this study, we identify the best feature set for predicting the survival days (SD) of GBM patients that outrank the current state-of-the-art methodologies. The proposed approach is an end-to-end AI model. This model first segments tumors from healthy brain parts in patients’ MRI images, extracts features from the segmented results, performs feature selection, and makes predictions about patients’ survival days (SD) based on selected features. The extracted features are primarily shape-based, location-based, and radiomics-based features. Additionally, patient metadata is also included as a feature. The selection methods include recursive feature elimination, permutation importance (PI), and finding the correlation between the features. Finally, we examined features’ behavior at local (single sample) and global (all the samples) levels. In this study, we find that out of 1265 extracted features, only 29 dominant features play a crucial role in predicting patients’ SD. Among these 29 features, one is metadata (age of patient), three are location-based, and the rest are radiomics features. Furthermore, we find explanations of these features using post-hoc interpretability methods to validate the model’s robust prediction and understand its decision. Finally, we analyzed the behavioral impact of the top six features on survival prediction, and the findings drawn from the explanations were coherent with the medical domain. We find that after the age of 50 years, the likelihood of survival of a patient deteriorates, and survival after 80 years is scarce. Again, for location-based features, the SD is less if the tumor location is in the central or back part of the brain. All these trends derived from the developed AI model are in sync with medically proven facts. The results show an overall 33% improvement in the accuracy of SD prediction compared to the top-performing methods of the BraTS-2020 challenge.\",\"PeriodicalId\":33757,\"journal\":{\"name\":\"Machine Learning Science and Technology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2023-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine Learning Science and Technology\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1088/2632-2153/acd5a9\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning Science and Technology","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1088/2632-2153/acd5a9","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 1

摘要

人工智能（AI）模型的性能受到输入特征的强烈影响。因此，找到最优特征集是至关重要的。它对多形性胶质母细胞瘤（GBM）型脑肿瘤的生存预测更为重要。在这项研究中，我们确定了预测GBM患者生存天数（SD）的最佳特征集，该特征集超过了当前最先进的方法。所提出的方法是一个端到端的人工智能模型。该模型首先从患者MRI图像中的健康大脑部分分割肿瘤，从分割结果中提取特征，进行特征选择，并根据所选特征预测患者的生存天数（SD）。提取的特征主要是基于形状、基于位置和基于放射组学的特征。此外，还将患者元数据作为一项功能包括在内。选择方法包括递归特征消除、排列重要性（PI）和寻找特征之间的相关性。最后，我们在局部（单个样本）和全局（所有样本）级别检查了特征的行为。在这项研究中，我们发现在1265个提取的特征中，只有29个主导特征在预测患者SD方面起着至关重要的作用。在这29个特征中，一个是元数据（患者年龄），三个是基于位置的，其余是放射组学特征。此外，我们使用事后可解释性方法来验证模型的鲁棒预测并理解其决策，从而找到对这些特征的解释。最后，我们分析了前六个特征对生存预测的行为影响，从这些解释中得出的结果与医学领域一致。我们发现，在50岁后，患者的存活率会下降，80岁后的存活率很低。同样，对于基于位置的特征，如果肿瘤位置在大脑的中央或后部，则SD较小。所有这些从开发的人工智能模型中得出的趋势都与医学证明的事实一致。结果显示，与BraTS-2020挑战赛中表现最好的方法相比，SD预测的准确性总体提高了33%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Interpretable machine learning model to predict survival days of malignant brain tumor patients

An artificial intelligence (AI) model’s performance is strongly influenced by the input features. Therefore, it is vital to find the optimal feature set. It is more crucial for the survival prediction of the glioblastoma multiforme (GBM) type of brain tumor. In this study, we identify the best feature set for predicting the survival days (SD) of GBM patients that outrank the current state-of-the-art methodologies. The proposed approach is an end-to-end AI model. This model first segments tumors from healthy brain parts in patients’ MRI images, extracts features from the segmented results, performs feature selection, and makes predictions about patients’ survival days (SD) based on selected features. The extracted features are primarily shape-based, location-based, and radiomics-based features. Additionally, patient metadata is also included as a feature. The selection methods include recursive feature elimination, permutation importance (PI), and finding the correlation between the features. Finally, we examined features’ behavior at local (single sample) and global (all the samples) levels. In this study, we find that out of 1265 extracted features, only 29 dominant features play a crucial role in predicting patients’ SD. Among these 29 features, one is metadata (age of patient), three are location-based, and the rest are radiomics features. Furthermore, we find explanations of these features using post-hoc interpretability methods to validate the model’s robust prediction and understand its decision. Finally, we analyzed the behavioral impact of the top six features on survival prediction, and the findings drawn from the explanations were coherent with the medical domain. We find that after the age of 50 years, the likelihood of survival of a patient deteriorates, and survival after 80 years is scarce. Again, for location-based features, the SD is less if the tumor location is in the central or back part of the brain. All these trends derived from the developed AI model are in sync with medically proven facts. The results show an overall 33% improvement in the accuracy of SD prediction compared to the top-performing methods of the BraTS-2020 challenge.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine Learning Science and Technology Computer Science-Artificial Intelligence

CiteScore

9.10

自引率

4.40%

发文量

审稿时长

5 weeks

期刊介绍： Machine Learning Science and Technology is a multidisciplinary open access journal that bridges the application of machine learning across the sciences with advances in machine learning methods and theory as motivated by physical insights. Specifically, articles must fall into one of the following categories: advance the state of machine learning-driven applications in the sciences or make conceptual, methodological or theoretical advances in machine learning with applications to, inspiration from, or motivated by scientific problems.