Evaluating Explainable Machine Learning Models for Clinicians

IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Cognitive Computation Pub Date : 2024-05-31 DOI:10.1007/s12559-024-10297-x
Noemi Scarpato, Aria Nourbakhsh, Patrizia Ferroni, Silvia Riondino, Mario Roselli, Francesca Fallucchi, Piero Barbanti, Fiorella Guadagni, Fabio Massimo Zanzotto
{"title":"Evaluating Explainable Machine Learning Models for Clinicians","authors":"Noemi Scarpato, Aria Nourbakhsh, Patrizia Ferroni, Silvia Riondino, Mario Roselli, Francesca Fallucchi, Piero Barbanti, Fiorella Guadagni, Fabio Massimo Zanzotto","doi":"10.1007/s12559-024-10297-x","DOIUrl":null,"url":null,"abstract":"<p>Gaining clinicians’ trust will unleash the full potential of artificial intelligence (AI) in medicine, and explaining AI decisions is seen as the way to build trustworthy systems. However, explainable artificial intelligence (XAI) methods in medicine often lack a proper evaluation. In this paper, we present our evaluation methodology for XAI methods using forward simulatability. We define the Forward Simulatability Score (FSS) and analyze its limitations in the context of clinical predictors. Then, we applied FSS to our XAI approach defined over an ML-RO, a machine learning clinical predictor based on random optimization over a multiple kernel support vector machine (SVM) algorithm. To Compare FSS values before and after the explanation phase, we test our evaluation methodology for XAI methods on three clinical datasets, namely breast cancer, VTE, and migraine. The ML-RO system is a good model on which to test our XAI evaluation strategy based on the FSS. Indeed, ML-RO outperforms two other base models—a decision tree (DT) and a plain SVM—in the three datasets and gives the possibility of defining different XAI models: TOPK, MIGF, and F4G. The FSS evaluation score suggests that the explanation method F4G for the ML-RO is the most effective in two datasets out of the three tested, and it shows the limits of the learned model for one dataset. Our study aims to introduce a standard practice for evaluating XAI methods in medicine. By establishing a rigorous evaluation framework, we seek to provide healthcare professionals with reliable tools for assessing the performance of XAI methods to enhance the adoption of AI systems in clinical practice.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"34 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12559-024-10297-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Gaining clinicians’ trust will unleash the full potential of artificial intelligence (AI) in medicine, and explaining AI decisions is seen as the way to build trustworthy systems. However, explainable artificial intelligence (XAI) methods in medicine often lack a proper evaluation. In this paper, we present our evaluation methodology for XAI methods using forward simulatability. We define the Forward Simulatability Score (FSS) and analyze its limitations in the context of clinical predictors. Then, we applied FSS to our XAI approach defined over an ML-RO, a machine learning clinical predictor based on random optimization over a multiple kernel support vector machine (SVM) algorithm. To Compare FSS values before and after the explanation phase, we test our evaluation methodology for XAI methods on three clinical datasets, namely breast cancer, VTE, and migraine. The ML-RO system is a good model on which to test our XAI evaluation strategy based on the FSS. Indeed, ML-RO outperforms two other base models—a decision tree (DT) and a plain SVM—in the three datasets and gives the possibility of defining different XAI models: TOPK, MIGF, and F4G. The FSS evaluation score suggests that the explanation method F4G for the ML-RO is the most effective in two datasets out of the three tested, and it shows the limits of the learned model for one dataset. Our study aims to introduce a standard practice for evaluating XAI methods in medicine. By establishing a rigorous evaluation framework, we seek to provide healthcare professionals with reliable tools for assessing the performance of XAI methods to enhance the adoption of AI systems in clinical practice.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为临床医生评估可解释的机器学习模型
赢得临床医生的信任将充分释放人工智能(AI)在医疗领域的潜力,而解释人工智能的决策被视为建立可信系统的途径。然而,医学中的可解释人工智能(XAI)方法往往缺乏适当的评估。在本文中,我们介绍了利用前向可模拟性对 XAI 方法进行评估的方法。我们定义了前向可模拟性评分(FSS),并分析了其在临床预测方面的局限性。然后,我们将 FSS 应用于在 ML-RO 上定义的 XAI 方法,ML-RO 是一种基于多核支持向量机 (SVM) 算法随机优化的机器学习临床预测器。为了比较解释阶段前后的 FSS 值,我们在三个临床数据集(即乳腺癌、VTE 和偏头痛)上测试了 XAI 方法的评估方法。ML-RO 系统是测试我们基于 FSS 的 XAI 评估策略的良好模型。事实上,ML-RO 在三个数据集上的表现优于其他两个基础模型--决策树(DT)和普通 SVM,并为定义不同的 XAI 模型提供了可能性:TOPK、MIGF 和 F4G。FSS 评估得分表明,ML-RO 的解释方法 F4G 在三个测试数据集中的两个数据集中最为有效,同时也显示了所学模型在一个数据集中的局限性。我们的研究旨在为医学领域的 XAI 方法评估引入标准实践。通过建立一个严格的评估框架,我们试图为医疗保健专业人员提供可靠的工具来评估 XAI 方法的性能,从而促进人工智能系统在临床实践中的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Cognitive Computation
Cognitive Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-NEUROSCIENCES
CiteScore
9.30
自引率
3.70%
发文量
116
审稿时长
>12 weeks
期刊介绍: Cognitive Computation is an international, peer-reviewed, interdisciplinary journal that publishes cutting-edge articles describing original basic and applied work involving biologically-inspired computational accounts of all aspects of natural and artificial cognitive systems. It provides a new platform for the dissemination of research, current practices and future trends in the emerging discipline of cognitive computation that bridges the gap between life sciences, social sciences, engineering, physical and mathematical sciences, and humanities.
期刊最新文献
A Joint Network for Low-Light Image Enhancement Based on Retinex Incorporating Template-Based Contrastive Learning into Cognitively Inspired, Low-Resource Relation Extraction A Novel Cognitive Rough Approach for Severity Analysis of Autistic Children Using Spherical Fuzzy Bipolar Soft Sets Cognitively Inspired Three-Way Decision Making and Bi-Level Evolutionary Optimization for Mobile Cybersecurity Threats Detection: A Case Study on Android Malware Probing Fundamental Visual Comprehend Capabilities on Vision Language Models via Visual Phrases from Structural Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1