Aida Brankovic, David Cook, Jessica Rahman, Sankalp Khanna, Wenjie Huang
{"title":"Benchmarking the most popular XAI used for explaining clinical predictive models: Untrustworthy but could be useful.","authors":"Aida Brankovic, David Cook, Jessica Rahman, Sankalp Khanna, Wenjie Huang","doi":"10.1177/14604582241304730","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study aimed to assess the practicality and trustworthiness of explainable artificial intelligence (XAI) methods used for explaining clinical predictive models.</p><p><strong>Methods: </strong>Two popular XAIs used for explaining clinical predictive models were evaluated based on their ability to generate domain-appropriate representations, impact clinical workflow, and consistency. Explanations were benchmarked against true clinical deterioration triggers recorded in the data system and agreement was quantified. The evaluation was conducted using two Electronic Medical Records datasets from major hospitals in Australia. Results were examined and commented on by a senior clinician.</p><p><strong>Results: </strong>Findings demonstrate a violation of consistency criteria and moderate concordance (0.47-0.8) with true triggers, undermining reliability and actionability, criteria for clinicians' trust in XAI.</p><p><strong>Conclusion: </strong>Explanations are not trustworthy to guide clinical interventions, though they may offer useful insights and help model troubleshooting. Clinician-informed XAI development and presentation, clear disclaimers on limitations, and critical clinical judgment can promote informed decisions and prevent over-reliance.</p>","PeriodicalId":55069,"journal":{"name":"Health Informatics Journal","volume":"30 4","pages":"14604582241304730"},"PeriodicalIF":2.2000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Informatics Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/14604582241304730","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: This study aimed to assess the practicality and trustworthiness of explainable artificial intelligence (XAI) methods used for explaining clinical predictive models.
Methods: Two popular XAIs used for explaining clinical predictive models were evaluated based on their ability to generate domain-appropriate representations, impact clinical workflow, and consistency. Explanations were benchmarked against true clinical deterioration triggers recorded in the data system and agreement was quantified. The evaluation was conducted using two Electronic Medical Records datasets from major hospitals in Australia. Results were examined and commented on by a senior clinician.
Results: Findings demonstrate a violation of consistency criteria and moderate concordance (0.47-0.8) with true triggers, undermining reliability and actionability, criteria for clinicians' trust in XAI.
Conclusion: Explanations are not trustworthy to guide clinical interventions, though they may offer useful insights and help model troubleshooting. Clinician-informed XAI development and presentation, clear disclaimers on limitations, and critical clinical judgment can promote informed decisions and prevent over-reliance.
期刊介绍:
Health Informatics Journal is an international peer-reviewed journal. All papers submitted to Health Informatics Journal are subject to peer review by members of a carefully appointed editorial board. The journal operates a conventional single-blind reviewing policy in which the reviewer’s name is always concealed from the submitting author.