首页 > 最新文献

Proceedings of the ACM Conference on Health, Inference, and Learning最新文献

英文 中文
Explaining a machine learning decision to physicians via counterfactuals 通过反事实向医生解释机器学习的决定
Pub Date : 2023-06-10 DOI: 10.48550/arXiv.2306.06325
Supriya Nagesh, Nina Mishra, Yonatan Naamad, James M. Rehg, M. A. Shah, Alexei Wagner
Machine learning models perform well on several healthcare tasks and can help reduce the burden on the healthcare system. However, the lack of explainability is a major roadblock to their adoption in hospitals. textit{How can the decision of an ML model be explained to a physician?} The explanations considered in this paper are counterfactuals (CFs), hypothetical scenarios that would have resulted in the opposite outcome. Specifically, time-series CFs are investigated, inspired by the way physicians converse and reason out decisions `I would have given the patient a vasopressor if their blood pressure was lower and falling'. Key properties of CFs that are particularly meaningful in clinical settings are outlined: physiological plausibility, relevance to the task and sparse perturbations. Past work on CF generation does not satisfy these properties, specifically plausibility in that realistic time-series CFs are not generated. A variational autoencoder (VAE)-based approach is proposed that captures these desired properties. The method produces CFs that improve on prior approaches quantitatively (more plausible CFs as evaluated by their likelihood w.r.t original data distribution, and 100$times$ faster at generating CFs) and qualitatively (2$times$ more plausible and relevant) as evaluated by three physicians.
机器学习模型在一些医疗保健任务上表现良好,可以帮助减轻医疗保健系统的负担。然而,缺乏可解释性是医院采用它们的主要障碍。textit{如何向医生解释ML模型的决定?}本文考虑的解释是反事实(CFs),即会导致相反结果的假设情景。具体来说,时间序列CFs的研究受到了医生交谈和推理决定的启发,“如果病人的血压较低且在下降,我就会给他们服用血管加压药”。本文概述了在临床环境中特别有意义的CFs的关键特性:生理上的合理性、与任务的相关性和稀疏的扰动。过去关于CF生成的工作不满足这些性质,特别是不生成真实时间序列CF的合理性。提出了一种基于变分自编码器(VAE)的方法来捕获这些期望的属性。该方法产生的CFs在数量上改进了先前的方法(通过原始数据分布的可能性来评估更可信的CFs,并且在生成CFs方面加快了100 $times$)和质量上(2 $times$更可信和相关),由三位医生进行了评估。
{"title":"Explaining a machine learning decision to physicians via counterfactuals","authors":"Supriya Nagesh, Nina Mishra, Yonatan Naamad, James M. Rehg, M. A. Shah, Alexei Wagner","doi":"10.48550/arXiv.2306.06325","DOIUrl":"https://doi.org/10.48550/arXiv.2306.06325","url":null,"abstract":"Machine learning models perform well on several healthcare tasks and can help reduce the burden on the healthcare system. However, the lack of explainability is a major roadblock to their adoption in hospitals. textit{How can the decision of an ML model be explained to a physician?} The explanations considered in this paper are counterfactuals (CFs), hypothetical scenarios that would have resulted in the opposite outcome. Specifically, time-series CFs are investigated, inspired by the way physicians converse and reason out decisions `I would have given the patient a vasopressor if their blood pressure was lower and falling'. Key properties of CFs that are particularly meaningful in clinical settings are outlined: physiological plausibility, relevance to the task and sparse perturbations. Past work on CF generation does not satisfy these properties, specifically plausibility in that realistic time-series CFs are not generated. A variational autoencoder (VAE)-based approach is proposed that captures these desired properties. The method produces CFs that improve on prior approaches quantitatively (more plausible CFs as evaluated by their likelihood w.r.t original data distribution, and 100$times$ faster at generating CFs) and qualitatively (2$times$ more plausible and relevant) as evaluated by three physicians.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73175490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Rare Life Event Detection via Mobile Sensing Using Multi-Task Learning 基于多任务学习的移动感知稀有生命事件检测
Pub Date : 2023-05-31 DOI: 10.48550/arXiv.2305.20056
Arvind Pillai, Subigya Nepal, Andrew T. Campbell
Rare life events significantly impact mental health, and their detection in behavioral studies is a crucial step towards health-based interventions. We envision that mobile sensing data can be used to detect these anomalies. However, the human-centered nature of the problem, combined with the infrequency and uniqueness of these events makes it challenging for unsupervised machine learning methods. In this paper, we first investigate granger-causality between life events and human behavior using sensing data. Next, we propose a multi-task framework with an unsupervised autoencoder to capture irregular behavior, and an auxiliary sequence predictor that identifies transitions in workplace performance to contextualize events. We perform experiments using data from a mobile sensing study comprising N=126 information workers from multiple industries, spanning 10106 days with 198 rare events (<2%). Through personalized inference, we detect the exact day of a rare event with an F1 of 0.34, demonstrating that our method outperforms several baselines. Finally, we discuss the implications of our work from the context of real-world deployment.
罕见的生活事件显著影响心理健康,在行为研究中发现它们是迈向基于健康的干预措施的关键一步。我们设想,移动传感数据可以用来检测这些异常。然而,该问题以人为中心的本质,加上这些事件的罕见性和独特性,使得无监督机器学习方法具有挑战性。本文首先利用传感数据研究了生活事件与人类行为之间的格兰杰因果关系。接下来,我们提出了一个多任务框架,其中包含一个无监督的自动编码器来捕获不规则行为,以及一个辅助序列预测器,用于识别工作场所绩效的转变,从而将事件置于环境中。我们使用来自移动传感研究的数据进行实验,该研究包括N=126名来自多个行业的信息工作者,跨越10106天,198次罕见事件(<2%)。通过个性化推断,我们以0.34的F1检测到罕见事件的确切日期,这表明我们的方法优于几个基线。最后,我们从实际部署的上下文中讨论了我们的工作的含义。
{"title":"Rare Life Event Detection via Mobile Sensing Using Multi-Task Learning","authors":"Arvind Pillai, Subigya Nepal, Andrew T. Campbell","doi":"10.48550/arXiv.2305.20056","DOIUrl":"https://doi.org/10.48550/arXiv.2305.20056","url":null,"abstract":"Rare life events significantly impact mental health, and their detection in behavioral studies is a crucial step towards health-based interventions. We envision that mobile sensing data can be used to detect these anomalies. However, the human-centered nature of the problem, combined with the infrequency and uniqueness of these events makes it challenging for unsupervised machine learning methods. In this paper, we first investigate granger-causality between life events and human behavior using sensing data. Next, we propose a multi-task framework with an unsupervised autoencoder to capture irregular behavior, and an auxiliary sequence predictor that identifies transitions in workplace performance to contextualize events. We perform experiments using data from a mobile sensing study comprising N=126 information workers from multiple industries, spanning 10106 days with 198 rare events (<2%). Through personalized inference, we detect the exact day of a rare event with an F1 of 0.34, demonstrating that our method outperforms several baselines. Finally, we discuss the implications of our work from the context of real-world deployment.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72784227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PTGB: Pre-Train Graph Neural Networks for Brain Network Analysis 脑网络分析的预训练图神经网络
Pub Date : 2023-05-20 DOI: 10.48550/arXiv.2305.14376
Yi Yang, Hejie Cui, Carl Yang
The human brain is the central hub of the neurobiological system, controlling behavior and cognition in complex ways. Recent advances in neuroscience and neuroimaging analysis have shown a growing interest in the interactions between brain regions of interest (ROIs) and their impact on neural development and disorder diagnosis. As a powerful deep model for analyzing graph-structured data, Graph Neural Networks (GNNs) have been applied for brain network analysis. However, training deep models requires large amounts of labeled data, which is often scarce in brain network datasets due to the complexities of data acquisition and sharing restrictions. To make the most out of available training data, we propose PTGB, a GNN pre-training framework that captures intrinsic brain network structures, regardless of clinical outcomes, and is easily adaptable to various downstream tasks. PTGB comprises two key components: (1) an unsupervised pre-training technique designed specifically for brain networks, which enables learning from large-scale datasets without task-specific labels; (2) a data-driven parcellation atlas mapping pipeline that facilitates knowledge transfer across datasets with different ROI systems. Extensive evaluations using various GNN models have demonstrated the robust and superior performance of PTGB compared to baseline methods.
人脑是神经生物系统的中枢,以复杂的方式控制行为和认知。神经科学和神经成像分析的最新进展表明,人们对大脑感兴趣区域(roi)之间的相互作用及其对神经发育和疾病诊断的影响越来越感兴趣。图神经网络作为分析图结构数据的一种强大的深度模型,已被应用于脑网络分析。然而,训练深度模型需要大量的标记数据,由于数据采集和共享限制的复杂性,这在脑网络数据集中往往是稀缺的。为了最大限度地利用可用的训练数据,我们提出了PTGB,这是一个GNN预训练框架,可以捕获内在的大脑网络结构,而不考虑临床结果,并且很容易适应各种下游任务。PTGB包括两个关键部分:(1)一种专门为大脑网络设计的无监督预训练技术,它可以从大规模数据集中学习,而不需要特定任务的标签;(2)数据驱动的分割图谱映射管道,促进不同ROI系统数据集之间的知识传递。使用各种GNN模型的广泛评估表明,与基线方法相比,PTGB具有鲁棒性和优越的性能。
{"title":"PTGB: Pre-Train Graph Neural Networks for Brain Network Analysis","authors":"Yi Yang, Hejie Cui, Carl Yang","doi":"10.48550/arXiv.2305.14376","DOIUrl":"https://doi.org/10.48550/arXiv.2305.14376","url":null,"abstract":"The human brain is the central hub of the neurobiological system, controlling behavior and cognition in complex ways. Recent advances in neuroscience and neuroimaging analysis have shown a growing interest in the interactions between brain regions of interest (ROIs) and their impact on neural development and disorder diagnosis. As a powerful deep model for analyzing graph-structured data, Graph Neural Networks (GNNs) have been applied for brain network analysis. However, training deep models requires large amounts of labeled data, which is often scarce in brain network datasets due to the complexities of data acquisition and sharing restrictions. To make the most out of available training data, we propose PTGB, a GNN pre-training framework that captures intrinsic brain network structures, regardless of clinical outcomes, and is easily adaptable to various downstream tasks. PTGB comprises two key components: (1) an unsupervised pre-training technique designed specifically for brain networks, which enables learning from large-scale datasets without task-specific labels; (2) a data-driven parcellation atlas mapping pipeline that facilitates knowledge transfer across datasets with different ROI systems. Extensive evaluations using various GNN models have demonstrated the robust and superior performance of PTGB compared to baseline methods.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78915420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Large-Scale Study of Temporal Shift in Health Insurance Claims 健康保险理赔时间变迁的大规模研究
Pub Date : 2023-05-08 DOI: 10.48550/arXiv.2305.05087
Christina X. Ji, A. Alaa, D. Sontag
Most machine learning models for predicting clinical outcomes are developed using historical data. Yet, even if these models are deployed in the near future, dataset shift over time may result in less than ideal performance. To capture this phenomenon, we consider a task--that is, an outcome to be predicted at a particular time point--to be non-stationary if a historical model is no longer optimal for predicting that outcome. We build an algorithm to test for temporal shift either at the population level or within a discovered sub-population. Then, we construct a meta-algorithm to perform a retrospective scan for temporal shift on a large collection of tasks. Our algorithms enable us to perform the first comprehensive evaluation of temporal shift in healthcare to our knowledge. We create 1,010 tasks by evaluating 242 healthcare outcomes for temporal shift from 2015 to 2020 on a health insurance claims dataset. 9.7% of the tasks show temporal shifts at the population level, and 93.0% have some sub-population affected by shifts. We dive into case studies to understand the clinical implications. Our analysis highlights the widespread prevalence of temporal shifts in healthcare.
大多数用于预测临床结果的机器学习模型都是使用历史数据开发的。然而,即使这些模型在不久的将来部署,随着时间的推移,数据集的迁移也可能导致性能不理想。为了捕捉这一现象,我们认为,如果历史模型不再是预测结果的最佳选择,那么任务——即在特定时间点预测的结果——是非平稳的。我们建立了一个算法来测试在种群水平或在发现的子种群内的时间位移。然后,我们构建了一个元算法来对大量任务的时间偏移进行回顾性扫描。我们的算法使我们能够对我们所知的医疗保健时间变化进行首次全面评估。我们通过在健康保险索赔数据集上评估242个医疗保健结果,从2015年到2020年的时间变化,创建了1,010个任务。9.7%的任务在人口水平上表现出时间的变化,93.0%的任务有一些受变化影响的子人口。我们深入研究案例,以了解临床意义。我们的分析强调了医疗保健中普遍存在的时间变化。
{"title":"Large-Scale Study of Temporal Shift in Health Insurance Claims","authors":"Christina X. Ji, A. Alaa, D. Sontag","doi":"10.48550/arXiv.2305.05087","DOIUrl":"https://doi.org/10.48550/arXiv.2305.05087","url":null,"abstract":"Most machine learning models for predicting clinical outcomes are developed using historical data. Yet, even if these models are deployed in the near future, dataset shift over time may result in less than ideal performance. To capture this phenomenon, we consider a task--that is, an outcome to be predicted at a particular time point--to be non-stationary if a historical model is no longer optimal for predicting that outcome. We build an algorithm to test for temporal shift either at the population level or within a discovered sub-population. Then, we construct a meta-algorithm to perform a retrospective scan for temporal shift on a large collection of tasks. Our algorithms enable us to perform the first comprehensive evaluation of temporal shift in healthcare to our knowledge. We create 1,010 tasks by evaluating 242 healthcare outcomes for temporal shift from 2015 to 2020 on a health insurance claims dataset. 9.7% of the tasks show temporal shifts at the population level, and 93.0% have some sub-population affected by shifts. We dive into case studies to understand the clinical implications. Our analysis highlights the widespread prevalence of temporal shifts in healthcare.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85985750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Token Imbalance Adaptation for Radiology Report Generation 用于放射学报告生成的Token不平衡适应
Pub Date : 2023-04-18 DOI: 10.48550/arXiv.2304.09185
Yuexin Wu, I. Huang, Xiaolei Huang
Imbalanced token distributions naturally exist in text documents, leading neural language models to overfit on frequent tokens. The token imbalance may dampen the robustness of radiology report generators, as complex medical terms appear less frequently but reflect more medical information. In this study, we demonstrate how current state-of-the-art models fail to generate infrequent tokens on two standard benchmark datasets (IU X-RAY and MIMIC-CXR) of radiology report generation. % However, no prior study has proposed methods to adapt infrequent tokens for text generators feeding with medical images. To solve the challenge, we propose the textbf{T}oken textbf{Im}balance Adapttextbf{er} (textit{TIMER}), aiming to improve generation robustness on infrequent tokens. The model automatically leverages token imbalance by an unlikelihood loss and dynamically optimizes generation processes to augment infrequent tokens. We compare our approach with multiple state-of-the-art methods on the two benchmarks. Experiments demonstrate the effectiveness of our approach in enhancing model robustness overall and infrequent tokens. Our ablation analysis shows that our reinforcement learning method has a major effect in adapting token imbalance for radiology report generation.
文本文档中自然存在不平衡的标记分布,导致神经语言模型在频繁标记上过拟合。代币失衡可能会抑制放射学报告生成器的稳健性,因为复杂的医学术语出现的频率较低,但反映了更多的医学信息。在本研究中,我们展示了当前最先进的模型如何无法在两个标准基准数据集(IU X-RAY和MIMIC-CXR)上生成罕见的令牌。 % However, no prior study has proposed methods to adapt infrequent tokens for text generators feeding with medical images. To solve the challenge, we propose the textbf{T}oken textbf{Im}balance Adapttextbf{er} (textit{TIMER}), aiming to improve generation robustness on infrequent tokens. The model automatically leverages token imbalance by an unlikelihood loss and dynamically optimizes generation processes to augment infrequent tokens. We compare our approach with multiple state-of-the-art methods on the two benchmarks. Experiments demonstrate the effectiveness of our approach in enhancing model robustness overall and infrequent tokens. Our ablation analysis shows that our reinforcement learning method has a major effect in adapting token imbalance for radiology report generation.
{"title":"Token Imbalance Adaptation for Radiology Report Generation","authors":"Yuexin Wu, I. Huang, Xiaolei Huang","doi":"10.48550/arXiv.2304.09185","DOIUrl":"https://doi.org/10.48550/arXiv.2304.09185","url":null,"abstract":"Imbalanced token distributions naturally exist in text documents, leading neural language models to overfit on frequent tokens. The token imbalance may dampen the robustness of radiology report generators, as complex medical terms appear less frequently but reflect more medical information. In this study, we demonstrate how current state-of-the-art models fail to generate infrequent tokens on two standard benchmark datasets (IU X-RAY and MIMIC-CXR) of radiology report generation. % However, no prior study has proposed methods to adapt infrequent tokens for text generators feeding with medical images. To solve the challenge, we propose the textbf{T}oken textbf{Im}balance Adapttextbf{er} (textit{TIMER}), aiming to improve generation robustness on infrequent tokens. The model automatically leverages token imbalance by an unlikelihood loss and dynamically optimizes generation processes to augment infrequent tokens. We compare our approach with multiple state-of-the-art methods on the two benchmarks. Experiments demonstrate the effectiveness of our approach in enhancing model robustness overall and infrequent tokens. Our ablation analysis shows that our reinforcement learning method has a major effect in adapting token imbalance for radiology report generation.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82799029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rediscovery of CNN's Versatility for Text-based Encoding of Raw Electronic Health Records 重新发现CNN对原始电子健康记录的基于文本编码的多功能性
Pub Date : 2023-03-15 DOI: 10.48550/arXiv.2303.08290
Eunbyeol Cho, Min Jae Lee, Kyunghoon Hur, Jiyoun Kim, Jinsung Yoon, E. Choi
Making the most use of abundant information in electronic health records (EHR) is rapidly becoming an important topic in the medical domain. Recent work presented a promising framework that embeds entire features in raw EHR data regardless of its form and medical code standards. The framework, however, only focuses on encoding EHR with minimal preprocessing and fails to consider how to learn efficient EHR representation in terms of computation and memory usage. In this paper, we search for a versatile encoder not only reducing the large data into a manageable size but also well preserving the core information of patients to perform diverse clinical tasks. We found that hierarchically structured Convolutional Neural Network (CNN) often outperforms the state-of-the-art model on diverse tasks such as reconstruction, prediction, and generation, even with fewer parameters and less training time. Moreover, it turns out that making use of the inherent hierarchy of EHR data can boost the performance of any kind of backbone models and clinical tasks performed. Through extensive experiments, we present concrete evidence to generalize our research findings into real-world practice. We give a clear guideline on building the encoder based on the research findings captured while exploring numerous settings.
在电子病历中充分利用丰富的信息已迅速成为医疗领域的一个重要课题。最近的工作提出了一个很有前途的框架,它可以在原始EHR数据中嵌入整个功能,而不考虑其形式和医疗代码标准。然而,该框架只关注以最少的预处理对EHR进行编码,而没有考虑如何在计算和内存使用方面学习有效的EHR表示。在本文中,我们寻找一种多功能编码器,既可以将大数据减少到可管理的大小,又可以很好地保留患者的核心信息,以执行各种临床任务。我们发现,即使使用更少的参数和更少的训练时间,分层结构的卷积神经网络(CNN)在重建、预测和生成等各种任务上的表现往往优于最先进的模型。此外,事实证明,利用电子病历数据固有的层次结构可以提高任何类型的骨干模型和执行临床任务的性能。通过大量的实验,我们提出了具体的证据,将我们的研究结果推广到现实世界的实践中。我们给出了一个明确的指导方针,基于在探索众多设置时捕获的研究结果构建编码器。
{"title":"Rediscovery of CNN's Versatility for Text-based Encoding of Raw Electronic Health Records","authors":"Eunbyeol Cho, Min Jae Lee, Kyunghoon Hur, Jiyoun Kim, Jinsung Yoon, E. Choi","doi":"10.48550/arXiv.2303.08290","DOIUrl":"https://doi.org/10.48550/arXiv.2303.08290","url":null,"abstract":"Making the most use of abundant information in electronic health records (EHR) is rapidly becoming an important topic in the medical domain. Recent work presented a promising framework that embeds entire features in raw EHR data regardless of its form and medical code standards. The framework, however, only focuses on encoding EHR with minimal preprocessing and fails to consider how to learn efficient EHR representation in terms of computation and memory usage. In this paper, we search for a versatile encoder not only reducing the large data into a manageable size but also well preserving the core information of patients to perform diverse clinical tasks. We found that hierarchically structured Convolutional Neural Network (CNN) often outperforms the state-of-the-art model on diverse tasks such as reconstruction, prediction, and generation, even with fewer parameters and less training time. Moreover, it turns out that making use of the inherent hierarchy of EHR data can boost the performance of any kind of backbone models and clinical tasks performed. Through extensive experiments, we present concrete evidence to generalize our research findings into real-world practice. We give a clear guideline on building the encoder based on the research findings captured while exploring numerous settings.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90384831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic match: Debugging feature attribution methods in XAI for healthcare 语义匹配:调试用于医疗保健的XAI中的特性归属方法
Pub Date : 2023-01-05 DOI: 10.48550/arXiv.2301.02080
G. Ciná, Tabea E. Rober, R. Goedhart, cS. .Ilker Birbil
The recent spike in certified Artificial Intelligence (AI) tools for healthcare has renewed the debate around adoption of this technology. One thread of such debate concerns Explainable AI (XAI) and its promise to render AI devices more transparent and trustworthy. A few voices active in the medical AI space have expressed concerns on the reliability of Explainable AI techniques and especially feature attribution methods, questioning their use and inclusion in guidelines and standards. Despite valid concerns, we argue that existing criticism on the viability of post-hoc local explainability methods throws away the baby with the bathwater by generalizing a problem that is specific to image data. We begin by characterizing the problem as a lack of semantic match between explanations and human understanding. To understand when feature importance can be used reliably, we introduce a distinction between feature importance of low- and high-level features. We argue that for data types where low-level features come endowed with a clear semantics, such as tabular data like Electronic Health Records (EHRs), semantic match can be obtained, and thus feature attribution methods can still be employed in a meaningful and useful way. Finally, we sketch a procedure to test whether semantic match has been achieved.
最近,经过认证的医疗保健人工智能(AI)工具激增,重新引发了关于采用这项技术的争论。这种争论的一个主题是可解释人工智能(XAI)及其使人工智能设备更加透明和值得信赖的承诺。在医疗人工智能领域活跃的一些声音对可解释人工智能技术的可靠性表示担忧,特别是特征归因方法,质疑它们的使用和纳入指南和标准。尽管存在有效的担忧,但我们认为,对事后局部可解释性方法可行性的现有批评通过概括特定于图像数据的问题,将婴儿与洗澡水一起扔掉。我们首先将这个问题描述为解释和人类理解之间缺乏语义匹配。为了理解什么时候可以可靠地使用特征重要性,我们引入了低特征重要性和高特征重要性之间的区别。我们认为,对于底层特征被赋予明确语义的数据类型,如电子健康记录(EHRs)等表格数据,可以获得语义匹配,因此特征归因方法仍然可以以有意义和有用的方式使用。最后,我们提出了一个测试语义匹配是否实现的程序。
{"title":"Semantic match: Debugging feature attribution methods in XAI for healthcare","authors":"G. Ciná, Tabea E. Rober, R. Goedhart, cS. .Ilker Birbil","doi":"10.48550/arXiv.2301.02080","DOIUrl":"https://doi.org/10.48550/arXiv.2301.02080","url":null,"abstract":"The recent spike in certified Artificial Intelligence (AI) tools for healthcare has renewed the debate around adoption of this technology. One thread of such debate concerns Explainable AI (XAI) and its promise to render AI devices more transparent and trustworthy. A few voices active in the medical AI space have expressed concerns on the reliability of Explainable AI techniques and especially feature attribution methods, questioning their use and inclusion in guidelines and standards. Despite valid concerns, we argue that existing criticism on the viability of post-hoc local explainability methods throws away the baby with the bathwater by generalizing a problem that is specific to image data. We begin by characterizing the problem as a lack of semantic match between explanations and human understanding. To understand when feature importance can be used reliably, we introduce a distinction between feature importance of low- and high-level features. We argue that for data types where low-level features come endowed with a clear semantics, such as tabular data like Electronic Health Records (EHRs), semantic match can be obtained, and thus feature attribution methods can still be employed in a meaningful and useful way. Finally, we sketch a procedure to test whether semantic match has been achieved.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86590799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards the Practical Utility of Federated Learning in the Medical Domain 联邦学习在医学领域的实际应用
Pub Date : 2022-07-07 DOI: 10.48550/arXiv.2207.03075
Seongjun Yang, Hyeonji Hwang, Daeyoung Kim, Radhika Dua, Jong-Yeup Kim, Eunho Yang, E. Choi
Federated learning (FL) is an active area of research. One of the most suitable areas for adopting FL is the medical domain, where patient privacy must be respected. Previous research, however, does not provide a practical guide to applying FL in the medical domain. We propose empirical benchmarks and experimental settings for three representative medical datasets with different modalities: longitudinal electronic health records, skin cancer images, and electrocardiogram signals. The likely users of FL such as medical institutions and IT companies can take these benchmarks as guides for adopting FL and minimize their trial and error. For each dataset, each client data is from a different source to preserve real-world heterogeneity. We evaluate six FL algorithms designed for addressing data heterogeneity among clients, and a hybrid algorithm combining the strengths of two representative FL algorithms. Based on experiment results from three modalities, we discover that simple FL algorithms tend to outperform more sophisticated ones, while the hybrid algorithm consistently shows good, if not the best performance. We also find that a frequent global model update leads to better performance under a fixed training iteration budget. As the number of participating clients increases, higher cost is incurred due to increased IT administrators and GPUs, but the performance consistently increases. We expect future users will refer to these empirical benchmarks to design the FL experiments in the medical domain considering their clinical tasks and obtain stronger performance with lower costs.
联邦学习(FL)是一个活跃的研究领域。最适合采用FL的领域之一是医疗领域,在该领域必须尊重患者的隐私。然而,以往的研究并没有为FL在医学领域的应用提供实用的指导。我们提出了三个具有不同模式的代表性医疗数据集的经验基准和实验设置:纵向电子健康记录、皮肤癌图像和心电图信号。医疗机构和IT公司等可能使用FL的用户可以将这些基准作为采用FL的指南,并尽量减少试验和错误。对于每个数据集,每个客户端数据都来自不同的来源,以保持真实世界的异质性。我们评估了六种用于解决客户端数据异构的FL算法,以及一种结合两种代表性FL算法优势的混合算法。基于三种模式的实验结果,我们发现简单的FL算法往往优于更复杂的算法,而混合算法即使不是最好的,也始终表现出良好的性能。我们还发现,在固定的训练迭代预算下,频繁的全局模型更新导致更好的性能。随着参与客户机数量的增加,由于IT管理员和gpu的增加,成本也会增加,但性能会持续提高。我们希望未来的用户可以参考这些经验基准,结合他们的临床任务来设计医学领域的FL实验,以更低的成本获得更强的性能。
{"title":"Towards the Practical Utility of Federated Learning in the Medical Domain","authors":"Seongjun Yang, Hyeonji Hwang, Daeyoung Kim, Radhika Dua, Jong-Yeup Kim, Eunho Yang, E. Choi","doi":"10.48550/arXiv.2207.03075","DOIUrl":"https://doi.org/10.48550/arXiv.2207.03075","url":null,"abstract":"Federated learning (FL) is an active area of research. One of the most suitable areas for adopting FL is the medical domain, where patient privacy must be respected. Previous research, however, does not provide a practical guide to applying FL in the medical domain. We propose empirical benchmarks and experimental settings for three representative medical datasets with different modalities: longitudinal electronic health records, skin cancer images, and electrocardiogram signals. The likely users of FL such as medical institutions and IT companies can take these benchmarks as guides for adopting FL and minimize their trial and error. For each dataset, each client data is from a different source to preserve real-world heterogeneity. We evaluate six FL algorithms designed for addressing data heterogeneity among clients, and a hybrid algorithm combining the strengths of two representative FL algorithms. Based on experiment results from three modalities, we discover that simple FL algorithms tend to outperform more sophisticated ones, while the hybrid algorithm consistently shows good, if not the best performance. We also find that a frequent global model update leads to better performance under a fixed training iteration budget. As the number of participating clients increases, higher cost is incurred due to increased IT administrators and GPUs, but the performance consistently increases. We expect future users will refer to these empirical benchmarks to design the FL experiments in the medical domain considering their clinical tasks and obtain stronger performance with lower costs.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77056542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Self-supervised Pretraining and Transfer Learning Enable Flu and COVID-19 Predictions in Small Mobile Sensing Datasets 自监督预训练和迁移学习使小型移动传感数据集的流感和COVID-19预测成为可能
Pub Date : 2022-05-26 DOI: 10.48550/arXiv.2205.13607
Michael Merrill, Tim Althoff
Detailed mobile sensing data from phones, watches, and fitness trackers offer an unparalleled opportunity to quantify and act upon previously unmeasurable behavioral changes in order to improve individual health and accelerate responses to emerging diseases. Unlike in natural language processing and computer vision, deep representation learning has yet to broadly impact this domain, in which the vast majority of research and clinical applications still rely on manually defined features and boosted tree models or even forgo predictive modeling altogether due to insufficient accuracy. This is due to unique challenges in the behavioral health domain, including very small datasets (~10^1 participants), which frequently contain missing data, consist of long time series with critical long-range dependencies (length>10^4), and extreme class imbalances (>10^3:1). Here, we introduce a neural architecture for multivariate time series classification designed to address these unique domain challenges. Our proposed behavioral representation learning approach combines novel tasks for self-supervised pretraining and transfer learning to address data scarcity, and captures long-range dependencies across long-history time series through transformer self-attention following convolutional neural network-based dimensionality reduction. We propose an evaluation framework aimed at reflecting expected real-world performance in plausible deployment scenarios. Concretely, we demonstrate (1) performance improvements over baselines of up to 0.15 ROC AUC across five prediction tasks, (2) transfer learning-induced performance improvements of 16% PR AUC in small data scenarios, and (3) the potential of transfer learning in novel disease scenarios through an exploratory case study of zero-shot COVID-19 prediction in an independent data set. Finally, we discuss potential implications for medical surveillance testing.
来自手机、手表和健身追踪器的详细移动传感数据提供了一个无与伦比的机会,可以量化以前无法衡量的行为变化,并对其采取行动,从而改善个人健康,加快对新出现疾病的反应。与自然语言处理和计算机视觉不同,深度表示学习尚未广泛影响这一领域,在这一领域,绝大多数研究和临床应用仍然依赖于手动定义的特征和增强的树模型,甚至由于准确性不足而完全放弃预测建模。这是由于行为健康领域的独特挑战,包括非常小的数据集(~10^1参与者),其中经常包含缺失数据,由具有关键远程依赖关系的长时间序列(长度>10^4)和极端的类别不平衡(>10^3:1)组成。在这里,我们介绍了一种用于多变量时间序列分类的神经结构,旨在解决这些独特的领域挑战。我们提出的行为表征学习方法结合了自我监督预训练和迁移学习的新任务,以解决数据稀缺性问题,并通过基于卷积神经网络的降维后的变压器自关注,捕获长历史时间序列中的长期依赖关系。我们提出了一个评估框架,旨在在合理的部署场景中反映预期的实际性能。具体来说,我们证明了(1)在五个预测任务中,迁移学习在基线上的性能提高高达0.15 ROC AUC;(2)在小数据场景中,迁移学习诱导的性能提高了16% PR AUC;(3)通过在独立数据集中的零shot COVID-19预测的探索性案例研究,迁移学习在新型疾病场景中的潜力。最后,我们讨论了医学监测测试的潜在影响。
{"title":"Self-supervised Pretraining and Transfer Learning Enable Flu and COVID-19 Predictions in Small Mobile Sensing Datasets","authors":"Michael Merrill, Tim Althoff","doi":"10.48550/arXiv.2205.13607","DOIUrl":"https://doi.org/10.48550/arXiv.2205.13607","url":null,"abstract":"Detailed mobile sensing data from phones, watches, and fitness trackers offer an unparalleled opportunity to quantify and act upon previously unmeasurable behavioral changes in order to improve individual health and accelerate responses to emerging diseases. Unlike in natural language processing and computer vision, deep representation learning has yet to broadly impact this domain, in which the vast majority of research and clinical applications still rely on manually defined features and boosted tree models or even forgo predictive modeling altogether due to insufficient accuracy. This is due to unique challenges in the behavioral health domain, including very small datasets (~10^1 participants), which frequently contain missing data, consist of long time series with critical long-range dependencies (length>10^4), and extreme class imbalances (>10^3:1). Here, we introduce a neural architecture for multivariate time series classification designed to address these unique domain challenges. Our proposed behavioral representation learning approach combines novel tasks for self-supervised pretraining and transfer learning to address data scarcity, and captures long-range dependencies across long-history time series through transformer self-attention following convolutional neural network-based dimensionality reduction. We propose an evaluation framework aimed at reflecting expected real-world performance in plausible deployment scenarios. Concretely, we demonstrate (1) performance improvements over baselines of up to 0.15 ROC AUC across five prediction tasks, (2) transfer learning-induced performance improvements of 16% PR AUC in small data scenarios, and (3) the potential of transfer learning in novel disease scenarios through an exploratory case study of zero-shot COVID-19 prediction in an independent data set. Finally, we discuss potential implications for medical surveillance testing.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73080509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Disability prediction in multiple sclerosis using performance outcome measures and demographic data 使用性能结果测量和人口统计数据预测多发性硬化症的残疾
Pub Date : 2022-04-08 DOI: 10.48550/arXiv.2204.03969
Subhrajit Roy, Diana Mincu, Lev Proleev, Negar Rostamzadeh, Chintan Ghate, Natalie Harris, Christina Chen, J. Schrouff, Nenad Tomašev, F. Hartsell, K. Heller
Literature on machine learning for multiple sclerosis has primarily focused on the use of neuroimaging data such as magnetic resonance imaging and clinical laboratory tests for disease identification. However, studies have shown that these modalities are not consistent with disease activity such as symptoms or disease progression. Furthermore, the cost of collecting data from these modalities is high, leading to scarce evaluations. In this work, we used multi-dimensional, affordable, physical and smartphone-based performance outcome measures (POM) in conjunction with demographic data to predict multiple sclerosis disease progression. We performed a rigorous benchmarking exercise on two datasets and present results across 13 clinically actionable prediction endpoints and 6 machine learning models. To the best of our knowledge, our results are the first to show that it is possible to predict disease progression using POMs and demographic data in the context of both clinical trials and smartphone-base studies by using two datasets. Moreover, we investigate our models to understand the impact of different POMs and demographics on model performance through feature ablation studies. We also show that model performance is similar across different demographic subgroups (based on age and sex). To enable this work, we developed an end-to-end reusable pre-processing and machine learning framework which allows quicker experimentation over disparate MS datasets.
关于多发性硬化症机器学习的文献主要集中在使用神经成像数据,如磁共振成像和临床实验室测试来识别疾病。然而,研究表明,这些模式与疾病活动,如症状或疾病进展不一致。此外,从这些方式收集数据的费用很高,导致评价很少。在这项工作中,我们使用多维的、可负担的、基于物理和智能手机的性能结果测量(POM),并结合人口统计数据来预测多发性硬化症的进展。我们对两个数据集进行了严格的基准测试,并在13个临床可操作的预测终点和6个机器学习模型中给出了结果。据我们所知,我们的研究结果首次表明,在临床试验和基于智能手机的研究中,通过使用两个数据集,使用POMs和人口统计数据可以预测疾病进展。此外,我们通过特征消融研究来研究我们的模型,以了解不同的pom和人口统计学对模型性能的影响。我们还表明,模型的性能在不同的人口统计亚组(基于年龄和性别)中是相似的。为了实现这项工作,我们开发了一个端到端可重用的预处理和机器学习框架,允许在不同的MS数据集上进行更快的实验。
{"title":"Disability prediction in multiple sclerosis using performance outcome measures and demographic data","authors":"Subhrajit Roy, Diana Mincu, Lev Proleev, Negar Rostamzadeh, Chintan Ghate, Natalie Harris, Christina Chen, J. Schrouff, Nenad Tomašev, F. Hartsell, K. Heller","doi":"10.48550/arXiv.2204.03969","DOIUrl":"https://doi.org/10.48550/arXiv.2204.03969","url":null,"abstract":"Literature on machine learning for multiple sclerosis has primarily focused on the use of neuroimaging data such as magnetic resonance imaging and clinical laboratory tests for disease identification. However, studies have shown that these modalities are not consistent with disease activity such as symptoms or disease progression. Furthermore, the cost of collecting data from these modalities is high, leading to scarce evaluations. In this work, we used multi-dimensional, affordable, physical and smartphone-based performance outcome measures (POM) in conjunction with demographic data to predict multiple sclerosis disease progression. We performed a rigorous benchmarking exercise on two datasets and present results across 13 clinically actionable prediction endpoints and 6 machine learning models. To the best of our knowledge, our results are the first to show that it is possible to predict disease progression using POMs and demographic data in the context of both clinical trials and smartphone-base studies by using two datasets. Moreover, we investigate our models to understand the impact of different POMs and demographics on model performance through feature ablation studies. We also show that model performance is similar across different demographic subgroups (based on age and sex). To enable this work, we developed an end-to-end reusable pre-processing and machine learning framework which allows quicker experimentation over disparate MS datasets.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90675061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings of the ACM Conference on Health, Inference, and Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1