Proceedings of the ACM Conference on Health, Inference, and Learning最新文献_第3页

Contextualization and individualization for just-in-time adaptive interventions to reduce sedentary behavior 情境化和个性化的即时适应性干预减少久坐行为

Proceedings of the ACM Conference on Health, Inference, and Learning

Pub Date : 2021-04-08 DOI: 10.1145/3450439.3451874

Matthew Saponaro, Ajith Vemuri, G. Dominick, Keith S. Decker

Wearable technology opens opportunities to reduce sedentary behavior; however, commercially available devices do not provide tailored coaching strategies. Just-In-Time Adaptive Interventions (JITAI) provide such a framework; however most JITAI are conceptual to date. We conduct a study to evaluate just-in-time nudges in free-living conditions in terms of receptiveness and nudge impact. We first quantify baseline behavioral patterns in context using features such as location and step count, and assess differences in individual responses. We show there is a strong inverse relationship between average daily step counts and time spent being sedentary indicating that steps are steadily taken throughout the day, rather than in large bursts. Interestingly, the effect of nudges delivered at the workplace is larger in terms of step count than those delivered at home. We develop Random Forest models to learn nudge receptiveness using both individualized and contextualized data. We show that step count is the least important identifier in nudge receptiveness, while location is the most important. Furthermore, we compare the developed models with a commercially available smart coach using post-hoc analysis. The results show that using the contextualized and individualized information significantly outperforms non-JITAI approaches to determine nudge receptiveness.

可穿戴技术为减少久坐行为提供了机会;然而，商用设备并不能提供量身定制的训练策略。即时自适应干预(JITAI)提供了这样一个框架;然而，迄今为止，大多数JITAI都是概念性的。我们进行了一项研究，以评估在自由生活条件下的即时轻推的接受性和轻推影响。我们首先使用位置和步数等特征来量化背景下的基线行为模式，并评估个体反应的差异。我们发现，平均每日步数与久坐时间之间存在很强的反比关系，这表明人们每天都在稳定地走几步，而不是大量地走几步。有趣的是，就步数而言，在工作场所轻推的效果比在家中轻推的效果更大。我们开发了随机森林模型，使用个性化和情境化数据来学习轻推接受性。我们发现步数是最不重要的标识符，而位置是最重要的。此外，我们使用事后分析将开发的模型与市售的智能教练进行比较。结果表明，使用情境化和个性化的信息显著优于非jitai方法来确定轻推接受度。

{"title":"Contextualization and individualization for just-in-time adaptive interventions to reduce sedentary behavior","authors":"Matthew Saponaro, Ajith Vemuri, G. Dominick, Keith S. Decker","doi":"10.1145/3450439.3451874","DOIUrl":"https://doi.org/10.1145/3450439.3451874","url":null,"abstract":"Wearable technology opens opportunities to reduce sedentary behavior; however, commercially available devices do not provide tailored coaching strategies. Just-In-Time Adaptive Interventions (JITAI) provide such a framework; however most JITAI are conceptual to date. We conduct a study to evaluate just-in-time nudges in free-living conditions in terms of receptiveness and nudge impact. We first quantify baseline behavioral patterns in context using features such as location and step count, and assess differences in individual responses. We show there is a strong inverse relationship between average daily step counts and time spent being sedentary indicating that steps are steadily taken throughout the day, rather than in large bursts. Interestingly, the effect of nudges delivered at the workplace is larger in terms of step count than those delivered at home. We develop Random Forest models to learn nudge receptiveness using both individualized and contextualized data. We show that step count is the least important identifier in nudge receptiveness, while location is the most important. Furthermore, we compare the developed models with a commercially available smart coach using post-hoc analysis. The results show that using the contextualized and individualized information significantly outperforms non-JITAI approaches to determine nudge receptiveness.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86071293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Trustworthy machine learning for health care: scalable data valuation with the shapley value 可信赖的医疗机器学习:shapley值的可扩展数据估值

Proceedings of the ACM Conference on Health, Inference, and Learning

Pub Date : 2021-04-08 DOI: 10.1145/3450439.3451861

Konstantin D. Pandl, Fabian Feiland, Scott Thiebes, A. Sunyaev

Collecting data from many sources is an essential approach to generate large data sets required for the training of machine learning models. Trustworthy machine learning requires incentives, guarantees of data quality, and information privacy. Applying recent advancements in data valuation methods for machine learning can help to enable these. In this work, we analyze the suitability of three different data valuation methods for medical image classification tasks, specifically pleural effusion, on an extensive data set of chest X-ray scans. Our results reveal that a heuristic for calculating the Shapley valuation scheme based on a k-nearest neighbor classifier can successfully value large quantities of data instances. We also demonstrate possible applications for incentivizing data sharing, the efficient detection of mislabeled data, and summarizing data sets to exclude private information. Thereby, this work contributes to developing modern data infrastructures for trustworthy machine learning in health care.

从许多来源收集数据是生成训练机器学习模型所需的大型数据集的基本方法。值得信赖的机器学习需要激励、数据质量保证和信息隐私。将最新的数据评估方法应用于机器学习可以帮助实现这些目标。在这项工作中，我们分析了三种不同的数据评估方法对医学图像分类任务的适用性，特别是胸膜积液，在胸部x射线扫描的广泛数据集上。我们的研究结果表明，基于k近邻分类器的Shapley估值方案的启发式计算可以成功地对大量数据实例进行估值。我们还演示了激励数据共享、有效检测错误标记数据以及汇总数据集以排除私人信息的可能应用。因此，这项工作有助于为医疗保健领域的可信机器学习开发现代数据基础设施。

引用次数: 10

Learning to safely approve updates to machine learning algorithms 学习安全地批准机器学习算法的更新

Proceedings of the ACM Conference on Health, Inference, and Learning

Pub Date : 2021-04-08 DOI: 10.1145/3450439.3451864

Jean Feng

Machine learning algorithms in healthcare have the potential to continually learn from real-world data generated during healthcare delivery and adapt to dataset shifts. As such, regulatory bodies like the US FDA have begun discussions on how to autonomously approve modifications to algorithms. Current proposals evaluate algorithmic modifications via hypothesis testing and control a definition of online approval error that only applies if the data is stationary over time, which is unlikely in practice. To this end, we investigate designing approval policies for modifications to ML algorithms in the presence of distributional shifts. Our key observation is that the approval policy most efficient at identifying and approving beneficial modifications varies across problem settings. So, rather than selecting a fixed approval policy a priori, we propose learning the best approval policy by searching over a family of approval strategies. We define a family of strategies that range in their level of optimism when approving modifications. To protect against settings where no version of the ML algorithm performs well, this family includes a pessimistic strategy that rescinds approval. We use the exponentially weighted averaging forecaster (EWAF) to learn the most appropriate strategy and derive tighter regret bounds assuming the distributional shifts are bounded. In simulation studies and empirical analyses, we find that wrapping approval strategies within EWAF is a simple yet effective approach to protect against distributional shifts without significantly slowing down approval of beneficial modifications.

医疗保健中的机器学习算法有可能从医疗保健交付过程中生成的真实数据中不断学习，并适应数据集的变化。因此，像美国FDA这样的监管机构已经开始讨论如何自主批准对算法的修改。目前的建议通过假设检验来评估算法修改，并控制在线审批错误的定义，该定义仅适用于数据随时间稳定的情况，这在实践中是不太可能的。为此，我们研究了在存在分布变化的情况下设计修改ML算法的批准策略。我们的主要观察是，在识别和批准有益修改方面最有效的审批策略因问题设置而异。因此，我们建议通过搜索一系列审批策略来学习最佳审批策略，而不是先验地选择固定的审批策略。我们定义了一系列策略，这些策略在批准修改时的乐观程度各不相同。为了防止没有版本的ML算法表现良好的设置，该系列包括一个撤销批准的悲观策略。我们使用指数加权平均预测器(EWAF)来学习最合适的策略，并在假设分布移位有界的情况下推导出更严格的后悔界。在模拟研究和实证分析中，我们发现在EWAF中包装批准策略是一种简单而有效的方法，可以防止分配转移，而不会显着减慢对有益修改的批准。

{"title":"Learning to safely approve updates to machine learning algorithms","authors":"Jean Feng","doi":"10.1145/3450439.3451864","DOIUrl":"https://doi.org/10.1145/3450439.3451864","url":null,"abstract":"Machine learning algorithms in healthcare have the potential to continually learn from real-world data generated during healthcare delivery and adapt to dataset shifts. As such, regulatory bodies like the US FDA have begun discussions on how to autonomously approve modifications to algorithms. Current proposals evaluate algorithmic modifications via hypothesis testing and control a definition of online approval error that only applies if the data is stationary over time, which is unlikely in practice. To this end, we investigate designing approval policies for modifications to ML algorithms in the presence of distributional shifts. Our key observation is that the approval policy most efficient at identifying and approving beneficial modifications varies across problem settings. So, rather than selecting a fixed approval policy a priori, we propose learning the best approval policy by searching over a family of approval strategies. We define a family of strategies that range in their level of optimism when approving modifications. To protect against settings where no version of the ML algorithm performs well, this family includes a pessimistic strategy that rescinds approval. We use the exponentially weighted averaging forecaster (EWAF) to learn the most appropriate strategy and derive tighter regret bounds assuming the distributional shifts are bounded. In simulation studies and empirical analyses, we find that wrapping approval strategies within EWAF is a simple yet effective approach to protect against distributional shifts without significantly slowing down approval of beneficial modifications.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79521409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Privacy-preserving and bandwidth-efficient federated learning: an application to in-hospital mortality prediction 隐私保护和带宽高效联合学习:在医院死亡率预测中的应用

Proceedings of the ACM Conference on Health, Inference, and Learning

Pub Date : 2021-04-08 DOI: 10.1145/3450439.3451859

Raouf Kerkouche, G. Ács, C. Castelluccia, P. Genevès

Machine Learning, and in particular Federated Machine Learning, opens new perspectives in terms of medical research and patient care. Although Federated Machine Learning improves over centralized Machine Learning in terms of privacy, it does not provide provable privacy guarantees. Furthermore, Federated Machine Learning is quite expensive in term of bandwidth consumption as it requires participant nodes to regularly exchange large updates. This paper proposes a bandwidth-efficient privacy-preserving Federated Learning that provides theoretical privacy guarantees based on Differential Privacy. We experimentally evaluate our proposal for in-hospital mortality prediction using a real dataset, containing Electronic Health Records of about one million patients. Our results suggest that strong and provable patient-level privacy can be enforced at the expense of only a moderate loss of prediction accuracy.

机器学习，特别是联邦机器学习，在医学研究和患者护理方面开辟了新的视角。尽管联邦机器学习在隐私方面优于集中式机器学习，但它并没有提供可证明的隐私保证。此外，联邦机器学习在带宽消耗方面非常昂贵，因为它需要参与者节点定期交换大型更新。本文提出了一种带宽高效的隐私保护联邦学习方法，该方法提供了基于差分隐私的理论隐私保证。我们通过实验评估了我们的住院死亡率预测建议，使用真实的数据集，包含大约一百万患者的电子健康记录。我们的研究结果表明，强大的和可证明的患者级隐私可以强制执行，代价是预测准确性的适度损失。

引用次数: 21

Influenza-like symptom recognition using mobile sensing and graph neural networks 使用移动传感和图形神经网络的流感样症状识别

Proceedings of the ACM Conference on Health, Inference, and Learning

Pub Date : 2021-04-08 DOI: 10.1145/3450439.3451880

Guimin Dong, Lihua Cai, Debajyoti Datta, Shashwat Kumar, Laura E. Barnes, M. Boukhechba

Early detection of influenza-like symptoms can prevent widespread flu viruses and enable timely treatments, particularly in the post-pandemic era. Mobile sensing leverages an increasingly diverse set of embedded sensors to capture fine-grained information of human behaviors and ambient contexts, and can serve as a promising solution for influenza-like symptom recognition. Traditionally, handcrafted and high level features of mobile sensing data are extracted by manual feature engineering and convolutional/recurrent neural network respectively. In this work, we apply graph representation to encode the dynamics of state transitions and internal dependencies in human behaviors, leverage graph embeddings to automatically extract the topological and spatial features from graph inputs, and propose an end-to-end graph neural network (GNN) model with multi-channel mobile sensing input for influenzalike symptom recognition based on people's daily mobility, social interactions, and physical activities. Using data generated from 448 participants, we show that GNN with GraphSAGE convolutional layers significantly outperforms baseline models with handcrafted features. Furthermore, we use GNN interpretability method to generate insights (e.g., important nodes and graph structures) about the importance of mobile sensing for recognizing Influenza-like symptoms. To the best of our knowledge, this is the first work that applies graph representation and graph neural network on mobile sensing data for graph-based human behavior modeling and health symptoms prediction.

早期发现流感样症状可以预防流感病毒的广泛传播，并能够及时治疗，特别是在大流行后时代。移动传感利用一组日益多样化的嵌入式传感器来捕获人类行为和环境背景的细粒度信息，并可作为流感样症状识别的一种有希望的解决方案。传统上，移动传感数据的手工特征提取和高级特征提取分别采用人工特征工程和卷积/递归神经网络。在这项工作中，我们应用图表示来编码人类行为中状态转换和内部依赖的动态，利用图嵌入来自动从图输入中提取拓扑和空间特征，并提出了一个具有多通道移动传感输入的端到端图神经网络(GNN)模型，用于基于人们的日常移动、社会互动和身体活动的流感症状识别。使用来自448名参与者的数据，我们表明具有GraphSAGE卷积层的GNN显著优于具有手工制作特征的基线模型。此外，我们使用GNN可解释性方法来生成关于移动传感对识别流感样症状的重要性的见解(例如，重要节点和图结构)。据我们所知，这是第一次将图表示和图神经网络应用于移动传感数据，用于基于图的人类行为建模和健康症状预测。

{"title":"Influenza-like symptom recognition using mobile sensing and graph neural networks","authors":"Guimin Dong, Lihua Cai, Debajyoti Datta, Shashwat Kumar, Laura E. Barnes, M. Boukhechba","doi":"10.1145/3450439.3451880","DOIUrl":"https://doi.org/10.1145/3450439.3451880","url":null,"abstract":"Early detection of influenza-like symptoms can prevent widespread flu viruses and enable timely treatments, particularly in the post-pandemic era. Mobile sensing leverages an increasingly diverse set of embedded sensors to capture fine-grained information of human behaviors and ambient contexts, and can serve as a promising solution for influenza-like symptom recognition. Traditionally, handcrafted and high level features of mobile sensing data are extracted by manual feature engineering and convolutional/recurrent neural network respectively. In this work, we apply graph representation to encode the dynamics of state transitions and internal dependencies in human behaviors, leverage graph embeddings to automatically extract the topological and spatial features from graph inputs, and propose an end-to-end graph neural network (GNN) model with multi-channel mobile sensing input for influenzalike symptom recognition based on people's daily mobility, social interactions, and physical activities. Using data generated from 448 participants, we show that GNN with GraphSAGE convolutional layers significantly outperforms baseline models with handcrafted features. Furthermore, we use GNN interpretability method to generate insights (e.g., important nodes and graph structures) about the importance of mobile sensing for recognizing Influenza-like symptoms. To the best of our knowledge, this is the first work that applies graph representation and graph neural network on mobile sensing data for graph-based human behavior modeling and health symptoms prediction.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90676441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Enabling Counterfactual Survival Analysis with Balanced Representations. 用平衡表征实现反事实生存分析。

Proceedings of the ACM Conference on Health, Inference, and Learning

Pub Date : 2021-04-01 Epub Date: 2021-04-08 DOI: 10.1145/3450439.3451875

Paidamoyo Chapfuwa, Serge Assaad, Shuxi Zeng, Michael J Pencina, Lawrence Carin, Ricardo Henao

Balanced representation learning methods have been applied successfully to counterfactual inference from observational data. However, approaches that account for survival outcomes are relatively limited. Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials, and such data are also relevant in fields like manufacturing (e.g., for equipment monitoring). When the outcome of interest is a time-to-event, special precautions for handling censored events need to be taken, as ignoring censored outcomes may lead to biased estimates. We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes. Further, we formulate a nonparametric hazard ratio metric for evaluating average and individualized treatment effects. Experimental results on real-world and semi-synthetic datasets, the latter of which we introduce, demonstrate that the proposed approach significantly outperforms competitive alternatives in both survival-outcome prediction and treatment-effect estimation.

平衡表征学习方法已经成功地应用于从观测数据中进行反事实推理。然而，能够解释生存结果的方法相对有限。生存数据经常在各种医疗应用中遇到，即药物开发、风险分析和临床试验，这些数据也与制造等领域相关（例如，用于设备监测）。当感兴趣的结果是到事件的时间时，需要采取特殊的预防措施来处理经过审查的事件，因为忽略经过审查的结果可能导致有偏差的估计。我们提出了一个适用于生存结果的反事实推理的理论基础统一框架。此外，我们制定了一个非参数风险比度量来评估平均和个性化治疗效果。在真实世界和半合成数据集上的实验结果表明，我们提出的方法在生存结果预测和治疗效果估计方面都明显优于其他竞争方法。

{"title":"Enabling Counterfactual Survival Analysis with Balanced Representations.","authors":"Paidamoyo Chapfuwa, Serge Assaad, Shuxi Zeng, Michael J Pencina, Lawrence Carin, Ricardo Henao","doi":"10.1145/3450439.3451875","DOIUrl":"10.1145/3450439.3451875","url":null,"abstract":"Balanced representation learning methods have been applied successfully to counterfactual inference from observational data. However, approaches that account for survival outcomes are relatively limited. Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials, and such data are also relevant in fields like manufacturing (e.g., for equipment monitoring). When the outcome of interest is a time-to-event, special precautions for handling censored events need to be taken, as ignoring censored outcomes may lead to biased estimates. We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes. Further, we formulate a nonparametric hazard ratio metric for evaluating average and individualized treatment effects. Experimental results on real-world and semi-synthetic datasets, the latter of which we introduce, demonstrate that the proposed approach significantly outperforms competitive alternatives in both survival-outcome prediction and treatment-effect estimation.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":"35 1","pages":"133-145"},"PeriodicalIF":0.0,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12423760/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73537292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An empirical framework for domain generalization in clinical settings 在临床设置领域概括的经验框架

Proceedings of the ACM Conference on Health, Inference, and Learning

Pub Date : 2021-03-20 DOI: 10.1145/3450439.3451878

Haoran Zhang, Natalie Dullerud, L. Seyyed-Kalantari, Q. Morris, Shalmali Joshi, M. Ghassemi

Clinical machine learning models experience significantly degraded performance in datasets not seen during training, e.g., new hospitals or populations. Recent developments in domain generalization offer a promising solution to this problem by creating models that learn invariances across environments. In this work, we benchmark the performance of eight domain generalization methods on multi-site clinical time series and medical imaging data. We introduce a framework to induce synthetic but realistic domain shifts and sampling bias to stress-test these methods over existing non-healthcare benchmarks. We find that current domain generalization methods do not achieve significant gains in out-of-distribution performance over empirical risk minimization on real-world medical imaging data, in line with prior work on general imaging datasets. However, a subset of realistic induced-shift scenarios in clinical time series data exhibit limited performance gains. We characterize these scenarios in detail, and recommend best practices for domain generalization in the clinical setting.

临床机器学习模型在训练期间未见的数据集(例如新医院或人口)中表现明显下降。领域泛化的最新发展通过创建学习跨环境不变性的模型，为这个问题提供了一个有希望的解决方案。在这项工作中，我们对八种领域泛化方法在多地点临床时间序列和医学影像数据上的性能进行了基准测试。我们引入了一个框架，以诱导合成但现实的域转移和抽样偏差，以在现有的非医疗保健基准上对这些方法进行压力测试。我们发现，目前的领域泛化方法在实际医学成像数据上的失分布性能并没有显著优于经验风险最小化，这与之前在一般成像数据集上的工作一致。然而，临床时间序列数据中现实的诱导转移场景的子集表现出有限的性能收益。我们详细描述了这些场景，并推荐了在临床环境中进行领域概括的最佳实践。

{"title":"An empirical framework for domain generalization in clinical settings","authors":"Haoran Zhang, Natalie Dullerud, L. Seyyed-Kalantari, Q. Morris, Shalmali Joshi, M. Ghassemi","doi":"10.1145/3450439.3451878","DOIUrl":"https://doi.org/10.1145/3450439.3451878","url":null,"abstract":"Clinical machine learning models experience significantly degraded performance in datasets not seen during training, e.g., new hospitals or populations. Recent developments in domain generalization offer a promising solution to this problem by creating models that learn invariances across environments. In this work, we benchmark the performance of eight domain generalization methods on multi-site clinical time series and medical imaging data. We introduce a framework to induce synthetic but realistic domain shifts and sampling bias to stress-test these methods over existing non-healthcare benchmarks. We find that current domain generalization methods do not achieve significant gains in out-of-distribution performance over empirical risk minimization on real-world medical imaging data, in line with prior work on general imaging datasets. However, a subset of realistic induced-shift scenarios in clinical time series data exhibit limited performance gains. We characterize these scenarios in detail, and recommend best practices for domain generalization in the clinical setting.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74328999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

Modeling longitudinal dynamics of comorbidities 共病的纵向动力学建模

Proceedings of the ACM Conference on Health, Inference, and Learning

Pub Date : 2021-03-14 DOI: 10.1145/3450439.3451871

B. Maag, S. Feuerriegel, Mathias Kraus, M. Saar-Tsechansky, Thomas Züger

In medicine, comorbidities refer to the presence of multiple, co-occurring diseases. Due to their co-occurring nature, the course of one comorbidity is often highly dependent on the course of the other disease and, hence, treatments can have significant spill-over effects. Despite the prevalence of comorbidities among patients, a comprehensive statistical framework for modeling the longitudinal dynamics of comorbidities is missing. In this paper, we propose a probabilistic model for analyzing comorbidity dynamics over time in patients. Specifically, we develop a coupled hidden Markov model with a personalized, non-homogeneous transition mechanism, named Comorbidity-HMM. The specification of our Comorbidity-HMM is informed by clinical research: (1) It accounts for different disease states (i. e., acute, stable) in the disease progression by introducing latent states that are of clinical meaning. (2) It models a coupling among the trajectories from comorbidities to capture co-evolution dynamics. (3) It considers between-patient heterogeneity (e. g., risk factors, treatments) in the transition mechanism. Based on our model, we define a spill-over effect that measures the indirect effect of treatments on patient trajectories through coupling (i. e., through comorbidity co-evolution). We evaluated our proposed Comorbidity-HMM based on 675 health trajectories where we investigate the joint progression of diabetes mellitus and chronic liver disease. Compared to alternative models without coupling, we find that our Comorbidity-HMM achieves a superior fit. Further, we quantify the spill-over effect, that is, to what extent diabetes treatments are associated with a change in the chronic liver disease from an acute to a stable disease state. To this end, our model is of direct relevance for both treatment planning and clinical research in the context of comorbidities.

在医学上，合并症是指多种同时发生的疾病。由于它们共同发生的性质，一种共病的病程往往高度依赖于另一种疾病的病程，因此，治疗可能具有显著的溢出效应。尽管患者中普遍存在合并症，但缺乏一个综合的统计框架来模拟合并症的纵向动态。在本文中，我们提出了一个概率模型来分析患者随时间的共病动态。具体来说，我们开发了一个具有个性化、非齐次过渡机制的耦合隐马尔可夫模型，命名为Comorbidity-HMM。我们的合并症- hmm的规范是由临床研究提供的:(1)它通过引入具有临床意义的潜在状态来解释疾病进展中的不同疾病状态(即急性，稳定)。(2)对共病轨迹之间的耦合进行建模，以捕捉共同进化动力学。(3)在转变机制中考虑了患者间的异质性(如危险因素、治疗方法等)。基于我们的模型，我们定义了一个溢出效应，通过耦合(即通过共病共同进化)来测量治疗对患者轨迹的间接影响。我们基于675个健康轨迹评估了我们提出的合并症- hmm，我们调查了糖尿病和慢性肝病的联合进展。与其他没有耦合的模型相比，我们发现我们的共病hmm达到了更好的拟合。此外，我们量化了溢出效应，即糖尿病治疗在多大程度上与慢性肝病从急性到稳定的疾病状态的变化相关。为此，我们的模型对合并症的治疗计划和临床研究都有直接的意义。

{"title":"Modeling longitudinal dynamics of comorbidities","authors":"B. Maag, S. Feuerriegel, Mathias Kraus, M. Saar-Tsechansky, Thomas Züger","doi":"10.1145/3450439.3451871","DOIUrl":"https://doi.org/10.1145/3450439.3451871","url":null,"abstract":"In medicine, comorbidities refer to the presence of multiple, co-occurring diseases. Due to their co-occurring nature, the course of one comorbidity is often highly dependent on the course of the other disease and, hence, treatments can have significant spill-over effects. Despite the prevalence of comorbidities among patients, a comprehensive statistical framework for modeling the longitudinal dynamics of comorbidities is missing. In this paper, we propose a probabilistic model for analyzing comorbidity dynamics over time in patients. Specifically, we develop a coupled hidden Markov model with a personalized, non-homogeneous transition mechanism, named Comorbidity-HMM. The specification of our Comorbidity-HMM is informed by clinical research: (1) It accounts for different disease states (i. e., acute, stable) in the disease progression by introducing latent states that are of clinical meaning. (2) It models a coupling among the trajectories from comorbidities to capture co-evolution dynamics. (3) It considers between-patient heterogeneity (e. g., risk factors, treatments) in the transition mechanism. Based on our model, we define a spill-over effect that measures the indirect effect of treatments on patient trajectories through coupling (i. e., through comorbidity co-evolution). We evaluated our proposed Comorbidity-HMM based on 675 health trajectories where we investigate the joint progression of diabetes mellitus and chronic liver disease. Compared to alternative models without coupling, we find that our Comorbidity-HMM achieves a superior fit. Further, we quantify the spill-over effect, that is, to what extent diabetes treatments are associated with a change in the chronic liver disease from an acute to a stable disease state. To this end, our model is of direct relevance for both treatment planning and clinical research in the context of comorbidities.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77646932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

RNA alternative splicing prediction with discrete compositional energy network 基于离散组成能量网络的RNA选择性剪接预测

Proceedings of the ACM Conference on Health, Inference, and Learning

Pub Date : 2021-03-07 DOI: 10.1145/3450439.3451857

Alvin Chan, A. Korsakova, Y. Ong, F. Winnerdy, K. W. Lim, A. Phan

A single gene can encode for different protein versions through a process called alternative splicing. Since proteins play major roles in cellular functions, aberrant splicing profiles can result in a variety of diseases, including cancers. Alternative splicing is determined by the gene's primary sequence and other regulatory factors such as RNA-binding protein levels. With these as input, we formulate the prediction of RNA splicing as a regression task and build a new training dataset (CAPD) to benchmark learned models. We propose discrete compositional energy network (DCEN) which leverages the hierarchical relationships between splice sites, junctions and transcripts to approach this task. In the case of alternative splicing prediction, DCEN models mRNA transcript probabilities through its constituent splice junctions' energy values. These transcript probabilities are subsequently mapped to relative abundance values of key nucleotides and trained with ground-truth experimental measurements. Through our experiments on CAPD1, we show that DCEN outperforms baselines and ablation variants.2

一个基因可以通过一种称为选择性剪接的过程编码出不同的蛋白质版本。由于蛋白质在细胞功能中起主要作用，异常的剪接谱可导致多种疾病，包括癌症。选择剪接是由基因的一级序列和其他调控因素，如rna结合蛋白水平决定的。以此为输入，我们将RNA剪接的预测作为一个回归任务，并建立一个新的训练数据集(CAPD)来基准学习模型。我们提出离散组合能量网络(DCEN)，它利用剪接位点、连接和转录本之间的层次关系来完成这项任务。在选择性剪接预测的情况下，DCEN通过其组成剪接的能量值来模拟mRNA转录概率。这些转录概率随后被映射到关键核苷酸的相对丰度值，并通过基础真值实验测量进行训练。通过我们在CAPD1上的实验，我们表明DCEN优于基线和消融变体

{"title":"RNA alternative splicing prediction with discrete compositional energy network","authors":"Alvin Chan, A. Korsakova, Y. Ong, F. Winnerdy, K. W. Lim, A. Phan","doi":"10.1145/3450439.3451857","DOIUrl":"https://doi.org/10.1145/3450439.3451857","url":null,"abstract":"A single gene can encode for different protein versions through a process called alternative splicing. Since proteins play major roles in cellular functions, aberrant splicing profiles can result in a variety of diseases, including cancers. Alternative splicing is determined by the gene's primary sequence and other regulatory factors such as RNA-binding protein levels. With these as input, we formulate the prediction of RNA splicing as a regression task and build a new training dataset (CAPD) to benchmark learned models. We propose discrete compositional energy network (DCEN) which leverages the hierarchical relationships between splice sites, junctions and transcripts to approach this task. In the case of alternative splicing prediction, DCEN models mRNA transcript probabilities through its constituent splice junctions' energy values. These transcript probabilities are subsequently mapped to relative abundance values of key nucleotides and trained with ground-truth experimental measurements. Through our experiments on CAPD1, we show that DCEN outperforms baselines and ablation variants.2","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89077137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Learning to predict with supporting evidence: applications to clinical risk prediction 学习用支持证据预测:临床风险预测的应用

Proceedings of the ACM Conference on Health, Inference, and Learning

Pub Date : 2021-03-04 DOI: 10.1145/3450439.3451869

Aniruddh Raghu, J. Guttag, K. Young, E. Pomerantsev, Adrian V. Dalca, Collin M. Stultz

The impact of machine learning models on healthcare will depend on the degree of trust that healthcare professionals place in the predictions made by these models. In this paper, we present a method to provide individuals with clinical expertise with domain-relevant evidence about why a prediction should be trusted. We first design a probabilistic model that relates meaningful latent concepts to prediction targets and observed data. Inference of latent variables in this model corresponds to both making a prediction and providing supporting evidence for that prediction. We present a two-step process to efficiently approximate inference: (i) estimating model parameters using variational learning, and (ii) approximating maximum a posteriori estimation of latent variables in the model using a neural network, trained with an objective derived from the probabilistic model. We demonstrate the method on the task of predicting mortality risk for patients with cardiovascular disease. Specifically, using electrocardiogram and tabular data as input, we show that our approach provides appropriate domain-relevant supporting evidence for accurate predictions.

机器学习模型对医疗保健的影响将取决于医疗保健专业人员对这些模型所做预测的信任程度。在本文中，我们提出了一种方法，为具有临床专业知识的个人提供领域相关证据，说明为什么应该信任预测。我们首先设计了一个概率模型，将有意义的潜在概念与预测目标和观测数据联系起来。该模型中潜在变量的推断既对应于预测，也对应于为该预测提供支持证据。我们提出了一个两步过程来有效地近似推理:(i)使用变分学习估计模型参数，以及(ii)使用神经网络近似模型中潜在变量的最大后验估计，该神经网络使用概率模型衍生的目标进行训练。我们在预测心血管疾病患者死亡风险的任务上展示了该方法。具体而言，使用心电图和表格数据作为输入，我们表明我们的方法为准确预测提供了适当的领域相关支持证据。

{"title":"Learning to predict with supporting evidence: applications to clinical risk prediction","authors":"Aniruddh Raghu, J. Guttag, K. Young, E. Pomerantsev, Adrian V. Dalca, Collin M. Stultz","doi":"10.1145/3450439.3451869","DOIUrl":"https://doi.org/10.1145/3450439.3451869","url":null,"abstract":"The impact of machine learning models on healthcare will depend on the degree of trust that healthcare professionals place in the predictions made by these models. In this paper, we present a method to provide individuals with clinical expertise with domain-relevant evidence about why a prediction should be trusted. We first design a probabilistic model that relates meaningful latent concepts to prediction targets and observed data. Inference of latent variables in this model corresponds to both making a prediction and providing supporting evidence for that prediction. We present a two-step process to efficiently approximate inference: (i) estimating model parameters using variational learning, and (ii) approximating maximum a posteriori estimation of latent variables in the model using a neural network, trained with an objective derived from the probabilistic model. We demonstrate the method on the task of predicting mortality risk for patients with cardiovascular disease. Specifically, using electrocardiogram and tabular data as input, we show that our approach provides appropriate domain-relevant supporting evidence for accurate predictions.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":"2007 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82566732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8