From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records

Jiayu Zhou, Fei Wang, Jianying Hu, Jieping Ye
{"title":"From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records","authors":"Jiayu Zhou, Fei Wang, Jianying Hu, Jieping Ye","doi":"10.1145/2623330.2623711","DOIUrl":null,"url":null,"abstract":"Inferring phenotypic patterns from population-scale clinical data is a core computational task in the development of personalized medicine. One important source of data on which to conduct this type of research is patient Electronic Medical Records (EMR). However, the patient EMRs are typically sparse and noisy, which creates significant challenges if we use them directly to represent patient phenotypes. In this paper, we propose a data driven phenotyping framework called Pacifier (PAtient reCord densIFIER), where we interpret the longitudinal EMR data of each patient as a sparse matrix with a feature dimension and a time dimension, and derive more robust patient phenotypes by exploring the latent structure of those matrices. Specifically, we assume that each derived phenotype is composed of a subset of the medical features contained in original patient EMR, whose value evolves smoothly over time. We propose two formulations to achieve such goal. One is Individual Basis Approach (IBA), which assumes the phenotypes are different for every patient. The other is Shared Basis Approach (SBA), which assumes the patient population shares a common set of phenotypes. We develop an efficient optimization algorithm that is capable of resolving both problems efficiently. Finally we validate Pacifier on two real world EMR cohorts for the tasks of early prediction of Congestive Heart Failure (CHF) and End Stage Renal Disease (ESRD). Our results show that the predictive performance in both tasks can be improved significantly by the proposed algorithms (average AUC score improved from 0.689 to 0.816 on CHF, and from 0.756 to 0.838 on ESRD respectively, on diagnosis group granularity). We also illustrate some interesting phenotypes derived from our data.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"123","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2623330.2623711","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 123

Abstract

Inferring phenotypic patterns from population-scale clinical data is a core computational task in the development of personalized medicine. One important source of data on which to conduct this type of research is patient Electronic Medical Records (EMR). However, the patient EMRs are typically sparse and noisy, which creates significant challenges if we use them directly to represent patient phenotypes. In this paper, we propose a data driven phenotyping framework called Pacifier (PAtient reCord densIFIER), where we interpret the longitudinal EMR data of each patient as a sparse matrix with a feature dimension and a time dimension, and derive more robust patient phenotypes by exploring the latent structure of those matrices. Specifically, we assume that each derived phenotype is composed of a subset of the medical features contained in original patient EMR, whose value evolves smoothly over time. We propose two formulations to achieve such goal. One is Individual Basis Approach (IBA), which assumes the phenotypes are different for every patient. The other is Shared Basis Approach (SBA), which assumes the patient population shares a common set of phenotypes. We develop an efficient optimization algorithm that is capable of resolving both problems efficiently. Finally we validate Pacifier on two real world EMR cohorts for the tasks of early prediction of Congestive Heart Failure (CHF) and End Stage Renal Disease (ESRD). Our results show that the predictive performance in both tasks can be improved significantly by the proposed algorithms (average AUC score improved from 0.689 to 0.816 on CHF, and from 0.756 to 0.838 on ESRD respectively, on diagnosis group granularity). We also illustrate some interesting phenotypes derived from our data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从微观到宏观:纵向电子病历致密化的数据驱动表型
从人群规模的临床数据推断表型模式是个体化医疗发展的核心计算任务。进行此类研究的一个重要数据来源是患者电子医疗记录(EMR)。然而,患者的电子病历通常是稀疏和嘈杂的,如果我们直接使用它们来表示患者的表型,这将带来重大挑战。在本文中,我们提出了一个名为Pacifier (PAtient reCord densIFIER)的数据驱动表型框架,其中我们将每个患者的纵向EMR数据解释为具有特征维度和时间维度的稀疏矩阵,并通过探索这些矩阵的潜在结构来获得更健壮的患者表型。具体来说,我们假设每个衍生表型由原始患者EMR中包含的医学特征的子集组成,其价值随着时间的推移而平稳发展。我们提出两种方案来实现这一目标。一种是个体基础方法(IBA),它假设每个患者的表型是不同的。另一种是共享基础方法(SBA),它假设患者群体共享一组共同的表型。我们开发了一种高效的优化算法,能够有效地解决这两个问题。最后,我们在两个真实世界的EMR队列中验证了安抚奶嘴对充血性心力衰竭(CHF)和终末期肾脏疾病(ESRD)的早期预测任务。我们的结果表明,所提出的算法可以显著提高这两个任务的预测性能(在诊断组粒度上,CHF的平均AUC得分分别从0.689提高到0.816,ESRD的平均AUC得分分别从0.756提高到0.838)。我们还说明了从我们的数据中得出的一些有趣的表型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022 KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021 Mutually Beneficial Collaborations to Broaden Participation of Hispanics in Data Science Bringing Inclusive Diversity to Data Science: Opportunities and Challenges A Causal Look at Statistical Definitions of Discrimination
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1