Distributed Tensor Decomposition for Large Scale Health Analytics.

Proceedings of the ... International World-Wide Web Conference. International WWW Conference Pub Date : 2019-05-01 DOI:10.1145/3308558.3313548

Huan He, Jette Henderson, Joyce C Ho

{"title":"Distributed Tensor Decomposition for Large Scale Health Analytics.","authors":"Huan He, Jette Henderson, Joyce C Ho","doi":"10.1145/3308558.3313548","DOIUrl":null,"url":null,"abstract":"<p><p>In the past few decades, there has been rapid growth in quantity and variety of healthcare data. These large sets of data are usually high dimensional (e.g. patients, their diagnoses, and medications to treat their diagnoses) and cannot be adequately represented as matrices. Thus, many existing algorithms can not analyze them. To accommodate these high dimensional data, tensor factorization, which can be viewed as a higher-order extension of methods like PCA, has attracted much attention and emerged as a promising solution. However, tensor factorization is a computationally expensive task, and existing methods developed to factor large tensors are not flexible enough for real-world situations. To address this scaling problem more efficiently, we introduce SGranite, a distributed, scalable, and sparse tensor factorization method fit through stochastic gradient descent. SGranite offers three contributions: (1) Scalability: it employs a block partitioning and parallel processing design and thus scales to large tensors, (2) Accuracy: we show that our method can achieve results faster without sacrificing the quality of the tensor decomposition, and (3) FlexibleConstraints: we show our approach can encompass various kinds of constraints including l2 norm, l1 norm, and logistic regularization. We demonstrate SGranite's capabilities in two real-world use cases. In the first, we use Google searches for flu-like symptoms to characterize and predict influenza patterns. In the second, we use SGranite to extract clinically interesting sets (i.e., phenotypes) of patients from electronic health records. Through these case studies, we show SGranite has the potential to be used to rapidly characterize, predict, and manage a large multimodal datasets, thereby promising a novel, data-driven solution that can benefit very large segments of the population.</p>","PeriodicalId":74532,"journal":{"name":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","volume":"2019 ","pages":"659-669"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3308558.3313548","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International World-Wide Web Conference. International WWW Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3308558.3313548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

In the past few decades, there has been rapid growth in quantity and variety of healthcare data. These large sets of data are usually high dimensional (e.g. patients, their diagnoses, and medications to treat their diagnoses) and cannot be adequately represented as matrices. Thus, many existing algorithms can not analyze them. To accommodate these high dimensional data, tensor factorization, which can be viewed as a higher-order extension of methods like PCA, has attracted much attention and emerged as a promising solution. However, tensor factorization is a computationally expensive task, and existing methods developed to factor large tensors are not flexible enough for real-world situations. To address this scaling problem more efficiently, we introduce SGranite, a distributed, scalable, and sparse tensor factorization method fit through stochastic gradient descent. SGranite offers three contributions: (1) Scalability: it employs a block partitioning and parallel processing design and thus scales to large tensors, (2) Accuracy: we show that our method can achieve results faster without sacrificing the quality of the tensor decomposition, and (3) FlexibleConstraints: we show our approach can encompass various kinds of constraints including l2 norm, l1 norm, and logistic regularization. We demonstrate SGranite's capabilities in two real-world use cases. In the first, we use Google searches for flu-like symptoms to characterize and predict influenza patterns. In the second, we use SGranite to extract clinically interesting sets (i.e., phenotypes) of patients from electronic health records. Through these case studies, we show SGranite has the potential to be used to rapidly characterize, predict, and manage a large multimodal datasets, thereby promising a novel, data-driven solution that can benefit very large segments of the population.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于大规模健康分析的分布式张量分解。

在过去的几十年里，医疗保健数据的数量和种类都在快速增长。这些大数据集通常是高维的（例如，患者、他们的诊断和治疗他们诊断的药物），不能充分表示为矩阵。因此，许多现有的算法无法对其进行分析。为了适应这些高维数据，张量因子分解可以被视为PCA等方法的高阶扩展，它引起了人们的广泛关注，并成为一种很有前途的解决方案。然而，张量因子分解是一项计算成本高昂的任务，并且现有的对大张量进行因子分解的方法对于现实世界的情况来说不够灵活。为了更有效地解决这个缩放问题，我们引入了SGranite，这是一种通过随机梯度下降拟合的分布式、可缩放和稀疏张量分解方法。SGranite提供了三个贡献：（1）可扩展性：它采用了块划分和并行处理设计，因此可扩展到大张量；（2）准确性：我们表明，我们的方法可以在不牺牲张量分解质量的情况下更快地获得结果，以及后勤正规化。我们在两个真实世界的用例中展示了SGranite的功能。首先，我们使用谷歌搜索流感样症状来表征和预测流感模式。第二，我们使用SGranite从电子健康记录中提取患者的临床感兴趣的集合（即表型）。通过这些案例研究，我们表明SGranite有潜力用于快速表征、预测和管理大型多模式数据集，从而有望成为一种新的数据驱动解决方案，使很大一部分人群受益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the ... International World-Wide Web Conference. International WWW Conference

自引率

0.00%

发文量