Accelerated SGD for Tensor Decomposition of Sparse Count Data

Huan He, Yuanzhe Xi, Joyce C. Ho
{"title":"Accelerated SGD for Tensor Decomposition of Sparse Count Data","authors":"Huan He, Yuanzhe Xi, Joyce C. Ho","doi":"10.1109/ICDMW51313.2020.00047","DOIUrl":null,"url":null,"abstract":"The rapid growth in the collection of high-dimensional data has led to the emergence of tensor decomposition, a powerful analysis method for the exploration of multidimensional data. Since tensor decomposition can extract hidden structures and capture underlying relationships between variables, it has been used successfully across a broad range of applications. However, tensor decomposition is a computationally expensive task, and existing methods developed to decompose large sparse tensors of count data are not efficient enough when being performed with limited computing resources. Therefore, we propose AS-CP, a novel algorithm to accelerate convergence of the stochastic gradient descent based CANDECOMP/PARAFAC (CP) decomposition model through an extrapolation method. The proposed framework can be easily parallelized in an asynchronous way. Our empirical results on three real-world datasets demonstrate that AS-CP decreases the total computation time and scales readily to large datasets without necessitating a high-performance computing platform or environment. The advantage of AS-CP over several state-of-the-art methods is also shown through a machine learning task as the computed factors by AS-CP can help identify better clinical characteristics from EHR data.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"181 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The rapid growth in the collection of high-dimensional data has led to the emergence of tensor decomposition, a powerful analysis method for the exploration of multidimensional data. Since tensor decomposition can extract hidden structures and capture underlying relationships between variables, it has been used successfully across a broad range of applications. However, tensor decomposition is a computationally expensive task, and existing methods developed to decompose large sparse tensors of count data are not efficient enough when being performed with limited computing resources. Therefore, we propose AS-CP, a novel algorithm to accelerate convergence of the stochastic gradient descent based CANDECOMP/PARAFAC (CP) decomposition model through an extrapolation method. The proposed framework can be easily parallelized in an asynchronous way. Our empirical results on three real-world datasets demonstrate that AS-CP decreases the total computation time and scales readily to large datasets without necessitating a high-performance computing platform or environment. The advantage of AS-CP over several state-of-the-art methods is also shown through a machine learning task as the computed factors by AS-CP can help identify better clinical characteristics from EHR data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
稀疏计数数据张量分解的加速SGD
高维数据收集的快速增长导致了张量分解的出现,这是一种探索多维数据的强大分析方法。由于张量分解可以提取隐藏的结构并捕获变量之间的潜在关系,因此它已经成功地应用于广泛的应用中。然而,张量分解是一项计算成本很高的任务,现有的用于分解计数数据的大型稀疏张量的方法在计算资源有限的情况下效率不够高。因此,我们提出了一种新的AS-CP算法,通过外推法加速基于随机梯度下降的CANDECOMP/PARAFAC (CP)分解模型的收敛。提出的框架可以很容易地以异步方式并行化。我们在三个真实数据集上的经验结果表明,AS-CP减少了总计算时间,并且很容易扩展到大型数据集,而不需要高性能的计算平台或环境。as - cp比几种最先进的方法的优势也通过机器学习任务显示出来,因为as - cp计算的因素可以帮助从EHR数据中识别更好的临床特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Synthetic Data by Principal Component Analysis Deep Contextualized Word Embedding for Text-based Online User Profiling to Detect Social Bots on Twitter Integration of Fuzzy and Deep Learning in Three-Way Decisions Mining Heterogeneous Data for Formulation Design Restructuring of Hoeffding Trees for Trapezoidal Data Streams
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1