模型驱动的高阶张量稀疏CP分解

Jiajia Li, Jee W. Choi, Ioakeim Perros, Jimeng Sun, R. Vuduc
{"title":"模型驱动的高阶张量稀疏CP分解","authors":"Jiajia Li, Jee W. Choi, Ioakeim Perros, Jimeng Sun, R. Vuduc","doi":"10.1109/IPDPS.2017.80","DOIUrl":null,"url":null,"abstract":"Given an input tensor, its CANDECOMP/PARAFAC decomposition (or CPD) is a low-rank representation. CPDs are of particular interest in data analysis and mining, especially when the data tensor is sparse and of higher order (dimension). This paper focuses on the central bottleneck of a CPD algorithm, which is evaluating a sequence of matricized tensor times Khatri-Rao products (MTTKRPs). To speed up the MTTKRP sequence, we propose a novel, adaptive tensor memoization algorithm, AdaTM. Besides removing redundant computations within the MTTKRP sequence, which potentially reduces its overall asymptotic complexity, our technique also allows a user to make a space-time tradeoff by automatically tuning algorithmic and machine parameters using a model-driven framework. Our method improves as the tensor order grows, making its performance more scalable for higher-order data problems. We show speedups of up to 8× and 820× on real sparse data tensors with orders as high as 85 over the SPLATT package and Tensor Toolbox library respectively; and on a full CPD algorithm (CP-ALS), AdaTM can be up to 8× faster than state-of-the-art method implemented in SPLATT.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"53","resultStr":"{\"title\":\"Model-Driven Sparse CP Decomposition for Higher-Order Tensors\",\"authors\":\"Jiajia Li, Jee W. Choi, Ioakeim Perros, Jimeng Sun, R. Vuduc\",\"doi\":\"10.1109/IPDPS.2017.80\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given an input tensor, its CANDECOMP/PARAFAC decomposition (or CPD) is a low-rank representation. CPDs are of particular interest in data analysis and mining, especially when the data tensor is sparse and of higher order (dimension). This paper focuses on the central bottleneck of a CPD algorithm, which is evaluating a sequence of matricized tensor times Khatri-Rao products (MTTKRPs). To speed up the MTTKRP sequence, we propose a novel, adaptive tensor memoization algorithm, AdaTM. Besides removing redundant computations within the MTTKRP sequence, which potentially reduces its overall asymptotic complexity, our technique also allows a user to make a space-time tradeoff by automatically tuning algorithmic and machine parameters using a model-driven framework. Our method improves as the tensor order grows, making its performance more scalable for higher-order data problems. We show speedups of up to 8× and 820× on real sparse data tensors with orders as high as 85 over the SPLATT package and Tensor Toolbox library respectively; and on a full CPD algorithm (CP-ALS), AdaTM can be up to 8× faster than state-of-the-art method implemented in SPLATT.\",\"PeriodicalId\":209524,\"journal\":{\"name\":\"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"53\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2017.80\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2017.80","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 53

摘要

给定一个输入张量,它的CANDECOMP/PARAFAC分解(或CPD)是一个低秩表示。cpd在数据分析和挖掘中特别有趣,特别是当数据张量是稀疏的和高阶(维)的时候。本文重点研究了一种CPD算法的中心瓶颈,即矩阵张量乘Khatri-Rao积(MTTKRPs)序列的求值。为了加快MTTKRP序列的速度,我们提出了一种新的自适应张量记忆算法AdaTM。除了消除MTTKRP序列中的冗余计算(这可能会降低其总体渐近复杂性)之外,我们的技术还允许用户通过使用模型驱动框架自动调整算法和机器参数来进行时空权衡。我们的方法随着张量阶的增长而改进,使其性能在高阶数据问题上更具可扩展性。我们分别在SPLATT包和Tensor Toolbox库上显示了高达85阶的真实稀疏数据张量的加速高达8倍和820倍;在全CPD算法(CP-ALS)上,AdaTM可以比SPLATT中实现的最先进方法快8倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Model-Driven Sparse CP Decomposition for Higher-Order Tensors
Given an input tensor, its CANDECOMP/PARAFAC decomposition (or CPD) is a low-rank representation. CPDs are of particular interest in data analysis and mining, especially when the data tensor is sparse and of higher order (dimension). This paper focuses on the central bottleneck of a CPD algorithm, which is evaluating a sequence of matricized tensor times Khatri-Rao products (MTTKRPs). To speed up the MTTKRP sequence, we propose a novel, adaptive tensor memoization algorithm, AdaTM. Besides removing redundant computations within the MTTKRP sequence, which potentially reduces its overall asymptotic complexity, our technique also allows a user to make a space-time tradeoff by automatically tuning algorithmic and machine parameters using a model-driven framework. Our method improves as the tensor order grows, making its performance more scalable for higher-order data problems. We show speedups of up to 8× and 820× on real sparse data tensors with orders as high as 85 over the SPLATT package and Tensor Toolbox library respectively; and on a full CPD algorithm (CP-ALS), AdaTM can be up to 8× faster than state-of-the-art method implemented in SPLATT.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL Toucan — A Translator for Communication Tolerant MPI Applications Production Hardware Overprovisioning: Real-World Performance Optimization Using an Extensible Power-Aware Resource Management Framework Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs Dynamic Memory-Aware Task-Tree Scheduling
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1