Succinct Representations in Collaborative Filtering: A Case Study using Wavelet Tree on 1,000 Cores

Xiangjun Peng, Qingfeng Wang, Xu Sun, Chunye Gong, Yaohua Wang
{"title":"Succinct Representations in Collaborative Filtering: A Case Study using Wavelet Tree on 1,000 Cores","authors":"Xiangjun Peng, Qingfeng Wang, Xu Sun, Chunye Gong, Yaohua Wang","doi":"10.1109/PDCAT46702.2019.00083","DOIUrl":null,"url":null,"abstract":"User-Item (U-I) matrix has been used as the dominant data infrastructure of Collaborative Filtering (CF). To reduce space consumption in runtime and storage, caused by data sparsity and growing need to accommodate side information in CF design, one needs to go beyond the U-I Matrix. In this paper, we took a case study of Succinct Representations in Collaborative Filtering, rather than using a U-I Matrix. Our key insight is to introduce Succinct Data Structures as a new infrastructure of CF. Towards this, we implemented a User-based K-Nearest-Neighbor CF prototype via Wavelet Tree, by first designing a Accessible Compressed Documents (ACD) to compress U-I data in Wavelet Tree, which is efficient in both storage and runtime. Then, we showed that ACD can be applied to develop an efficient intersection algorithm without decompression, by taking advantage of ACD's characteristics. We evaluated our design on 1,000 cores of Tianhe-II supercomputer, with one of the largest public data set ml-20m. The results showed that our prototype could achieve 3.7 minutes on average to deliver the results.","PeriodicalId":166126,"journal":{"name":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT46702.2019.00083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

User-Item (U-I) matrix has been used as the dominant data infrastructure of Collaborative Filtering (CF). To reduce space consumption in runtime and storage, caused by data sparsity and growing need to accommodate side information in CF design, one needs to go beyond the U-I Matrix. In this paper, we took a case study of Succinct Representations in Collaborative Filtering, rather than using a U-I Matrix. Our key insight is to introduce Succinct Data Structures as a new infrastructure of CF. Towards this, we implemented a User-based K-Nearest-Neighbor CF prototype via Wavelet Tree, by first designing a Accessible Compressed Documents (ACD) to compress U-I data in Wavelet Tree, which is efficient in both storage and runtime. Then, we showed that ACD can be applied to develop an efficient intersection algorithm without decompression, by taking advantage of ACD's characteristics. We evaluated our design on 1,000 cores of Tianhe-II supercomputer, with one of the largest public data set ml-20m. The results showed that our prototype could achieve 3.7 minutes on average to deliver the results.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
协同过滤中的简洁表示:在 1,000 个内核上使用小波树的案例研究
用户-项目(U-I)矩阵一直被用作协同过滤(CF)的主要数据基础结构。为了减少运行时和存储时的空间消耗(这是由数据稀疏性和协同过滤设计中日益增长的容纳边信息的需求造成的),我们需要超越 U-I 矩阵。在本文中,我们对协同过滤中的简洁表示法进行了案例研究,而不是使用 U-I 矩阵。我们的主要观点是引入简洁数据结构作为协同过滤的新基础架构。为此,我们通过小波树实现了一个基于用户的 K 近邻 CF 原型,首先设计了一个可访问压缩文件(ACD)来压缩小波树上的 U-I 数据,它在存储和运行时都很高效。然后,我们利用 ACD 的特点,证明了 ACD 可用于开发无需解压缩的高效交叉算法。我们在天河二号超级计算机的 1000 个内核上评估了我们的设计,并使用了最大的公共数据集之一 ml-20m。结果表明,我们的原型平均可以在 3.7 分钟内得出结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
RNC: Reliable Network Property Classifier Based on Graph Embedding NFV Optimization Algorithm for Shortest Path and Service Function Assignment I/O Scheduling for Limited-Size Burst-Buffers Deployed High Performance Computing Efficient Fault-Tolerant Syndrome Measurement of Quantum Error-Correcting Codes Based on "Flag" Adaptive Clustering Strategy Based on Capacity Weight
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1