Succinct Representations in Collaborative Filtering: A Case Study using Wavelet Tree on 1,000 Cores

2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT) Pub Date : 2019-03-12 DOI:10.1109/PDCAT46702.2019.00083

Xiangjun Peng, Qingfeng Wang, Xu Sun, Chunye Gong, Yaohua Wang

{"title":"Succinct Representations in Collaborative Filtering: A Case Study using Wavelet Tree on 1,000 Cores","authors":"Xiangjun Peng, Qingfeng Wang, Xu Sun, Chunye Gong, Yaohua Wang","doi":"10.1109/PDCAT46702.2019.00083","DOIUrl":null,"url":null,"abstract":"User-Item (U-I) matrix has been used as the dominant data infrastructure of Collaborative Filtering (CF). To reduce space consumption in runtime and storage, caused by data sparsity and growing need to accommodate side information in CF design, one needs to go beyond the U-I Matrix. In this paper, we took a case study of Succinct Representations in Collaborative Filtering, rather than using a U-I Matrix. Our key insight is to introduce Succinct Data Structures as a new infrastructure of CF. Towards this, we implemented a User-based K-Nearest-Neighbor CF prototype via Wavelet Tree, by first designing a Accessible Compressed Documents (ACD) to compress U-I data in Wavelet Tree, which is efficient in both storage and runtime. Then, we showed that ACD can be applied to develop an efficient intersection algorithm without decompression, by taking advantage of ACD's characteristics. We evaluated our design on 1,000 cores of Tianhe-II supercomputer, with one of the largest public data set ml-20m. The results showed that our prototype could achieve 3.7 minutes on average to deliver the results.","PeriodicalId":166126,"journal":{"name":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT46702.2019.00083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

User-Item (U-I) matrix has been used as the dominant data infrastructure of Collaborative Filtering (CF). To reduce space consumption in runtime and storage, caused by data sparsity and growing need to accommodate side information in CF design, one needs to go beyond the U-I Matrix. In this paper, we took a case study of Succinct Representations in Collaborative Filtering, rather than using a U-I Matrix. Our key insight is to introduce Succinct Data Structures as a new infrastructure of CF. Towards this, we implemented a User-based K-Nearest-Neighbor CF prototype via Wavelet Tree, by first designing a Accessible Compressed Documents (ACD) to compress U-I data in Wavelet Tree, which is efficient in both storage and runtime. Then, we showed that ACD can be applied to develop an efficient intersection algorithm without decompression, by taking advantage of ACD's characteristics. We evaluated our design on 1,000 cores of Tianhe-II supercomputer, with one of the largest public data set ml-20m. The results showed that our prototype could achieve 3.7 minutes on average to deliver the results.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

协同过滤中的简洁表示：在 1,000 个内核上使用小波树的案例研究

用户-项目（U-I）矩阵一直被用作协同过滤（CF）的主要数据基础结构。为了减少运行时和存储时的空间消耗（这是由数据稀疏性和协同过滤设计中日益增长的容纳边信息的需求造成的），我们需要超越 U-I 矩阵。在本文中，我们对协同过滤中的简洁表示法进行了案例研究，而不是使用 U-I 矩阵。我们的主要观点是引入简洁数据结构作为协同过滤的新基础架构。为此，我们通过小波树实现了一个基于用户的 K 近邻 CF 原型，首先设计了一个可访问压缩文件（ACD）来压缩小波树上的 U-I 数据，它在存储和运行时都很高效。然后，我们利用 ACD 的特点，证明了 ACD 可用于开发无需解压缩的高效交叉算法。我们在天河二号超级计算机的 1000 个内核上评估了我们的设计，并使用了最大的公共数据集之一 ml-20m。结果表明，我们的原型平均可以在 3.7 分钟内得出结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

自引率

0.00%

发文量