面向大数据加速的多核平台低耗能绘图引擎

2016 International Great Lakes Symposium on VLSI (GLSVLSI) Pub Date : 2016-05-18 DOI:10.1145/2902961.2902984

A. Kulkarni, Tahmid Abtahi, E. Smith, T. Mohsenin

{"title":"面向大数据加速的多核平台低耗能绘图引擎","authors":"A. Kulkarni, Tahmid Abtahi, E. Smith, T. Mohsenin","doi":"10.1145/2902961.2902984","DOIUrl":null,"url":null,"abstract":"Almost 90% of the data available today was created within the last couple of years, thus Big Data set processing is of utmost importance. Many solutions have been investigated to increase processing speed and memory capacity, however I/O bottleneck is still a critical issue. To tackle this issue we adopt Sketching technique to reduce data communications. Reconstruction of the sketched matrix is performed using Orthogonal Matching Pursuit (OMP). Additionally we propose Gradient Descent OMP (GD-OMP) algorithm to reduce hardware complexity. Big data processing at real-time imposes rigid constraints on sketching kernel, hence to further reduce hardware overhead both algorithms are implemented on a low power domain specific many-core platform called Power Efficient Nano Clusters (PENC). GD-OMP algorithm is evaluated for image reconstruction accuracy and the PENC many-core architecture. Implementation results show that for large matrix sizes GD-OMP algorithm is 1.3× faster and consumes 1.4× less energy than OMP algorithm implementations. Compared to GPU and Quad-Core CPU implementations the PENC many-core reconstructs 5.4× and 9.8× faster respectively for large signal sizes with higher sparsity.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Low energy sketching engines on many-core platform for big data acceleration\",\"authors\":\"A. Kulkarni, Tahmid Abtahi, E. Smith, T. Mohsenin\",\"doi\":\"10.1145/2902961.2902984\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Almost 90% of the data available today was created within the last couple of years, thus Big Data set processing is of utmost importance. Many solutions have been investigated to increase processing speed and memory capacity, however I/O bottleneck is still a critical issue. To tackle this issue we adopt Sketching technique to reduce data communications. Reconstruction of the sketched matrix is performed using Orthogonal Matching Pursuit (OMP). Additionally we propose Gradient Descent OMP (GD-OMP) algorithm to reduce hardware complexity. Big data processing at real-time imposes rigid constraints on sketching kernel, hence to further reduce hardware overhead both algorithms are implemented on a low power domain specific many-core platform called Power Efficient Nano Clusters (PENC). GD-OMP algorithm is evaluated for image reconstruction accuracy and the PENC many-core architecture. Implementation results show that for large matrix sizes GD-OMP algorithm is 1.3× faster and consumes 1.4× less energy than OMP algorithm implementations. Compared to GPU and Quad-Core CPU implementations the PENC many-core reconstructs 5.4× and 9.8× faster respectively for large signal sizes with higher sparsity.\",\"PeriodicalId\":407054,\"journal\":{\"name\":\"2016 International Great Lakes Symposium on VLSI (GLSVLSI)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Great Lakes Symposium on VLSI (GLSVLSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2902961.2902984\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2902961.2902984","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

如今，几乎90%的可用数据都是在过去几年内创建的，因此大数据集处理至关重要。已经研究了许多解决方案来提高处理速度和内存容量，但是I/O瓶颈仍然是一个关键问题。为了解决这个问题，我们采用了草图技术来减少数据通信。利用正交匹配追踪(OMP)对绘制好的矩阵进行重构。此外，我们提出梯度下降OMP (GD-OMP)算法来降低硬件复杂度。实时大数据处理对绘制内核施加了严格的约束，因此为了进一步减少硬件开销，这两种算法都在低功耗领域特定的多核平台上实现，称为功率高效纳米集群(PENC)。对GD-OMP算法的图像重建精度和PENC多核结构进行了评价。实现结果表明，对于大矩阵，GD-OMP算法比OMP算法实现速度快1.3倍，能耗低1.4倍。与GPU和四核CPU实现相比，对于具有更高稀疏性的大信号大小，PENC多核重构速度分别快5.4倍和9.8倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Low energy sketching engines on many-core platform for big data acceleration

Almost 90% of the data available today was created within the last couple of years, thus Big Data set processing is of utmost importance. Many solutions have been investigated to increase processing speed and memory capacity, however I/O bottleneck is still a critical issue. To tackle this issue we adopt Sketching technique to reduce data communications. Reconstruction of the sketched matrix is performed using Orthogonal Matching Pursuit (OMP). Additionally we propose Gradient Descent OMP (GD-OMP) algorithm to reduce hardware complexity. Big data processing at real-time imposes rigid constraints on sketching kernel, hence to further reduce hardware overhead both algorithms are implemented on a low power domain specific many-core platform called Power Efficient Nano Clusters (PENC). GD-OMP algorithm is evaluated for image reconstruction accuracy and the PENC many-core architecture. Implementation results show that for large matrix sizes GD-OMP algorithm is 1.3× faster and consumes 1.4× less energy than OMP algorithm implementations. Compared to GPU and Quad-Core CPU implementations the PENC many-core reconstructs 5.4× and 9.8× faster respectively for large signal sizes with higher sparsity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 International Great Lakes Symposium on VLSI (GLSVLSI)

自引率

0.00%

发文量

期刊最新文献

Concurrent error detection for reliable SHA-3 design Task-resource co-allocation for hotspot minimization in heterogeneous many-core NoCs Multiple attempt write strategy for low energy STT-RAM An enhanced analytical electrical masking model for multiple event transients A novel on-chip impedance calibration method for LPDDR4 interface between DRAM and AP/SoC