MSIM: A Highly Parallel Near-Memory Accelerator for MinHash Sketch

Aman Sinha, Jhih-Yong Mai, B. Lai
{"title":"MSIM: A Highly Parallel Near-Memory Accelerator for MinHash Sketch","authors":"Aman Sinha, Jhih-Yong Mai, B. Lai","doi":"10.1109/SOCC56010.2022.9908115","DOIUrl":null,"url":null,"abstract":"Genome Assembly is an important Big Data analytics which involves massive computations for similarity searches on sequence databases. Being major component of runtime, similarity searches require careful design for scalable performance. MinHash Sketching is an extensively used data structure in Long-read genome assembly pipelines, which involves generating, randomizing and minimizing a set of hashes for all the k-mers in genome sequences. Compute-hungry MinHash sketch processing on commercially available multi-threaded CPUs suffer from the limited bandwidth of the L1-cache, which causes the CPUs to stall. Near-Data Processing (NDP) is an emerging trend in data-bound Big Data analytics to harness the low-latency, highbandwidth available within the Dual In-line Memory Modules (DIMMs). While NDP architectures have generally been utilized for memory-bound computations, MinHash sketching is a potential application that can gain massive throughput by exploiting memory Banks as higher bandwidth L1-cache.In this work, we propose MSIM, a distributed, highly parallel and efficient hardware-software co-design for accelerating MinHash Sketch processing on light-weight components placed on the DRAM hierarchy. Multiple ASIC-based Processing Engines (PEs) placed at the bank-group-level in MSIM provide highparallelism for low-latency computations. The PEs sequentially access data from all Banks within their bank-group with the help of a dedicated Address calculator, which utilizes an optimal data mapping scheme. The PEs are controlled by a custom Arbiter, which is directly activated by the host CPU using general DDR commands, without requiring any modification to the memory controller or the DIMM standard buses. MSIM requires limited area and power overheads, while displaying up-to 384.9x speedup and 1088.4x energy reduction compared to the baseline multithreaded software solution in our experiments. MSIM achieves 4.26x speedup over high-end GPU, while consuming 26.4x lesser energy. Moreover, MSIM design is highly scalable and extendable in nature.","PeriodicalId":431451,"journal":{"name":"2022 IEEE 35th International System-on-Chip Conference (SOCC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 35th International System-on-Chip Conference (SOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOCC56010.2022.9908115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Genome Assembly is an important Big Data analytics which involves massive computations for similarity searches on sequence databases. Being major component of runtime, similarity searches require careful design for scalable performance. MinHash Sketching is an extensively used data structure in Long-read genome assembly pipelines, which involves generating, randomizing and minimizing a set of hashes for all the k-mers in genome sequences. Compute-hungry MinHash sketch processing on commercially available multi-threaded CPUs suffer from the limited bandwidth of the L1-cache, which causes the CPUs to stall. Near-Data Processing (NDP) is an emerging trend in data-bound Big Data analytics to harness the low-latency, highbandwidth available within the Dual In-line Memory Modules (DIMMs). While NDP architectures have generally been utilized for memory-bound computations, MinHash sketching is a potential application that can gain massive throughput by exploiting memory Banks as higher bandwidth L1-cache.In this work, we propose MSIM, a distributed, highly parallel and efficient hardware-software co-design for accelerating MinHash Sketch processing on light-weight components placed on the DRAM hierarchy. Multiple ASIC-based Processing Engines (PEs) placed at the bank-group-level in MSIM provide highparallelism for low-latency computations. The PEs sequentially access data from all Banks within their bank-group with the help of a dedicated Address calculator, which utilizes an optimal data mapping scheme. The PEs are controlled by a custom Arbiter, which is directly activated by the host CPU using general DDR commands, without requiring any modification to the memory controller or the DIMM standard buses. MSIM requires limited area and power overheads, while displaying up-to 384.9x speedup and 1088.4x energy reduction compared to the baseline multithreaded software solution in our experiments. MSIM achieves 4.26x speedup over high-end GPU, while consuming 26.4x lesser energy. Moreover, MSIM design is highly scalable and extendable in nature.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于MinHash草图的高度并行近内存加速器
基因组组装是一种重要的大数据分析方法,需要对序列数据库进行大量的相似性搜索。作为运行时的主要组成部分,相似性搜索需要仔细设计以实现可伸缩的性能。MinHash草图是长读基因组组装管道中广泛使用的数据结构,它涉及为基因组序列中的所有k-mers生成,随机化和最小化一组哈希。在商用多线程cpu上进行需要大量计算的MinHash草图处理会受到l1缓存带宽的限制,这会导致cpu停止运行。近数据处理(NDP)是数据绑定大数据分析的一个新兴趋势,它利用了双内联内存模块(dimm)内的低延迟、高带宽。虽然NDP架构通常用于内存约束计算,但MinHash草图是一个潜在的应用程序,可以通过利用内存库作为更高带宽的l1缓存来获得大量吞吐量。在这项工作中,我们提出了MSIM,一种分布式,高度并行和高效的软硬件协同设计,用于加速放在DRAM层次结构上的轻量级组件的MinHash草图处理。MSIM中位于银行组级别的多个基于asic的处理引擎(pe)为低延迟计算提供了高并行性。pe在专用地址计算器的帮助下依次访问其银行集团内所有银行的数据,该计算器利用最佳数据映射方案。pe由自定义Arbiter控制,主机CPU使用通用DDR命令直接激活该仲裁器,不需要修改内存控制器或DIMM标准总线。在我们的实验中,与基线多线程软件解决方案相比,MSIM需要有限的面积和功耗开销,同时显示高达384.9倍的加速和1088.4倍的能耗降低。与高端GPU相比,MSIM实现了4.26倍的加速,同时能耗降低26.4倍。此外,MSIM设计在本质上具有高度可扩展性和可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automated Deep Learning Platform for Accelerated Analog Circuit Design RECO-HCON: A High-Throughput Reconfigurable Compact ASCON Processor for Trusted IoT Industrial Sessions Accurate Estimation of the CNN Inference Cost for TinyML Devices SOCC 2022 Cover Page
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1