Zhuowen Zou, Hanning Chen, Prathyush Poduval, Yeseong Kim, Mahdi Imani, Elaheh Sadredini, Rosario Cammarota, M. Imani
{"title":"BioHD: an efficient genome sequence search platform using HyperDimensional memorization","authors":"Zhuowen Zou, Hanning Chen, Prathyush Poduval, Yeseong Kim, Mahdi Imani, Elaheh Sadredini, Rosario Cammarota, M. Imani","doi":"10.1145/3470496.3527422","DOIUrl":null,"url":null,"abstract":"In this paper, we propose BioHD, a novel genomic sequence searching platform based on Hyper-Dimensional Computing (HDC) for hardware-friendly computation. BioHD transforms inherent sequential processes of genome matching to highly-parallelizable computation tasks. We exploit HDC memorization to encode and represent the genome sequences using high-dimensional vectors. Then, it combines the genome sequences to generate an HDC reference library. During the sequence searching, BioHD performs exact or approximate similarity check of an encoded query with the HDC reference library. Our framework simplifies the required sequence matching operations while introducing a statistical model to control the alignment quality. To get actual advantage from BioHD inherent robustness and parallelism, we design a processing in-memory (PIM) architecture with massive parallelism and compatible with the existing crossbar memory. Our PIM architecture supports all essential BioHD operations natively in memory with minimal modification on the array. We evaluate BioHD accuracy and efficiency on a wide range of genomics data, including COVID-19 databases. Our results indicate that PIM provides 102.8× and 116.1× (9.3× and 13.2×) speedup and energy efficiency compared to the state-of-the-art pattern matching algorithm running on GeForce RTX 3060 Ti GPU (state-of-the-art PIM accelerator).","PeriodicalId":337932,"journal":{"name":"Proceedings of the 49th Annual International Symposium on Computer Architecture","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 49th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3470496.3527422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25
Abstract
In this paper, we propose BioHD, a novel genomic sequence searching platform based on Hyper-Dimensional Computing (HDC) for hardware-friendly computation. BioHD transforms inherent sequential processes of genome matching to highly-parallelizable computation tasks. We exploit HDC memorization to encode and represent the genome sequences using high-dimensional vectors. Then, it combines the genome sequences to generate an HDC reference library. During the sequence searching, BioHD performs exact or approximate similarity check of an encoded query with the HDC reference library. Our framework simplifies the required sequence matching operations while introducing a statistical model to control the alignment quality. To get actual advantage from BioHD inherent robustness and parallelism, we design a processing in-memory (PIM) architecture with massive parallelism and compatible with the existing crossbar memory. Our PIM architecture supports all essential BioHD operations natively in memory with minimal modification on the array. We evaluate BioHD accuracy and efficiency on a wide range of genomics data, including COVID-19 databases. Our results indicate that PIM provides 102.8× and 116.1× (9.3× and 13.2×) speedup and energy efficiency compared to the state-of-the-art pattern matching algorithm running on GeForce RTX 3060 Ti GPU (state-of-the-art PIM accelerator).
本文提出了一种基于超维计算(HDC)的新型基因组序列搜索平台BioHD,用于硬件友好的计算。BioHD将基因组匹配的固有顺序过程转换为高度并行化的计算任务。我们利用HDC记忆来编码和表示使用高维向量的基因组序列。然后,结合基因组序列生成HDC参考库。在序列搜索过程中,BioHD执行与HDC参考库编码查询的精确或近似相似性检查。我们的框架简化了所需的序列匹配操作,同时引入了一个统计模型来控制比对质量。为了充分发挥BioHD固有的鲁棒性和并行性的优势,我们设计了一种具有大规模并行性的内存处理(PIM)架构,并与现有的交叉棒存储器兼容。我们的PIM架构在内存中支持所有基本的BioHD操作,只需对阵列进行最小的修改。我们在包括COVID-19数据库在内的广泛基因组学数据上评估BioHD的准确性和效率。我们的研究结果表明,与运行在GeForce RTX 3060 Ti GPU(最先进的PIM加速器)上的最先进的模式匹配算法相比,PIM提供了102.8倍和116.1倍(9.3倍和13.2倍)的加速和能效。