矢量矩阵乘法的数字内存随机计算体系结构

IF 3.8 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY Frontiers in Nanotechnology Pub Date : 2023-07-24 DOI:10.3389/fnano.2023.1147396

Shady O. Agwa, T. Prodromakis

{"title":"矢量矩阵乘法的数字内存随机计算体系结构","authors":"Shady O. Agwa, T. Prodromakis","doi":"10.3389/fnano.2023.1147396","DOIUrl":null,"url":null,"abstract":"The applications of the Artificial Intelligence are currently dominating the technology landscape. Meanwhile, the conventional Von Neumann architectures are struggling with the data-movement bottleneck to meet the ever-increasing performance demands of these data-centric applications. Moreover, The vector-matrix multiplication cost, in the binary domain, is a major computational bottleneck for these applications. This paper introduces a novel digital in-memory stochastic computing architecture that leverages the simplicity of the stochastic computing for in-memory vector-matrix multiplication. The proposed architecture incorporates several new approaches including a new stochastic number generator with ideal binary-to-stochastic mapping, a best seeding approach for accurate-enough low stochastic bit-precisions, a hybrid stochastic-binary accumulation approach for vector-matrix multiplication, and the conversion of conventional memory read operations into on-the-fly stochastic multiplication operations with negligible overhead. Thanks to the combination of these approaches, the accuracy analysis of the vector-matrix multiplication benchmark shows that scaling down the stochastic bit-precision from 16-bit to 4-bit achieves nearly the same average error (less than 3%). The derived analytical model of the proposed in-memory stochastic computing architecture demonstrates that the 4-bit stochastic architecture achieves the highest throughput per sub-array (122 Ops/Cycle), which is better than the 16-bit stochastic precision by 4.36x, while still maintaining a small average error of 2.25%.","PeriodicalId":34432,"journal":{"name":"Frontiers in Nanotechnology","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2023-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Digital in-memory stochastic computing architecture for vector-matrix multiplication\",\"authors\":\"Shady O. Agwa, T. Prodromakis\",\"doi\":\"10.3389/fnano.2023.1147396\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The applications of the Artificial Intelligence are currently dominating the technology landscape. Meanwhile, the conventional Von Neumann architectures are struggling with the data-movement bottleneck to meet the ever-increasing performance demands of these data-centric applications. Moreover, The vector-matrix multiplication cost, in the binary domain, is a major computational bottleneck for these applications. This paper introduces a novel digital in-memory stochastic computing architecture that leverages the simplicity of the stochastic computing for in-memory vector-matrix multiplication. The proposed architecture incorporates several new approaches including a new stochastic number generator with ideal binary-to-stochastic mapping, a best seeding approach for accurate-enough low stochastic bit-precisions, a hybrid stochastic-binary accumulation approach for vector-matrix multiplication, and the conversion of conventional memory read operations into on-the-fly stochastic multiplication operations with negligible overhead. Thanks to the combination of these approaches, the accuracy analysis of the vector-matrix multiplication benchmark shows that scaling down the stochastic bit-precision from 16-bit to 4-bit achieves nearly the same average error (less than 3%). The derived analytical model of the proposed in-memory stochastic computing architecture demonstrates that the 4-bit stochastic architecture achieves the highest throughput per sub-array (122 Ops/Cycle), which is better than the 16-bit stochastic precision by 4.36x, while still maintaining a small average error of 2.25%.\",\"PeriodicalId\":34432,\"journal\":{\"name\":\"Frontiers in Nanotechnology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2023-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Nanotechnology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fnano.2023.1147396\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Nanotechnology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fnano.2023.1147396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 1

摘要

人工智能的应用目前在技术领域占据主导地位。与此同时，传统的Von Neumann体系结构正在努力解决数据移动瓶颈，以满足这些以数据为中心的应用程序不断增长的性能需求。此外，二进制域中的向量矩阵乘法成本是这些应用的主要计算瓶颈。本文介绍了一种新的数字内存随机计算架构，该架构利用随机计算的简单性进行内存向量矩阵乘法。所提出的架构包含了几种新的方法，包括具有理想二进制到随机映射的新随机数生成器、用于足够精确的低随机比特精度的最佳种子方法、用于向量矩阵乘法的混合随机二进制累积方法，以及将传统的存储器读取操作转换为具有可忽略开销的动态随机乘法操作。由于这些方法的结合，矢量矩阵乘法基准的精度分析表明，将随机比特精度从16比特缩减到4比特可以实现几乎相同的平均误差（小于3%）。所提出的内存中随机计算架构的推导分析模型表明，4位随机架构实现了最高的每子阵列吞吐量（122个操作/周期），比16位随机精度高4.36倍，同时仍保持2.25%的小平均误差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Digital in-memory stochastic computing architecture for vector-matrix multiplication

The applications of the Artificial Intelligence are currently dominating the technology landscape. Meanwhile, the conventional Von Neumann architectures are struggling with the data-movement bottleneck to meet the ever-increasing performance demands of these data-centric applications. Moreover, The vector-matrix multiplication cost, in the binary domain, is a major computational bottleneck for these applications. This paper introduces a novel digital in-memory stochastic computing architecture that leverages the simplicity of the stochastic computing for in-memory vector-matrix multiplication. The proposed architecture incorporates several new approaches including a new stochastic number generator with ideal binary-to-stochastic mapping, a best seeding approach for accurate-enough low stochastic bit-precisions, a hybrid stochastic-binary accumulation approach for vector-matrix multiplication, and the conversion of conventional memory read operations into on-the-fly stochastic multiplication operations with negligible overhead. Thanks to the combination of these approaches, the accuracy analysis of the vector-matrix multiplication benchmark shows that scaling down the stochastic bit-precision from 16-bit to 4-bit achieves nearly the same average error (less than 3%). The derived analytical model of the proposed in-memory stochastic computing architecture demonstrates that the 4-bit stochastic architecture achieves the highest throughput per sub-array (122 Ops/Cycle), which is better than the 16-bit stochastic precision by 4.36x, while still maintaining a small average error of 2.25%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊