Data-Pattern-Driven LUT for Efficient In-Cache Computing in CNNs Acceleration

IF 1.4 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Computer Architecture Letters Pub Date : 2025-03-05 DOI:10.1109/LCA.2025.3548080

Zhengpan Fei;Mingchuan Lyu;Satoshi Kawakami;Koji Inoue

{"title":"Data-Pattern-Driven LUT for Efficient In-Cache Computing in CNNs Acceleration","authors":"Zhengpan Fei;Mingchuan Lyu;Satoshi Kawakami;Koji Inoue","doi":"10.1109/LCA.2025.3548080","DOIUrl":null,"url":null,"abstract":"The lookup table (LUT)-based Processing-in-Memory (PIM) solutions perform computations by looking up precomputed results stored in LUTs, providing exceptional efficiency for complex operations such as multiplication, making them highly suitable for energy- and latency-efficient Convolutional Neural Network (CNN) inference tasks. However, including all possible results in the LUT naively demands exponential hardware resources, significantly limiting parallelism and increasing hardware area, latency, and power overhead. While decomposition and compression techniques can reduce the LUT size, they also introduce considerable memory access overhead and additional operations. To address these challenges, we conduct an extensive analysis to identify which data portions significantly impact accuracy in CNNs. Based on the insight that key data is concentrated in a small range, we propose a data-pattern-driven (DPD) optimization strategy, which approximates less critical data to drastically reduce LUT size while preserving computational efficiency with acceptable accuracy loss.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 1","pages":"81-84"},"PeriodicalIF":1.4000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Architecture Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10910157/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The lookup table (LUT)-based Processing-in-Memory (PIM) solutions perform computations by looking up precomputed results stored in LUTs, providing exceptional efficiency for complex operations such as multiplication, making them highly suitable for energy- and latency-efficient Convolutional Neural Network (CNN) inference tasks. However, including all possible results in the LUT naively demands exponential hardware resources, significantly limiting parallelism and increasing hardware area, latency, and power overhead. While decomposition and compression techniques can reduce the LUT size, they also introduce considerable memory access overhead and additional operations. To address these challenges, we conduct an extensive analysis to identify which data portions significantly impact accuracy in CNNs. Based on the insight that key data is concentrated in a small range, we propose a data-pattern-driven (DPD) optimization strategy, which approximates less critical data to drastically reduce LUT size while preserving computational efficiency with acceptable accuracy loss.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

数据模式驱动的LUT在cnn加速中的高效缓存计算

基于查找表（LUT）的内存中处理（PIM）解决方案通过查找存储在LUT中的预计算结果来执行计算，为乘法等复杂操作提供了卓越的效率，使其非常适合节能和延迟高效的卷积神经网络（CNN）推理任务。然而，在LUT中包含所有可能的结果天真地需要指数级的硬件资源，这极大地限制了并行性，并增加了硬件面积、延迟和功耗开销。虽然分解和压缩技术可以减少LUT的大小，但它们也引入了相当大的内存访问开销和额外的操作。为了应对这些挑战，我们进行了广泛的分析，以确定哪些数据部分显著影响cnn的准确性。基于关键数据集中在小范围内的洞察力，我们提出了一种数据模式驱动（DPD）优化策略，该策略近似于不太关键的数据，以大幅减少LUT大小，同时在可接受的精度损失下保持计算效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Computer Architecture Letters COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-

CiteScore

4.60

自引率

4.30%

发文量

期刊介绍： IEEE Computer Architecture Letters is a rigorously peer-reviewed forum for publishing early, high-impact results in the areas of uni- and multiprocessor computer systems, computer architecture, microarchitecture, workload characterization, performance evaluation and simulation techniques, and power-aware computing. Submissions are welcomed on any topic in computer architecture, especially but not limited to: microprocessor and multiprocessor systems, microarchitecture and ILP processors, workload characterization, performance evaluation and simulation techniques, compiler-hardware and operating system-hardware interactions, interconnect architectures, memory and cache systems, power and thermal issues at the architecture level, I/O architectures and techniques, independent validation of previously published results, analysis of unsuccessful techniques, domain-specific processor architectures (e.g., embedded, graphics, network, etc.), real-time and high-availability architectures, reconfigurable systems.

期刊最新文献

DAWN: Efficient Distribution of Attention Workload in PIM-Enabled Systems for LLM Inference 2025 Reviewers List* Driving the Core Frontend With LiteBTB CTL: A Case for CXL Device-Managed Hugepages H3: Hybrid Architecture Using High Bandwidth Memory and High Bandwidth Flash for Cost-Efficient LLM Inference