A Quantitative Analysis of State Space Model-Based Large Language Model: Study of Hungry Hungry Hippos

IF 1.4 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Computer Architecture Letters Pub Date : 2024-07-03 DOI:10.1109/LCA.2024.3422492

Dongho Yoon;Taehun Kim;Jae W. Lee;Minsoo Rhu

引用次数: 0

Abstract

As the need for processing long contexts in large language models (LLMs) increases, attention-based LLMs face significant challenges due to their high computation and memory requirements. To overcome this challenge, there have been several recent works that seek to alleviate attention's system-level bottlenecks. An approach that has been receiving a lot of attraction lately is state space models (SSMs) thanks to their ability to substantially reduce computational complexity and memory footprint. Despite the excitement around SSMs, there is a lack of an in-depth characterization and analysis on this important model architecture. In this paper, we delve into a representative SSM named Hungry Hungry Hippos (H3), examining its advantages as well as its current limitations. We also discuss future research directions on improving the efficiency of SSMs via hardware architectural support.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于状态空间模型的大型语言模型定量分析：饥饿的河马》研究

随着在大型语言模型（LLM）中处理长语境的需求不断增加，基于注意力的 LLM 因其对计算和内存的高要求而面临巨大挑战。为了克服这一挑战，最近有几项研究试图缓解注意力的系统级瓶颈。状态空间模型（SSM）是近来备受关注的一种方法，因为它能大大降低计算复杂度和内存占用。尽管 SSM 备受关注，但对这种重要的模型架构却缺乏深入的描述和分析。在本文中，我们将深入研究一种具有代表性的 SSM，名为 "饥饿的河马"（Hungry Hungry Hippos，H3），研究它的优势以及目前的局限性。我们还讨论了通过硬件架构支持提高 SSM 效率的未来研究方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Computer Architecture Letters COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-

CiteScore

4.60

自引率

4.30%

发文量

期刊介绍： IEEE Computer Architecture Letters is a rigorously peer-reviewed forum for publishing early, high-impact results in the areas of uni- and multiprocessor computer systems, computer architecture, microarchitecture, workload characterization, performance evaluation and simulation techniques, and power-aware computing. Submissions are welcomed on any topic in computer architecture, especially but not limited to: microprocessor and multiprocessor systems, microarchitecture and ILP processors, workload characterization, performance evaluation and simulation techniques, compiler-hardware and operating system-hardware interactions, interconnect architectures, memory and cache systems, power and thermal issues at the architecture level, I/O architectures and techniques, independent validation of previously published results, analysis of unsuccessful techniques, domain-specific processor architectures (e.g., embedded, graphics, network, etc.), real-time and high-availability architectures, reconfigurable systems.

期刊最新文献

DAWN: Efficient Distribution of Attention Workload in PIM-Enabled Systems for LLM Inference 2025 Reviewers List* Driving the Core Frontend With LiteBTB CTL: A Case for CXL Device-Managed Hugepages H3: Hybrid Architecture Using High Bandwidth Memory and High Bandwidth Flash for Cost-Efficient LLM Inference