{"title":"The Expressive Capacity of State Space Models: A Formal Language Perspective","authors":"Yash Sarrof, Yana Veitsman, Michael Hahn","doi":"arxiv-2405.17394","DOIUrl":null,"url":null,"abstract":"Recently, recurrent models based on linear state space models (SSMs) have\nshown promising performance in language modeling (LM), competititve with\ntransformers. However, there is little understanding of the in-principle\nabilities of such models, which could provide useful guidance to the search for\nbetter LM architectures. We present a comprehensive theoretical study of the\ncapacity of such SSMs as it compares to that of transformers and traditional\nRNNs. We find that SSMs and transformers have overlapping but distinct\nstrengths. In star-free state tracking, SSMs implement straightforward and\nexact solutions to problems that transformers struggle to represent exactly.\nThey can also model bounded hierarchical structure with optimal memory even\nwithout simulating a stack. On the other hand, we identify a design choice in\ncurrent SSMs that limits their expressive power. We discuss implications for\nSSM and LM research, and verify results empirically on a recent SSM, Mamba.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"98 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Formal Languages and Automata Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.17394","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, recurrent models based on linear state space models (SSMs) have
shown promising performance in language modeling (LM), competititve with
transformers. However, there is little understanding of the in-principle
abilities of such models, which could provide useful guidance to the search for
better LM architectures. We present a comprehensive theoretical study of the
capacity of such SSMs as it compares to that of transformers and traditional
RNNs. We find that SSMs and transformers have overlapping but distinct
strengths. In star-free state tracking, SSMs implement straightforward and
exact solutions to problems that transformers struggle to represent exactly.
They can also model bounded hierarchical structure with optimal memory even
without simulating a stack. On the other hand, we identify a design choice in
current SSMs that limits their expressive power. We discuss implications for
SSM and LM research, and verify results empirically on a recent SSM, Mamba.