CompactDFA: Generic State Machine Compression for Scalable Pattern Matching

A. Bremler-Barr, David Hay, Yaron Koral
{"title":"CompactDFA: Generic State Machine Compression for Scalable Pattern Matching","authors":"A. Bremler-Barr, David Hay, Yaron Koral","doi":"10.1109/INFCOM.2010.5462160","DOIUrl":null,"url":null,"abstract":"Pattern matching algorithms lie at the core of all contemporary Intrusion Detection Systems (IDS), making it intrinsic to reduce their speed and memory requirements. This paper focuses on the most popular class of pattern-matching algorithms, the Aho-Corasick--like algorithms, which are based on constructing and traversing a Deterministic Finite Automaton (DFA), representing the patterns. While this approach ensures deterministic time guarantees, modern IDSs need to deal with hundreds of patterns, thus requiring to store very large DFAs which usually do not fit in fast memory. This results in a major bottleneck on the throughput of the IDS, as well as its power consumption and cost. We propose a novel method to compress DFAs by observing that the name of the states is meaningless. While regular DFAs store separately each transition between two states, we use this degree of freedom and encode states in such a way that all transitions to a specific state can be represented by a single prefix that defines a set of current states. Our technique applies to a large class of automata, which can be categorized by simple properties. Then, the problem of pattern matching is reduced to the well-studied problem of Longest Prefix Matching (LPM) that can be solved either in TCAM, in commercially available IP-lookup chips, or in software. Specifically, we show that with a TCAM our scheme can reach a throughput of 10 Gbps with low power consumption.","PeriodicalId":259639,"journal":{"name":"2010 Proceedings IEEE INFOCOM","volume":"194 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Proceedings IEEE INFOCOM","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFCOM.2010.5462160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 46

Abstract

Pattern matching algorithms lie at the core of all contemporary Intrusion Detection Systems (IDS), making it intrinsic to reduce their speed and memory requirements. This paper focuses on the most popular class of pattern-matching algorithms, the Aho-Corasick--like algorithms, which are based on constructing and traversing a Deterministic Finite Automaton (DFA), representing the patterns. While this approach ensures deterministic time guarantees, modern IDSs need to deal with hundreds of patterns, thus requiring to store very large DFAs which usually do not fit in fast memory. This results in a major bottleneck on the throughput of the IDS, as well as its power consumption and cost. We propose a novel method to compress DFAs by observing that the name of the states is meaningless. While regular DFAs store separately each transition between two states, we use this degree of freedom and encode states in such a way that all transitions to a specific state can be represented by a single prefix that defines a set of current states. Our technique applies to a large class of automata, which can be categorized by simple properties. Then, the problem of pattern matching is reduced to the well-studied problem of Longest Prefix Matching (LPM) that can be solved either in TCAM, in commercially available IP-lookup chips, or in software. Specifically, we show that with a TCAM our scheme can reach a throughput of 10 Gbps with low power consumption.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于可扩展模式匹配的通用状态机压缩
模式匹配算法是所有当代入侵检测系统(IDS)的核心,降低它们的速度和内存需求是其固有的。本文关注的是最流行的一类模式匹配算法,即类Aho-Corasick算法,它基于构造和遍历表示模式的确定性有限自动机(DFA)。虽然这种方法确保了确定性的时间保证,但现代ids需要处理数百个模式,因此需要存储非常大的dfa,而这些dfa通常不适合快速内存。这将导致IDS吞吐量的主要瓶颈,以及功耗和成本。我们提出了一种新的方法来压缩dfa,通过观察状态的名称是没有意义的。虽然常规dfa分别存储两个状态之间的每个转换,但我们使用这种自由度并以这样一种方式对状态进行编码,即所有到特定状态的转换都可以用定义一组当前状态的单个前缀表示。我们的技术适用于大量的自动机,它们可以通过简单的属性进行分类。然后,将模式匹配问题简化为已经得到充分研究的最长前缀匹配(LPM)问题,该问题可以在TCAM、商用ip查找芯片或软件中解决。具体来说,我们证明了使用TCAM我们的方案可以在低功耗下达到10gbps的吞吐量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Truthful Least-Priced-Path Routing in Opportunistic Spectrum Access Networks Overcoming Failures: Fault-tolerance and Logical Centralization in Clean-Slate Network Management Improving QoS in BitTorrent-like VoD Systems Lightweight Mutual Authentication and Ownership Transfer for RFID Systems Overhearing-aware Joint Routing and Rate Selection in Multi-hop Multi-rate UWB-based WPANs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1