CompactDFA: Generic State Machine Compression for Scalable Pattern Matching

2010 Proceedings IEEE INFOCOM Pub Date : 2010-03-14 DOI:10.1109/INFCOM.2010.5462160

A. Bremler-Barr, David Hay, Yaron Koral

{"title":"CompactDFA: Generic State Machine Compression for Scalable Pattern Matching","authors":"A. Bremler-Barr, David Hay, Yaron Koral","doi":"10.1109/INFCOM.2010.5462160","DOIUrl":null,"url":null,"abstract":"Pattern matching algorithms lie at the core of all contemporary Intrusion Detection Systems (IDS), making it intrinsic to reduce their speed and memory requirements. This paper focuses on the most popular class of pattern-matching algorithms, the Aho-Corasick--like algorithms, which are based on constructing and traversing a Deterministic Finite Automaton (DFA), representing the patterns. While this approach ensures deterministic time guarantees, modern IDSs need to deal with hundreds of patterns, thus requiring to store very large DFAs which usually do not fit in fast memory. This results in a major bottleneck on the throughput of the IDS, as well as its power consumption and cost. We propose a novel method to compress DFAs by observing that the name of the states is meaningless. While regular DFAs store separately each transition between two states, we use this degree of freedom and encode states in such a way that all transitions to a specific state can be represented by a single prefix that defines a set of current states. Our technique applies to a large class of automata, which can be categorized by simple properties. Then, the problem of pattern matching is reduced to the well-studied problem of Longest Prefix Matching (LPM) that can be solved either in TCAM, in commercially available IP-lookup chips, or in software. Specifically, we show that with a TCAM our scheme can reach a throughput of 10 Gbps with low power consumption.","PeriodicalId":259639,"journal":{"name":"2010 Proceedings IEEE INFOCOM","volume":"194 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Proceedings IEEE INFOCOM","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFCOM.2010.5462160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 46

Abstract

Pattern matching algorithms lie at the core of all contemporary Intrusion Detection Systems (IDS), making it intrinsic to reduce their speed and memory requirements. This paper focuses on the most popular class of pattern-matching algorithms, the Aho-Corasick--like algorithms, which are based on constructing and traversing a Deterministic Finite Automaton (DFA), representing the patterns. While this approach ensures deterministic time guarantees, modern IDSs need to deal with hundreds of patterns, thus requiring to store very large DFAs which usually do not fit in fast memory. This results in a major bottleneck on the throughput of the IDS, as well as its power consumption and cost. We propose a novel method to compress DFAs by observing that the name of the states is meaningless. While regular DFAs store separately each transition between two states, we use this degree of freedom and encode states in such a way that all transitions to a specific state can be represented by a single prefix that defines a set of current states. Our technique applies to a large class of automata, which can be categorized by simple properties. Then, the problem of pattern matching is reduced to the well-studied problem of Longest Prefix Matching (LPM) that can be solved either in TCAM, in commercially available IP-lookup chips, or in software. Specifically, we show that with a TCAM our scheme can reach a throughput of 10 Gbps with low power consumption.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于可扩展模式匹配的通用状态机压缩

模式匹配算法是所有当代入侵检测系统(IDS)的核心，降低它们的速度和内存需求是其固有的。本文关注的是最流行的一类模式匹配算法，即类Aho-Corasick算法，它基于构造和遍历表示模式的确定性有限自动机(DFA)。虽然这种方法确保了确定性的时间保证，但现代ids需要处理数百个模式，因此需要存储非常大的dfa，而这些dfa通常不适合快速内存。这将导致IDS吞吐量的主要瓶颈，以及功耗和成本。我们提出了一种新的方法来压缩dfa，通过观察状态的名称是没有意义的。虽然常规dfa分别存储两个状态之间的每个转换，但我们使用这种自由度并以这样一种方式对状态进行编码，即所有到特定状态的转换都可以用定义一组当前状态的单个前缀表示。我们的技术适用于大量的自动机，它们可以通过简单的属性进行分类。然后，将模式匹配问题简化为已经得到充分研究的最长前缀匹配(LPM)问题，该问题可以在TCAM、商用ip查找芯片或软件中解决。具体来说，我们证明了使用TCAM我们的方案可以在低功耗下达到10gbps的吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2010 Proceedings IEEE INFOCOM

自引率

0.00%

发文量