Optimizing Regular Expression Matching with SR-NFA on Multi-Core Systems

2011 International Conference on Parallel Architectures and Compilation Techniques Pub Date : 2011-10-10 DOI:10.1109/PACT.2011.73

Y. Yang, V. Prasanna

{"title":"Optimizing Regular Expression Matching with SR-NFA on Multi-Core Systems","authors":"Y. Yang, V. Prasanna","doi":"10.1109/PACT.2011.73","DOIUrl":null,"url":null,"abstract":"Conventionally, regular expression matching (REM) has been performed by sequentially comparing the regular expression (regex) to the input stream, which can be slow due to excessive backtracking (smith:acsac06). Alternatively, the regex can be converted to a deterministic finite automaton (DFA) for efficient matching, which however may require an extremely large state transition table (STT) due to exponential state explosion (meyer:swat71, yu:ancs06). We propose the segmented regex-NFA (SR-NFA) architecture, where the regex is first compiled into modular nondeterministic finite automata (NFA), then partitioned, optimized, and matched efficiently on modern multi-core processors. SR-NFA offers attack-resilient multi-gigabit per second matching throughput, does not suffer from either backtracking or state explosion, and can be rapidly constructed. For regex sets that construct a DFA with moderate state explosion, i.e., on average 200k states in the STT, the proposed SR-NFA is 367k times faster to construct and update and use 23k times less memory than the DFA approach. Running on an 8-core 2.6 GHz Opteron platform, our prototype achieves 2.2 Gbps average matching throughput for regex sets with up to 4,000 SR-NFA states per regex set.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Parallel Architectures and Compilation Techniques","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2011.73","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Conventionally, regular expression matching (REM) has been performed by sequentially comparing the regular expression (regex) to the input stream, which can be slow due to excessive backtracking (smith:acsac06). Alternatively, the regex can be converted to a deterministic finite automaton (DFA) for efficient matching, which however may require an extremely large state transition table (STT) due to exponential state explosion (meyer:swat71, yu:ancs06). We propose the segmented regex-NFA (SR-NFA) architecture, where the regex is first compiled into modular nondeterministic finite automata (NFA), then partitioned, optimized, and matched efficiently on modern multi-core processors. SR-NFA offers attack-resilient multi-gigabit per second matching throughput, does not suffer from either backtracking or state explosion, and can be rapidly constructed. For regex sets that construct a DFA with moderate state explosion, i.e., on average 200k states in the STT, the proposed SR-NFA is 367k times faster to construct and update and use 23k times less memory than the DFA approach. Running on an 8-core 2.6 GHz Opteron platform, our prototype achieves 2.2 Gbps average matching throughput for regex sets with up to 4,000 SR-NFA states per regex set.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于SR-NFA的多核正则表达式匹配优化

通常，正则表达式匹配(REM)是通过顺序地比较正则表达式(regex)和输入流来执行的，由于过度回溯，这可能会很慢(smith:acsac06)。或者，可以将正则表达式转换为确定性有限自动机(DFA)以进行有效匹配，但是由于指数状态爆炸，这可能需要一个非常大的状态转移表(STT) (meyer:swat71, yu:ancs06)。我们提出了分段的正则表达式-有限自动机(SR-NFA)架构，其中正则表达式首先被编译成模块化的不确定性有限自动机(NFA)，然后在现代多核处理器上进行分区、优化和有效匹配。SR-NFA提供了每秒多千兆比特的攻击弹性匹配吞吐量，不受回溯或状态爆炸的影响，并且可以快速构建。对于构建具有中等状态爆炸的DFA的正则表达式集，即STT中平均有200k个状态，所提出的SR-NFA的构建和更新速度比DFA方法快36.7万倍，使用的内存比DFA方法少2.3万倍。在8核2.6 GHz Opteron平台上运行，我们的原型实现了2.2 Gbps的正则表达式集平均匹配吞吐量，每个正则表达式集多达4,000个SR-NFA状态。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 International Conference on Parallel Architectures and Compilation Techniques

自引率

0.00%

发文量