Optimizing Regular Expression Matching with SR-NFA on Multi-Core Systems

Y. Yang, V. Prasanna
{"title":"Optimizing Regular Expression Matching with SR-NFA on Multi-Core Systems","authors":"Y. Yang, V. Prasanna","doi":"10.1109/PACT.2011.73","DOIUrl":null,"url":null,"abstract":"Conventionally, regular expression matching (REM) has been performed by sequentially comparing the regular expression (regex) to the input stream, which can be slow due to excessive backtracking (smith:acsac06). Alternatively, the regex can be converted to a deterministic finite automaton (DFA) for efficient matching, which however may require an extremely large state transition table (STT) due to exponential state explosion (meyer:swat71, yu:ancs06). We propose the segmented regex-NFA (SR-NFA) architecture, where the regex is first compiled into modular nondeterministic finite automata (NFA), then partitioned, optimized, and matched efficiently on modern multi-core processors. SR-NFA offers attack-resilient multi-gigabit per second matching throughput, does not suffer from either backtracking or state explosion, and can be rapidly constructed. For regex sets that construct a DFA with moderate state explosion, i.e., on average 200k states in the STT, the proposed SR-NFA is 367k times faster to construct and update and use 23k times less memory than the DFA approach. Running on an 8-core 2.6 GHz Opteron platform, our prototype achieves 2.2 Gbps average matching throughput for regex sets with up to 4,000 SR-NFA states per regex set.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Parallel Architectures and Compilation Techniques","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2011.73","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Conventionally, regular expression matching (REM) has been performed by sequentially comparing the regular expression (regex) to the input stream, which can be slow due to excessive backtracking (smith:acsac06). Alternatively, the regex can be converted to a deterministic finite automaton (DFA) for efficient matching, which however may require an extremely large state transition table (STT) due to exponential state explosion (meyer:swat71, yu:ancs06). We propose the segmented regex-NFA (SR-NFA) architecture, where the regex is first compiled into modular nondeterministic finite automata (NFA), then partitioned, optimized, and matched efficiently on modern multi-core processors. SR-NFA offers attack-resilient multi-gigabit per second matching throughput, does not suffer from either backtracking or state explosion, and can be rapidly constructed. For regex sets that construct a DFA with moderate state explosion, i.e., on average 200k states in the STT, the proposed SR-NFA is 367k times faster to construct and update and use 23k times less memory than the DFA approach. Running on an 8-core 2.6 GHz Opteron platform, our prototype achieves 2.2 Gbps average matching throughput for regex sets with up to 4,000 SR-NFA states per regex set.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于SR-NFA的多核正则表达式匹配优化
通常,正则表达式匹配(REM)是通过顺序地比较正则表达式(regex)和输入流来执行的,由于过度回溯,这可能会很慢(smith:acsac06)。或者,可以将正则表达式转换为确定性有限自动机(DFA)以进行有效匹配,但是由于指数状态爆炸,这可能需要一个非常大的状态转移表(STT) (meyer:swat71, yu:ancs06)。我们提出了分段的正则表达式-有限自动机(SR-NFA)架构,其中正则表达式首先被编译成模块化的不确定性有限自动机(NFA),然后在现代多核处理器上进行分区、优化和有效匹配。SR-NFA提供了每秒多千兆比特的攻击弹性匹配吞吐量,不受回溯或状态爆炸的影响,并且可以快速构建。对于构建具有中等状态爆炸的DFA的正则表达式集,即STT中平均有200k个状态,所提出的SR-NFA的构建和更新速度比DFA方法快36.7万倍,使用的内存比DFA方法少2.3万倍。在8核2.6 GHz Opteron平台上运行,我们的原型实现了2.2 Gbps的正则表达式集平均匹配吞吐量,每个正则表达式集多达4,000个SR-NFA状态。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Modeling and Performance Evaluation of TSO-Preserving Binary Optimization An Alternative Memory Access Scheduling in Manycore Accelerators DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1