深度包检测中多个正则表达式匹配加速算法

S. Sushanth Kumar, Sarang Dharmapurikar, Fang Yu, P. Crowley, J. Turner
{"title":"深度包检测中多个正则表达式匹配加速算法","authors":"S. Sushanth Kumar, Sarang Dharmapurikar, Fang Yu, P. Crowley, J. Turner","doi":"10.1145/1159913.1159952","DOIUrl":null,"url":null,"abstract":"There is a growing demand for network devices capable of examining the content of data packets in order to improve network security and provide application-specific services. Most high performance systems that perform deep packet inspection implement simple string matching algorithms to match packets against a large, but finite set of strings. owever, there is growing interest in the use of regular expression-based pattern matching, since regular expressions offer superior expressive power and flexibility. Deterministic finite automata (DFA) representations are typically used to implement regular expressions. However, DFA representations of regular expression sets arising in network applications require large amounts of memory, limiting their practical application.In this paper, we introduce a new representation for regular expressions, called the Delayed Input DFA (D2FA), which substantially reduces space equirements as compared to a DFA. A D2FA is constructed by transforming a DFA via incrementally replacing several transitions of the automaton with a single default transition. Our approach dramatically reduces the number of distinct transitions between states. For a collection of regular expressions drawn from current commercial and academic systems, a D2FA representation reduces transitions by more than 95%. Given the substantially reduced space equirements, we describe an efficient architecture that can perform deep packet inspection at multi-gigabit rates. Our architecture uses multiple on-chip memories in such a way that each remains uniformly occupied and accessed over a short duration, thus effectively distributing the load and enabling high throughput. Our architecture can provide ostffective packet content scanning at OC-192 rates with memory requirements that are consistent with current ASIC technology.","PeriodicalId":109155,"journal":{"name":"Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications","volume":"193 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"557","resultStr":"{\"title\":\"Algorithms to accelerate multiple regular expressions matching for deep packet inspection\",\"authors\":\"S. Sushanth Kumar, Sarang Dharmapurikar, Fang Yu, P. Crowley, J. Turner\",\"doi\":\"10.1145/1159913.1159952\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is a growing demand for network devices capable of examining the content of data packets in order to improve network security and provide application-specific services. Most high performance systems that perform deep packet inspection implement simple string matching algorithms to match packets against a large, but finite set of strings. owever, there is growing interest in the use of regular expression-based pattern matching, since regular expressions offer superior expressive power and flexibility. Deterministic finite automata (DFA) representations are typically used to implement regular expressions. However, DFA representations of regular expression sets arising in network applications require large amounts of memory, limiting their practical application.In this paper, we introduce a new representation for regular expressions, called the Delayed Input DFA (D2FA), which substantially reduces space equirements as compared to a DFA. A D2FA is constructed by transforming a DFA via incrementally replacing several transitions of the automaton with a single default transition. Our approach dramatically reduces the number of distinct transitions between states. For a collection of regular expressions drawn from current commercial and academic systems, a D2FA representation reduces transitions by more than 95%. Given the substantially reduced space equirements, we describe an efficient architecture that can perform deep packet inspection at multi-gigabit rates. Our architecture uses multiple on-chip memories in such a way that each remains uniformly occupied and accessed over a short duration, thus effectively distributing the load and enabling high throughput. Our architecture can provide ostffective packet content scanning at OC-192 rates with memory requirements that are consistent with current ASIC technology.\",\"PeriodicalId\":109155,\"journal\":{\"name\":\"Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications\",\"volume\":\"193 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"557\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1159913.1159952\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1159913.1159952","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 557

摘要

为了提高网络安全性和提供特定于应用程序的服务,对能够检查数据包内容的网络设备的需求日益增长。大多数执行深度包检测的高性能系统实现简单的字符串匹配算法,以根据大量但有限的字符串集匹配数据包。然而,人们对使用基于正则表达式的模式匹配越来越感兴趣,因为正则表达式提供了优越的表达能力和灵活性。确定性有限自动机(DFA)表示通常用于实现正则表达式。然而,在网络应用程序中产生的正则表达式集的DFA表示需要大量内存,限制了它们的实际应用。在本文中,我们引入了一种新的正则表达式表示,称为延迟输入DFA (D2FA),与DFA相比,它大大减少了空间需求。D2FA是通过用单个默认转换增量地替换自动机的多个转换来转换DFA来构造的。我们的方法极大地减少了状态之间不同转换的数量。对于来自当前商业和学术系统的正则表达式集合,D2FA表示减少了95%以上的转换。考虑到大幅减少的空间需求,我们描述了一种能够以千兆位速率执行深度数据包检测的高效架构。我们的架构使用多个片上存储器,使每个存储器在短时间内保持均匀占用和访问,从而有效地分配负载并实现高吞吐量。我们的架构可以在OC-192速率下提供有效的数据包内容扫描,其内存要求与当前的ASIC技术一致。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Algorithms to accelerate multiple regular expressions matching for deep packet inspection
There is a growing demand for network devices capable of examining the content of data packets in order to improve network security and provide application-specific services. Most high performance systems that perform deep packet inspection implement simple string matching algorithms to match packets against a large, but finite set of strings. owever, there is growing interest in the use of regular expression-based pattern matching, since regular expressions offer superior expressive power and flexibility. Deterministic finite automata (DFA) representations are typically used to implement regular expressions. However, DFA representations of regular expression sets arising in network applications require large amounts of memory, limiting their practical application.In this paper, we introduce a new representation for regular expressions, called the Delayed Input DFA (D2FA), which substantially reduces space equirements as compared to a DFA. A D2FA is constructed by transforming a DFA via incrementally replacing several transitions of the automaton with a single default transition. Our approach dramatically reduces the number of distinct transitions between states. For a collection of regular expressions drawn from current commercial and academic systems, a D2FA representation reduces transitions by more than 95%. Given the substantially reduced space equirements, we describe an efficient architecture that can perform deep packet inspection at multi-gigabit rates. Our architecture uses multiple on-chip memories in such a way that each remains uniformly occupied and accessed over a short duration, thus effectively distributing the load and enabling high throughput. Our architecture can provide ostffective packet content scanning at OC-192 rates with memory requirements that are consistent with current ASIC technology.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Session details: Wireless Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications Session details: Applications Session details: Measurement Session details: Routing 1
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1