Finite State Machine Parsing for Internet Protocols: Faster Than You Think

R. Graham, Peter C. Johnson
{"title":"Finite State Machine Parsing for Internet Protocols: Faster Than You Think","authors":"R. Graham, Peter C. Johnson","doi":"10.1109/SPW.2014.34","DOIUrl":null,"url":null,"abstract":"A parser's job is to take unstructured, opaque data and convert it to a structured, semantically meaningful format. As such, parsers often operate at the border between untrusted data sources (e.g., the Internet) and the soft, chewy center of computer systems, where performance and security are paramount. A firewall, for instance, is precisely a trust-creating parser for Internet protocols, permitting valid packets to pass through and dropping or actively rejecting malformed packets. Despite the prevalence of finite state machines (FSMs) in both protocol specifications and protocol implementations, they have gained little traction in parser code for such protocols. Typical reasons for avoiding the FSM computation model claim poor performance, poor scalability, poor expressibility, and difficult or time-consuming programming. In this research report, we present our motivations for and designs of finite state machines to parse a variety of existing Internet protocols, both binary and ASCII. Our hand-written parsers explicitly optimize around L1 cache hit latency, branch misprediction penalty, and program-wide memory overhead to achieve aggressive performance and scalability targets. Our work demonstrates that such parsers are, contrary to popular belief, sufficiently expressive for meaningful protocols, sufficiently performant for high-throughput applications, and sufficiently simple to construct and maintain. We hope that, in light of other research demonstrating the security benefits of such parsers over more complex, Turing-complete codes, our work serves as evidence that certain ``practical'' reasons for avoiding FSM-based parsers are invalid.","PeriodicalId":142224,"journal":{"name":"2014 IEEE Security and Privacy Workshops","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Security and Privacy Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPW.2014.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

A parser's job is to take unstructured, opaque data and convert it to a structured, semantically meaningful format. As such, parsers often operate at the border between untrusted data sources (e.g., the Internet) and the soft, chewy center of computer systems, where performance and security are paramount. A firewall, for instance, is precisely a trust-creating parser for Internet protocols, permitting valid packets to pass through and dropping or actively rejecting malformed packets. Despite the prevalence of finite state machines (FSMs) in both protocol specifications and protocol implementations, they have gained little traction in parser code for such protocols. Typical reasons for avoiding the FSM computation model claim poor performance, poor scalability, poor expressibility, and difficult or time-consuming programming. In this research report, we present our motivations for and designs of finite state machines to parse a variety of existing Internet protocols, both binary and ASCII. Our hand-written parsers explicitly optimize around L1 cache hit latency, branch misprediction penalty, and program-wide memory overhead to achieve aggressive performance and scalability targets. Our work demonstrates that such parsers are, contrary to popular belief, sufficiently expressive for meaningful protocols, sufficiently performant for high-throughput applications, and sufficiently simple to construct and maintain. We hope that, in light of other research demonstrating the security benefits of such parsers over more complex, Turing-complete codes, our work serves as evidence that certain ``practical'' reasons for avoiding FSM-based parsers are invalid.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
互联网协议的有限状态机解析:比你想象的要快
解析器的工作是获取非结构化、不透明的数据,并将其转换为结构化、语义上有意义的格式。因此,解析器经常在不受信任的数据源(例如Internet)和计算机系统的软的、有嚼劲的中心之间的边界上操作,在那里性能和安全性是至关重要的。例如,防火墙就是为Internet协议创建信任的解析器,它允许有效的数据包通过,并丢弃或主动拒绝不正确的数据包。尽管有限状态机(fsm)在协议规范和协议实现中都很流行,但它们在这类协议的解析器代码中却很少受到关注。避免使用FSM计算模型的典型原因是性能差、可扩展性差、可表达性差、编程困难或耗时。在本研究报告中,我们介绍了有限状态机的动机和设计,以解析各种现有的互联网协议,包括二进制和ASCII。我们手工编写的解析器明确地围绕L1缓存命中延迟、分支错误预测惩罚和程序范围内的内存开销进行优化,以实现积极的性能和可伸缩性目标。我们的工作表明,与普遍的看法相反,这样的解析器对于有意义的协议具有足够的表现力,对于高吞吐量的应用程序具有足够的性能,并且构造和维护足够简单。我们希望,鉴于其他研究证明了这种解析器相对于更复杂的图灵完备代码的安全性优势,我们的工作可以作为证据,证明避免基于fsm的解析器的某些“实际”理由是无效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analysis of Unintentional Insider Threats Deriving from Social Engineering Exploits Detecting Unknown Insider Threat Scenarios Can We Identify NAT Behavior by Analyzing Traffic Flows? A Case Study in Malware Research Ethics Education: When Teaching Bad is Good Resilience as a New Enforcement Model for IT Security Based on Usage Control
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1