Finite State Machine Parsing for Internet Protocols: Faster Than You Think

2014 IEEE Security and Privacy Workshops Pub Date : 2014-05-17 DOI:10.1109/SPW.2014.34

R. Graham, Peter C. Johnson

{"title":"Finite State Machine Parsing for Internet Protocols: Faster Than You Think","authors":"R. Graham, Peter C. Johnson","doi":"10.1109/SPW.2014.34","DOIUrl":null,"url":null,"abstract":"A parser's job is to take unstructured, opaque data and convert it to a structured, semantically meaningful format. As such, parsers often operate at the border between untrusted data sources (e.g., the Internet) and the soft, chewy center of computer systems, where performance and security are paramount. A firewall, for instance, is precisely a trust-creating parser for Internet protocols, permitting valid packets to pass through and dropping or actively rejecting malformed packets. Despite the prevalence of finite state machines (FSMs) in both protocol specifications and protocol implementations, they have gained little traction in parser code for such protocols. Typical reasons for avoiding the FSM computation model claim poor performance, poor scalability, poor expressibility, and difficult or time-consuming programming. In this research report, we present our motivations for and designs of finite state machines to parse a variety of existing Internet protocols, both binary and ASCII. Our hand-written parsers explicitly optimize around L1 cache hit latency, branch misprediction penalty, and program-wide memory overhead to achieve aggressive performance and scalability targets. Our work demonstrates that such parsers are, contrary to popular belief, sufficiently expressive for meaningful protocols, sufficiently performant for high-throughput applications, and sufficiently simple to construct and maintain. We hope that, in light of other research demonstrating the security benefits of such parsers over more complex, Turing-complete codes, our work serves as evidence that certain ``practical'' reasons for avoiding FSM-based parsers are invalid.","PeriodicalId":142224,"journal":{"name":"2014 IEEE Security and Privacy Workshops","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Security and Privacy Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPW.2014.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

A parser's job is to take unstructured, opaque data and convert it to a structured, semantically meaningful format. As such, parsers often operate at the border between untrusted data sources (e.g., the Internet) and the soft, chewy center of computer systems, where performance and security are paramount. A firewall, for instance, is precisely a trust-creating parser for Internet protocols, permitting valid packets to pass through and dropping or actively rejecting malformed packets. Despite the prevalence of finite state machines (FSMs) in both protocol specifications and protocol implementations, they have gained little traction in parser code for such protocols. Typical reasons for avoiding the FSM computation model claim poor performance, poor scalability, poor expressibility, and difficult or time-consuming programming. In this research report, we present our motivations for and designs of finite state machines to parse a variety of existing Internet protocols, both binary and ASCII. Our hand-written parsers explicitly optimize around L1 cache hit latency, branch misprediction penalty, and program-wide memory overhead to achieve aggressive performance and scalability targets. Our work demonstrates that such parsers are, contrary to popular belief, sufficiently expressive for meaningful protocols, sufficiently performant for high-throughput applications, and sufficiently simple to construct and maintain. We hope that, in light of other research demonstrating the security benefits of such parsers over more complex, Turing-complete codes, our work serves as evidence that certain ``practical'' reasons for avoiding FSM-based parsers are invalid.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

互联网协议的有限状态机解析:比你想象的要快

解析器的工作是获取非结构化、不透明的数据，并将其转换为结构化、语义上有意义的格式。因此，解析器经常在不受信任的数据源(例如Internet)和计算机系统的软的、有嚼劲的中心之间的边界上操作，在那里性能和安全性是至关重要的。例如，防火墙就是为Internet协议创建信任的解析器，它允许有效的数据包通过，并丢弃或主动拒绝不正确的数据包。尽管有限状态机(fsm)在协议规范和协议实现中都很流行，但它们在这类协议的解析器代码中却很少受到关注。避免使用FSM计算模型的典型原因是性能差、可扩展性差、可表达性差、编程困难或耗时。在本研究报告中，我们介绍了有限状态机的动机和设计，以解析各种现有的互联网协议，包括二进制和ASCII。我们手工编写的解析器明确地围绕L1缓存命中延迟、分支错误预测惩罚和程序范围内的内存开销进行优化，以实现积极的性能和可伸缩性目标。我们的工作表明，与普遍的看法相反，这样的解析器对于有意义的协议具有足够的表现力，对于高吞吐量的应用程序具有足够的性能，并且构造和维护足够简单。我们希望，鉴于其他研究证明了这种解析器相对于更复杂的图灵完备代码的安全性优势，我们的工作可以作为证据，证明避免基于fsm的解析器的某些“实际”理由是无效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2014 IEEE Security and Privacy Workshops

自引率

0.00%

发文量

期刊最新文献

Analysis of Unintentional Insider Threats Deriving from Social Engineering Exploits Detecting Unknown Insider Threat Scenarios Can We Identify NAT Behavior by Analyzing Traffic Flows? A Case Study in Malware Research Ethics Education: When Teaching Bad is Good Resilience as a New Enforcement Model for IT Security Based on Usage Control