互联网协议的有限状态机解析:比你想象的要快

2014 IEEE Security and Privacy Workshops Pub Date : 2014-05-17 DOI:10.1109/SPW.2014.34

R. Graham, Peter C. Johnson

{"title":"互联网协议的有限状态机解析:比你想象的要快","authors":"R. Graham, Peter C. Johnson","doi":"10.1109/SPW.2014.34","DOIUrl":null,"url":null,"abstract":"A parser's job is to take unstructured, opaque data and convert it to a structured, semantically meaningful format. As such, parsers often operate at the border between untrusted data sources (e.g., the Internet) and the soft, chewy center of computer systems, where performance and security are paramount. A firewall, for instance, is precisely a trust-creating parser for Internet protocols, permitting valid packets to pass through and dropping or actively rejecting malformed packets. Despite the prevalence of finite state machines (FSMs) in both protocol specifications and protocol implementations, they have gained little traction in parser code for such protocols. Typical reasons for avoiding the FSM computation model claim poor performance, poor scalability, poor expressibility, and difficult or time-consuming programming. In this research report, we present our motivations for and designs of finite state machines to parse a variety of existing Internet protocols, both binary and ASCII. Our hand-written parsers explicitly optimize around L1 cache hit latency, branch misprediction penalty, and program-wide memory overhead to achieve aggressive performance and scalability targets. Our work demonstrates that such parsers are, contrary to popular belief, sufficiently expressive for meaningful protocols, sufficiently performant for high-throughput applications, and sufficiently simple to construct and maintain. We hope that, in light of other research demonstrating the security benefits of such parsers over more complex, Turing-complete codes, our work serves as evidence that certain ``practical'' reasons for avoiding FSM-based parsers are invalid.","PeriodicalId":142224,"journal":{"name":"2014 IEEE Security and Privacy Workshops","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Finite State Machine Parsing for Internet Protocols: Faster Than You Think\",\"authors\":\"R. Graham, Peter C. Johnson\",\"doi\":\"10.1109/SPW.2014.34\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A parser's job is to take unstructured, opaque data and convert it to a structured, semantically meaningful format. As such, parsers often operate at the border between untrusted data sources (e.g., the Internet) and the soft, chewy center of computer systems, where performance and security are paramount. A firewall, for instance, is precisely a trust-creating parser for Internet protocols, permitting valid packets to pass through and dropping or actively rejecting malformed packets. Despite the prevalence of finite state machines (FSMs) in both protocol specifications and protocol implementations, they have gained little traction in parser code for such protocols. Typical reasons for avoiding the FSM computation model claim poor performance, poor scalability, poor expressibility, and difficult or time-consuming programming. In this research report, we present our motivations for and designs of finite state machines to parse a variety of existing Internet protocols, both binary and ASCII. Our hand-written parsers explicitly optimize around L1 cache hit latency, branch misprediction penalty, and program-wide memory overhead to achieve aggressive performance and scalability targets. Our work demonstrates that such parsers are, contrary to popular belief, sufficiently expressive for meaningful protocols, sufficiently performant for high-throughput applications, and sufficiently simple to construct and maintain. We hope that, in light of other research demonstrating the security benefits of such parsers over more complex, Turing-complete codes, our work serves as evidence that certain ``practical'' reasons for avoiding FSM-based parsers are invalid.\",\"PeriodicalId\":142224,\"journal\":{\"name\":\"2014 IEEE Security and Privacy Workshops\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE Security and Privacy Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPW.2014.34\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Security and Privacy Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPW.2014.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

解析器的工作是获取非结构化、不透明的数据，并将其转换为结构化、语义上有意义的格式。因此，解析器经常在不受信任的数据源(例如Internet)和计算机系统的软的、有嚼劲的中心之间的边界上操作，在那里性能和安全性是至关重要的。例如，防火墙就是为Internet协议创建信任的解析器，它允许有效的数据包通过，并丢弃或主动拒绝不正确的数据包。尽管有限状态机(fsm)在协议规范和协议实现中都很流行，但它们在这类协议的解析器代码中却很少受到关注。避免使用FSM计算模型的典型原因是性能差、可扩展性差、可表达性差、编程困难或耗时。在本研究报告中，我们介绍了有限状态机的动机和设计，以解析各种现有的互联网协议，包括二进制和ASCII。我们手工编写的解析器明确地围绕L1缓存命中延迟、分支错误预测惩罚和程序范围内的内存开销进行优化，以实现积极的性能和可伸缩性目标。我们的工作表明，与普遍的看法相反，这样的解析器对于有意义的协议具有足够的表现力，对于高吞吐量的应用程序具有足够的性能，并且构造和维护足够简单。我们希望，鉴于其他研究证明了这种解析器相对于更复杂的图灵完备代码的安全性优势，我们的工作可以作为证据，证明避免基于fsm的解析器的某些“实际”理由是无效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Finite State Machine Parsing for Internet Protocols: Faster Than You Think

A parser's job is to take unstructured, opaque data and convert it to a structured, semantically meaningful format. As such, parsers often operate at the border between untrusted data sources (e.g., the Internet) and the soft, chewy center of computer systems, where performance and security are paramount. A firewall, for instance, is precisely a trust-creating parser for Internet protocols, permitting valid packets to pass through and dropping or actively rejecting malformed packets. Despite the prevalence of finite state machines (FSMs) in both protocol specifications and protocol implementations, they have gained little traction in parser code for such protocols. Typical reasons for avoiding the FSM computation model claim poor performance, poor scalability, poor expressibility, and difficult or time-consuming programming. In this research report, we present our motivations for and designs of finite state machines to parse a variety of existing Internet protocols, both binary and ASCII. Our hand-written parsers explicitly optimize around L1 cache hit latency, branch misprediction penalty, and program-wide memory overhead to achieve aggressive performance and scalability targets. Our work demonstrates that such parsers are, contrary to popular belief, sufficiently expressive for meaningful protocols, sufficiently performant for high-throughput applications, and sufficiently simple to construct and maintain. We hope that, in light of other research demonstrating the security benefits of such parsers over more complex, Turing-complete codes, our work serves as evidence that certain ``practical'' reasons for avoiding FSM-based parsers are invalid.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助