{"title":"Finite State Machine Parsing for Internet Protocols: Faster Than You Think","authors":"R. Graham, Peter C. Johnson","doi":"10.1109/SPW.2014.34","DOIUrl":null,"url":null,"abstract":"A parser's job is to take unstructured, opaque data and convert it to a structured, semantically meaningful format. As such, parsers often operate at the border between untrusted data sources (e.g., the Internet) and the soft, chewy center of computer systems, where performance and security are paramount. A firewall, for instance, is precisely a trust-creating parser for Internet protocols, permitting valid packets to pass through and dropping or actively rejecting malformed packets. Despite the prevalence of finite state machines (FSMs) in both protocol specifications and protocol implementations, they have gained little traction in parser code for such protocols. Typical reasons for avoiding the FSM computation model claim poor performance, poor scalability, poor expressibility, and difficult or time-consuming programming. In this research report, we present our motivations for and designs of finite state machines to parse a variety of existing Internet protocols, both binary and ASCII. Our hand-written parsers explicitly optimize around L1 cache hit latency, branch misprediction penalty, and program-wide memory overhead to achieve aggressive performance and scalability targets. Our work demonstrates that such parsers are, contrary to popular belief, sufficiently expressive for meaningful protocols, sufficiently performant for high-throughput applications, and sufficiently simple to construct and maintain. We hope that, in light of other research demonstrating the security benefits of such parsers over more complex, Turing-complete codes, our work serves as evidence that certain ``practical'' reasons for avoiding FSM-based parsers are invalid.","PeriodicalId":142224,"journal":{"name":"2014 IEEE Security and Privacy Workshops","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Security and Privacy Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPW.2014.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
A parser's job is to take unstructured, opaque data and convert it to a structured, semantically meaningful format. As such, parsers often operate at the border between untrusted data sources (e.g., the Internet) and the soft, chewy center of computer systems, where performance and security are paramount. A firewall, for instance, is precisely a trust-creating parser for Internet protocols, permitting valid packets to pass through and dropping or actively rejecting malformed packets. Despite the prevalence of finite state machines (FSMs) in both protocol specifications and protocol implementations, they have gained little traction in parser code for such protocols. Typical reasons for avoiding the FSM computation model claim poor performance, poor scalability, poor expressibility, and difficult or time-consuming programming. In this research report, we present our motivations for and designs of finite state machines to parse a variety of existing Internet protocols, both binary and ASCII. Our hand-written parsers explicitly optimize around L1 cache hit latency, branch misprediction penalty, and program-wide memory overhead to achieve aggressive performance and scalability targets. Our work demonstrates that such parsers are, contrary to popular belief, sufficiently expressive for meaningful protocols, sufficiently performant for high-throughput applications, and sufficiently simple to construct and maintain. We hope that, in light of other research demonstrating the security benefits of such parsers over more complex, Turing-complete codes, our work serves as evidence that certain ``practical'' reasons for avoiding FSM-based parsers are invalid.