Ian Erik Varatalu, Margus Veanes, Juhan-Peep Ernits
{"title":"RE#:基于派生词的高性能 Regex 匹配(带交集、补全和查找功能","authors":"Ian Erik Varatalu, Margus Veanes, Juhan-Peep Ernits","doi":"arxiv-2407.20479","DOIUrl":null,"url":null,"abstract":"We present a tool and theory RE# for regular expression matching that is\nbuilt on symbolic derivatives, does not use backtracking, and, in addition to\nthe classical operators, also supports complement, intersection and\nlookarounds. We develop the theory formally and show that the main matching\nalgorithm has input-linear complexity both in theory as well as experimentally.\nWe apply thorough evaluation on popular benchmarks that show that RE# is over\n71% faster than the next fastest regex engine in Rust on the baseline, and\noutperforms all state-of-the-art engines on extensions of the benchmarks often\nby several orders of magnitude.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RE#: High Performance Derivative-Based Regex Matching with Intersection, Complement and Lookarounds\",\"authors\":\"Ian Erik Varatalu, Margus Veanes, Juhan-Peep Ernits\",\"doi\":\"arxiv-2407.20479\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a tool and theory RE# for regular expression matching that is\\nbuilt on symbolic derivatives, does not use backtracking, and, in addition to\\nthe classical operators, also supports complement, intersection and\\nlookarounds. We develop the theory formally and show that the main matching\\nalgorithm has input-linear complexity both in theory as well as experimentally.\\nWe apply thorough evaluation on popular benchmarks that show that RE# is over\\n71% faster than the next fastest regex engine in Rust on the baseline, and\\noutperforms all state-of-the-art engines on extensions of the benchmarks often\\nby several orders of magnitude.\",\"PeriodicalId\":501124,\"journal\":{\"name\":\"arXiv - CS - Formal Languages and Automata Theory\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Formal Languages and Automata Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.20479\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Formal Languages and Automata Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.20479","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
RE#: High Performance Derivative-Based Regex Matching with Intersection, Complement and Lookarounds
We present a tool and theory RE# for regular expression matching that is
built on symbolic derivatives, does not use backtracking, and, in addition to
the classical operators, also supports complement, intersection and
lookarounds. We develop the theory formally and show that the main matching
algorithm has input-linear complexity both in theory as well as experimentally.
We apply thorough evaluation on popular benchmarks that show that RE# is over
71% faster than the next fastest regex engine in Rust on the baseline, and
outperforms all state-of-the-art engines on extensions of the benchmarks often
by several orders of magnitude.