Ian Erik Varatalu, Margus Veanes, Juhan-Peep Ernits
{"title":"RE#: High Performance Derivative-Based Regex Matching with Intersection, Complement and Lookarounds","authors":"Ian Erik Varatalu, Margus Veanes, Juhan-Peep Ernits","doi":"arxiv-2407.20479","DOIUrl":null,"url":null,"abstract":"We present a tool and theory RE# for regular expression matching that is\nbuilt on symbolic derivatives, does not use backtracking, and, in addition to\nthe classical operators, also supports complement, intersection and\nlookarounds. We develop the theory formally and show that the main matching\nalgorithm has input-linear complexity both in theory as well as experimentally.\nWe apply thorough evaluation on popular benchmarks that show that RE# is over\n71% faster than the next fastest regex engine in Rust on the baseline, and\noutperforms all state-of-the-art engines on extensions of the benchmarks often\nby several orders of magnitude.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Formal Languages and Automata Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.20479","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We present a tool and theory RE# for regular expression matching that is
built on symbolic derivatives, does not use backtracking, and, in addition to
the classical operators, also supports complement, intersection and
lookarounds. We develop the theory formally and show that the main matching
algorithm has input-linear complexity both in theory as well as experimentally.
We apply thorough evaluation on popular benchmarks that show that RE# is over
71% faster than the next fastest regex engine in Rust on the baseline, and
outperforms all state-of-the-art engines on extensions of the benchmarks often
by several orders of magnitude.