Johannes Doleschal, Noa Bratman, Benny Kimelfeld, Wim Martens
{"title":"The Complexity of Aggregates over Extractions by Regular Expressions","authors":"Johannes Doleschal, Noa Bratman, Benny Kimelfeld, Wim Martens","doi":"10.46298/lmcs-19(3:12)2023","DOIUrl":null,"url":null,"abstract":"Regular expressions with capture variables, also known as regex-formulas, extract relations of spans (intervals identified by their start and end indices) from text. In turn, the class of regular document spanners is the closure of the regex formulas under the Relational Algebra. We investigate the computational complexity of querying text by aggregate functions, such as sum, average, and quantile, on top of regular document spanners. To this end, we formally define aggregate functions over regular document spanners and analyze the computational complexity of exact and approximate computation. More precisely, we show that in a restricted case, all studied aggregate functions can be computed in polynomial time. In general, however, even though exact computation is intractable, some aggregates can still be approximated with fully polynomial-time randomized approximation schemes (FPRAS).","PeriodicalId":49904,"journal":{"name":"Logical Methods in Computer Science","volume":"8 1","pages":"0"},"PeriodicalIF":0.6000,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Logical Methods in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46298/lmcs-19(3:12)2023","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 1
Abstract
Regular expressions with capture variables, also known as regex-formulas, extract relations of spans (intervals identified by their start and end indices) from text. In turn, the class of regular document spanners is the closure of the regex formulas under the Relational Algebra. We investigate the computational complexity of querying text by aggregate functions, such as sum, average, and quantile, on top of regular document spanners. To this end, we formally define aggregate functions over regular document spanners and analyze the computational complexity of exact and approximate computation. More precisely, we show that in a restricted case, all studied aggregate functions can be computed in polynomial time. In general, however, even though exact computation is intractable, some aggregates can still be approximated with fully polynomial-time randomized approximation schemes (FPRAS).
期刊介绍:
Logical Methods in Computer Science is a fully refereed, open access, free, electronic journal. It welcomes papers on theoretical and practical areas in computer science involving logical methods, taken in a broad sense; some particular areas within its scope are listed below. Papers are refereed in the traditional way, with two or more referees per paper. Copyright is retained by the author.
Topics of Logical Methods in Computer Science:
Algebraic methods
Automata and logic
Automated deduction
Categorical models and logic
Coalgebraic methods
Computability and Logic
Computer-aided verification
Concurrency theory
Constraint programming
Cyber-physical systems
Database theory
Defeasible reasoning
Domain theory
Emerging topics: Computational systems in biology
Emerging topics: Quantum computation and logic
Finite model theory
Formalized mathematics
Functional programming and lambda calculus
Inductive logic and learning
Interactive proof checking
Logic and algorithms
Logic and complexity
Logic and games
Logic and probability
Logic for knowledge representation
Logic programming
Logics of programs
Modal and temporal logics
Program analysis and type checking
Program development and specification
Proof complexity
Real time and hybrid systems
Reasoning about actions and planning
Satisfiability
Security
Semantics of programming languages
Term rewriting and equational logic
Type theory and constructive mathematics.