The Complexity of Aggregates over Extractions by Regular Expressions

IF 0.6 4区 数学 Q4 COMPUTER SCIENCE, THEORY & METHODS Logical Methods in Computer Science Pub Date : 2023-08-09 DOI:10.46298/lmcs-19(3:12)2023
Johannes Doleschal, Noa Bratman, Benny Kimelfeld, Wim Martens
{"title":"The Complexity of Aggregates over Extractions by Regular Expressions","authors":"Johannes Doleschal, Noa Bratman, Benny Kimelfeld, Wim Martens","doi":"10.46298/lmcs-19(3:12)2023","DOIUrl":null,"url":null,"abstract":"Regular expressions with capture variables, also known as regex-formulas, extract relations of spans (intervals identified by their start and end indices) from text. In turn, the class of regular document spanners is the closure of the regex formulas under the Relational Algebra. We investigate the computational complexity of querying text by aggregate functions, such as sum, average, and quantile, on top of regular document spanners. To this end, we formally define aggregate functions over regular document spanners and analyze the computational complexity of exact and approximate computation. More precisely, we show that in a restricted case, all studied aggregate functions can be computed in polynomial time. In general, however, even though exact computation is intractable, some aggregates can still be approximated with fully polynomial-time randomized approximation schemes (FPRAS).","PeriodicalId":49904,"journal":{"name":"Logical Methods in Computer Science","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Logical Methods in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46298/lmcs-19(3:12)2023","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 1

Abstract

Regular expressions with capture variables, also known as regex-formulas, extract relations of spans (intervals identified by their start and end indices) from text. In turn, the class of regular document spanners is the closure of the regex formulas under the Relational Algebra. We investigate the computational complexity of querying text by aggregate functions, such as sum, average, and quantile, on top of regular document spanners. To this end, we formally define aggregate functions over regular document spanners and analyze the computational complexity of exact and approximate computation. More precisely, we show that in a restricted case, all studied aggregate functions can be computed in polynomial time. In general, however, even though exact computation is intractable, some aggregates can still be approximated with fully polynomial-time randomized approximation schemes (FPRAS).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
正则表达式提取聚合的复杂性
带有捕获变量的正则表达式(也称为正则表达式公式)从文本中提取跨度(由其开始和结束索引标识的间隔)的关系。反过来,常规文档生成器类是关系代数下正则表达式公式的闭包。我们研究了在常规文档生成器之上通过聚合函数(如sum、average和quantile)查询文本的计算复杂性。为此,我们正式定义了常规文档生成器的聚合函数,并分析了精确计算和近似计算的计算复杂度。更准确地说,我们证明了在一个有限的情况下,所有研究的聚合函数都可以在多项式时间内计算出来。然而,在一般情况下,即使精确计算是难以处理的,一些聚合仍然可以用完全多项式时间随机近似方案(FPRAS)进行近似。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Logical Methods in Computer Science
Logical Methods in Computer Science 工程技术-计算机:理论方法
CiteScore
1.80
自引率
0.00%
发文量
105
审稿时长
6-12 weeks
期刊介绍: Logical Methods in Computer Science is a fully refereed, open access, free, electronic journal. It welcomes papers on theoretical and practical areas in computer science involving logical methods, taken in a broad sense; some particular areas within its scope are listed below. Papers are refereed in the traditional way, with two or more referees per paper. Copyright is retained by the author. Topics of Logical Methods in Computer Science: Algebraic methods Automata and logic Automated deduction Categorical models and logic Coalgebraic methods Computability and Logic Computer-aided verification Concurrency theory Constraint programming Cyber-physical systems Database theory Defeasible reasoning Domain theory Emerging topics: Computational systems in biology Emerging topics: Quantum computation and logic Finite model theory Formalized mathematics Functional programming and lambda calculus Inductive logic and learning Interactive proof checking Logic and algorithms Logic and complexity Logic and games Logic and probability Logic for knowledge representation Logic programming Logics of programs Modal and temporal logics Program analysis and type checking Program development and specification Proof complexity Real time and hybrid systems Reasoning about actions and planning Satisfiability Security Semantics of programming languages Term rewriting and equational logic Type theory and constructive mathematics.
期刊最新文献
Node Replication: Theory And Practice A categorical characterization of relative entropy on standard Borel spaces The Power-Set Construction for Tree Algebras Token Games and History-Deterministic Quantitative-Automata A coherent differential PCF
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1