Regular Expressions with Backreferences on Multiple Context-Free Languages, and the Closed-Star Condition

Taisei Nogami, Tachio Terauchi
{"title":"Regular Expressions with Backreferences on Multiple Context-Free Languages, and the Closed-Star Condition","authors":"Taisei Nogami, Tachio Terauchi","doi":"arxiv-2406.18918","DOIUrl":null,"url":null,"abstract":"Backreference is a well-known practical extension of regular expressions and\nmost modern programming languages, such as Java, Python, JavaScript and more,\nsupport regular expressions with backreferences (rewb) in their standard\nlibraries for string processing. A difficulty of backreference is\nnon-regularity: unlike some other extensions, backreference strictly enhances\nthe expressive power of regular expressions and thus rewbs can describe\nnon-regular (in fact, even non-context-free) languages. In this paper, we\ninvestigate the expressive power of rewbs by comparing rewbs to multiple\ncontext-free languages (MCFL) and parallel multiple context-free languages\n(PMCFL). First, we prove that the language class of rewbs is a proper subclass\nof unary-PMCFLs. The class of unary-PMCFLs coincides with that of EDT0L\nlanguages, and our result strictly improves the known upper bound of rewbs.\nAdditionally, we show that, however, the language class of rewbs is not\ncontained in that of MCFLs even when restricted to rewbs with only one\ncapturing group and no captured references. Therefore, in general, the\nparallelism seems essential for rewbs. Backed by these results, we define a\nnovel syntactic condition on rewbs that we call closed-star and observe that it\nprovides an upper bound on the number of times a rewb references the same\ncaptured string. The closed-star condition allows dispensing with the\nparallelism: that is, we prove that the language class of closed-star rewbs\nfalls inside the class of unary-MCFLs, which is equivalent to that of EDT0L\nsystems of finite index. Furthermore, as additional evidence for the robustness\nof the condition, we show that the language class of closed-star rewbs also\nfalls inside the class of nonerasing stack languages (NESL).","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"37 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Formal Languages and Automata Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.18918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Backreference is a well-known practical extension of regular expressions and most modern programming languages, such as Java, Python, JavaScript and more, support regular expressions with backreferences (rewb) in their standard libraries for string processing. A difficulty of backreference is non-regularity: unlike some other extensions, backreference strictly enhances the expressive power of regular expressions and thus rewbs can describe non-regular (in fact, even non-context-free) languages. In this paper, we investigate the expressive power of rewbs by comparing rewbs to multiple context-free languages (MCFL) and parallel multiple context-free languages (PMCFL). First, we prove that the language class of rewbs is a proper subclass of unary-PMCFLs. The class of unary-PMCFLs coincides with that of EDT0L languages, and our result strictly improves the known upper bound of rewbs. Additionally, we show that, however, the language class of rewbs is not contained in that of MCFLs even when restricted to rewbs with only one capturing group and no captured references. Therefore, in general, the parallelism seems essential for rewbs. Backed by these results, we define a novel syntactic condition on rewbs that we call closed-star and observe that it provides an upper bound on the number of times a rewb references the same captured string. The closed-star condition allows dispensing with the parallelism: that is, we prove that the language class of closed-star rewbs falls inside the class of unary-MCFLs, which is equivalent to that of EDT0L systems of finite index. Furthermore, as additional evidence for the robustness of the condition, we show that the language class of closed-star rewbs also falls inside the class of nonerasing stack languages (NESL).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多上下文自由语言上具有反向引用的正则表达式和闭星条件
反向引用是正则表达式的一个众所周知的实用扩展,大多数现代编程语言,如 Java、Python、JavaScript 等,都在其用于字符串处理的标准库中支持带有反向引用的正则表达式(rewb)。反向引用的难点在于非正则性:与其他一些扩展不同,反向引用严格增强了正则表达式的表达能力,因此反向引用可以描述非正则(事实上,甚至是非无上下文)语言。在本文中,我们通过比较 rewbs 与多无上下文语言(MCFL)和并行多无上下文语言(PMCFL),研究了 rewbs 的表达能力。首先,我们证明了 rewbs 的语言类是 unary-PMCFLs 的一个适当子类。此外,我们还证明了,即使仅限于只有一个捕获组且没有捕获引用的 rewbs,rewbs 的语言类也不包含在 MCFL 的语言类中。因此,一般来说,并行性似乎对 rewbs 至关重要。在这些结果的支持下,我们对 rewbs 定义了一个新的语法条件,我们称之为封闭星形条件,并观察到它为 rewbs 引用同一捕获字符串的次数提供了上限。封闭星条件允许免除并行性:也就是说,我们证明封闭星 rewbs 的语言类属于单音多音节词法(unary-MCFLs)的范畴,而单音多音节词法等同于有限索引的 EDT0L 系统。此外,作为该条件稳健性的额外证据,我们还证明了闭星反演的语言类也属于非递减栈语言(NESL)的范畴。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Query Learning of Advice and Nominal Automata Well-Behaved (Co)algebraic Semantics of Regular Expressions in Dafny Run supports and initial algebra supports of weighted automata Alternating hierarchy of sushifts defined by nondeterministic plane-walking automata $\mathbb{N}$-polyregular functions arise from well-quasi-orderings
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1