arXiv - CS - Formal Languages and Automata Theory最新文献

英文中文

History-Determinism vs Fair Simulation 历史决定论与公平模拟

arXiv - CS - Formal Languages and Automata Theory

Pub Date : 2024-07-11 DOI: arxiv-2407.08620

Udi Boker, Thomas A. Henzinger, Karoliina Lehtinen, Aditya Prakash

An automaton is history-deterministic if its nondeterminism can be resolvedon the fly, only using the prefix of the word read so far. This mild form ofnondeterminism has attracted particular attention for its applications insynthesis problems. An automaton $A$ is guidable with respect to a class $C$ ofautomata if it can fairly simulate every automaton in $C$ whose language iscontained in that of $A$. In other words, guidable automata are those for whichinclusion and simulation coincide, making them particularly interesting formodel-checking. We study the connection between these two notions, and specifically thequestion of when they coincide. For classes of automata on which they do,deciding guidability, an otherwise challenging decision problem, reduces todeciding history-determinism, a problem that is starting to be well-understoodfor many classes. We provide a selection of sufficient criteria for a class of automata toguarantee the coincidence of the notions, and use them to show that the notionscoincide for the most common automata classes, among which are $omega$-regularautomata and many infinite-state automata with safety and reachabilityacceptance conditions, including vector addition systems with states,one-counter nets, pushdown-, Parikh-, and timed-automata. We also demonstrate that history-determinism and guidability do not alwayscoincide, for example, for the classes of timed automata with a fixed number ofclocks.

如果一个自动机的非确定性可以在运行中解决，只需使用迄今为止读取的词的前缀，那么这个自动机就是历史决定论的。这种温和的非确定性形式在合成问题中的应用引起了特别的关注。如果一个自动机 $A$ 可以公平地模拟 $C$ 中语言包含在 $A$ 语言中的每一个自动机，那么相对于一类自动机 $C$ 来说，这个自动机 $A$ 是可指导的。换句话说，可指导自动机是那些包含和模拟相吻合的自动机，这使它们成为特别有趣的模型检查形式。我们研究这两个概念之间的联系，特别是它们何时重合的问题。对于这两个概念重合的自动机类别，决定可指导性这个原本具有挑战性的决策问题，可以简化为决定历史决定性，而这个问题在很多类别中都开始得到很好的理解。我们为一类自动机提供了一些充分标准，以保证这些概念的吻合，并用它们来证明这些概念对于最常见的自动机类是吻合的，其中包括$omega$-regular自动机和许多具有安全和可达性接受条件的无穷态自动机，包括具有状态的向量加法系统、单计数器网、下推式自动机、帕里克式自动机和定时自动机。我们还证明了历史决定论和可指导性并不总是一致的，例如，对于具有固定时钟数的定时自动机类。

{"title":"History-Determinism vs Fair Simulation","authors":"Udi Boker, Thomas A. Henzinger, Karoliina Lehtinen, Aditya Prakash","doi":"arxiv-2407.08620","DOIUrl":"https://doi.org/arxiv-2407.08620","url":null,"abstract":"An automaton is history-deterministic if its nondeterminism can be resolved\u0000on the fly, only using the prefix of the word read so far. This mild form of\u0000nondeterminism has attracted particular attention for its applications in\u0000synthesis problems. An automaton $A$ is guidable with respect to a class $C$ of\u0000automata if it can fairly simulate every automaton in $C$ whose language is\u0000contained in that of $A$. In other words, guidable automata are those for which\u0000inclusion and simulation coincide, making them particularly interesting for\u0000model-checking. We study the connection between these two notions, and specifically the\u0000question of when they coincide. For classes of automata on which they do,\u0000deciding guidability, an otherwise challenging decision problem, reduces to\u0000deciding history-determinism, a problem that is starting to be well-understood\u0000for many classes. We provide a selection of sufficient criteria for a class of automata to\u0000guarantee the coincidence of the notions, and use them to show that the notions\u0000coincide for the most common automata classes, among which are $omega$-regular\u0000automata and many infinite-state automata with safety and reachability\u0000acceptance conditions, including vector addition systems with states,\u0000one-counter nets, pushdown-, Parikh-, and timed-automata. We also demonstrate that history-determinism and guidability do not always\u0000coincide, for example, for the classes of timed automata with a fixed number of\u0000clocks.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141614300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automata-based constraints for language model decoding 基于自动机的语言模型解码约束

arXiv - CS - Formal Languages and Automata Theory

Pub Date : 2024-07-11 DOI: arxiv-2407.08103

Terry Koo, Frederick Liu, Luheng He

LMs are often expected to generate strings in some formal language; forexample, structured data, API calls, or code snippets. Although LMs can betuned to improve their adherence to formal syntax, this does not guaranteeconformance, especially with smaller LMs suitable for large-scale deployment.In addition, tuning requires significant resources, making it impractical foruncommon or task-specific formats. To prevent downstream parsing errors wewould ideally constrain the LM to only produce valid output, but this isseverely complicated by tokenization, which is typically both ambiguous andmisaligned with the formal grammar. We solve these issues through theapplication of automata theory, deriving an efficient closed-form solution forthe regular languages, a broad class of formal languages with many practicalapplications, including API calls or schema-guided JSON and YAML. We alsodiscuss pragmatic extensions for coping with the issue of high branchingfactor. Finally, we extend our techniques to deterministic context-freelanguages, which similarly admit an efficient closed-form solution. In spite ofits flexibility and representative power, our approach only requires access toper-token decoding logits and lowers into simple calculations that areindependent of LM size, making it both efficient and easy to apply to almostany LM architecture.

人们通常希望 LM 以某种正式语言生成字符串；例如，结构化数据、API 调用或代码片段。虽然可以对 LM 进行调整，使其更符合正式语法，但这并不能保证其性能，尤其是适合大规模部署的小型 LM。为了防止下游解析错误，我们最好限制 LM 只产生有效输出，但标记化会使这一问题变得非常复杂，因为标记化通常既模棱两可，又与正式语法不一致。我们通过应用自动机理论解决了这些问题，为正则表达式语言推导出了一个高效的闭式解决方案，正则表达式语言是一类广泛的形式语言，有很多实际应用，包括 API 调用或模式引导的 JSON 和 YAML。我们还讨论了应对高分支因子问题的实用扩展。最后，我们将我们的技术扩展到确定性上下文无关语言，这些语言同样允许高效的闭式解决方案。尽管我们的方法具有灵活性和代表性，但它只需要访问标记解码对数，并简化为与 LM 大小无关的简单计算，因此它既高效又易于应用于几乎所有 LM 架构。

{"title":"Automata-based constraints for language model decoding","authors":"Terry Koo, Frederick Liu, Luheng He","doi":"arxiv-2407.08103","DOIUrl":"https://doi.org/arxiv-2407.08103","url":null,"abstract":"LMs are often expected to generate strings in some formal language; for\u0000example, structured data, API calls, or code snippets. Although LMs can be\u0000tuned to improve their adherence to formal syntax, this does not guarantee\u0000conformance, especially with smaller LMs suitable for large-scale deployment.\u0000In addition, tuning requires significant resources, making it impractical for\u0000uncommon or task-specific formats. To prevent downstream parsing errors we\u0000would ideally constrain the LM to only produce valid output, but this is\u0000severely complicated by tokenization, which is typically both ambiguous and\u0000misaligned with the formal grammar. We solve these issues through the\u0000application of automata theory, deriving an efficient closed-form solution for\u0000the regular languages, a broad class of formal languages with many practical\u0000applications, including API calls or schema-guided JSON and YAML. We also\u0000discuss pragmatic extensions for coping with the issue of high branching\u0000factor. Finally, we extend our techniques to deterministic context-free\u0000languages, which similarly admit an efficient closed-form solution. In spite of\u0000its flexibility and representative power, our approach only requires access to\u0000per-token decoding logits and lowers into simple calculations that are\u0000independent of LM size, making it both efficient and easy to apply to almost\u0000any LM architecture.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141614319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

More on Maximally Permissive Similarity Control of Discrete Event Systems 离散事件系统最大允许相似性控制的更多信息

arXiv - CS - Formal Languages and Automata Theory

Pub Date : 2024-07-10 DOI: arxiv-2407.08068

Yu Wang, Zhaohui Zhu, Rob van Glabbeek, Jinjin Zhang, Lixing Tan

Takai proposed a method for constructing a maximally permissive supervisorfor the similarity control problem (IEEE Transactions on Automatic Control,66(7):3197-3204, 2021). This paper points out flaws in his results by providinga counterexample. Inspired by Takai's construction, the notion of a (saturated)(G, R)-automaton is introduced and metatheorems concerning (maximallypermissive) supervisors for the similarity control problem are provided interms of this notion. As an application of these metatheorems, the flaws inTakai's work are corrected.

Takai 提出了一种为相似性控制问题构建最大许可监督器的方法（IEEE Transactions on Automatic Control，66(7):3197-3204, 2021）。本文提供了一个反例，指出了其结果的缺陷。受高井构建的启发，本文引入了（饱和）(G, R)-自变量的概念，并根据这一概念提供了关于相似性控制问题（最大容许）监督器的元定理。作为这些元定理的应用，高井工作中的缺陷得到了纠正。

引用次数: 0

Generalized Parikh Matrices For Tracking Subsequence Occurrences 用于跟踪后续出现的广义帕里克矩阵

arXiv - CS - Formal Languages and Automata Theory

Pub Date : 2024-07-05 DOI: arxiv-2407.04462

Szilárd Zsolt Fazekas, Xinhao Huang

We introduce and study a generalized Parikh matrix mapping based on trackingthe occurrence counts of special types of subsequences. These matrices retainmore information about a word than the original Parikh matrix mapping whilepreserving the homomorphic property. We build the generalization by firstintroducing the Parikh factor matrix mapping and extend it to the Parikhsequence matrix mapping. We establish an interesting connection between thegeneralized Parikh matrices and the original ones and use it to prove thatcertain important minors of a Parikh sequence matrix have nonnegativedeterminant. Finally, we generalize the concept of subword histories and showthat each generalized subword history is equivalent to a linear one.

我们介绍并研究了一种基于跟踪特殊类型子序列出现次数的广义 Parikh 矩阵映射。与原始的帕里克矩阵映射相比，这些矩阵保留了更多的单词信息，同时保留了同态属性。我们首先介绍了帕里克因子矩阵映射，然后将其扩展到帕里克序列矩阵映射，从而建立了这种泛化方法。我们在广义的帕里克矩阵和原始矩阵之间建立了有趣的联系，并用它证明了帕里克序列矩阵的某些重要最小值具有非负的决定性。最后，我们概括了子字历史的概念，并证明每个广义子字历史等价于线性子字历史。

引用次数: 0

Complex Event Recognition with Symbolic Register Transducers: Extended Technical Report 利用符号寄存器转换器识别复杂事件：扩展技术报告

arXiv - CS - Formal Languages and Automata Theory

Pub Date : 2024-07-03 DOI: arxiv-2407.02884

Elias Alevizos, Alexander Artikis, Georgios Paliouras

We present a system for Complex Event Recognition (CER) based on automata.While multiple such systems have been described in the literature, theytypically suffer from a lack of clear and denotational semantics, a limitationwhich often leads to confusion with respect to their expressive power. In orderto address this issue, our system is based on an automaton model which is acombination of symbolic and register automata. We extend previous work on thesetypes of automata, in order to construct a formalism with clear semantics and acorresponding automaton model whose properties can be formally investigated. Wecall such automata Symbolic Register Transducers (SRT). We show that SRT areclosed under various operators, but are not in general closed under complementand they are not determinizable. However, they are closed under theseoperations when a window operator, quintessential in Complex Event Recognition,is used. We show how SRT can be used in CER in order to detect patterns uponstreams of events, using our framework that provides declarative andcompositional semantics, and that allows for a systematic treatment of suchautomata. For SRT to work in pattern detection, we allow them to mark eventsfrom the input stream as belonging to a complex event or not, hence the name"transducers". We also present an implementation of SRT which can perform CER.We compare our SRT-based CER engine against other state-of-the-art CER systemsand show that it is both more expressive and more efficient.

虽然文献中已经描述了多个此类系统，但它们通常都缺乏清晰的指称语义，这种局限性常常导致其表达能力方面的混乱。为了解决这个问题，我们的系统基于一种自动机模型，它是符号自动机和寄存器自动机的结合。我们扩展了之前在自动机类型方面的工作，以构建一种具有清晰语义的形式主义和相应的自动机模型，其特性可以被正式研究。我们称这种自动机为符号寄存器转换器（SRT）。我们证明，SRT 在各种算子下都是封闭的，但在补码下一般不是封闭的，它们不可确定。不过，当使用复杂事件识别中最重要的窗口算子时，它们在这些算子下是封闭的。我们展示了如何在 CER 中使用 SRT 来检测事件流中的模式，我们的框架提供了声明性和组合性语义，并允许系统地处理此类自变量。为了让 SRT 在模式检测中发挥作用，我们允许它们将输入流中的事件标记为属于或不属于复杂事件，这就是 "变换器 "这一名称的由来。我们将基于 SRT 的 CER 引擎与其他最先进的 CER 系统进行了比较，结果表明 SRT 的表现力更强，效率更高。

{"title":"Complex Event Recognition with Symbolic Register Transducers: Extended Technical Report","authors":"Elias Alevizos, Alexander Artikis, Georgios Paliouras","doi":"arxiv-2407.02884","DOIUrl":"https://doi.org/arxiv-2407.02884","url":null,"abstract":"We present a system for Complex Event Recognition (CER) based on automata.\u0000While multiple such systems have been described in the literature, they\u0000typically suffer from a lack of clear and denotational semantics, a limitation\u0000which often leads to confusion with respect to their expressive power. In order\u0000to address this issue, our system is based on an automaton model which is a\u0000combination of symbolic and register automata. We extend previous work on these\u0000types of automata, in order to construct a formalism with clear semantics and a\u0000corresponding automaton model whose properties can be formally investigated. We\u0000call such automata Symbolic Register Transducers (SRT). We show that SRT are\u0000closed under various operators, but are not in general closed under complement\u0000and they are not determinizable. However, they are closed under these\u0000operations when a window operator, quintessential in Complex Event Recognition,\u0000is used. We show how SRT can be used in CER in order to detect patterns upon\u0000streams of events, using our framework that provides declarative and\u0000compositional semantics, and that allows for a systematic treatment of such\u0000automata. For SRT to work in pattern detection, we allow them to mark events\u0000from the input stream as belonging to a complex event or not, hence the name\u0000\"transducers\". We also present an implementation of SRT which can perform CER.\u0000We compare our SRT-based CER engine against other state-of-the-art CER systems\u0000and show that it is both more expressive and more efficient.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141546773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts TheoremLlama：将通用LLM转化为精益4专家

arXiv - CS - Formal Languages and Automata Theory

Pub Date : 2024-07-03 DOI: arxiv-2407.03203

Ruida Wang, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, Tong Zhang

Proving mathematical theorems using computer-verifiable formal languages likeLean significantly impacts mathematical reasoning. One approach to formaltheorem proving involves generating complete proofs using Large Language Models(LLMs) based on Natural Language (NL) proofs. Similar methods have shownpromising results in code generation. However, most modern LLMs exhibitsuboptimal performance due to the scarcity of aligned NL and Formal Language(FL) theorem-proving data. This scarcity results in a paucity of methodologiesfor training LLMs and techniques to fully utilize their capabilities incomposing formal proofs. To address the challenges, this paper proposes**TheoremLlama**, an end-to-end framework to train a general-purpose LLM tobecome a Lean4 expert. This framework encompasses NL-FL aligned datasetgeneration methods, training approaches for the LLM formal theorem prover, andtechniques for LLM Lean4 proof writing. Using the dataset generation method, weprovide *Open Bootstrapped Theorems* (OBT), an NL-FL aligned and bootstrappeddataset. A key innovation in this framework is the NL-FL bootstrapping method,where NL proofs are integrated into Lean4 code for training datasets,leveraging the NL reasoning ability of LLMs for formal reasoning. The**TheoremLlama** framework achieves cumulative accuracies of 36.48% and 33.61%on MiniF2F-Valid and Test datasets respectively, surpassing the GPT-4 baselineof 22.95% and 25.41%. We have also open-sourced our model checkpoints andgenerated dataset, and will soon make all the code publicly available.

使用计算机可验证的形式语言（如里昂语言）证明数学定理对数学推理产生了重大影响。形式化定理证明的一种方法是使用基于自然语言（NL）证明的大型语言模型（LLM）生成完整的证明。类似的方法在代码生成方面也取得了可喜的成果。然而，由于对齐的自然语言和形式语言（FL）定理证明数据稀缺，大多数现代 LLMs 都表现出次优性能。这种稀缺性导致缺乏训练 LLM 的方法和技术，无法充分利用 LLM 的能力来组合形式化证明。为了应对这些挑战，本文提出了***TheoremLlama***，这是一个端到端框架，用于训练通用LLM成为Lean4专家。该框架包括 NL-FL 对齐数据集生成方法、LLM 形式定理证明器的训练方法以及 LLM Lean4 证明编写技术。利用数据集生成方法，我们提供了*开放引导定理*（OBT），一个NL-FL对齐和引导数据集。该框架的一个关键创新是NL-FL引导方法，即把NL证明集成到用于训练数据集的Lean4代码中，利用LLM的NL推理能力进行形式推理。在MiniF2F-Valid和测试数据集上，**TheoremLlama**框架的累计准确率分别达到了36.48%和33.61%，超过了GPT-4基准线的22.95%和25.41%。我们还开源了模型检查点和生成的数据集，并将很快公开所有代码。

{"title":"TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts","authors":"Ruida Wang, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, Tong Zhang","doi":"arxiv-2407.03203","DOIUrl":"https://doi.org/arxiv-2407.03203","url":null,"abstract":"Proving mathematical theorems using computer-verifiable formal languages like\u0000Lean significantly impacts mathematical reasoning. One approach to formal\u0000theorem proving involves generating complete proofs using Large Language Models\u0000(LLMs) based on Natural Language (NL) proofs. Similar methods have shown\u0000promising results in code generation. However, most modern LLMs exhibit\u0000suboptimal performance due to the scarcity of aligned NL and Formal Language\u0000(FL) theorem-proving data. This scarcity results in a paucity of methodologies\u0000for training LLMs and techniques to fully utilize their capabilities in\u0000composing formal proofs. To address the challenges, this paper proposes\u0000**TheoremLlama**, an end-to-end framework to train a general-purpose LLM to\u0000become a Lean4 expert. This framework encompasses NL-FL aligned dataset\u0000generation methods, training approaches for the LLM formal theorem prover, and\u0000techniques for LLM Lean4 proof writing. Using the dataset generation method, we\u0000provide *Open Bootstrapped Theorems* (OBT), an NL-FL aligned and bootstrapped\u0000dataset. A key innovation in this framework is the NL-FL bootstrapping method,\u0000where NL proofs are integrated into Lean4 code for training datasets,\u0000leveraging the NL reasoning ability of LLMs for formal reasoning. The\u0000**TheoremLlama** framework achieves cumulative accuracies of 36.48% and 33.61%\u0000on MiniF2F-Valid and Test datasets respectively, surpassing the GPT-4 baseline\u0000of 22.95% and 25.41%. We have also open-sourced our model checkpoints and\u0000generated dataset, and will soon make all the code publicly available.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141546772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Monads, Comonads, and Transducers 单子、公元和变换器

arXiv - CS - Formal Languages and Automata Theory

Pub Date : 2024-07-02 DOI: arxiv-2407.02704

Rafał Stefański

This paper proposes a definition of recognizable transducers over monads andcomonads, which bridges two important ongoing efforts in the current researchon regularity. The first effort is the study of regular transductions, whichextends the notion of regularity from languages into word-to-word functions.The other important effort is generalizing the notion of regular languages fromwords to arbitrary monads, introduced in arXiv:1502.04898. In this paper, wepresent a number of examples of transducer classes that fit the proposedframework. In particular we show that our class generalizes the classes ofMealy machines and rational transductions. We also present examples ofrecognizable transducers for infinite words and a specific type of trees calledterms. The main result of this paper is a theorem, which states the class ofrecognizable transductions is closed under composition, subject to somecoherence axioms between the structure of a monad and the structure of acomonad. Due to its complexity, we formalize the proof of the theorem in CoqProof Assistant. In the proof, we introduce the concepts of a context and ageneralized wreath product for Eilenberg-Moore algebras, which could bevaluable tools for studying these algebras.

本文提出了单元和单元上的可识别转换器的定义，它连接了当前正则性研究中的两项重要工作。第一项工作是研究正则转导，它将正则性的概念从语言扩展到词到词的函数。另一项重要工作是将正则语言的概念从词推广到任意单元，这在 arXiv:1502.04898 中已经介绍过。在本文中，我们提出了一些符合所提议框架的转换器类示例。我们特别展示了我们的类概括了梅里机类和有理转换类。我们还举例说明了无限词和一种称为 "语词 "的特定树类型的可识别转换器。本文的主要结果是一个定理，它指出在单子结构和单子结构之间存在一些一致性公理的前提下，可识别变换类在组合下是封闭的。由于它的复杂性，我们在 CoqProof Assistant 中形式化了该定理的证明。在证明过程中，我们引入了艾伦伯格-摩尔代数的上下文和广义花环积的概念，它们可能是研究这些代数的宝贵工具。

{"title":"Monads, Comonads, and Transducers","authors":"Rafał Stefański","doi":"arxiv-2407.02704","DOIUrl":"https://doi.org/arxiv-2407.02704","url":null,"abstract":"This paper proposes a definition of recognizable transducers over monads and\u0000comonads, which bridges two important ongoing efforts in the current research\u0000on regularity. The first effort is the study of regular transductions, which\u0000extends the notion of regularity from languages into word-to-word functions.\u0000The other important effort is generalizing the notion of regular languages from\u0000words to arbitrary monads, introduced in arXiv:1502.04898. In this paper, we\u0000present a number of examples of transducer classes that fit the proposed\u0000framework. In particular we show that our class generalizes the classes of\u0000Mealy machines and rational transductions. We also present examples of\u0000recognizable transducers for infinite words and a specific type of trees called\u0000terms. The main result of this paper is a theorem, which states the class of\u0000recognizable transductions is closed under composition, subject to some\u0000coherence axioms between the structure of a monad and the structure of a\u0000comonad. Due to its complexity, we formalize the proof of the theorem in Coq\u0000Proof Assistant. In the proof, we introduce the concepts of a context and a\u0000generalized wreath product for Eilenberg-Moore algebras, which could be\u0000valuable tools for studying these algebras.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141546774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On Shuffling and Splitting Automata 关于洗牌和分裂自动机

arXiv - CS - Formal Languages and Automata Theory

Pub Date : 2024-07-02 DOI: arxiv-2407.02660

Ignacio Mollo Cunningham

We consider a class of finite state three-tape transducers which models theoperation of shuffling and splitting words. We present them as automata overthe so-called Shuffling Monoid. These automata can be seen as either shufflersor splitters interchangeably. We prove that functionality is decidable forsplitters, and we also show that the equivalence between functional splittersis decidable. Moreover, in the deterministic case, the algorithm forequivalence is polynomial on the number of states of the splitter.

我们考虑了一类有限状态三带转换器，它模拟了洗牌和拆字的操作。我们将它们作为所谓的 "洗牌单体"（Shuffling Monoid）上的自动机来介绍。这些自动机可以被看作是洗牌器，也可以被看作是拆字器。我们证明了分词器的功能性是可解的，还证明了功能性分词器之间的等价性是可解的。此外，在确定性情况下，等价性的算法是分裂器状态数的多项式。

引用次数: 0

Some Remarks on First-Order Definable Tree Languages 关于一阶可定义树状语言的一些评论

arXiv - CS - Formal Languages and Automata Theory

Pub Date : 2024-07-01 DOI: arxiv-2407.01169

Achim Blumensath

We study the question of whether a given regular language of finite trees canbe defined in first-order logic. We develop an algebraic approach to addressthis question and we use it to derive several necessary and sufficientconditions for definability (but unfortunately no condition that is both). Themain difference of our results to those from the literature is that ourconditions are decidable.

我们研究的问题是，给定的有限树正则语言是否可以用一阶逻辑来定义。我们开发了一种代数方法来解决这个问题，并用它推导出了可定义性的几个必要条件和充分条件（但遗憾的是，没有一个条件同时具备）。我们的结果与文献中的结果的主要区别在于，我们的条件是可解的。

引用次数: 0

Regular Expressions with Backreferences on Multiple Context-Free Languages, and the Closed-Star Condition 多上下文自由语言上具有反向引用的正则表达式和闭星条件

arXiv - CS - Formal Languages and Automata Theory

Pub Date : 2024-06-27 DOI: arxiv-2406.18918

Taisei Nogami, Tachio Terauchi

Backreference is a well-known practical extension of regular expressions andmost modern programming languages, such as Java, Python, JavaScript and more,support regular expressions with backreferences (rewb) in their standardlibraries for string processing. A difficulty of backreference isnon-regularity: unlike some other extensions, backreference strictly enhancesthe expressive power of regular expressions and thus rewbs can describenon-regular (in fact, even non-context-free) languages. In this paper, weinvestigate the expressive power of rewbs by comparing rewbs to multiplecontext-free languages (MCFL) and parallel multiple context-free languages(PMCFL). First, we prove that the language class of rewbs is a proper subclassof unary-PMCFLs. The class of unary-PMCFLs coincides with that of EDT0Llanguages, and our result strictly improves the known upper bound of rewbs.Additionally, we show that, however, the language class of rewbs is notcontained in that of MCFLs even when restricted to rewbs with only onecapturing group and no captured references. Therefore, in general, theparallelism seems essential for rewbs. Backed by these results, we define anovel syntactic condition on rewbs that we call closed-star and observe that itprovides an upper bound on the number of times a rewb references the samecaptured string. The closed-star condition allows dispensing with theparallelism: that is, we prove that the language class of closed-star rewbsfalls inside the class of unary-MCFLs, which is equivalent to that of EDT0Lsystems of finite index. Furthermore, as additional evidence for the robustnessof the condition, we show that the language class of closed-star rewbs alsofalls inside the class of nonerasing stack languages (NESL).

反向引用是正则表达式的一个众所周知的实用扩展，大多数现代编程语言，如 Java、Python、JavaScript 等，都在其用于字符串处理的标准库中支持带有反向引用的正则表达式（rewb）。反向引用的难点在于非正则性：与其他一些扩展不同，反向引用严格增强了正则表达式的表达能力，因此反向引用可以描述非正则（事实上，甚至是非无上下文）语言。在本文中，我们通过比较 rewbs 与多无上下文语言（MCFL）和并行多无上下文语言（PMCFL），研究了 rewbs 的表达能力。首先，我们证明了 rewbs 的语言类是 unary-PMCFLs 的一个适当子类。此外，我们还证明了，即使仅限于只有一个捕获组且没有捕获引用的 rewbs，rewbs 的语言类也不包含在 MCFL 的语言类中。因此，一般来说，并行性似乎对 rewbs 至关重要。在这些结果的支持下，我们对 rewbs 定义了一个新的语法条件，我们称之为封闭星形条件，并观察到它为 rewbs 引用同一捕获字符串的次数提供了上限。封闭星条件允许免除并行性：也就是说，我们证明封闭星 rewbs 的语言类属于单音多音节词法（unary-MCFLs）的范畴，而单音多音节词法等同于有限索引的 EDT0L 系统。此外，作为该条件稳健性的额外证据，我们还证明了闭星反演的语言类也属于非递减栈语言（NESL）的范畴。

{"title":"Regular Expressions with Backreferences on Multiple Context-Free Languages, and the Closed-Star Condition","authors":"Taisei Nogami, Tachio Terauchi","doi":"arxiv-2406.18918","DOIUrl":"https://doi.org/arxiv-2406.18918","url":null,"abstract":"Backreference is a well-known practical extension of regular expressions and\u0000most modern programming languages, such as Java, Python, JavaScript and more,\u0000support regular expressions with backreferences (rewb) in their standard\u0000libraries for string processing. A difficulty of backreference is\u0000non-regularity: unlike some other extensions, backreference strictly enhances\u0000the expressive power of regular expressions and thus rewbs can describe\u0000non-regular (in fact, even non-context-free) languages. In this paper, we\u0000investigate the expressive power of rewbs by comparing rewbs to multiple\u0000context-free languages (MCFL) and parallel multiple context-free languages\u0000(PMCFL). First, we prove that the language class of rewbs is a proper subclass\u0000of unary-PMCFLs. The class of unary-PMCFLs coincides with that of EDT0L\u0000languages, and our result strictly improves the known upper bound of rewbs.\u0000Additionally, we show that, however, the language class of rewbs is not\u0000contained in that of MCFLs even when restricted to rewbs with only one\u0000capturing group and no captured references. Therefore, in general, the\u0000parallelism seems essential for rewbs. Backed by these results, we define a\u0000novel syntactic condition on rewbs that we call closed-star and observe that it\u0000provides an upper bound on the number of times a rewb references the same\u0000captured string. The closed-star condition allows dispensing with the\u0000parallelism: that is, we prove that the language class of closed-star rewbs\u0000falls inside the class of unary-MCFLs, which is equivalent to that of EDT0L\u0000systems of finite index. Furthermore, as additional evidence for the robustness\u0000of the condition, we show that the language class of closed-star rewbs also\u0000falls inside the class of nonerasing stack languages (NESL).","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141511255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

arXiv - CS - Formal Languages and Automata Theory

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀