首页 > 最新文献

arXiv - CS - Formal Languages and Automata Theory最新文献

英文 中文
Parameterized Verification of Systems with Precise (0,1)-Counter Abstraction 使用精确(0,1)计数器抽象对系统进行参数化验证
Pub Date : 2024-08-12 DOI: arxiv-2408.05954
Paul Eichler, Swen Jacobs, Chana Weil-Kennedy
We introduce a new framework for verifying systems with a parametric numberof concurrently running processes. The systems we consider are well-structuredwith respect to a specific well-quasi order. This allows us to decide a widerange of verification problems, including control-state reachability,coverability, and target, in a fixed finite abstraction of the infinitestate-space, called a 01-counter system. We show that several systems from theparameterized verification literature fall into this class, includingreconfigurable broadcast networks (or systems with lossy broadcast),disjunctive systems, synchronizations and systems with a fixed number of sharedfinite-domain variables. Our framework provides a simple and unifiedexplanation for the properties of these systems, which have so far beeninvestigated separately. Additionally, it extends and improves on a range ofthe existing results, and gives rise to other systems with similar properties.
我们引入了一个新框架,用于验证具有参数数量并发运行进程的系统。我们考虑的系统具有特定的良好准序结构。这使我们能够在无穷测试空间的固定有限抽象(称为 01 计数器系统)中决定更广泛的验证问题,包括控制状态可达性、覆盖性和目标。我们的研究表明,参数化验证文献中的几个系统都属于这一类,包括可重新配置的广播网络(或有损广播系统)、互不关联系统、同步系统和具有固定数量共享有限域变量的系统。我们的框架为这些系统的特性提供了一个简单而统一的解释,迄今为止,人们一直在对这些系统进行单独研究。此外,它还扩展和改进了一系列现有结果,并产生了具有类似性质的其他系统。
{"title":"Parameterized Verification of Systems with Precise (0,1)-Counter Abstraction","authors":"Paul Eichler, Swen Jacobs, Chana Weil-Kennedy","doi":"arxiv-2408.05954","DOIUrl":"https://doi.org/arxiv-2408.05954","url":null,"abstract":"We introduce a new framework for verifying systems with a parametric number\u0000of concurrently running processes. The systems we consider are well-structured\u0000with respect to a specific well-quasi order. This allows us to decide a wide\u0000range of verification problems, including control-state reachability,\u0000coverability, and target, in a fixed finite abstraction of the infinite\u0000state-space, called a 01-counter system. We show that several systems from the\u0000parameterized verification literature fall into this class, including\u0000reconfigurable broadcast networks (or systems with lossy broadcast),\u0000disjunctive systems, synchronizations and systems with a fixed number of shared\u0000finite-domain variables. Our framework provides a simple and unified\u0000explanation for the properties of these systems, which have so far been\u0000investigated separately. Additionally, it extends and improves on a range of\u0000the existing results, and gives rise to other systems with similar properties.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142197227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-order observers and high-order state-estimation-based properties of discrete-event systems 离散事件系统的高阶观测器和基于高阶状态估计的特性
Pub Date : 2024-08-12 DOI: arxiv-2408.06141
Kuize Zhang, Xiaoguang Han, Alessandro Giua, Carla Seatzu
State-estimation-based properties are central properties in discrete-eventsystems modeled by labeled finite-state automata studied over the past 3decades. Most existing results are based on a single agent who knows thestructure of a system and can observe a subset of events and estimate thesystem's state based on the system's structure and the agent's observation tothe system. The main tool used to do state estimation and verifystate-estimation-based properties is called emph{observer} which is thepowerset construction originally proposed by Rabin and Scott in 1959, used todeterminize a nondeterministic finite automaton with $varepsilon$-transitions. In this paper, we consider labeled finite-state automata, extend thestate-estimation-based properties from a single agent to a finite ordered setof agents and also extend the original observer to emph{high-order observer}based on the original observer and our emph{concurrent composition}. As aresult, a general framework on high-order state-estimation-based propertieshave been built and a basic tool has also been built to verify such properties.This general framework contains many basic properties as its members such asstate-based opacity, critical observability, determinism, high-order opacity,etc. Special cases for which verification can be done more efficiently are alsodiscussed. In our general framework, the system's structure is publicly known to allagents $A_1,dots,A_n$, each agent $A_i$ has its own observable event set$E_i$, and additionally knows all its preceding agents' observable events butcan only observe its own observable events. The intuitive meaning of ourhigh-order observer is what agent $A_n$ knows about what $A_{n-1}$ knows aboutdots what $A_2$ knows about $A_1$'s state estimate of the system.
基于状态估计的特性是过去三十年来通过标注有限状态自动机建模的离散事件系统的核心特性。大多数现有成果都是基于一个知道系统结构的单个代理,该代理可以观察事件子集,并根据系统结构和代理对系统的观察来估计系统的状态。用于进行状态估计和验证基于状态估计的属性的主要工具叫做 emph{observer},它是最初由拉宾和斯科特于 1959 年提出的幂集构造,用于确定具有 $varepsilon$ 过渡的非确定有限自动机。在本文中,我们考虑了有标记的有限状态自动机,将基于状态估计的属性从单一代理扩展到有限有序代理集,并基于原始观测器和我们的emph{并发组合}将原始观测器扩展为emph{高阶观测器}。因此,我们建立了一个基于状态估计的高阶属性的一般框架,也建立了一个验证这些属性的基本工具。这个一般框架包含许多基本属性,如基于状态的不透明性、临界可观测性、确定性、高阶不透明性等。此外,还讨论了可以更有效地进行验证的特殊情况。在我们的一般框架中,系统的结构对所有代理 $A_1,(点),A_n$ 都是公开的,每个代理 $A_i$ 都有自己的可观测事件集 $E_i$,此外还知道其前面代理的所有可观测事件,但只能观测自己的可观测事件。我们的高阶观察者的直观含义是,代理 $A_n$ 知道 $A_{n-1}$ 所知道的关于 $A_2$ 所知道的关于 $A_1$ 的系统状态估计。
{"title":"High-order observers and high-order state-estimation-based properties of discrete-event systems","authors":"Kuize Zhang, Xiaoguang Han, Alessandro Giua, Carla Seatzu","doi":"arxiv-2408.06141","DOIUrl":"https://doi.org/arxiv-2408.06141","url":null,"abstract":"State-estimation-based properties are central properties in discrete-event\u0000systems modeled by labeled finite-state automata studied over the past 3\u0000decades. Most existing results are based on a single agent who knows the\u0000structure of a system and can observe a subset of events and estimate the\u0000system's state based on the system's structure and the agent's observation to\u0000the system. The main tool used to do state estimation and verify\u0000state-estimation-based properties is called emph{observer} which is the\u0000powerset construction originally proposed by Rabin and Scott in 1959, used to\u0000determinize a nondeterministic finite automaton with $varepsilon$-transitions. In this paper, we consider labeled finite-state automata, extend the\u0000state-estimation-based properties from a single agent to a finite ordered set\u0000of agents and also extend the original observer to emph{high-order observer}\u0000based on the original observer and our emph{concurrent composition}. As a\u0000result, a general framework on high-order state-estimation-based properties\u0000have been built and a basic tool has also been built to verify such properties.\u0000This general framework contains many basic properties as its members such as\u0000state-based opacity, critical observability, determinism, high-order opacity,\u0000etc. Special cases for which verification can be done more efficiently are also\u0000discussed. In our general framework, the system's structure is publicly known to all\u0000agents $A_1,dots,A_n$, each agent $A_i$ has its own observable event set\u0000$E_i$, and additionally knows all its preceding agents' observable events but\u0000can only observe its own observable events. The intuitive meaning of our\u0000high-order observer is what agent $A_n$ knows about what $A_{n-1}$ knows about\u0000dots what $A_2$ knows about $A_1$'s state estimate of the system.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"96 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142197225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parameterized Verification of Timed Networks with Clock Invariants 具有时钟不变性的定时网络参数化验证
Pub Date : 2024-08-09 DOI: arxiv-2408.05190
Étienne André, Swen Jacobs, Shyam Lal Karra, Ocan Sankur
We consider parameterized verification problems for networks of timedautomata (TAs) that communicate via disjunctive guards or lossy broadcast. Tothis end, we first consider disjunctive timed networks (DTNs), i.e., networksof TAs that communicate via location guards that enable a transition only ifthere is another process in a certain location. We solve for the first time thegeneral case with clock invariants, and establish the decidability of theparameterized verification problem for local trace properties and forreachability of global configurations; Moreover, we prove that, surprisinglyand unlike in other settings, this model is equivalent to lossy broadcastnetworks.
我们考虑的是通过互不相关的保护或有损广播进行通信的定时自动机(TA)网络的参数化验证问题。为此,我们首先考虑互不相容的定时网络(DTN),即通过位置保护进行通信的定时自动机网络。我们首次求解了具有时钟不变性的一般情况,并建立了局部跟踪属性和全局配置可达性的参数化验证问题的可判定性;此外,我们还证明,令人惊讶的是,与其他情况不同,这个模型等价于有损广播网络。
{"title":"Parameterized Verification of Timed Networks with Clock Invariants","authors":"Étienne André, Swen Jacobs, Shyam Lal Karra, Ocan Sankur","doi":"arxiv-2408.05190","DOIUrl":"https://doi.org/arxiv-2408.05190","url":null,"abstract":"We consider parameterized verification problems for networks of timed\u0000automata (TAs) that communicate via disjunctive guards or lossy broadcast. To\u0000this end, we first consider disjunctive timed networks (DTNs), i.e., networks\u0000of TAs that communicate via location guards that enable a transition only if\u0000there is another process in a certain location. We solve for the first time the\u0000general case with clock invariants, and establish the decidability of the\u0000parameterized verification problem for local trace properties and for\u0000reachability of global configurations; Moreover, we prove that, surprisingly\u0000and unlike in other settings, this model is equivalent to lossy broadcast\u0000networks.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Alternating Nominal Automata with Name Allocation 名称分配交替名义自动机
Pub Date : 2024-08-07 DOI: arxiv-2408.03658
Florian Frank, Daniel Hausmann, Stefan Milius, Lutz Schröder, Henning Urbat
Formal languages over infinite alphabets serve as abstractions of structuresand processes carrying data. Automata models over infinite alphabets, such asclassical register automata or, equivalently, nominal orbit-finite automata,tend to have computationally hard or even undecidable reasoning problems unlessstringent restrictions are imposed on either the power of control or the numberof registers. This has been shown to be ameliorated in automata models withname allocation such as regular nondeterministic nominal automata, which allowfor deciding language inclusion in elementary complexity even with unboundedlymany registers while retaining a reasonable level of expressiveness. In thepresent work, we demonstrate that elementary complexity survives underextending the power of control to alternation: We introduce regular alternatingnominal automata (RANAs), and show that their non-emptiness and inclusionproblems have elementary complexity even when the number of registers isunbounded. Moreover, we show that RANAs allow for nearly completede-alternation, specifically de-alternation up to a single deadlocked universalstate.
无限字母表上的形式语言是对结构和携带数据的过程的抽象。无限字母表上的自动机模型,如经典寄存器自动机或等价的标称轨道无限自动机,除非对控制能力或寄存器数量施加严格限制,否则往往会出现难以计算甚至无法判定的推理问题。这种情况在有名称分配的自动机模型中得到了改善,比如正则非决定性标称自动机,它允许在基本复杂度中决定语言的包含,即使有无限多的寄存器,也能保持合理的表达能力。在目前的工作中,我们证明了基本复杂性可以在控制权过度扩展到交替的情况下存活下来:我们引入了正则交替全称自动机(RANAs),并证明即使寄存器数量无界,其非emptiness和包含问题也具有基本复杂性。此外,我们还证明了 RANAs 允许近乎完整的交替,特别是去交替到单个死锁通用状态。
{"title":"Alternating Nominal Automata with Name Allocation","authors":"Florian Frank, Daniel Hausmann, Stefan Milius, Lutz Schröder, Henning Urbat","doi":"arxiv-2408.03658","DOIUrl":"https://doi.org/arxiv-2408.03658","url":null,"abstract":"Formal languages over infinite alphabets serve as abstractions of structures\u0000and processes carrying data. Automata models over infinite alphabets, such as\u0000classical register automata or, equivalently, nominal orbit-finite automata,\u0000tend to have computationally hard or even undecidable reasoning problems unless\u0000stringent restrictions are imposed on either the power of control or the number\u0000of registers. This has been shown to be ameliorated in automata models with\u0000name allocation such as regular nondeterministic nominal automata, which allow\u0000for deciding language inclusion in elementary complexity even with unboundedly\u0000many registers while retaining a reasonable level of expressiveness. In the\u0000present work, we demonstrate that elementary complexity survives under\u0000extending the power of control to alternation: We introduce regular alternating\u0000nominal automata (RANAs), and show that their non-emptiness and inclusion\u0000problems have elementary complexity even when the number of registers is\u0000unbounded. Moreover, we show that RANAs allow for nearly complete\u0000de-alternation, specifically de-alternation up to a single deadlocked universal\u0000state.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLMs as Probabilistic Minimally Adequate Teachers for DFA Learning LLM 作为 DFA 学习的概率最小充分教师
Pub Date : 2024-08-06 DOI: arxiv-2408.02999
Lekai Chen, Ashutosh Trivedi, Alvaro Velasquez
The emergence of intelligence in large language models (LLMs) has inspiredinvestigations into their integration into automata learning. This paperintroduces the probabilistic Minimally Adequate Teacher (pMAT) formulation,which leverages a probabilistic oracle that could give persistent errorsrandomly during answering the membership queries for deterministic finiteautomata (DFA) learning. Given the tendency of LLMs to produce hallucinatorycontent, we have developed techniques to improve answer accuracy and ensure thecorrectness of the learned automata. We propose the $mathtt{Discrimination}$prompt as well as the $mathtt{Verification}$ prompt and explore theiradvantages over common prompts. Additionally, we compare DFA learningperformance between the TTT algorithm and common active learning algorithms. Toaddress the exponential number of persistent errors, we implement a dynamicquery cache refinement algorithm that identifies and corrects conflictingqueries by combining the active and passive learning algorithms. The empiricalresults demonstrate the robustness and efficiency of our approach, providing atheoretical foundation for automata learning with LLMs in the loop.
大型语言模型(LLMs)中出现的智能激发了人们将其融入自动机学习的研究。本文介绍了概率最小适格教师(pMAT)公式,它利用了概率甲骨文,这种甲骨文在回答确定性有限自动机(DFA)学习的成员查询时可能会随机给出持续性错误。鉴于 LLMs 容易产生幻觉内容,我们开发了一些技术来提高回答的准确性,并确保所学自动机的正确性。我们提出了 $mathtt{Discrimination}$ 提示和 $mathtt{Verification}$ 提示,并探讨了它们与常见提示相比的优势。此外,我们还比较了 TTT 算法和普通主动学习算法的 DFA 学习性能。为了解决指数级数量的持续性错误,我们实现了一种动态查询缓存完善算法,通过结合主动和被动学习算法来识别和纠正冲突查询。实证结果证明了我们方法的稳健性和效率,为循环中的 LLM 自动学习提供了理论基础。
{"title":"LLMs as Probabilistic Minimally Adequate Teachers for DFA Learning","authors":"Lekai Chen, Ashutosh Trivedi, Alvaro Velasquez","doi":"arxiv-2408.02999","DOIUrl":"https://doi.org/arxiv-2408.02999","url":null,"abstract":"The emergence of intelligence in large language models (LLMs) has inspired\u0000investigations into their integration into automata learning. This paper\u0000introduces the probabilistic Minimally Adequate Teacher (pMAT) formulation,\u0000which leverages a probabilistic oracle that could give persistent errors\u0000randomly during answering the membership queries for deterministic finite\u0000automata (DFA) learning. Given the tendency of LLMs to produce hallucinatory\u0000content, we have developed techniques to improve answer accuracy and ensure the\u0000correctness of the learned automata. We propose the $mathtt{Discrimination}$\u0000prompt as well as the $mathtt{Verification}$ prompt and explore their\u0000advantages over common prompts. Additionally, we compare DFA learning\u0000performance between the TTT algorithm and common active learning algorithms. To\u0000address the exponential number of persistent errors, we implement a dynamic\u0000query cache refinement algorithm that identifies and corrects conflicting\u0000queries by combining the active and passive learning algorithms. The empirical\u0000results demonstrate the robustness and efficiency of our approach, providing a\u0000theoretical foundation for automata learning with LLMs in the loop.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Clustering using Reversible Binary Cellular Automata for High-Dimensional Data 利用可逆二元蜂窝自动机对高维数据进行分层聚类
Pub Date : 2024-08-05 DOI: arxiv-2408.02250
Baby C. J., Kamalika Bhattacharjee
This work proposes a hierarchical clustering algorithm for high-dimensionaldatasets using the cyclic space of reversible finite cellular automata. Incellular automaton (CA) based clustering, if two objects belong to the samecycle, they are closely related and considered as part of the same cluster.However, if a high-dimensional dataset is clustered using the cycles of one CA,closely related objects may belong to different cycles. This paper identifiesthe relationship between objects in two different cycles based on the median ofall elements in each cycle so that they can be grouped in the next stage.Further, to minimize the number of intermediate clusters which in turn reducesthe computational cost, a rule selection strategy is taken to find the bestrules based on information propagation and cycle structure. After encoding thedataset using frequency-based encoding such that the consecutive data elementsmaintain a minimum hamming distance in encoded form, our proposed clusteringalgorithm iterates over three stages to finally cluster the data elements intothe desired number of clusters given by user. This algorithm can be applied tovarious fields, including healthcare, sports, chemical research, agriculture,etc. When verified over standard benchmark datasets with various performancemetrics, our algorithm is at par with the existing algorithms with quadratictime complexity.
本研究提出了一种利用可逆有限蜂窝自动机循环空间对高维数据集进行分层聚类的算法。在基于细胞自动机(CA)的聚类中,如果两个对象属于同一个循环,那么它们就是密切相关的,并被视为同一个聚类的一部分。然而,如果使用一个 CA 的循环对高维数据集进行聚类,那么密切相关的对象可能属于不同的循环。本文根据每个循环中所有元素的中位数来识别两个不同循环中的对象之间的关系,以便在下一阶段对它们进行分组。此外,为了尽量减少中间聚类的数量,从而降低计算成本,本文采用了一种规则选择策略,根据信息传播和循环结构来寻找最佳规则。在使用基于频率的编码对数据集进行编码,使连续的数据元素在编码形式下保持最小的汉明距离之后,我们提出的聚类算法将经过三个阶段的迭代,最终将数据元素聚类到用户给出的所需数量的聚类中。该算法可应用于医疗保健、体育、化学研究、农业等多个领域。在对标准基准数据集进行各种性能指标验证时,我们的算法与现有算法不相上下,其复杂度为四倍。
{"title":"Hierarchical Clustering using Reversible Binary Cellular Automata for High-Dimensional Data","authors":"Baby C. J., Kamalika Bhattacharjee","doi":"arxiv-2408.02250","DOIUrl":"https://doi.org/arxiv-2408.02250","url":null,"abstract":"This work proposes a hierarchical clustering algorithm for high-dimensional\u0000datasets using the cyclic space of reversible finite cellular automata. In\u0000cellular automaton (CA) based clustering, if two objects belong to the same\u0000cycle, they are closely related and considered as part of the same cluster.\u0000However, if a high-dimensional dataset is clustered using the cycles of one CA,\u0000closely related objects may belong to different cycles. This paper identifies\u0000the relationship between objects in two different cycles based on the median of\u0000all elements in each cycle so that they can be grouped in the next stage.\u0000Further, to minimize the number of intermediate clusters which in turn reduces\u0000the computational cost, a rule selection strategy is taken to find the best\u0000rules based on information propagation and cycle structure. After encoding the\u0000dataset using frequency-based encoding such that the consecutive data elements\u0000maintain a minimum hamming distance in encoded form, our proposed clustering\u0000algorithm iterates over three stages to finally cluster the data elements into\u0000the desired number of clusters given by user. This algorithm can be applied to\u0000various fields, including healthcare, sports, chemical research, agriculture,\u0000etc. When verified over standard benchmark datasets with various performance\u0000metrics, our algorithm is at par with the existing algorithms with quadratic\u0000time complexity.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Tree Sampler for Bounded Context-Free Languages 有界无上下文语言的树状采样器
Pub Date : 2024-08-03 DOI: arxiv-2408.01849
Breandan Considine
In the following paper, we present a simple method for sampling trees with orwithout replacement from BCFLs. A BCFL is a context-free language (CFL)corresponding to an incomplete string with holes, which can be completed byvalid terminals. To solve this problem, we introduce an algebraic datatype thatcompactly represents candidate parse forests for porous strings. Onceconstructed, sampling trees is a straightforward matter of sampling integersuniformly without replacement, then lazily decoding them into trees.
在下面的论文中,我们将介绍一种从 BCFL 中抽取有或无替换树的简单方法。BCFL 是一种上下文无漏洞语言(CFL),对应于一个有漏洞的不完整字符串,它可以通过有效终端来完成。为了解决这个问题,我们引入了一种代数数据类型,它能紧凑地表示多孔字符串的候选解析森林。一旦构建完成,对树的采样就变得简单易行了,只需对整数进行均匀采样,无需替换,然后将它们轻松解码成树即可。
{"title":"A Tree Sampler for Bounded Context-Free Languages","authors":"Breandan Considine","doi":"arxiv-2408.01849","DOIUrl":"https://doi.org/arxiv-2408.01849","url":null,"abstract":"In the following paper, we present a simple method for sampling trees with or\u0000without replacement from BCFLs. A BCFL is a context-free language (CFL)\u0000corresponding to an incomplete string with holes, which can be completed by\u0000valid terminals. To solve this problem, we introduce an algebraic datatype that\u0000compactly represents candidate parse forests for porous strings. Once\u0000constructed, sampling trees is a straightforward matter of sampling integers\u0000uniformly without replacement, then lazily decoding them into trees.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Complex event recognition meets hierarchical conjunctive queries 复杂事件识别满足分层连接查询
Pub Date : 2024-08-03 DOI: arxiv-2408.01652
Dante Pinto, Cristian Riveros
Hierarchical conjunctive queries (HCQ) are a subclass of conjunctive queries(CQ) with robust algorithmic properties. Among others, Berkholz, Keppeler, andSchweikardt have shown that HCQ is the subclass of CQ (without projection) thatadmits dynamic query evaluation with constant update time and constant delayenumeration. On a different but related setting stands Complex EventRecognition (CER), a prominent technology for evaluating sequence patterns overstreams. Since one can interpret a data stream as an unbounded sequence ofinserts in dynamic query evaluation, it is natural to ask to which extent CERcan take advantage of HCQ to find a robust class of queries that can beevaluated efficiently. In this paper, we search to combine HCQ with sequence patterns to find aclass of CER queries that can get the best of both worlds. To reach this goal,we propose a class of complex event automata model called Parallelized ComplexEvent Automata (PCEA) for evaluating CER queries with correlation (i.e., joins)over streams. This model allows us to express sequence patterns and comparevalues among tuples, but it also allows us to express conjunctions byincorporating a novel form of non-determinism that we call parallelization. Weshow that for every HCQ (under bag semantics), we can construct an equivalentPCEA. Further, we show that HCQ is the biggest class of acyclic CQ that thisautomata model can define. Then, PCEA stands as a sweet spot that preciselyexpresses HCQ (i.e., among acyclic CQ) and extends them with sequence patterns.Finally, we show that PCEA also inherits the good algorithmic properties of HCQby presenting a streaming evaluation algorithm under sliding windows withlogarithmic update time and output-linear delay for the class of PCEA withequality predicates.
分层连接查询(HCQ)是连接查询(CQ)的一个子类,具有稳健的算法特性。其中,Berkholz、Keppeler 和 Schweikardt 已经证明,HCQ 是 CQ 的子类(无投影),它允许以恒定的更新时间和恒定的延迟枚举进行动态查询评估。复杂事件识别(CER)是一种用于评估数据流序列模式的著名技术,它与 CQ 的设定不同,但又相互关联。由于在动态查询评估中,人们可以将数据流解释为无限制的插入序列,因此我们自然会问,CER 能在多大程度上利用 HCQ 来找到一类可以高效评估的健壮查询。在本文中,我们试图将 HCQ 与序列模式结合起来,找到一类可以获得两全其美的 CER 查询。为了实现这一目标,我们提出了一类名为并行化复杂事件自动机(Parallelized ComplexEvent Automata,PCEA)的复杂事件自动机模型,用于评估流上具有相关性(即连接)的 CER 查询。该模型允许我们表达序列模式和图元间的比较值,还允许我们通过纳入一种新颖的非确定性形式(我们称之为并行化)来表达连接。我们证明,对于每一个 HCQ(在袋语义下),我们都能构造一个等价的 PCEA。此外,我们还证明,HCQ 是这种自动模型所能定义的最大一类无循环 CQ。最后,我们证明了 PCEA 也继承了 HCQ 的良好算法特性,为一类具有质量谓词的 PCEA 提出了滑动窗口下的流式评估算法,该算法具有对数更新时间和输出线性延迟。
{"title":"Complex event recognition meets hierarchical conjunctive queries","authors":"Dante Pinto, Cristian Riveros","doi":"arxiv-2408.01652","DOIUrl":"https://doi.org/arxiv-2408.01652","url":null,"abstract":"Hierarchical conjunctive queries (HCQ) are a subclass of conjunctive queries\u0000(CQ) with robust algorithmic properties. Among others, Berkholz, Keppeler, and\u0000Schweikardt have shown that HCQ is the subclass of CQ (without projection) that\u0000admits dynamic query evaluation with constant update time and constant delay\u0000enumeration. On a different but related setting stands Complex Event\u0000Recognition (CER), a prominent technology for evaluating sequence patterns over\u0000streams. Since one can interpret a data stream as an unbounded sequence of\u0000inserts in dynamic query evaluation, it is natural to ask to which extent CER\u0000can take advantage of HCQ to find a robust class of queries that can be\u0000evaluated efficiently. In this paper, we search to combine HCQ with sequence patterns to find a\u0000class of CER queries that can get the best of both worlds. To reach this goal,\u0000we propose a class of complex event automata model called Parallelized Complex\u0000Event Automata (PCEA) for evaluating CER queries with correlation (i.e., joins)\u0000over streams. This model allows us to express sequence patterns and compare\u0000values among tuples, but it also allows us to express conjunctions by\u0000incorporating a novel form of non-determinism that we call parallelization. We\u0000show that for every HCQ (under bag semantics), we can construct an equivalent\u0000PCEA. Further, we show that HCQ is the biggest class of acyclic CQ that this\u0000automata model can define. Then, PCEA stands as a sweet spot that precisely\u0000expresses HCQ (i.e., among acyclic CQ) and extends them with sequence patterns.\u0000Finally, we show that PCEA also inherits the good algorithmic properties of HCQ\u0000by presenting a streaming evaluation algorithm under sliding windows with\u0000logarithmic update time and output-linear delay for the class of PCEA with\u0000equality predicates.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regular Grammars for Graph Sets of Tree-Width $leq2$ 树宽为 $leq2$ 的图集的正则语法
Pub Date : 2024-08-02 DOI: arxiv-2408.01226
Marius Bozga, Radu Iosif, Florian Zuleger
Regular and context-free languages form a central pillar of formal languagetheory. This is because a variety of formalisms are known that define theseclasses of languages. For example, we have that finite automata, monoids,algebraic recognizability, regular expressions, regular grammars,monadic-second order logic, etc., can be used to represent regular wordlanguages. However, the situation is less clear for formal languages overgraphs, and open problems persist. This is because generalizing notions fromwords to graphs has been more successful for some of the cited formalisms thanfor the other ones. Bruno Courcelle has introduced hyper-edge replacement (hr)algebras for generalizing the notion of context-free languages from words tographs. At the same time, hr-algebras support the generalization of algebraicrecognizability from words to graphs, a notion that has been proven to beequivalent to definability in (counting) monadic-second order logic (cmso)over graphs of bounded tree-width. In this paper, we deal with generalizingregular word grammars to graphs. We propose regular grammars for (unordered andunranked) trees, series-parallel graphs, and graphs of tree-width $le 2$,where the qualifier regular is justified because these grammars define exactlythe recognizable resp. cmso-definable subsets of the respective graph classes.
正则表达式语言和无语境语言是形式语言理论的核心支柱。这是因为我们已经知道了定义这类语言的各种形式主义。例如,我们可以用有限自动机、单体、代数可识别性、正则表达式、正则语法、一元二阶逻辑等来表示正则词语言。然而,对于图上的形式语言来说,情况就不那么明朗了,仍然存在一些悬而未决的问题。这是因为,从词到图的概念泛化,在某些被引用的形式语言中比在其他形式语言中更成功。布鲁诺-库塞尔(Bruno Courcelle)引入了超边置换(hyper-edge replacement)代数,用于将无语境语言的概念从单词泛化到图。同时,(hr)词组支持将代数可识别性从词推广到图,这一概念已被证明等同于在有界树宽的图上(计数)一元二阶逻辑(cmso)中的可定义性。在本文中,我们将讨论如何把正则词语法推广到图中。我们提出了针对(无序和无序)树、序列平行图和树宽($le 2$)图的正则语法,其中限定词正则是合理的,因为这些语法恰好定义了各自图类的可识别子集和可定义子集。
{"title":"Regular Grammars for Graph Sets of Tree-Width $leq2$","authors":"Marius Bozga, Radu Iosif, Florian Zuleger","doi":"arxiv-2408.01226","DOIUrl":"https://doi.org/arxiv-2408.01226","url":null,"abstract":"Regular and context-free languages form a central pillar of formal language\u0000theory. This is because a variety of formalisms are known that define these\u0000classes of languages. For example, we have that finite automata, monoids,\u0000algebraic recognizability, regular expressions, regular grammars,\u0000monadic-second order logic, etc., can be used to represent regular word\u0000languages. However, the situation is less clear for formal languages over\u0000graphs, and open problems persist. This is because generalizing notions from\u0000words to graphs has been more successful for some of the cited formalisms than\u0000for the other ones. Bruno Courcelle has introduced hyper-edge replacement (hr)\u0000algebras for generalizing the notion of context-free languages from words to\u0000graphs. At the same time, hr-algebras support the generalization of algebraic\u0000recognizability from words to graphs, a notion that has been proven to be\u0000equivalent to definability in (counting) monadic-second order logic (cmso)\u0000over graphs of bounded tree-width. In this paper, we deal with generalizing\u0000regular word grammars to graphs. We propose regular grammars for (unordered and\u0000unranked) trees, series-parallel graphs, and graphs of tree-width $le 2$,\u0000where the qualifier regular is justified because these grammars define exactly\u0000the recognizable resp. cmso-definable subsets of the respective graph classes.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RE#: High Performance Derivative-Based Regex Matching with Intersection, Complement and Lookarounds RE#:基于派生词的高性能 Regex 匹配(带交集、补全和查找功能
Pub Date : 2024-07-30 DOI: arxiv-2407.20479
Ian Erik Varatalu, Margus Veanes, Juhan-Peep Ernits
We present a tool and theory RE# for regular expression matching that isbuilt on symbolic derivatives, does not use backtracking, and, in addition tothe classical operators, also supports complement, intersection andlookarounds. We develop the theory formally and show that the main matchingalgorithm has input-linear complexity both in theory as well as experimentally.We apply thorough evaluation on popular benchmarks that show that RE# is over71% faster than the next fastest regex engine in Rust on the baseline, andoutperforms all state-of-the-art engines on extensions of the benchmarks oftenby several orders of magnitude.
我们提出了一种用于正则表达式匹配的工具和理论 RE#,它建立在符号导数的基础上,不使用回溯,除了经典运算符之外,还支持补码、交集和遍历。我们在流行的基准上进行了全面评估,结果表明 RE# 比 Rust 中下一个最快的 regex 引擎快 71%,在基准的扩展上,RE# 的性能比所有最先进的引擎高出几个数量级。
{"title":"RE#: High Performance Derivative-Based Regex Matching with Intersection, Complement and Lookarounds","authors":"Ian Erik Varatalu, Margus Veanes, Juhan-Peep Ernits","doi":"arxiv-2407.20479","DOIUrl":"https://doi.org/arxiv-2407.20479","url":null,"abstract":"We present a tool and theory RE# for regular expression matching that is\u0000built on symbolic derivatives, does not use backtracking, and, in addition to\u0000the classical operators, also supports complement, intersection and\u0000lookarounds. We develop the theory formally and show that the main matching\u0000algorithm has input-linear complexity both in theory as well as experimentally.\u0000We apply thorough evaluation on popular benchmarks that show that RE# is over\u000071% faster than the next fastest regex engine in Rust on the baseline, and\u0000outperforms all state-of-the-art engines on extensions of the benchmarks often\u0000by several orders of magnitude.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Formal Languages and Automata Theory
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1