We introduce a new framework for verifying systems with a parametric number of concurrently running processes. The systems we consider are well-structured with respect to a specific well-quasi order. This allows us to decide a wide range of verification problems, including control-state reachability, coverability, and target, in a fixed finite abstraction of the infinite state-space, called a 01-counter system. We show that several systems from the parameterized verification literature fall into this class, including reconfigurable broadcast networks (or systems with lossy broadcast), disjunctive systems, synchronizations and systems with a fixed number of shared finite-domain variables. Our framework provides a simple and unified explanation for the properties of these systems, which have so far been investigated separately. Additionally, it extends and improves on a range of the existing results, and gives rise to other systems with similar properties.
{"title":"Parameterized Verification of Systems with Precise (0,1)-Counter Abstraction","authors":"Paul Eichler, Swen Jacobs, Chana Weil-Kennedy","doi":"arxiv-2408.05954","DOIUrl":"https://doi.org/arxiv-2408.05954","url":null,"abstract":"We introduce a new framework for verifying systems with a parametric number\u0000of concurrently running processes. The systems we consider are well-structured\u0000with respect to a specific well-quasi order. This allows us to decide a wide\u0000range of verification problems, including control-state reachability,\u0000coverability, and target, in a fixed finite abstraction of the infinite\u0000state-space, called a 01-counter system. We show that several systems from the\u0000parameterized verification literature fall into this class, including\u0000reconfigurable broadcast networks (or systems with lossy broadcast),\u0000disjunctive systems, synchronizations and systems with a fixed number of shared\u0000finite-domain variables. Our framework provides a simple and unified\u0000explanation for the properties of these systems, which have so far been\u0000investigated separately. Additionally, it extends and improves on a range of\u0000the existing results, and gives rise to other systems with similar properties.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142197227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
State-estimation-based properties are central properties in discrete-event systems modeled by labeled finite-state automata studied over the past 3 decades. Most existing results are based on a single agent who knows the structure of a system and can observe a subset of events and estimate the system's state based on the system's structure and the agent's observation to the system. The main tool used to do state estimation and verify state-estimation-based properties is called emph{observer} which is the powerset construction originally proposed by Rabin and Scott in 1959, used to determinize a nondeterministic finite automaton with $varepsilon$-transitions. In this paper, we consider labeled finite-state automata, extend the state-estimation-based properties from a single agent to a finite ordered set of agents and also extend the original observer to emph{high-order observer} based on the original observer and our emph{concurrent composition}. As a result, a general framework on high-order state-estimation-based properties have been built and a basic tool has also been built to verify such properties. This general framework contains many basic properties as its members such as state-based opacity, critical observability, determinism, high-order opacity, etc. Special cases for which verification can be done more efficiently are also discussed. In our general framework, the system's structure is publicly known to all agents $A_1,dots,A_n$, each agent $A_i$ has its own observable event set $E_i$, and additionally knows all its preceding agents' observable events but can only observe its own observable events. The intuitive meaning of our high-order observer is what agent $A_n$ knows about what $A_{n-1}$ knows about dots what $A_2$ knows about $A_1$'s state estimate of the system.
{"title":"High-order observers and high-order state-estimation-based properties of discrete-event systems","authors":"Kuize Zhang, Xiaoguang Han, Alessandro Giua, Carla Seatzu","doi":"arxiv-2408.06141","DOIUrl":"https://doi.org/arxiv-2408.06141","url":null,"abstract":"State-estimation-based properties are central properties in discrete-event\u0000systems modeled by labeled finite-state automata studied over the past 3\u0000decades. Most existing results are based on a single agent who knows the\u0000structure of a system and can observe a subset of events and estimate the\u0000system's state based on the system's structure and the agent's observation to\u0000the system. The main tool used to do state estimation and verify\u0000state-estimation-based properties is called emph{observer} which is the\u0000powerset construction originally proposed by Rabin and Scott in 1959, used to\u0000determinize a nondeterministic finite automaton with $varepsilon$-transitions. In this paper, we consider labeled finite-state automata, extend the\u0000state-estimation-based properties from a single agent to a finite ordered set\u0000of agents and also extend the original observer to emph{high-order observer}\u0000based on the original observer and our emph{concurrent composition}. As a\u0000result, a general framework on high-order state-estimation-based properties\u0000have been built and a basic tool has also been built to verify such properties.\u0000This general framework contains many basic properties as its members such as\u0000state-based opacity, critical observability, determinism, high-order opacity,\u0000etc. Special cases for which verification can be done more efficiently are also\u0000discussed. In our general framework, the system's structure is publicly known to all\u0000agents $A_1,dots,A_n$, each agent $A_i$ has its own observable event set\u0000$E_i$, and additionally knows all its preceding agents' observable events but\u0000can only observe its own observable events. The intuitive meaning of our\u0000high-order observer is what agent $A_n$ knows about what $A_{n-1}$ knows about\u0000dots what $A_2$ knows about $A_1$'s state estimate of the system.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"96 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142197225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider parameterized verification problems for networks of timed automata (TAs) that communicate via disjunctive guards or lossy broadcast. To this end, we first consider disjunctive timed networks (DTNs), i.e., networks of TAs that communicate via location guards that enable a transition only if there is another process in a certain location. We solve for the first time the general case with clock invariants, and establish the decidability of the parameterized verification problem for local trace properties and for reachability of global configurations; Moreover, we prove that, surprisingly and unlike in other settings, this model is equivalent to lossy broadcast networks.
{"title":"Parameterized Verification of Timed Networks with Clock Invariants","authors":"Étienne André, Swen Jacobs, Shyam Lal Karra, Ocan Sankur","doi":"arxiv-2408.05190","DOIUrl":"https://doi.org/arxiv-2408.05190","url":null,"abstract":"We consider parameterized verification problems for networks of timed\u0000automata (TAs) that communicate via disjunctive guards or lossy broadcast. To\u0000this end, we first consider disjunctive timed networks (DTNs), i.e., networks\u0000of TAs that communicate via location guards that enable a transition only if\u0000there is another process in a certain location. We solve for the first time the\u0000general case with clock invariants, and establish the decidability of the\u0000parameterized verification problem for local trace properties and for\u0000reachability of global configurations; Moreover, we prove that, surprisingly\u0000and unlike in other settings, this model is equivalent to lossy broadcast\u0000networks.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Florian Frank, Daniel Hausmann, Stefan Milius, Lutz Schröder, Henning Urbat
Formal languages over infinite alphabets serve as abstractions of structures and processes carrying data. Automata models over infinite alphabets, such as classical register automata or, equivalently, nominal orbit-finite automata, tend to have computationally hard or even undecidable reasoning problems unless stringent restrictions are imposed on either the power of control or the number of registers. This has been shown to be ameliorated in automata models with name allocation such as regular nondeterministic nominal automata, which allow for deciding language inclusion in elementary complexity even with unboundedly many registers while retaining a reasonable level of expressiveness. In the present work, we demonstrate that elementary complexity survives under extending the power of control to alternation: We introduce regular alternating nominal automata (RANAs), and show that their non-emptiness and inclusion problems have elementary complexity even when the number of registers is unbounded. Moreover, we show that RANAs allow for nearly complete de-alternation, specifically de-alternation up to a single deadlocked universal state.
{"title":"Alternating Nominal Automata with Name Allocation","authors":"Florian Frank, Daniel Hausmann, Stefan Milius, Lutz Schröder, Henning Urbat","doi":"arxiv-2408.03658","DOIUrl":"https://doi.org/arxiv-2408.03658","url":null,"abstract":"Formal languages over infinite alphabets serve as abstractions of structures\u0000and processes carrying data. Automata models over infinite alphabets, such as\u0000classical register automata or, equivalently, nominal orbit-finite automata,\u0000tend to have computationally hard or even undecidable reasoning problems unless\u0000stringent restrictions are imposed on either the power of control or the number\u0000of registers. This has been shown to be ameliorated in automata models with\u0000name allocation such as regular nondeterministic nominal automata, which allow\u0000for deciding language inclusion in elementary complexity even with unboundedly\u0000many registers while retaining a reasonable level of expressiveness. In the\u0000present work, we demonstrate that elementary complexity survives under\u0000extending the power of control to alternation: We introduce regular alternating\u0000nominal automata (RANAs), and show that their non-emptiness and inclusion\u0000problems have elementary complexity even when the number of registers is\u0000unbounded. Moreover, we show that RANAs allow for nearly complete\u0000de-alternation, specifically de-alternation up to a single deadlocked universal\u0000state.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The emergence of intelligence in large language models (LLMs) has inspired investigations into their integration into automata learning. This paper introduces the probabilistic Minimally Adequate Teacher (pMAT) formulation, which leverages a probabilistic oracle that could give persistent errors randomly during answering the membership queries for deterministic finite automata (DFA) learning. Given the tendency of LLMs to produce hallucinatory content, we have developed techniques to improve answer accuracy and ensure the correctness of the learned automata. We propose the $mathtt{Discrimination}$ prompt as well as the $mathtt{Verification}$ prompt and explore their advantages over common prompts. Additionally, we compare DFA learning performance between the TTT algorithm and common active learning algorithms. To address the exponential number of persistent errors, we implement a dynamic query cache refinement algorithm that identifies and corrects conflicting queries by combining the active and passive learning algorithms. The empirical results demonstrate the robustness and efficiency of our approach, providing a theoretical foundation for automata learning with LLMs in the loop.
{"title":"LLMs as Probabilistic Minimally Adequate Teachers for DFA Learning","authors":"Lekai Chen, Ashutosh Trivedi, Alvaro Velasquez","doi":"arxiv-2408.02999","DOIUrl":"https://doi.org/arxiv-2408.02999","url":null,"abstract":"The emergence of intelligence in large language models (LLMs) has inspired\u0000investigations into their integration into automata learning. This paper\u0000introduces the probabilistic Minimally Adequate Teacher (pMAT) formulation,\u0000which leverages a probabilistic oracle that could give persistent errors\u0000randomly during answering the membership queries for deterministic finite\u0000automata (DFA) learning. Given the tendency of LLMs to produce hallucinatory\u0000content, we have developed techniques to improve answer accuracy and ensure the\u0000correctness of the learned automata. We propose the $mathtt{Discrimination}$\u0000prompt as well as the $mathtt{Verification}$ prompt and explore their\u0000advantages over common prompts. Additionally, we compare DFA learning\u0000performance between the TTT algorithm and common active learning algorithms. To\u0000address the exponential number of persistent errors, we implement a dynamic\u0000query cache refinement algorithm that identifies and corrects conflicting\u0000queries by combining the active and passive learning algorithms. The empirical\u0000results demonstrate the robustness and efficiency of our approach, providing a\u0000theoretical foundation for automata learning with LLMs in the loop.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work proposes a hierarchical clustering algorithm for high-dimensional datasets using the cyclic space of reversible finite cellular automata. In cellular automaton (CA) based clustering, if two objects belong to the same cycle, they are closely related and considered as part of the same cluster. However, if a high-dimensional dataset is clustered using the cycles of one CA, closely related objects may belong to different cycles. This paper identifies the relationship between objects in two different cycles based on the median of all elements in each cycle so that they can be grouped in the next stage. Further, to minimize the number of intermediate clusters which in turn reduces the computational cost, a rule selection strategy is taken to find the best rules based on information propagation and cycle structure. After encoding the dataset using frequency-based encoding such that the consecutive data elements maintain a minimum hamming distance in encoded form, our proposed clustering algorithm iterates over three stages to finally cluster the data elements into the desired number of clusters given by user. This algorithm can be applied to various fields, including healthcare, sports, chemical research, agriculture, etc. When verified over standard benchmark datasets with various performance metrics, our algorithm is at par with the existing algorithms with quadratic time complexity.
本研究提出了一种利用可逆有限蜂窝自动机循环空间对高维数据集进行分层聚类的算法。在基于细胞自动机(CA)的聚类中,如果两个对象属于同一个循环,那么它们就是密切相关的,并被视为同一个聚类的一部分。然而,如果使用一个 CA 的循环对高维数据集进行聚类,那么密切相关的对象可能属于不同的循环。本文根据每个循环中所有元素的中位数来识别两个不同循环中的对象之间的关系,以便在下一阶段对它们进行分组。此外,为了尽量减少中间聚类的数量,从而降低计算成本,本文采用了一种规则选择策略,根据信息传播和循环结构来寻找最佳规则。在使用基于频率的编码对数据集进行编码,使连续的数据元素在编码形式下保持最小的汉明距离之后,我们提出的聚类算法将经过三个阶段的迭代,最终将数据元素聚类到用户给出的所需数量的聚类中。该算法可应用于医疗保健、体育、化学研究、农业等多个领域。在对标准基准数据集进行各种性能指标验证时,我们的算法与现有算法不相上下,其复杂度为四倍。
{"title":"Hierarchical Clustering using Reversible Binary Cellular Automata for High-Dimensional Data","authors":"Baby C. J., Kamalika Bhattacharjee","doi":"arxiv-2408.02250","DOIUrl":"https://doi.org/arxiv-2408.02250","url":null,"abstract":"This work proposes a hierarchical clustering algorithm for high-dimensional\u0000datasets using the cyclic space of reversible finite cellular automata. In\u0000cellular automaton (CA) based clustering, if two objects belong to the same\u0000cycle, they are closely related and considered as part of the same cluster.\u0000However, if a high-dimensional dataset is clustered using the cycles of one CA,\u0000closely related objects may belong to different cycles. This paper identifies\u0000the relationship between objects in two different cycles based on the median of\u0000all elements in each cycle so that they can be grouped in the next stage.\u0000Further, to minimize the number of intermediate clusters which in turn reduces\u0000the computational cost, a rule selection strategy is taken to find the best\u0000rules based on information propagation and cycle structure. After encoding the\u0000dataset using frequency-based encoding such that the consecutive data elements\u0000maintain a minimum hamming distance in encoded form, our proposed clustering\u0000algorithm iterates over three stages to finally cluster the data elements into\u0000the desired number of clusters given by user. This algorithm can be applied to\u0000various fields, including healthcare, sports, chemical research, agriculture,\u0000etc. When verified over standard benchmark datasets with various performance\u0000metrics, our algorithm is at par with the existing algorithms with quadratic\u0000time complexity.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the following paper, we present a simple method for sampling trees with or without replacement from BCFLs. A BCFL is a context-free language (CFL) corresponding to an incomplete string with holes, which can be completed by valid terminals. To solve this problem, we introduce an algebraic datatype that compactly represents candidate parse forests for porous strings. Once constructed, sampling trees is a straightforward matter of sampling integers uniformly without replacement, then lazily decoding them into trees.
{"title":"A Tree Sampler for Bounded Context-Free Languages","authors":"Breandan Considine","doi":"arxiv-2408.01849","DOIUrl":"https://doi.org/arxiv-2408.01849","url":null,"abstract":"In the following paper, we present a simple method for sampling trees with or\u0000without replacement from BCFLs. A BCFL is a context-free language (CFL)\u0000corresponding to an incomplete string with holes, which can be completed by\u0000valid terminals. To solve this problem, we introduce an algebraic datatype that\u0000compactly represents candidate parse forests for porous strings. Once\u0000constructed, sampling trees is a straightforward matter of sampling integers\u0000uniformly without replacement, then lazily decoding them into trees.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hierarchical conjunctive queries (HCQ) are a subclass of conjunctive queries (CQ) with robust algorithmic properties. Among others, Berkholz, Keppeler, and Schweikardt have shown that HCQ is the subclass of CQ (without projection) that admits dynamic query evaluation with constant update time and constant delay enumeration. On a different but related setting stands Complex Event Recognition (CER), a prominent technology for evaluating sequence patterns over streams. Since one can interpret a data stream as an unbounded sequence of inserts in dynamic query evaluation, it is natural to ask to which extent CER can take advantage of HCQ to find a robust class of queries that can be evaluated efficiently. In this paper, we search to combine HCQ with sequence patterns to find a class of CER queries that can get the best of both worlds. To reach this goal, we propose a class of complex event automata model called Parallelized Complex Event Automata (PCEA) for evaluating CER queries with correlation (i.e., joins) over streams. This model allows us to express sequence patterns and compare values among tuples, but it also allows us to express conjunctions by incorporating a novel form of non-determinism that we call parallelization. We show that for every HCQ (under bag semantics), we can construct an equivalent PCEA. Further, we show that HCQ is the biggest class of acyclic CQ that this automata model can define. Then, PCEA stands as a sweet spot that precisely expresses HCQ (i.e., among acyclic CQ) and extends them with sequence patterns. Finally, we show that PCEA also inherits the good algorithmic properties of HCQ by presenting a streaming evaluation algorithm under sliding windows with logarithmic update time and output-linear delay for the class of PCEA with equality predicates.
{"title":"Complex event recognition meets hierarchical conjunctive queries","authors":"Dante Pinto, Cristian Riveros","doi":"arxiv-2408.01652","DOIUrl":"https://doi.org/arxiv-2408.01652","url":null,"abstract":"Hierarchical conjunctive queries (HCQ) are a subclass of conjunctive queries\u0000(CQ) with robust algorithmic properties. Among others, Berkholz, Keppeler, and\u0000Schweikardt have shown that HCQ is the subclass of CQ (without projection) that\u0000admits dynamic query evaluation with constant update time and constant delay\u0000enumeration. On a different but related setting stands Complex Event\u0000Recognition (CER), a prominent technology for evaluating sequence patterns over\u0000streams. Since one can interpret a data stream as an unbounded sequence of\u0000inserts in dynamic query evaluation, it is natural to ask to which extent CER\u0000can take advantage of HCQ to find a robust class of queries that can be\u0000evaluated efficiently. In this paper, we search to combine HCQ with sequence patterns to find a\u0000class of CER queries that can get the best of both worlds. To reach this goal,\u0000we propose a class of complex event automata model called Parallelized Complex\u0000Event Automata (PCEA) for evaluating CER queries with correlation (i.e., joins)\u0000over streams. This model allows us to express sequence patterns and compare\u0000values among tuples, but it also allows us to express conjunctions by\u0000incorporating a novel form of non-determinism that we call parallelization. We\u0000show that for every HCQ (under bag semantics), we can construct an equivalent\u0000PCEA. Further, we show that HCQ is the biggest class of acyclic CQ that this\u0000automata model can define. Then, PCEA stands as a sweet spot that precisely\u0000expresses HCQ (i.e., among acyclic CQ) and extends them with sequence patterns.\u0000Finally, we show that PCEA also inherits the good algorithmic properties of HCQ\u0000by presenting a streaming evaluation algorithm under sliding windows with\u0000logarithmic update time and output-linear delay for the class of PCEA with\u0000equality predicates.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Regular and context-free languages form a central pillar of formal language theory. This is because a variety of formalisms are known that define these classes of languages. For example, we have that finite automata, monoids, algebraic recognizability, regular expressions, regular grammars, monadic-second order logic, etc., can be used to represent regular word languages. However, the situation is less clear for formal languages over graphs, and open problems persist. This is because generalizing notions from words to graphs has been more successful for some of the cited formalisms than for the other ones. Bruno Courcelle has introduced hyper-edge replacement (hr) algebras for generalizing the notion of context-free languages from words to graphs. At the same time, hr-algebras support the generalization of algebraic recognizability from words to graphs, a notion that has been proven to be equivalent to definability in (counting) monadic-second order logic (cmso) over graphs of bounded tree-width. In this paper, we deal with generalizing regular word grammars to graphs. We propose regular grammars for (unordered and unranked) trees, series-parallel graphs, and graphs of tree-width $le 2$, where the qualifier regular is justified because these grammars define exactly the recognizable resp. cmso-definable subsets of the respective graph classes.
{"title":"Regular Grammars for Graph Sets of Tree-Width $leq2$","authors":"Marius Bozga, Radu Iosif, Florian Zuleger","doi":"arxiv-2408.01226","DOIUrl":"https://doi.org/arxiv-2408.01226","url":null,"abstract":"Regular and context-free languages form a central pillar of formal language\u0000theory. This is because a variety of formalisms are known that define these\u0000classes of languages. For example, we have that finite automata, monoids,\u0000algebraic recognizability, regular expressions, regular grammars,\u0000monadic-second order logic, etc., can be used to represent regular word\u0000languages. However, the situation is less clear for formal languages over\u0000graphs, and open problems persist. This is because generalizing notions from\u0000words to graphs has been more successful for some of the cited formalisms than\u0000for the other ones. Bruno Courcelle has introduced hyper-edge replacement (hr)\u0000algebras for generalizing the notion of context-free languages from words to\u0000graphs. At the same time, hr-algebras support the generalization of algebraic\u0000recognizability from words to graphs, a notion that has been proven to be\u0000equivalent to definability in (counting) monadic-second order logic (cmso)\u0000over graphs of bounded tree-width. In this paper, we deal with generalizing\u0000regular word grammars to graphs. We propose regular grammars for (unordered and\u0000unranked) trees, series-parallel graphs, and graphs of tree-width $le 2$,\u0000where the qualifier regular is justified because these grammars define exactly\u0000the recognizable resp. cmso-definable subsets of the respective graph classes.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ian Erik Varatalu, Margus Veanes, Juhan-Peep Ernits
We present a tool and theory RE# for regular expression matching that is built on symbolic derivatives, does not use backtracking, and, in addition to the classical operators, also supports complement, intersection and lookarounds. We develop the theory formally and show that the main matching algorithm has input-linear complexity both in theory as well as experimentally. We apply thorough evaluation on popular benchmarks that show that RE# is over 71% faster than the next fastest regex engine in Rust on the baseline, and outperforms all state-of-the-art engines on extensions of the benchmarks often by several orders of magnitude.
{"title":"RE#: High Performance Derivative-Based Regex Matching with Intersection, Complement and Lookarounds","authors":"Ian Erik Varatalu, Margus Veanes, Juhan-Peep Ernits","doi":"arxiv-2407.20479","DOIUrl":"https://doi.org/arxiv-2407.20479","url":null,"abstract":"We present a tool and theory RE# for regular expression matching that is\u0000built on symbolic derivatives, does not use backtracking, and, in addition to\u0000the classical operators, also supports complement, intersection and\u0000lookarounds. We develop the theory formally and show that the main matching\u0000algorithm has input-linear complexity both in theory as well as experimentally.\u0000We apply thorough evaluation on popular benchmarks that show that RE# is over\u000071% faster than the next fastest regex engine in Rust on the baseline, and\u0000outperforms all state-of-the-art engines on extensions of the benchmarks often\u0000by several orders of magnitude.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}