{"title":"LLMs as Probabilistic Minimally Adequate Teachers for DFA Learning","authors":"Lekai Chen, Ashutosh Trivedi, Alvaro Velasquez","doi":"arxiv-2408.02999","DOIUrl":null,"url":null,"abstract":"The emergence of intelligence in large language models (LLMs) has inspired\ninvestigations into their integration into automata learning. This paper\nintroduces the probabilistic Minimally Adequate Teacher (pMAT) formulation,\nwhich leverages a probabilistic oracle that could give persistent errors\nrandomly during answering the membership queries for deterministic finite\nautomata (DFA) learning. Given the tendency of LLMs to produce hallucinatory\ncontent, we have developed techniques to improve answer accuracy and ensure the\ncorrectness of the learned automata. We propose the $\\mathtt{Discrimination}$\nprompt as well as the $\\mathtt{Verification}$ prompt and explore their\nadvantages over common prompts. Additionally, we compare DFA learning\nperformance between the TTT algorithm and common active learning algorithms. To\naddress the exponential number of persistent errors, we implement a dynamic\nquery cache refinement algorithm that identifies and corrects conflicting\nqueries by combining the active and passive learning algorithms. The empirical\nresults demonstrate the robustness and efficiency of our approach, providing a\ntheoretical foundation for automata learning with LLMs in the loop.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"62 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Formal Languages and Automata Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.02999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The emergence of intelligence in large language models (LLMs) has inspired
investigations into their integration into automata learning. This paper
introduces the probabilistic Minimally Adequate Teacher (pMAT) formulation,
which leverages a probabilistic oracle that could give persistent errors
randomly during answering the membership queries for deterministic finite
automata (DFA) learning. Given the tendency of LLMs to produce hallucinatory
content, we have developed techniques to improve answer accuracy and ensure the
correctness of the learned automata. We propose the $\mathtt{Discrimination}$
prompt as well as the $\mathtt{Verification}$ prompt and explore their
advantages over common prompts. Additionally, we compare DFA learning
performance between the TTT algorithm and common active learning algorithms. To
address the exponential number of persistent errors, we implement a dynamic
query cache refinement algorithm that identifies and corrects conflicting
queries by combining the active and passive learning algorithms. The empirical
results demonstrate the robustness and efficiency of our approach, providing a
theoretical foundation for automata learning with LLMs in the loop.