{"title":"FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving","authors":"Xiaohan Lin, Qingxing Cao, Yinya Huang, Haiming Wang, Jianqiao Lu, Zhengying Liu, Linqi Song, Xiaodan Liang","doi":"arxiv-2406.14408","DOIUrl":null,"url":null,"abstract":"Formal verification (FV) has witnessed growing significance with current\nemerging program synthesis by the evolving large language models (LLMs).\nHowever, current formal verification mainly resorts to symbolic verifiers or\nhand-craft rules, resulting in limitations for extensive and flexible\nverification. On the other hand, formal languages for automated theorem\nproving, such as Isabelle, as another line of rigorous verification, are\nmaintained with comprehensive rules and theorems. In this paper, we propose\nFVEL, an interactive Formal Verification Environment with LLMs. Specifically,\nFVEL transforms a given code to be verified into Isabelle, and then conducts\nverification via neural automated theorem proving with an LLM. The joined\nparadigm leverages the rigorous yet abundant formulated and organized rules in\nIsabelle and is also convenient for introducing and adjusting cutting-edge\nLLMs. To achieve this goal, we extract a large-scale FVELER3. The FVELER\ndataset includes code dependencies and verification processes that are\nformulated in Isabelle, containing 758 theories, 29,125 lemmas, and 200,646\nproof steps in total with in-depth dependencies. We benchmark FVELER in the\nFVEL environment by first fine-tuning LLMs with FVELER and then evaluating them\non Code2Inv and SV-COMP. The results show that FVEL with FVELER fine-tuned\nLlama3- 8B solves 17.39% (69 -> 81) more problems, and Mistral-7B 12% (75 ->\n84) more problems in SV-COMP. And the proportion of proof errors is reduced.\nProject page: https://fveler.github.io/.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"76 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.14408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Formal verification (FV) has witnessed growing significance with current
emerging program synthesis by the evolving large language models (LLMs).
However, current formal verification mainly resorts to symbolic verifiers or
hand-craft rules, resulting in limitations for extensive and flexible
verification. On the other hand, formal languages for automated theorem
proving, such as Isabelle, as another line of rigorous verification, are
maintained with comprehensive rules and theorems. In this paper, we propose
FVEL, an interactive Formal Verification Environment with LLMs. Specifically,
FVEL transforms a given code to be verified into Isabelle, and then conducts
verification via neural automated theorem proving with an LLM. The joined
paradigm leverages the rigorous yet abundant formulated and organized rules in
Isabelle and is also convenient for introducing and adjusting cutting-edge
LLMs. To achieve this goal, we extract a large-scale FVELER3. The FVELER
dataset includes code dependencies and verification processes that are
formulated in Isabelle, containing 758 theories, 29,125 lemmas, and 200,646
proof steps in total with in-depth dependencies. We benchmark FVELER in the
FVEL environment by first fine-tuning LLMs with FVELER and then evaluating them
on Code2Inv and SV-COMP. The results show that FVEL with FVELER fine-tuned
Llama3- 8B solves 17.39% (69 -> 81) more problems, and Mistral-7B 12% (75 ->
84) more problems in SV-COMP. And the proportion of proof errors is reduced.
Project page: https://fveler.github.io/.