{"title":"TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts","authors":"Ruida Wang, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, Tong Zhang","doi":"arxiv-2407.03203","DOIUrl":null,"url":null,"abstract":"Proving mathematical theorems using computer-verifiable formal languages like\nLean significantly impacts mathematical reasoning. One approach to formal\ntheorem proving involves generating complete proofs using Large Language Models\n(LLMs) based on Natural Language (NL) proofs. Similar methods have shown\npromising results in code generation. However, most modern LLMs exhibit\nsuboptimal performance due to the scarcity of aligned NL and Formal Language\n(FL) theorem-proving data. This scarcity results in a paucity of methodologies\nfor training LLMs and techniques to fully utilize their capabilities in\ncomposing formal proofs. To address the challenges, this paper proposes\n**TheoremLlama**, an end-to-end framework to train a general-purpose LLM to\nbecome a Lean4 expert. This framework encompasses NL-FL aligned dataset\ngeneration methods, training approaches for the LLM formal theorem prover, and\ntechniques for LLM Lean4 proof writing. Using the dataset generation method, we\nprovide *Open Bootstrapped Theorems* (OBT), an NL-FL aligned and bootstrapped\ndataset. A key innovation in this framework is the NL-FL bootstrapping method,\nwhere NL proofs are integrated into Lean4 code for training datasets,\nleveraging the NL reasoning ability of LLMs for formal reasoning. The\n**TheoremLlama** framework achieves cumulative accuracies of 36.48% and 33.61%\non MiniF2F-Valid and Test datasets respectively, surpassing the GPT-4 baseline\nof 22.95% and 25.41%. We have also open-sourced our model checkpoints and\ngenerated dataset, and will soon make all the code publicly available.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Formal Languages and Automata Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.03203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Proving mathematical theorems using computer-verifiable formal languages like
Lean significantly impacts mathematical reasoning. One approach to formal
theorem proving involves generating complete proofs using Large Language Models
(LLMs) based on Natural Language (NL) proofs. Similar methods have shown
promising results in code generation. However, most modern LLMs exhibit
suboptimal performance due to the scarcity of aligned NL and Formal Language
(FL) theorem-proving data. This scarcity results in a paucity of methodologies
for training LLMs and techniques to fully utilize their capabilities in
composing formal proofs. To address the challenges, this paper proposes
**TheoremLlama**, an end-to-end framework to train a general-purpose LLM to
become a Lean4 expert. This framework encompasses NL-FL aligned dataset
generation methods, training approaches for the LLM formal theorem prover, and
techniques for LLM Lean4 proof writing. Using the dataset generation method, we
provide *Open Bootstrapped Theorems* (OBT), an NL-FL aligned and bootstrapped
dataset. A key innovation in this framework is the NL-FL bootstrapping method,
where NL proofs are integrated into Lean4 code for training datasets,
leveraging the NL reasoning ability of LLMs for formal reasoning. The
**TheoremLlama** framework achieves cumulative accuracies of 36.48% and 33.61%
on MiniF2F-Valid and Test datasets respectively, surpassing the GPT-4 baseline
of 22.95% and 25.41%. We have also open-sourced our model checkpoints and
generated dataset, and will soon make all the code publicly available.