{"title":"A Comparison of Large Language Models and Genetic Programming for Program Synthesis","authors":"Dominik Sobania;Justyna Petke;Martin Briesch;Franz Rothlauf","doi":"10.1109/TEVC.2024.3410873","DOIUrl":null,"url":null,"abstract":"Large language models have recently become known for their ability to generate computer programs, especially through tools, such as GitHub Copilot, a domain where genetic programming (GP) has been very successful so far. Although they require different inputs (free-text versus input/output examples) their goal is the same—program synthesis. Therefore, in this work, we compare how well GitHub Copilot and GP perform on common program synthesis benchmark problems. We study the structure and diversity of the generated programs by using well-known software metrics. We find that GitHub Copilot and GP solve a similar number of benchmark problems (85.2% versus 77.8%, respectively). We find that GitHub Copilot generated smaller and less complex programs as GP, while GP is able to find new and unique problem solving strategies. This increase in diversity of solutions comes at a cost. When analyzing the success rates for 100 runs per problem, GitHub Copilot outperforms GP on over 50% of the problems.","PeriodicalId":13206,"journal":{"name":"IEEE Transactions on Evolutionary Computation","volume":"29 4","pages":"1434-1448"},"PeriodicalIF":11.7000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10551744/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Large language models have recently become known for their ability to generate computer programs, especially through tools, such as GitHub Copilot, a domain where genetic programming (GP) has been very successful so far. Although they require different inputs (free-text versus input/output examples) their goal is the same—program synthesis. Therefore, in this work, we compare how well GitHub Copilot and GP perform on common program synthesis benchmark problems. We study the structure and diversity of the generated programs by using well-known software metrics. We find that GitHub Copilot and GP solve a similar number of benchmark problems (85.2% versus 77.8%, respectively). We find that GitHub Copilot generated smaller and less complex programs as GP, while GP is able to find new and unique problem solving strategies. This increase in diversity of solutions comes at a cost. When analyzing the success rates for 100 runs per problem, GitHub Copilot outperforms GP on over 50% of the problems.
期刊介绍:
The IEEE Transactions on Evolutionary Computation is published by the IEEE Computational Intelligence Society on behalf of 13 societies: Circuits and Systems; Computer; Control Systems; Engineering in Medicine and Biology; Industrial Electronics; Industry Applications; Lasers and Electro-Optics; Oceanic Engineering; Power Engineering; Robotics and Automation; Signal Processing; Social Implications of Technology; and Systems, Man, and Cybernetics. The journal publishes original papers in evolutionary computation and related areas such as nature-inspired algorithms, population-based methods, optimization, and hybrid systems. It welcomes both purely theoretical papers and application papers that provide general insights into these areas of computation.