Chain-of-Thought in Neural Code Generation: From and for Lightweight Language Models

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-08-12 DOI:10.1109/TSE.2024.3440503

Guang Yang;Yu Zhou;Xiang Chen;Xiangyu Zhang;Terry Yue Zhuo;Taolue Chen

{"title":"Chain-of-Thought in Neural Code Generation: From and for Lightweight Language Models","authors":"Guang Yang;Yu Zhou;Xiang Chen;Xiangyu Zhang;Terry Yue Zhuo;Taolue Chen","doi":"10.1109/TSE.2024.3440503","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or LLMs with over 100 billion parameters to generate, impeding their applicability in resource-constrained scenarios. In this study, we investigate lightweight Language Models (\n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs), which are defined to have fewer than 10 billion parameters. Empirically, we find that most \n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs cannot generate high-quality CoTs when prompted by the few-shot method, but can take advantage of high-quality CoTs generated elsewhere to improve their performance in code generation. Based on these findings, we design a novel approach \n<monospace>COTTON</monospace>\n which can leverage \n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs to automatically generate CoTs for code generation. We synthesize new datasets and conduct extensive experiments on various benchmarks. The results show that the CoTs generated by \n<monospace>COTTON</monospace>\n outperform the baselines in terms of automated and human evaluation metrics. In particular, the CoTs generated by \n<monospace>COTTON</monospace>\n boost various \n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs to achieve higher performance gains than those generated by LLMs such as ChatGLM (130B), and are competitive with those generated by Gemini and gpt-3.5-turbo. The results also reveal that \n<monospace>COTTON</monospace>\n not only improves the performance of \n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs, but also enhances the performance of LLMs. Our study showcases the potential of \n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs in software engineering applications.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 9","pages":"2437-2457"},"PeriodicalIF":6.5000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10634302/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or LLMs with over 100 billion parameters to generate, impeding their applicability in resource-constrained scenarios. In this study, we investigate lightweight Language Models (

$\ell$

LMs), which are defined to have fewer than 10 billion parameters. Empirically, we find that most

$\ell$

LMs cannot generate high-quality CoTs when prompted by the few-shot method, but can take advantage of high-quality CoTs generated elsewhere to improve their performance in code generation. Based on these findings, we design a novel approach COTTON which can leverage

$\ell$

LMs to automatically generate CoTs for code generation. We synthesize new datasets and conduct extensive experiments on various benchmarks. The results show that the CoTs generated by COTTON outperform the baselines in terms of automated and human evaluation metrics. In particular, the CoTs generated by COTTON boost various

$\ell$

LMs to achieve higher performance gains than those generated by LLMs such as ChatGLM (130B), and are competitive with those generated by Gemini and gpt-3.5-turbo. The results also reveal that COTTON not only improves the performance of

$\ell$

LMs, but also enhances the performance of LLMs. Our study showcases the potential of

$\ell$

LMs in software engineering applications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

神经代码生成中的思维链：从轻量级语言模型到轻量级语言模型

大型语言模型（LLM）在代码生成方面已显示出巨大的潜力。整合思维链（CoT）推理可进一步提高其性能。然而，当前的 CoT 方法通常需要人工编写或生成参数超过 1000 亿的 LLM，这阻碍了它们在资源受限场景中的适用性。在本研究中，我们研究了轻量级语言模型（$\ell$LMs），其定义为参数少于 100 亿。通过实证，我们发现大多数 $\ell$LMs 在使用 few-shot 方法时无法生成高质量的 CoTs，但可以利用其他地方生成的高质量 CoTs 来提高代码生成的性能。基于这些发现，我们设计了一种新方法 COTTON，它可以利用 $\ell$LMs 自动生成用于代码生成的 CoTs。我们合成了新的数据集，并在各种基准上进行了广泛的实验。结果表明，COTTON 生成的 CoT 在自动和人工评估指标方面都优于基准。特别是，COTTON生成的CoT促进各种$/ell$LM实现了更高的性能提升，超过了ChatGLM（130B）等LLM生成的CoT，与Gemini和gpt-3.5-turbo生成的CoT相比具有竞争力。研究结果还表明，COTTON 不仅能提高 $\ell$LMs 的性能，还能增强 LLMs 的性能。我们的研究展示了 $\ell$LMs 在软件工程应用中的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.