{"title":"Chain-of-Thought in Neural Code Generation: From and for Lightweight Language Models","authors":"Guang Yang;Yu Zhou;Xiang Chen;Xiangyu Zhang;Terry Yue Zhuo;Taolue Chen","doi":"10.1109/TSE.2024.3440503","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or LLMs with over 100 billion parameters to generate, impeding their applicability in resource-constrained scenarios. In this study, we investigate lightweight Language Models (\n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs), which are defined to have fewer than 10 billion parameters. Empirically, we find that most \n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs cannot generate high-quality CoTs when prompted by the few-shot method, but can take advantage of high-quality CoTs generated elsewhere to improve their performance in code generation. Based on these findings, we design a novel approach \n<monospace>COTTON</monospace>\n which can leverage \n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs to automatically generate CoTs for code generation. We synthesize new datasets and conduct extensive experiments on various benchmarks. The results show that the CoTs generated by \n<monospace>COTTON</monospace>\n outperform the baselines in terms of automated and human evaluation metrics. In particular, the CoTs generated by \n<monospace>COTTON</monospace>\n boost various \n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs to achieve higher performance gains than those generated by LLMs such as ChatGLM (130B), and are competitive with those generated by Gemini and gpt-3.5-turbo. The results also reveal that \n<monospace>COTTON</monospace>\n not only improves the performance of \n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs, but also enhances the performance of LLMs. Our study showcases the potential of \n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs in software engineering applications.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 9","pages":"2437-2457"},"PeriodicalIF":6.5000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10634302/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or LLMs with over 100 billion parameters to generate, impeding their applicability in resource-constrained scenarios. In this study, we investigate lightweight Language Models (
$\ell$
LMs), which are defined to have fewer than 10 billion parameters. Empirically, we find that most
$\ell$
LMs cannot generate high-quality CoTs when prompted by the few-shot method, but can take advantage of high-quality CoTs generated elsewhere to improve their performance in code generation. Based on these findings, we design a novel approach
COTTON
which can leverage
$\ell$
LMs to automatically generate CoTs for code generation. We synthesize new datasets and conduct extensive experiments on various benchmarks. The results show that the CoTs generated by
COTTON
outperform the baselines in terms of automated and human evaluation metrics. In particular, the CoTs generated by
COTTON
boost various
$\ell$
LMs to achieve higher performance gains than those generated by LLMs such as ChatGLM (130B), and are competitive with those generated by Gemini and gpt-3.5-turbo. The results also reveal that
COTTON
not only improves the performance of
$\ell$
LMs, but also enhances the performance of LLMs. Our study showcases the potential of
$\ell$
LMs in software engineering applications.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.