Chain-of-Thought in Neural Code Generation: From and for Lightweight Language Models

IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-08-12 DOI:10.1109/TSE.2024.3440503
Guang Yang;Yu Zhou;Xiang Chen;Xiangyu Zhang;Terry Yue Zhuo;Taolue Chen
{"title":"Chain-of-Thought in Neural Code Generation: From and for Lightweight Language Models","authors":"Guang Yang;Yu Zhou;Xiang Chen;Xiangyu Zhang;Terry Yue Zhuo;Taolue Chen","doi":"10.1109/TSE.2024.3440503","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or LLMs with over 100 billion parameters to generate, impeding their applicability in resource-constrained scenarios. In this study, we investigate lightweight Language Models (\n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs), which are defined to have fewer than 10 billion parameters. Empirically, we find that most \n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs cannot generate high-quality CoTs when prompted by the few-shot method, but can take advantage of high-quality CoTs generated elsewhere to improve their performance in code generation. Based on these findings, we design a novel approach \n<monospace>COTTON</monospace>\n which can leverage \n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs to automatically generate CoTs for code generation. We synthesize new datasets and conduct extensive experiments on various benchmarks. The results show that the CoTs generated by \n<monospace>COTTON</monospace>\n outperform the baselines in terms of automated and human evaluation metrics. In particular, the CoTs generated by \n<monospace>COTTON</monospace>\n boost various \n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs to achieve higher performance gains than those generated by LLMs such as ChatGLM (130B), and are competitive with those generated by Gemini and gpt-3.5-turbo. The results also reveal that \n<monospace>COTTON</monospace>\n not only improves the performance of \n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs, but also enhances the performance of LLMs. Our study showcases the potential of \n<inline-formula><tex-math>$\\ell$</tex-math></inline-formula>\nLMs in software engineering applications.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 9","pages":"2437-2457"},"PeriodicalIF":6.5000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10634302/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or LLMs with over 100 billion parameters to generate, impeding their applicability in resource-constrained scenarios. In this study, we investigate lightweight Language Models ( $\ell$ LMs), which are defined to have fewer than 10 billion parameters. Empirically, we find that most $\ell$ LMs cannot generate high-quality CoTs when prompted by the few-shot method, but can take advantage of high-quality CoTs generated elsewhere to improve their performance in code generation. Based on these findings, we design a novel approach COTTON which can leverage $\ell$ LMs to automatically generate CoTs for code generation. We synthesize new datasets and conduct extensive experiments on various benchmarks. The results show that the CoTs generated by COTTON outperform the baselines in terms of automated and human evaluation metrics. In particular, the CoTs generated by COTTON boost various $\ell$ LMs to achieve higher performance gains than those generated by LLMs such as ChatGLM (130B), and are competitive with those generated by Gemini and gpt-3.5-turbo. The results also reveal that COTTON not only improves the performance of $\ell$ LMs, but also enhances the performance of LLMs. Our study showcases the potential of $\ell$ LMs in software engineering applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
神经代码生成中的思维链:从轻量级语言模型到轻量级语言模型
大型语言模型(LLM)在代码生成方面已显示出巨大的潜力。整合思维链(CoT)推理可进一步提高其性能。然而,当前的 CoT 方法通常需要人工编写或生成参数超过 1000 亿的 LLM,这阻碍了它们在资源受限场景中的适用性。在本研究中,我们研究了轻量级语言模型($\ell$LMs),其定义为参数少于 100 亿。通过实证,我们发现大多数 $\ell$LMs 在使用 few-shot 方法时无法生成高质量的 CoTs,但可以利用其他地方生成的高质量 CoTs 来提高代码生成的性能。基于这些发现,我们设计了一种新方法 COTTON,它可以利用 $\ell$LMs 自动生成用于代码生成的 CoTs。我们合成了新的数据集,并在各种基准上进行了广泛的实验。结果表明,COTTON 生成的 CoT 在自动和人工评估指标方面都优于基准。特别是,COTTON生成的CoT促进各种$/ell$LM实现了更高的性能提升,超过了ChatGLM(130B)等LLM生成的CoT,与Gemini和gpt-3.5-turbo生成的CoT相比具有竞争力。研究结果还表明,COTTON 不仅能提高 $\ell$LMs 的性能,还能增强 LLMs 的性能。我们的研究展示了 $\ell$LMs 在软件工程应用中的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering 工程技术-工程:电子与电气
CiteScore
9.70
自引率
10.80%
发文量
724
审稿时长
6 months
期刊介绍: IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.
期刊最新文献
Line-Level Defect Prediction by Capturing Code Contexts with Graph Convolutional Networks Does Treatment Adherence Impact Experiment Results in TDD? Scoping Software Engineering for AI: The TSE Perspective A context-aware clustering approach for assisting operators in classifying security alerts StagedVulBERT: Multi-Granular Vulnerability Detection with a Novel Pre-trained Code Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1