用于训练程序的神经代理的复杂性引导数据采样

IF 2.2 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Proceedings of the ACM on Programming Languages Pub Date : 2023-10-16 DOI:10.1145/3622856

Renda, Alex, Ding, Yi, Carbin, Michael

{"title":"用于训练程序的神经代理的复杂性引导数据采样","authors":"Renda, Alex, Ding, Yi, Carbin, Michael","doi":"10.1145/3622856","DOIUrl":null,"url":null,"abstract":"Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program. We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"3 1","pages":"0"},"PeriodicalIF":2.2000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs\",\"authors\":\"Renda, Alex, Ding, Yi, Carbin, Michael\",\"doi\":\"10.1145/3622856\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program. We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.\",\"PeriodicalId\":20697,\"journal\":{\"name\":\"Proceedings of the ACM on Programming Languages\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2023-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM on Programming Languages\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3622856\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3622856","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

程序员和研究人员越来越多地开发程序的替代品，即给定程序的可观察行为子集的模型，以解决各种软件开发挑战。程序员通过在输入示例数据集上对程序行为的测量来训练代理。构建代理的一个关键挑战是确定使用什么训练数据来训练给定程序的代理。我们提出了一种采样数据集的方法来训练基于神经网络的程序代理。我们首先根据学习相应执行路径的代理的复杂性来表征程序输入空间的每个区域(对应于程序的不同执行路径)的样本数据比例。接下来，我们将提供一个程序分析，以确定程序中不同路径的复杂性。我们在一系列现实世界的程序中评估了这些结果，证明了复杂性引导的采样结果在准确性方面的经验改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs

Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program. We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊