预测汇编的程序表示法：20世纪初的现状

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Journal of Computer Languages Pub Date : 2022-12-01 DOI:10.1016/j.cola.2022.101171

Anderson Faustino da Silva , Edson Borin , Fernando Magno Quintão Pereira , Nilton Luiz Queiroz Junior , Otávio Oliveira Napoli

{"title":"预测汇编的程序表示法：20世纪初的现状","authors":"Anderson Faustino da Silva , Edson Borin , Fernando Magno Quintão Pereira , Nilton Luiz Queiroz Junior , Otávio Oliveira Napoli","doi":"10.1016/j.cola.2022.101171","DOIUrl":null,"url":null,"abstract":"<div><p><span>In the last five years, predictive compilation has advanced with long strides. Contributions in the field include new program embeddings, new learning architectures, and datasets with millions of programs. This paper evaluates 25 state-of-the-art program embeddings, three of them new, plus two learning models from previous work. We have trained this apparatus with three large datasets, and have applied it onto three classification problems. When classifying programs according to the problem that they solve, we reproduced the high-accuracy results seen in previous work. However, we have not been able to repeat these results in the two new classification challenges that we study: namely, determining the depth of the most nested loop in a program and determining the best sequence of optimizations to reduce code size of programs. Negative results emerged, even in spite of the large number of classifiers, 25, that we have evaluated. Surprisingly, using the histogram of instruction opcodes, a very simple program embedding, led to about the same classification accuracy than embeddings like </span><span>Ir2Vec</span> or <span>Inst2Vec</span>, which were designed to solve stochastic compilation tasks.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"73 ","pages":"Article 101171"},"PeriodicalIF":1.7000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Program representations for predictive compilation: State of affairs in the early 20’s\",\"authors\":\"Anderson Faustino da Silva , Edson Borin , Fernando Magno Quintão Pereira , Nilton Luiz Queiroz Junior , Otávio Oliveira Napoli\",\"doi\":\"10.1016/j.cola.2022.101171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span>In the last five years, predictive compilation has advanced with long strides. Contributions in the field include new program embeddings, new learning architectures, and datasets with millions of programs. This paper evaluates 25 state-of-the-art program embeddings, three of them new, plus two learning models from previous work. We have trained this apparatus with three large datasets, and have applied it onto three classification problems. When classifying programs according to the problem that they solve, we reproduced the high-accuracy results seen in previous work. However, we have not been able to repeat these results in the two new classification challenges that we study: namely, determining the depth of the most nested loop in a program and determining the best sequence of optimizations to reduce code size of programs. Negative results emerged, even in spite of the large number of classifiers, 25, that we have evaluated. Surprisingly, using the histogram of instruction opcodes, a very simple program embedding, led to about the same classification accuracy than embeddings like </span><span>Ir2Vec</span> or <span>Inst2Vec</span>, which were designed to solve stochastic compilation tasks.</p></div>\",\"PeriodicalId\":48552,\"journal\":{\"name\":\"Journal of Computer Languages\",\"volume\":\"73 \",\"pages\":\"Article 101171\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Languages\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590118422000685\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Languages","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590118422000685","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 2

摘要

在过去的五年里，预测汇编取得了长足的进步。该领域的贡献包括新的程序嵌入、新的学习架构以及数百万程序的数据集。本文评估了25个最先进的程序嵌入，其中三个是新的，加上以前工作中的两个学习模型。我们用三个大型数据集训练了这个装置，并将其应用于三个分类问题。当根据程序解决的问题对程序进行分类时，我们复制了以前工作中看到的高精度结果。然而，我们无法在我们研究的两个新的分类挑战中重复这些结果：即，确定程序中嵌套最多的循环的深度，以及确定减少程序代码大小的最佳优化序列。尽管我们评估了大量的分类器（25个），但还是出现了负面结果。令人惊讶的是，使用指令操作码的直方图，一种非常简单的程序嵌入，导致了与Ir2Vec或Inst2Vec等嵌入（设计用于解决随机编译任务）大致相同的分类精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Program representations for predictive compilation: State of affairs in the early 20’s

In the last five years, predictive compilation has advanced with long strides. Contributions in the field include new program embeddings, new learning architectures, and datasets with millions of programs. This paper evaluates 25 state-of-the-art program embeddings, three of them new, plus two learning models from previous work. We have trained this apparatus with three large datasets, and have applied it onto three classification problems. When classifying programs according to the problem that they solve, we reproduced the high-accuracy results seen in previous work. However, we have not been able to repeat these results in the two new classification challenges that we study: namely, determining the depth of the most nested loop in a program and determining the best sequence of optimizations to reduce code size of programs. Negative results emerged, even in spite of the large number of classifiers, 25, that we have evaluated. Surprisingly, using the histogram of instruction opcodes, a very simple program embedding, led to about the same classification accuracy than embeddings like Ir2Vec or Inst2Vec, which were designed to solve stochastic compilation tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Computer Languages Computer Science-Computer Networks and Communications

CiteScore

5.00

自引率

13.60%

发文量