Anderson Faustino da Silva , Edson Borin , Fernando Magno Quintão Pereira , Nilton Luiz Queiroz Junior , Otávio Oliveira Napoli
{"title":"预测汇编的程序表示法:20世纪初的现状","authors":"Anderson Faustino da Silva , Edson Borin , Fernando Magno Quintão Pereira , Nilton Luiz Queiroz Junior , Otávio Oliveira Napoli","doi":"10.1016/j.cola.2022.101171","DOIUrl":null,"url":null,"abstract":"<div><p><span>In the last five years, predictive compilation has advanced with long strides. Contributions in the field include new program embeddings, new learning architectures, and datasets with millions of programs. This paper evaluates 25 state-of-the-art program embeddings, three of them new, plus two learning models from previous work. We have trained this apparatus with three large datasets, and have applied it onto three classification problems. When classifying programs according to the problem that they solve, we reproduced the high-accuracy results seen in previous work. However, we have not been able to repeat these results in the two new classification challenges that we study: namely, determining the depth of the most nested loop in a program and determining the best sequence of optimizations to reduce code size of programs. Negative results emerged, even in spite of the large number of classifiers, 25, that we have evaluated. Surprisingly, using the histogram of instruction opcodes, a very simple program embedding, led to about the same classification accuracy than embeddings like </span><span>Ir2Vec</span> or <span>Inst2Vec</span>, which were designed to solve stochastic compilation tasks.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"73 ","pages":"Article 101171"},"PeriodicalIF":1.7000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Program representations for predictive compilation: State of affairs in the early 20’s\",\"authors\":\"Anderson Faustino da Silva , Edson Borin , Fernando Magno Quintão Pereira , Nilton Luiz Queiroz Junior , Otávio Oliveira Napoli\",\"doi\":\"10.1016/j.cola.2022.101171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span>In the last five years, predictive compilation has advanced with long strides. Contributions in the field include new program embeddings, new learning architectures, and datasets with millions of programs. This paper evaluates 25 state-of-the-art program embeddings, three of them new, plus two learning models from previous work. We have trained this apparatus with three large datasets, and have applied it onto three classification problems. When classifying programs according to the problem that they solve, we reproduced the high-accuracy results seen in previous work. However, we have not been able to repeat these results in the two new classification challenges that we study: namely, determining the depth of the most nested loop in a program and determining the best sequence of optimizations to reduce code size of programs. Negative results emerged, even in spite of the large number of classifiers, 25, that we have evaluated. Surprisingly, using the histogram of instruction opcodes, a very simple program embedding, led to about the same classification accuracy than embeddings like </span><span>Ir2Vec</span> or <span>Inst2Vec</span>, which were designed to solve stochastic compilation tasks.</p></div>\",\"PeriodicalId\":48552,\"journal\":{\"name\":\"Journal of Computer Languages\",\"volume\":\"73 \",\"pages\":\"Article 101171\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Languages\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590118422000685\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Languages","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590118422000685","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Program representations for predictive compilation: State of affairs in the early 20’s
In the last five years, predictive compilation has advanced with long strides. Contributions in the field include new program embeddings, new learning architectures, and datasets with millions of programs. This paper evaluates 25 state-of-the-art program embeddings, three of them new, plus two learning models from previous work. We have trained this apparatus with three large datasets, and have applied it onto three classification problems. When classifying programs according to the problem that they solve, we reproduced the high-accuracy results seen in previous work. However, we have not been able to repeat these results in the two new classification challenges that we study: namely, determining the depth of the most nested loop in a program and determining the best sequence of optimizations to reduce code size of programs. Negative results emerged, even in spite of the large number of classifiers, 25, that we have evaluated. Surprisingly, using the histogram of instruction opcodes, a very simple program embedding, led to about the same classification accuracy than embeddings like Ir2Vec or Inst2Vec, which were designed to solve stochastic compilation tasks.