预测汇编的程序表示法:20世纪初的现状

IF 1.7 3区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Journal of Computer Languages Pub Date : 2022-12-01 DOI:10.1016/j.cola.2022.101171
Anderson Faustino da Silva , Edson Borin , Fernando Magno Quintão Pereira , Nilton Luiz Queiroz Junior , Otávio Oliveira Napoli
{"title":"预测汇编的程序表示法:20世纪初的现状","authors":"Anderson Faustino da Silva ,&nbsp;Edson Borin ,&nbsp;Fernando Magno Quintão Pereira ,&nbsp;Nilton Luiz Queiroz Junior ,&nbsp;Otávio Oliveira Napoli","doi":"10.1016/j.cola.2022.101171","DOIUrl":null,"url":null,"abstract":"<div><p><span>In the last five years, predictive compilation has advanced with long strides. Contributions in the field include new program embeddings, new learning architectures, and datasets with millions of programs. This paper evaluates 25 state-of-the-art program embeddings, three of them new, plus two learning models from previous work. We have trained this apparatus with three large datasets, and have applied it onto three classification problems. When classifying programs according to the problem that they solve, we reproduced the high-accuracy results seen in previous work. However, we have not been able to repeat these results in the two new classification challenges that we study: namely, determining the depth of the most nested loop in a program and determining the best sequence of optimizations to reduce code size of programs. Negative results emerged, even in spite of the large number of classifiers, 25, that we have evaluated. Surprisingly, using the histogram of instruction opcodes, a very simple program embedding, led to about the same classification accuracy than embeddings like </span><span>Ir2Vec</span> or <span>Inst2Vec</span>, which were designed to solve stochastic compilation tasks.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"73 ","pages":"Article 101171"},"PeriodicalIF":1.7000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Program representations for predictive compilation: State of affairs in the early 20’s\",\"authors\":\"Anderson Faustino da Silva ,&nbsp;Edson Borin ,&nbsp;Fernando Magno Quintão Pereira ,&nbsp;Nilton Luiz Queiroz Junior ,&nbsp;Otávio Oliveira Napoli\",\"doi\":\"10.1016/j.cola.2022.101171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span>In the last five years, predictive compilation has advanced with long strides. Contributions in the field include new program embeddings, new learning architectures, and datasets with millions of programs. This paper evaluates 25 state-of-the-art program embeddings, three of them new, plus two learning models from previous work. We have trained this apparatus with three large datasets, and have applied it onto three classification problems. When classifying programs according to the problem that they solve, we reproduced the high-accuracy results seen in previous work. However, we have not been able to repeat these results in the two new classification challenges that we study: namely, determining the depth of the most nested loop in a program and determining the best sequence of optimizations to reduce code size of programs. Negative results emerged, even in spite of the large number of classifiers, 25, that we have evaluated. Surprisingly, using the histogram of instruction opcodes, a very simple program embedding, led to about the same classification accuracy than embeddings like </span><span>Ir2Vec</span> or <span>Inst2Vec</span>, which were designed to solve stochastic compilation tasks.</p></div>\",\"PeriodicalId\":48552,\"journal\":{\"name\":\"Journal of Computer Languages\",\"volume\":\"73 \",\"pages\":\"Article 101171\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Languages\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590118422000685\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Languages","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590118422000685","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 2

摘要

在过去的五年里,预测汇编取得了长足的进步。该领域的贡献包括新的程序嵌入、新的学习架构以及数百万程序的数据集。本文评估了25个最先进的程序嵌入,其中三个是新的,加上以前工作中的两个学习模型。我们用三个大型数据集训练了这个装置,并将其应用于三个分类问题。当根据程序解决的问题对程序进行分类时,我们复制了以前工作中看到的高精度结果。然而,我们无法在我们研究的两个新的分类挑战中重复这些结果:即,确定程序中嵌套最多的循环的深度,以及确定减少程序代码大小的最佳优化序列。尽管我们评估了大量的分类器(25个),但还是出现了负面结果。令人惊讶的是,使用指令操作码的直方图,一种非常简单的程序嵌入,导致了与Ir2Vec或Inst2Vec等嵌入(设计用于解决随机编译任务)大致相同的分类精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Program representations for predictive compilation: State of affairs in the early 20’s

In the last five years, predictive compilation has advanced with long strides. Contributions in the field include new program embeddings, new learning architectures, and datasets with millions of programs. This paper evaluates 25 state-of-the-art program embeddings, three of them new, plus two learning models from previous work. We have trained this apparatus with three large datasets, and have applied it onto three classification problems. When classifying programs according to the problem that they solve, we reproduced the high-accuracy results seen in previous work. However, we have not been able to repeat these results in the two new classification challenges that we study: namely, determining the depth of the most nested loop in a program and determining the best sequence of optimizations to reduce code size of programs. Negative results emerged, even in spite of the large number of classifiers, 25, that we have evaluated. Surprisingly, using the histogram of instruction opcodes, a very simple program embedding, led to about the same classification accuracy than embeddings like Ir2Vec or Inst2Vec, which were designed to solve stochastic compilation tasks.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Computer Languages
Journal of Computer Languages Computer Science-Computer Networks and Communications
CiteScore
5.00
自引率
13.60%
发文量
36
期刊最新文献
Editorial Board Code histories: Documenting development by recording code influences and changes in code A comprehensive meta-analysis of efficiency and effectiveness in the detection community MTable: Visual query interface for browsing and navigation in NoSQL data stores Mental stress analysis by measuring heart rate variability during learning programming: Comparison of visual- and text-based languages
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1