Concrete Type Inference for Code Optimization using Machine Learning with SMT Solving

IF 2.2 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Proceedings of the ACM on Programming Languages Pub Date : 2023-10-16 DOI:10.1145/3622825
Fangke Ye, Jisheng Zhao, Jun Shirako, Vivek Sarkar
{"title":"Concrete Type Inference for Code Optimization using Machine Learning with SMT Solving","authors":"Fangke Ye, Jisheng Zhao, Jun Shirako, Vivek Sarkar","doi":"10.1145/3622825","DOIUrl":null,"url":null,"abstract":"Despite the widespread popularity of dynamically typed languages such as Python, it is well known that they pose significant challenges to code optimization due to the lack of concrete type information. To overcome this limitation, many ahead-of-time optimizing compiler approaches for Python rely on programmers to provide optional type information as a prerequisite for extensive code optimization. Since few programmers provide this information, a large majority of Python applications are executed without the benefit of code optimization, thereby contributing collectively to a significant worldwide wastage of compute and energy resources. In this paper, we introduce a new approach to concrete type inference that is shown to be effective in enabling code optimization for dynamically typed languages, without requiring the programmer to provide any type information. We explore three kinds of type inference algorithms in our approach based on: 1) machine learning models including GPT-4, 2) constraint-based inference based on SMT solving, and 3) a combination of 1) and 2). Our approach then uses the output from type inference to generate multi-version code for a bounded number of concrete type options, while also including a catch-all untyped version for the case when no match is found. The typed versions are then amenable to code optimization. Experimental results show that the combined algorithm in 3) delivers far superior precision and performance than the separate algorithms for 1) and 2). The performance improvement due to type inference, in terms of geometric mean speedup across all benchmarks compared to standard Python, when using 3) is 26.4× with Numba as an AOT optimizing back-end and 62.2× with the Intrepydd optimizing compiler as a back-end. These vast performance improvements can have a significant impact on programmers’ productivity, while also reducing their applications’ use of compute and energy resources.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3622825","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Despite the widespread popularity of dynamically typed languages such as Python, it is well known that they pose significant challenges to code optimization due to the lack of concrete type information. To overcome this limitation, many ahead-of-time optimizing compiler approaches for Python rely on programmers to provide optional type information as a prerequisite for extensive code optimization. Since few programmers provide this information, a large majority of Python applications are executed without the benefit of code optimization, thereby contributing collectively to a significant worldwide wastage of compute and energy resources. In this paper, we introduce a new approach to concrete type inference that is shown to be effective in enabling code optimization for dynamically typed languages, without requiring the programmer to provide any type information. We explore three kinds of type inference algorithms in our approach based on: 1) machine learning models including GPT-4, 2) constraint-based inference based on SMT solving, and 3) a combination of 1) and 2). Our approach then uses the output from type inference to generate multi-version code for a bounded number of concrete type options, while also including a catch-all untyped version for the case when no match is found. The typed versions are then amenable to code optimization. Experimental results show that the combined algorithm in 3) delivers far superior precision and performance than the separate algorithms for 1) and 2). The performance improvement due to type inference, in terms of geometric mean speedup across all benchmarks compared to standard Python, when using 3) is 26.4× with Numba as an AOT optimizing back-end and 62.2× with the Intrepydd optimizing compiler as a back-end. These vast performance improvements can have a significant impact on programmers’ productivity, while also reducing their applications’ use of compute and energy resources.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于SMT求解的机器学习代码优化的具体类型推断
尽管Python等动态类型语言广泛流行,但众所周知,由于缺乏具体的类型信息,它们对代码优化构成了重大挑战。为了克服这一限制,许多Python的提前优化编译器方法依赖于程序员提供可选的类型信息,作为广泛的代码优化的先决条件。由于很少有程序员提供此信息,因此大多数Python应用程序在执行时没有获得代码优化的好处,从而共同造成了全球范围内计算和能源资源的重大浪费。在本文中,我们介绍了一种具体类型推断的新方法,该方法被证明可以有效地实现动态类型语言的代码优化,而不需要程序员提供任何类型信息。我们在我们的方法中探索了三种类型推理算法:1)包括GPT-4在内的机器学习模型,2)基于SMT求解的基于约束的推理,以及3)1)和2)的组合。然后,我们的方法使用类型推理的输出为有限数量的具体类型选项生成多版本代码,同时还包括一个捕获所有无类型的版本,用于没有找到匹配的情况。然后,类型化版本可以进行代码优化。实验结果表明,3)中的组合算法比1)和2)中的单独算法提供了更高的精度和性能。与标准Python相比,在所有基准测试中,类型推断带来的性能提升,在使用Numba作为AOT优化后端时为26.4倍,使用Intrepydd优化编译器作为后端时为62.2倍。这些巨大的性能改进可以对程序员的生产力产生重大影响,同时也减少了应用程序对计算和能源的使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages Engineering-Safety, Risk, Reliability and Quality
CiteScore
5.20
自引率
22.20%
发文量
192
期刊最新文献
ReLU Hull Approximation An Axiomatic Basis for Computer Programming on the Relaxed Arm-A Architecture: The AxSL Logic The Essence of Generalized Algebraic Data Types Explicit Effects and Effect Constraints in ReML Indexed Types for a Statically Safe WebAssembly
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1