Concrete Type Inference for Code Optimization using Machine Learning with SMT Solving

IF 2.8 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Proceedings of the ACM on Programming Languages Pub Date : 2023-10-16 DOI:10.1145/3622825

Fangke Ye, Jisheng Zhao, Jun Shirako, Vivek Sarkar

{"title":"Concrete Type Inference for Code Optimization using Machine Learning with SMT Solving","authors":"Fangke Ye, Jisheng Zhao, Jun Shirako, Vivek Sarkar","doi":"10.1145/3622825","DOIUrl":null,"url":null,"abstract":"Despite the widespread popularity of dynamically typed languages such as Python, it is well known that they pose significant challenges to code optimization due to the lack of concrete type information. To overcome this limitation, many ahead-of-time optimizing compiler approaches for Python rely on programmers to provide optional type information as a prerequisite for extensive code optimization. Since few programmers provide this information, a large majority of Python applications are executed without the benefit of code optimization, thereby contributing collectively to a significant worldwide wastage of compute and energy resources. In this paper, we introduce a new approach to concrete type inference that is shown to be effective in enabling code optimization for dynamically typed languages, without requiring the programmer to provide any type information. We explore three kinds of type inference algorithms in our approach based on: 1) machine learning models including GPT-4, 2) constraint-based inference based on SMT solving, and 3) a combination of 1) and 2). Our approach then uses the output from type inference to generate multi-version code for a bounded number of concrete type options, while also including a catch-all untyped version for the case when no match is found. The typed versions are then amenable to code optimization. Experimental results show that the combined algorithm in 3) delivers far superior precision and performance than the separate algorithms for 1) and 2). The performance improvement due to type inference, in terms of geometric mean speedup across all benchmarks compared to standard Python, when using 3) is 26.4× with Numba as an AOT optimizing back-end and 62.2× with the Intrepydd optimizing compiler as a back-end. These vast performance improvements can have a significant impact on programmers’ productivity, while also reducing their applications’ use of compute and energy resources.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"6 1","pages":"0"},"PeriodicalIF":2.8000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3622825","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Despite the widespread popularity of dynamically typed languages such as Python, it is well known that they pose significant challenges to code optimization due to the lack of concrete type information. To overcome this limitation, many ahead-of-time optimizing compiler approaches for Python rely on programmers to provide optional type information as a prerequisite for extensive code optimization. Since few programmers provide this information, a large majority of Python applications are executed without the benefit of code optimization, thereby contributing collectively to a significant worldwide wastage of compute and energy resources. In this paper, we introduce a new approach to concrete type inference that is shown to be effective in enabling code optimization for dynamically typed languages, without requiring the programmer to provide any type information. We explore three kinds of type inference algorithms in our approach based on: 1) machine learning models including GPT-4, 2) constraint-based inference based on SMT solving, and 3) a combination of 1) and 2). Our approach then uses the output from type inference to generate multi-version code for a bounded number of concrete type options, while also including a catch-all untyped version for the case when no match is found. The typed versions are then amenable to code optimization. Experimental results show that the combined algorithm in 3) delivers far superior precision and performance than the separate algorithms for 1) and 2). The performance improvement due to type inference, in terms of geometric mean speedup across all benchmarks compared to standard Python, when using 3) is 26.4× with Numba as an AOT optimizing back-end and 62.2× with the Intrepydd optimizing compiler as a back-end. These vast performance improvements can have a significant impact on programmers’ productivity, while also reducing their applications’ use of compute and energy resources.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于SMT求解的机器学习代码优化的具体类型推断

尽管Python等动态类型语言广泛流行，但众所周知，由于缺乏具体的类型信息，它们对代码优化构成了重大挑战。为了克服这一限制，许多Python的提前优化编译器方法依赖于程序员提供可选的类型信息，作为广泛的代码优化的先决条件。由于很少有程序员提供此信息，因此大多数Python应用程序在执行时没有获得代码优化的好处，从而共同造成了全球范围内计算和能源资源的重大浪费。在本文中，我们介绍了一种具体类型推断的新方法，该方法被证明可以有效地实现动态类型语言的代码优化，而不需要程序员提供任何类型信息。我们在我们的方法中探索了三种类型推理算法:1)包括GPT-4在内的机器学习模型，2)基于SMT求解的基于约束的推理，以及3)1)和2)的组合。然后，我们的方法使用类型推理的输出为有限数量的具体类型选项生成多版本代码，同时还包括一个捕获所有无类型的版本，用于没有找到匹配的情况。然后，类型化版本可以进行代码优化。实验结果表明，3)中的组合算法比1)和2)中的单独算法提供了更高的精度和性能。与标准Python相比，在所有基准测试中，类型推断带来的性能提升，在使用Numba作为AOT优化后端时为26.4倍，使用Intrepydd优化编译器作为后端时为62.2倍。这些巨大的性能改进可以对程序员的生产力产生重大影响，同时也减少了应用程序对计算和能源的使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊