Ali-Reza Adl-Tabatabai, Michal Cierniak, Guei-Yuan Lueh, Vishesh M. Parikh, J. Stichnoth
A "Just-In-Time" (JIT) Java compiler produces native code from Java byte code instructions during program execution. As such, compilation speed is more important in a Java JIT compiler than in a traditional compiler, requiring optimization algorithms to be lightweight and effective. We present the structure of a Java JIT compiler for the Intel Architecture, describe the lightweight implementation of JIT compiler optimizations (e.g., common subexpression elimination, register allocation, and elimination of array bounds checking), and evaluate the performance benefits and tradeoffs of the optimizations. This JIT compiler has been shipped with version 2.5 of Intel's VTune for Java product.
“即时”(JIT) Java编译器在程序执行期间从Java字节码指令生成本机代码。因此,在Java JIT编译器中,编译速度比在传统编译器中更为重要,这就要求优化算法是轻量级和有效的。我们介绍了一个用于Intel架构的Java JIT编译器的结构,描述了JIT编译器优化的轻量级实现(例如,消除公共子表达式、寄存器分配和消除数组边界检查),并评估了优化的性能优势和权衡。这个JIT编译器已经随Intel的VTune for Java产品的2.5版一起发布了。
{"title":"Fast, effective code generation in a just-in-time Java compiler","authors":"Ali-Reza Adl-Tabatabai, Michal Cierniak, Guei-Yuan Lueh, Vishesh M. Parikh, J. Stichnoth","doi":"10.1145/277650.277740","DOIUrl":"https://doi.org/10.1145/277650.277740","url":null,"abstract":"A \"Just-In-Time\" (JIT) Java compiler produces native code from Java byte code instructions during program execution. As such, compilation speed is more important in a Java JIT compiler than in a traditional compiler, requiring optimization algorithms to be lightweight and effective. We present the structure of a Java JIT compiler for the Intel Architecture, describe the lightweight implementation of JIT compiler optimizations (e.g., common subexpression elimination, register allocation, and elimination of array bounds checking), and evaluate the performance benefits and tradeoffs of the optimizations. This JIT compiler has been shipped with version 2.5 of Intel's VTune for Java product.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114314412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A pointer-analysis algorithm can be either flow-sensitive or flow-insensitive. While flow-sensitive analysis usually provides more precise information, it is also usually considerably more costly in terms of time and space. The main contribution of this paper is the presentation of another option in the form of an algorithm that can be 'tuned' to provide a range of results that fall between the results of flow-insensitive and flow-sensitive analysis. The algorithm combines a flow-insensitive pointer analysis with static single assignment (SSA) form and uses an iterative process to obtain progressively better results.
{"title":"Using static single assignment form to improve flow-insensitive pointer analysis","authors":"Rebecca Hasti, Susan Horwitz","doi":"10.1145/277650.277668","DOIUrl":"https://doi.org/10.1145/277650.277668","url":null,"abstract":"A pointer-analysis algorithm can be either flow-sensitive or flow-insensitive. While flow-sensitive analysis usually provides more precise information, it is also usually considerably more costly in terms of time and space. The main contribution of this paper is the presentation of another option in the form of an algorithm that can be 'tuned' to provide a range of results that fall between the results of flow-insensitive and flow-sensitive analysis. The algorithm combines a flow-insensitive pointer analysis with static single assignment (SSA) form and uses an iterative process to obtain progressively better results.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125179186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing array region representation techniques are sensitive to the complexity of array subscripts. In general, these techniques are very accurate and efficient for simple subscript expressions, but lose accuracy or require potentially expensive algorithms for complex subscripts. We found that in scientific applications, many access patterns are simple even when the subscript expressions are complex. In this work, we present a new, general array access representation and define operations for it. This allows us to aggregate and simplify the representation enough that precise region operations may be applied to enable compiler optimizations. Our experiments show that these techniques hold promise for speeding up applications.
{"title":"Simplification of array access patterns for compiler optimizations","authors":"Y. Paek, J. Hoeflinger, D. Padua","doi":"10.1145/277650.277664","DOIUrl":"https://doi.org/10.1145/277650.277664","url":null,"abstract":"Existing array region representation techniques are sensitive to the complexity of array subscripts. In general, these techniques are very accurate and efficient for simple subscript expressions, but lose accuracy or require potentially expensive algorithms for complex subscripts. We found that in scientific applications, many access patterns are simple even when the subscript expressions are complex. In this work, we present a new, general array access representation and define operations for it. This allows us to aggregate and simplify the representation enough that precise region operations may be applied to enable compiler optimizations. Our experiments show that these techniques hold promise for speeding up applications.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124698485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A linear-scan algorithm directs the global allocation of register candidates to registers based on a simple linear sweep over the program being compiled. This approach to register allocation makes sense for systems, such as those for dynamic compilation, where compilation speed is important. In contrast, most commercial and research optimizing compilers rely on a graph-coloring approach to global register allocation. In this paper, we compare the performance of a linear-scan method against a modern graph-coloring method. We implement both register allocators within the Machine SUIF extension of the Stanford SUIF compiler system. Experimental results show that linear scan is much faster than coloring on benchmarks with large numbers of register candidates. We also describe improvements to the linear-scan approach that do not change its linear character, but allow it to produce code of a quality near to that produced by graph coloring.
{"title":"Quality and speed in linear-scan register allocation","authors":"Omri Traub, G. Holloway, Michael D. Smith","doi":"10.1145/277650.277714","DOIUrl":"https://doi.org/10.1145/277650.277714","url":null,"abstract":"A linear-scan algorithm directs the global allocation of register candidates to registers based on a simple linear sweep over the program being compiled. This approach to register allocation makes sense for systems, such as those for dynamic compilation, where compilation speed is important. In contrast, most commercial and research optimizing compilers rely on a graph-coloring approach to global register allocation. In this paper, we compare the performance of a linear-scan method against a modern graph-coloring method. We implement both register allocators within the Machine SUIF extension of the Stanford SUIF compiler system. Experimental results show that linear scan is much faster than coloring on benchmarks with large numbers of register candidates. We also describe improvements to the linear-scan approach that do not change its linear character, but allow it to produce code of a quality near to that produced by graph coloring.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125365821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A module system ought to enable assembly-line programming using separate compilation and an expressive linking language. Separate compilation allows programmers to develop parts of a program independently. A linking language gives programmers precise control over the assembly of parts into a whole. This paper presents models of program units, MzScheme's module language for assembly-line programming. Units support separate compilation, independent module reuse, cyclic dependencies, hierarchical structuring, and dynamic linking. The models explain how to integrate units with untyped and typed languages such as Scheme and ML.
{"title":"Units: cool modules for HOT languages","authors":"M. Flatt, M. Felleisen","doi":"10.1145/277650.277730","DOIUrl":"https://doi.org/10.1145/277650.277730","url":null,"abstract":"A module system ought to enable assembly-line programming using separate compilation and an expressive linking language. Separate compilation allows programmers to develop parts of a program independently. A linking language gives programmers precise control over the assembly of parts into a whole. This paper presents models of program units, MzScheme's module language for assembly-line programming. Units support separate compilation, independent module reuse, cyclic dependencies, hierarchical structuring, and dynamic linking. The models explain how to integrate units with untyped and typed languages such as Scheme and ML.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125423837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most existing reference-based distributed object systems include some kind of acyclic garbage collection, but fail to provide acceptable collection of cyclic garbage. Those that do provide such GC currently suffer from one or more problems: synchronous operation, the need for expensive global consensus or termination algorithms, susceptibility to communication problems, or an algorithm that does not scale. We present a simple, complete, fault-tolerant, asynchronous extension to the (acyclic) cleanup protocol of the SSP Chains system. This extension is scalable, consumes few resources, and could easily be adapted to work in other reference-based distributed object systems---rendering them usable for very large-scale applications.
{"title":"An implementation of complete, asynchronous, distributed garbage collection","authors":"Fabrice Le Fessant, Ian Piumarta, M. Shapiro","doi":"10.1145/277650.277715","DOIUrl":"https://doi.org/10.1145/277650.277715","url":null,"abstract":"Most existing reference-based distributed object systems include some kind of acyclic garbage collection, but fail to provide acceptable collection of cyclic garbage. Those that do provide such GC currently suffer from one or more problems: synchronous operation, the need for expensive global consensus or termination algorithms, susceptibility to communication problems, or an algorithm that does not scale. We present a simple, complete, fault-tolerant, asynchronous extension to the (acyclic) cleanup protocol of the SSP Chains system. This extension is scalable, consumes few resources, and could easily be adapted to work in other reference-based distributed object systems---rendering them usable for very large-scale applications.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125456567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Much research has been devoted to studies of and algorithms for memory management based on garbage collection or explicit allocation and deallocation. An alternative approach, region-based memory management, has been known for decades, but has not been well-studied. In a region-based system each allocation specifies a region, and memory is reclaimed by destroying a region, freeing all the storage allocated therein. We show that on a suite of allocation-intensive C programs, regions are competitive with malloc/free and sometimes substantially faster. We also show that regions support safe memory management with low overhead. Experience with our benchmarks suggests that modifying many existing programs to use regions is not difficult.
{"title":"Memory management with explicit regions","authors":"David E. Gay, A. Aiken","doi":"10.1145/277650.277748","DOIUrl":"https://doi.org/10.1145/277650.277748","url":null,"abstract":"Much research has been devoted to studies of and algorithms for memory management based on garbage collection or explicit allocation and deallocation. An alternative approach, region-based memory management, has been known for decades, but has not been well-studied. In a region-based system each allocation specifies a region, and memory is reclaimed by destroying a region, freeing all the storage allocated therein. We show that on a suite of allocation-intensive C programs, regions are competitive with malloc/free and sometimes substantially faster. We also show that regions support safe memory management with low overhead. Experience with our benchmarks suggests that modifying many existing programs to use regions is not difficult.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134164742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper evaluates three alias analyses based on programming language types. The first analysis uses type compatibility to determine aliases. The second extends the first by using additional high-level information such as field names. The third extends the second with a flow-insensitive analysis. Although other researchers suggests using types to disambiguate memory references, none evaluates its effectiveness. We perform both static and dynamic evaluations of type-based alias analyses for Modula-3, a statically-typed type-safe language. The static analysis reveals that type compatibility alone yields a very imprecise alias analysis, but the other two analyses significantly improve alias precision. We use redundant load elimination (RLE) to demonstrate the effectiveness of the three alias algorithms in terms of the opportunities for optimization, the impact on simulated execution times, and to compute an upper bound on what a perfect alias analysis would yield. We show modest dynamic improvements for (RLE), and more surprisingly, that on average our alias analysis is within 2.5% of a perfect alias analysis with respect to RLE on 8 Modula-3 programs. These results illustrate that to explore thoroughly the effectiveness of alias analyses, researchers need static, dynamic, and upper-bound analysis. In addition, we show that for type-safe languages like Modula-3 and Java, a fast and simple alias analysis may be sufficient for many applications.
{"title":"Type-based alias analysis","authors":"Amer Diwan, K. McKinley, J. Moss","doi":"10.1145/277650.277670","DOIUrl":"https://doi.org/10.1145/277650.277670","url":null,"abstract":"This paper evaluates three alias analyses based on programming language types. The first analysis uses type compatibility to determine aliases. The second extends the first by using additional high-level information such as field names. The third extends the second with a flow-insensitive analysis. Although other researchers suggests using types to disambiguate memory references, none evaluates its effectiveness. We perform both static and dynamic evaluations of type-based alias analyses for Modula-3, a statically-typed type-safe language. The static analysis reveals that type compatibility alone yields a very imprecise alias analysis, but the other two analyses significantly improve alias precision. We use redundant load elimination (RLE) to demonstrate the effectiveness of the three alias algorithms in terms of the opportunities for optimization, the impact on simulated execution times, and to compute an upper bound on what a perfect alias analysis would yield. We show modest dynamic improvements for (RLE), and more surprisingly, that on average our alias analysis is within 2.5% of a perfect alias analysis with respect to RLE on 8 Modula-3 programs. These results illustrate that to explore thoroughly the effectiveness of alias analyses, researchers need static, dynamic, and upper-bound analysis. In addition, we show that for type-safe languages like Modula-3 and Java, a fast and simple alias analysis may be sufficient for many applications.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134510793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a typed programming language and compiler for run-time code generation. The language, called ML', extends ML with modal operators in the style of the Mini-ML'e language of Davies and Pfenning. ML' allows programmers to use types to specify precisely the stages of computation in a program. The types also guide the compiler in generating target code that exploits the staging information through the use of run-time code generation. The target machine is currently a version of the Categorical Abstract Machine, called the CCAM, which we have extended with facilities for run-time code generation.This approach allows the programmer to express the staging that he wants directly to the compiler. It also provides a typed framework in which to verify the correctness of his staging intentions, and to discuss his staging decisions with other programmers. Finally, it supports in a natural way multiple stages of run-time specialization, so that dynamically generated code can be used in the generation of yet further specialized code.This paper presents an overview of the language, with several examples of programs that illustrate key concepts and programming techniques. Then, it discusses the CCAM and the compilation of ML' programs into CCAM code. Finally, the results of some experiments are shown, to demonstrate the benefits of this style of run-time code generation for some applications.
{"title":"Run-time code generation and modal-ML","authors":"P. Wickline, Peter Lee, F. Pfenning","doi":"10.1145/277650.277727","DOIUrl":"https://doi.org/10.1145/277650.277727","url":null,"abstract":"This paper presents a typed programming language and compiler for run-time code generation. The language, called ML', extends ML with modal operators in the style of the Mini-ML'e language of Davies and Pfenning. ML' allows programmers to use types to specify precisely the stages of computation in a program. The types also guide the compiler in generating target code that exploits the staging information through the use of run-time code generation. The target machine is currently a version of the Categorical Abstract Machine, called the CCAM, which we have extended with facilities for run-time code generation.This approach allows the programmer to express the staging that he wants directly to the compiler. It also provides a typed framework in which to verify the correctness of his staging intentions, and to discuss his staging decisions with other programmers. Finally, it supports in a natural way multiple stages of run-time specialization, so that dynamically generated code can be used in the generation of yet further specialized code.This paper presents an overview of the language, with several examples of programs that illustrate key concepts and programming techniques. Then, it discusses the CCAM and the compilation of ML' programs into CCAM code. Finally, the results of some experiments are shown, to demonstrate the benefits of this style of run-time code generation for some applications.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"2021 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115685188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The IEEE/ANSI standard for Scheme requires implementations to be properly tail recursive. This ensures that portable code can rely upon the space efficiency of continuation-passing style and other idioms. On its face, proper tail recursion concerns the efficiency of procedure calls that occur within a tail context. When examined closely, proper tail recursion also depends upon the fact that garbage collection can be asymptotically more space-efficient than Algol-like stack allocation.Proper tail recursion is not the same as ad hoc tail call optimization in stack-based languages. Proper tail recursion often precludes stack allocation of variables, but yields a well-defined asymptotic space complexity that can be relied upon by portable programs.This paper offers a formal and implementation-independent definition of proper tail recursion for Scheme. It also shows how an entire family of reference implementations can be used to characterize related safe-for-space properties, and proves the asymptotic inequalities that hold between them.
{"title":"Proper tail recursion and space efficiency","authors":"William D. Clinger","doi":"10.1145/277650.277719","DOIUrl":"https://doi.org/10.1145/277650.277719","url":null,"abstract":"The IEEE/ANSI standard for Scheme requires implementations to be properly tail recursive. This ensures that portable code can rely upon the space efficiency of continuation-passing style and other idioms. On its face, proper tail recursion concerns the efficiency of procedure calls that occur within a tail context. When examined closely, proper tail recursion also depends upon the fact that garbage collection can be asymptotically more space-efficient than Algol-like stack allocation.Proper tail recursion is not the same as ad hoc tail call optimization in stack-based languages. Proper tail recursion often precludes stack allocation of variables, but yields a well-defined asymptotic space complexity that can be relied upon by portable programs.This paper offers a formal and implementation-independent definition of proper tail recursion for Scheme. It also shows how an entire family of reference implementations can be used to characterize related safe-for-space properties, and proves the asymptotic inequalities that hold between them.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124863549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}