R. Kessler, John C. Peterson, H. Carr, Gerald P. Duggan, J. Knell, Jed J. Krohnfeldt
The Experimental Portable Standard Lisp Compiler (EPIC) is a compiler testbed for experimentation with, and development of, Lisp compilation strategies. EPIC uses an architectural description of the target machine to increase portability, and performs extensive optimizations in the form of source-to-source transformations, register allocation, and peephole optimization. It introduces machine-specific instructions early to enable machine-specific optimizations in the initial passes. EPIC produces better code than the original Portable Standard Lisp compiler, has an improved portability model, and is easier to maintain.
{"title":"EPIC - a retargetable, highly optimizing Lisp compiler","authors":"R. Kessler, John C. Peterson, H. Carr, Gerald P. Duggan, J. Knell, Jed J. Krohnfeldt","doi":"10.1145/12276.13323","DOIUrl":"https://doi.org/10.1145/12276.13323","url":null,"abstract":"The Experimental Portable Standard Lisp Compiler (EPIC) is a compiler testbed for experimentation with, and development of, Lisp compilation strategies. EPIC uses an architectural description of the target machine to increase portability, and performs extensive optimizations in the form of source-to-source transformations, register allocation, and peephole optimization. It introduces machine-specific instructions early to enable machine-specific optimizations in the initial passes. EPIC produces better code than the original Portable Standard Lisp compiler, has an improved portability model, and is easier to maintain.","PeriodicalId":414056,"journal":{"name":"SIGPLAN Conferences and Workshops","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1986-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128619721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We describe, and give experience with, a new method of intraprocedural data flow analysis on reducible flow-graphs[9]. The method is advantageous in imbedded applications where the added value of improved performance justifies substantial optimization effort, but extremely powerful data flow analysis is required due to the code profile. The method is unusual in that (1) various kinds of forward data flow analysis are done simultaneously, so that benefit is derived from informative interactions (e.g. between constant propagation[7] and alias analysis[8]) and (2) we abandon insistence on reaching a least fixed point, allowing us to handle extremely broad classes of information (e.g., inequality of array indices, which implies non-aliasing in array references). We argue that the gain from using a very rich framework more than offsets the loss due to non-minimal fixed points, and justify this with a 'thought experiment' and practical results.
{"title":"Data flow analysis for `intractable' system software","authors":"H. Johnson","doi":"10.1145/12276.13322","DOIUrl":"https://doi.org/10.1145/12276.13322","url":null,"abstract":"We describe, and give experience with, a new method of intraprocedural data flow analysis on reducible flow-graphs[9]. The method is advantageous in imbedded applications where the added value of improved performance justifies substantial optimization effort, but extremely powerful data flow analysis is required due to the code profile.\u0000The method is unusual in that (1) various kinds of forward data flow analysis are done simultaneously, so that benefit is derived from informative interactions (e.g. between constant propagation[7] and alias analysis[8]) and (2) we abandon insistence on reaching a least fixed point, allowing us to handle extremely broad classes of information (e.g., inequality of array indices, which implies non-aliasing in array references).\u0000We argue that the gain from using a very rich framework more than offsets the loss due to non-minimal fixed points, and justify this with a 'thought experiment' and practical results.","PeriodicalId":414056,"journal":{"name":"SIGPLAN Conferences and Workshops","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1986-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130135660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
LR parsers can be made to run 6 to 10 times as fast as the best table-interpretive LR parsers. The resulting parse time is negligible compared to the time required by the remainder of a typical compiler containing the parser. A parsing speed of 1/2 million lines per minute on a computer similar to a VAX 11/780 was achieved, up from an interpretive speed of 40,000 lines per minute. A speed of 240,000 lines per minute on an Intel 80286 was achieved, up from an interpretive speed of 37,000 lines per minute. The improvement is obtained by translating the parser's finite state control into assembly language. States become code memory addresses. The current input symbol resides in a register and a quick sequence of register-constant comparisons determines the next state, which is merely jumped to. The parser's push-down stack is implemented directly on a hardware stack. The stack contains code memory addresses rather than the traditional state numbers. The strongly-connected components of the directed graph induced by the parser's terminal and nonterminal transitions are examined to determine a typically small subset of the states that require parse-time stack-overflow-check code when hardware does not provide the check automatically. The increase in speed is at the expense of space: a factor of 2 to 4 increase in total table size can be expected, depending upon whether syntactic error recovery is required.
{"title":"Very fast LR parsing","authors":"T. Pennello","doi":"10.1145/12276.13326","DOIUrl":"https://doi.org/10.1145/12276.13326","url":null,"abstract":"LR parsers can be made to run 6 to 10 times as fast as the best table-interpretive LR parsers. The resulting parse time is negligible compared to the time required by the remainder of a typical compiler containing the parser.\u0000A parsing speed of 1/2 million lines per minute on a computer similar to a VAX 11/780 was achieved, up from an interpretive speed of 40,000 lines per minute. A speed of 240,000 lines per minute on an Intel 80286 was achieved, up from an interpretive speed of 37,000 lines per minute.\u0000The improvement is obtained by translating the parser's finite state control into assembly language. States become code memory addresses. The current input symbol resides in a register and a quick sequence of register-constant comparisons determines the next state, which is merely jumped to. The parser's push-down stack is implemented directly on a hardware stack. The stack contains code memory addresses rather than the traditional state numbers.\u0000The strongly-connected components of the directed graph induced by the parser's terminal and nonterminal transitions are examined to determine a typically small subset of the states that require parse-time stack-overflow-check code when hardware does not provide the check automatically.\u0000The increase in speed is at the expense of space: a factor of 2 to 4 increase in total table size can be expected, depending upon whether syntactic error recovery is required.","PeriodicalId":414056,"journal":{"name":"SIGPLAN Conferences and Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1986-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116138857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We report on a compiler for Warp, a high-performance systolic array developed at Carnegie Mellon. This compiler enhances the usefulness of Warp significantly and allows application programmers to code substantial algorithms. The compiler combines a novel programming model, which is based on a model of skewed computation for the array, with powerful optimization techniques. Programming in W2 (the language accepted by the compiler) is orders of magnitude easier than coding in microcode, the only alternative available previously.
{"title":"Compilation for a high-performance systolic array","authors":"T. Gross, M. Lam","doi":"10.1145/12276.13314","DOIUrl":"https://doi.org/10.1145/12276.13314","url":null,"abstract":"We report on a compiler for Warp, a high-performance systolic array developed at Carnegie Mellon. This compiler enhances the usefulness of Warp significantly and allows application programmers to code substantial algorithms.\u0000The compiler combines a novel programming model, which is based on a model of skewed computation for the array, with powerful optimization techniques. Programming in W2 (the language accepted by the compiler) is orders of magnitude easier than coding in microcode, the only alternative available previously.","PeriodicalId":414056,"journal":{"name":"SIGPLAN Conferences and Workshops","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1986-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132177540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Register allocation is an important component of most compilers, particularly those for RISC machines. The SPUR Lisp compiler uses a sophisticated, graph-coloring algorithm developed by Fredrick Chow [Chow84]. This paper describes the algorithm and the techniques used to implement it efficiently and evaluates its performance on several large programs. The allocator successfully assigned most temporaries and local variables to registers in a wide variety of functions. Its execution cost is moderate.
{"title":"Register allocation in the SPUR Lisp compiler","authors":"J. Larus, P. Hilfinger","doi":"10.1145/12276.13337","DOIUrl":"https://doi.org/10.1145/12276.13337","url":null,"abstract":"Register allocation is an important component of most compilers, particularly those for RISC machines. The SPUR Lisp compiler uses a sophisticated, graph-coloring algorithm developed by Fredrick Chow [Chow84]. This paper describes the algorithm and the techniques used to implement it efficiently and evaluates its performance on several large programs. The allocator successfully assigned most temporaries and local variables to registers in a wide variety of functions. Its execution cost is moderate.","PeriodicalId":414056,"journal":{"name":"SIGPLAN Conferences and Workshops","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1986-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122147103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chaining is the ability to pipeline two or more vector instructions on Cray-1 like machines. We show how to optimally use this feature to compute (vector) expression trees, in the context of automatic code-generation. We present a linear-time scheduling algorithm for finding an optimal order of evaluation for a machine with a bounded number of registers.
{"title":"Optimal chaining in expression trees","authors":"D. Bernstein, H. Boral, R. Pinter","doi":"10.1145/12276.13311","DOIUrl":"https://doi.org/10.1145/12276.13311","url":null,"abstract":"Chaining is the ability to pipeline two or more vector instructions on Cray-1 like machines. We show how to optimally use this feature to compute (vector) expression trees, in the context of automatic code-generation. We present a linear-time scheduling algorithm for finding an optimal order of evaluation for a machine with a bounded number of registers.","PeriodicalId":414056,"journal":{"name":"SIGPLAN Conferences and Workshops","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1986-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121635069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Extant peephole optimizers can perform many optimizations that are handled by higher-level optimizers. This paper describes a retargetable instruction reorganizer that performs targeting and evaluation order determination by applying a well known algorithm for optimal code generation for expressions to object code. By applying the algorithm to object code after code generation and optimization, a phase ordering problem often encountered by higher-level optimizers is avoided. It minimizes the number of registers and temporaries required to compile expressions by rearranging machine instructions. For some machines, this can result in smaller and faster programs. By generalizing its operation, the reorganizer can also be used to reorder instructions to avoid delays in pipelined machines. For one pipelined machine, it has provided a 5 to 10 percent improvement in the execution speed of benchmark programs.
{"title":"A retargetable instruction reorganizer","authors":"J. Davidson","doi":"10.1145/12276.13334","DOIUrl":"https://doi.org/10.1145/12276.13334","url":null,"abstract":"Extant peephole optimizers can perform many optimizations that are handled by higher-level optimizers. This paper describes a retargetable instruction reorganizer that performs targeting and evaluation order determination by applying a well known algorithm for optimal code generation for expressions to object code. By applying the algorithm to object code after code generation and optimization, a phase ordering problem often encountered by higher-level optimizers is avoided. It minimizes the number of registers and temporaries required to compile expressions by rearranging machine instructions. For some machines, this can result in smaller and faster programs. By generalizing its operation, the reorganizer can also be used to reorder instructions to avoid delays in pipelined machines. For one pipelined machine, it has provided a 5 to 10 percent improvement in the execution speed of benchmark programs.","PeriodicalId":414056,"journal":{"name":"SIGPLAN Conferences and Workshops","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1986-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128377776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Partitioning and scheduling techniques are necessary to implement parallel languages on multiprocessors. Multiprocessor performance is maximized when parallelism between tasks is optimally traded off with communication and synchronization overhead. We present compile-time partitioning and scheduling techniques to achieve this trade-off.
{"title":"Compile-time partitioning and scheduling of parallel programs","authors":"Vivek Sarkar, J. Hennessy","doi":"10.1145/12276.13313","DOIUrl":"https://doi.org/10.1145/12276.13313","url":null,"abstract":"Partitioning and scheduling techniques are necessary to implement parallel languages on multiprocessors. Multiprocessor performance is maximized when parallelism between tasks is optimally traded off with communication and synchronization overhead. We present compile-time partitioning and scheduling techniques to achieve this trade-off.","PeriodicalId":414056,"journal":{"name":"SIGPLAN Conferences and Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1986-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121257114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Whilst widely recognised as an excellent means for solving problems and for designing software, functional programming languages have suffered from their inefficient implementations on conventional computers. A route to improved runtime performance is to transform recursively defined functions into programs which execute more quickly and/or consume less space. We derive equivalent imperative programming language loops for a large class of linear recursive functions of which the tail-recursive functions form a very small subset. We first identify a small set of primitive function defining expressions for which we determine the corresponding loop-expressions. We then determine the loop-expressions for linear functions defined by any expressions which are formed from those primitives. In this way, a very general class of linear functions can be transformed automatically into loops in the parsing phase of a compiler, since the parser has in any case to determine the hierarchical structure of function definitions. Further transformation may involve specific properties of particular defining expressions, and adopt previous schemes. In addition, equivalent linear functions can be found for many non-linear ones which can therefore also be transformed into loops.
{"title":"Efficient compilation of linear recursive functions into object level loops","authors":"P. Harrison, H. Khoshnevisan","doi":"10.1145/12276.13332","DOIUrl":"https://doi.org/10.1145/12276.13332","url":null,"abstract":"Whilst widely recognised as an excellent means for solving problems and for designing software, functional programming languages have suffered from their inefficient implementations on conventional computers. A route to improved runtime performance is to transform recursively defined functions into programs which execute more quickly and/or consume less space. We derive equivalent imperative programming language loops for a large class of linear recursive functions of which the tail-recursive functions form a very small subset. We first identify a small set of primitive function defining expressions for which we determine the corresponding loop-expressions. We then determine the loop-expressions for linear functions defined by any expressions which are formed from those primitives. In this way, a very general class of linear functions can be transformed automatically into loops in the parsing phase of a compiler, since the parser has in any case to determine the hierarchical structure of function definitions. Further transformation may involve specific properties of particular defining expressions, and adopt previous schemes. In addition, equivalent linear functions can be found for many non-linear ones which can therefore also be transformed into loops.","PeriodicalId":414056,"journal":{"name":"SIGPLAN Conferences and Workshops","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1986-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124375947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a practical technique that provides an LR (0) parser with either fixed or arbitrary look-ahead. The construction algorithm is based on certain paths in the LR (0) state diagram, which must be restricted to a maximum length m. The technique determines the amount of look-ahead required, and the user is spared the task of guessing it. Instead, the user provides m. In situations where single symbol look-ahead is sufficient, the corresponding grammar class (called LAR (m )) is identical to the NQLALR (1) class. For practical grammars that require arbitrary look-ahead, the storage requirements typically do not exceed an amount linear in the size of the corresponding LR (0) parser. The technique is shown to work for a practical programming language grammar, and has been used to solve particular cases of the PL/1 keyword problem.
{"title":"A practical arbitrary look-ahead LR parsing technique","authors":"M. Bermudez, Karl M. Schimpf","doi":"10.1145/12276.13325","DOIUrl":"https://doi.org/10.1145/12276.13325","url":null,"abstract":"We present a practical technique that provides an <italic>LR</italic> (0) parser with either fixed or arbitrary look-ahead. The construction algorithm is based on certain paths in the <italic>LR</italic> (0) state diagram, which must be restricted to a maximum length <italic>m</italic>. The technique determines the amount of look-ahead required, and the user is spared the task of guessing it. Instead, the user provides <italic>m</italic>. In situations where single symbol look-ahead is sufficient, the corresponding grammar class (called <italic>LAR</italic> (<italic>m</italic> )) is identical to the <italic>NQLALR</italic> (1) class. For practical grammars that require arbitrary look-ahead, the storage requirements typically do not exceed an amount linear in the size of the corresponding <italic>LR</italic> (0) parser. The technique is shown to work for a practical programming language grammar, and has been used to solve particular cases of the PL/1 keyword problem.","PeriodicalId":414056,"journal":{"name":"SIGPLAN Conferences and Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1986-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122960402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}