首页 > 最新文献

Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation最新文献

英文 中文
Optimizing direct threaded code by selective inlining 通过选择性内联优化直接线程代码
Ian Piumarta, F. Riccardi
Achieving good performance in bytecoded language interpreters is difficult without sacrificing both simplicity and portability. This is due to the complexity of dynamic translation ("just-in-time compilation") of bytecodes into native code, which is the mechanism employed universally by high-performance interpreters.We demonstrate that a few simple techniques make it possible to create highly-portable dynamic translators that can attain as much as 70% the performance of optimized C for certain numerical computations. Translators based on such techniques can offer respectable performance without sacrificing either the simplicity or portability of much slower "pure" bytecode interpreters.
在字节码语言解释器中实现良好的性能很难不牺牲简单性和可移植性。这是由于字节码到本机代码的动态转换(“即时编译”)的复杂性,这是高性能解释器普遍采用的机制。我们证明了一些简单的技术可以创建高度可移植的动态翻译器,在某些数值计算中可以达到优化C的70%的性能。基于这种技术的翻译器可以提供可观的性能,而不会牺牲速度慢得多的“纯”字节码翻译器的简单性或可移植性。
{"title":"Optimizing direct threaded code by selective inlining","authors":"Ian Piumarta, F. Riccardi","doi":"10.1145/277650.277743","DOIUrl":"https://doi.org/10.1145/277650.277743","url":null,"abstract":"Achieving good performance in bytecoded language interpreters is difficult without sacrificing both simplicity and portability. This is due to the complexity of dynamic translation (\"just-in-time compilation\") of bytecodes into native code, which is the mechanism employed universally by high-performance interpreters.We demonstrate that a few simple techniques make it possible to create highly-portable dynamic translators that can attain as much as 70% the performance of optimized C for certain numerical computations. Translators based on such techniques can offer respectable performance without sacrificing either the simplicity or portability of much slower \"pure\" bytecode interpreters.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121931694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 139
Partial online cycle elimination in inclusion constraint graphs 包含约束图中的部分在线循环消去
Manuel Fähndrich, J. Foster, Z. Su, A. Aiken
Many program analyses are naturally formulated and implemented using inclusion constraints. We present new results on the scalable implementation of such analyses based on two insights: first, that online elimination of cyclic constraints yields orders-of-magnitude improvements in analysis time for large problems; second, that the choice of constraint representation affects the quality and efficiency of online cycle elimination. We present an analytical model that explains our design choices and show that the model's predictions match well with results from a substantial experiment.
许多程序分析自然是使用包含约束来制定和实现的。我们基于两个见解提出了这种分析的可扩展实现的新结果:首先,在线消除循环约束可以在大型问题的分析时间上产生数量级的改进;其次,约束表示的选择影响在线循环消除的质量和效率。我们提出了一个分析模型来解释我们的设计选择,并表明该模型的预测与大量实验的结果很好地匹配。
{"title":"Partial online cycle elimination in inclusion constraint graphs","authors":"Manuel Fähndrich, J. Foster, Z. Su, A. Aiken","doi":"10.1145/277650.277667","DOIUrl":"https://doi.org/10.1145/277650.277667","url":null,"abstract":"Many program analyses are naturally formulated and implemented using inclusion constraints. We present new results on the scalable implementation of such analyses based on two insights: first, that online elimination of cyclic constraints yields orders-of-magnitude improvements in analysis time for large problems; second, that the choice of constraint representation affects the quality and efficiency of online cycle elimination. We present an analytical model that explains our design choices and show that the model's predictions match well with results from a substantial experiment.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128536625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 241
A study of dead data members in C++ applications c++应用程序中失效数据成员的研究
P. Sweeney, F. Tip
Object-oriented applications may contain data members that can be removed from the application without affecting program behavior. Such "dead" data members may occur due to unused functionality in class libraries, or due to the programmer losing track of member usage as the application changes over time. We present a simple and efficient algorithm for detecting dead data members in C++ applications. This algorithm has been implemented using a prototype version of the IBM VisualAge C++ compiler, and applied to a number of realistic benchmark programs ranging from 600 to 58,000 lines of code. For the non-trivial benchmarks, we found that up to 27.3% of the data members in the benchmarks are dead (average 12.5%), and that up to 11.6% of the object space of these applications may be occupied by dead data members at run-time (average 4.4%).
面向对象的应用程序可能包含可以从应用程序中删除而不影响程序行为的数据成员。这种“死亡”数据成员可能是由于类库中未使用的功能,或者由于程序员在应用程序随时间变化时丢失了成员使用的跟踪而发生的。提出了一种简单有效的检测c++应用程序中失效数据成员的算法。该算法是使用IBM VisualAge c++编译器的原型版本实现的,并应用于大量实际的基准测试程序,代码行从600行到58,000行不等。对于重要的基准测试,我们发现基准测试中高达27.3%的数据成员是死的(平均为12.5%),并且这些应用程序中高达11.6%的对象空间可能在运行时被死的数据成员占用(平均为4.4%)。
{"title":"A study of dead data members in C++ applications","authors":"P. Sweeney, F. Tip","doi":"10.1145/277650.277750","DOIUrl":"https://doi.org/10.1145/277650.277750","url":null,"abstract":"Object-oriented applications may contain data members that can be removed from the application without affecting program behavior. Such \"dead\" data members may occur due to unused functionality in class libraries, or due to the programmer losing track of member usage as the application changes over time. We present a simple and efficient algorithm for detecting dead data members in C++ applications. This algorithm has been implemented using a prototype version of the IBM VisualAge C++ compiler, and applied to a number of realistic benchmark programs ranging from 600 to 58,000 lines of code. For the non-trivial benchmarks, we found that up to 27.3% of the data members in the benchmarks are dead (average 12.5%), and that up to 11.6% of the object space of these applications may be occupied by dead data members at run-time (average 4.4%).","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128929886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Scalable cross-module optimization 可扩展的跨模块优化
A. Ayers, Stuart de Jong, John Peyton, R. Schooler
Large applications are typically partitioned into separately compiled modules. Large performance gains in these applications are available by optimizing across module boundaries. One barrier to applying crossmodule optimization (CMO) to large applications is the potentially enormous amount of time and space consumed by the optimization process.We describe a framework for scalable CMO that provides large gains in performance on applications that contain millions of lines of code. Two major techniques are described. First, careful management of in-memory data structures results in sub-linear memory occupancy when compared to the number of lines of code being optimized. Second, profile data is used to focus optimization effort on the performance-critical portions of applications. We also present practical issues that arise in deploying this framework in a production environment. These issues include debuggability and compatibility with existing development tools, such as make. Our framework is deployed in Hewlett-Packard's (HP) UNIX compiler products and speeds up shipped independent software vendors' applications by as much as 71%.
大型应用程序通常被划分为单独编译的模块。在这些应用程序中,通过跨模块边界进行优化可以获得较大的性能提升。将跨模块优化(CMO)应用于大型应用程序的一个障碍是,优化过程可能会消耗大量的时间和空间。我们描述了一个可扩展的CMO框架,它为包含数百万行代码的应用程序提供了巨大的性能提升。介绍了两种主要技术。首先,与优化的代码行数相比,仔细管理内存中的数据结构会导致亚线性内存占用。其次,配置文件数据用于将优化工作集中在应用程序的性能关键部分。我们还介绍了在生产环境中部署该框架时出现的实际问题。这些问题包括可调试性和与现有开发工具(如make)的兼容性。我们的框架被部署在惠普(Hewlett-Packard)的UNIX编译器产品中,并将独立软件供应商发布的应用程序的速度提高了71%。
{"title":"Scalable cross-module optimization","authors":"A. Ayers, Stuart de Jong, John Peyton, R. Schooler","doi":"10.1145/277650.277745","DOIUrl":"https://doi.org/10.1145/277650.277745","url":null,"abstract":"Large applications are typically partitioned into separately compiled modules. Large performance gains in these applications are available by optimizing across module boundaries. One barrier to applying crossmodule optimization (CMO) to large applications is the potentially enormous amount of time and space consumed by the optimization process.We describe a framework for scalable CMO that provides large gains in performance on applications that contain millions of lines of code. Two major techniques are described. First, careful management of in-memory data structures results in sub-linear memory occupancy when compared to the number of lines of code being optimized. Second, profile data is used to focus optimization effort on the performance-critical portions of applications. We also present practical issues that arise in deploying this framework in a production environment. These issues include debuggability and compatibility with existing development tools, such as make. Our framework is deployed in Hewlett-Packard's (HP) UNIX compiler products and speeds up shipped independent software vendors' applications by as much as 71%.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114667841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Register promotion by sparse partial redundancy elimination of loads and stores 通过稀疏部分冗余消除负载和存储的寄存器提升
R. Lo, Fred C. Chow, Robert Kennedy, Shin-Ming Liu, P. Tu
An algorithm for register promotion is presented based on the observation that the circumstances for promoting a memory location's value to register coincide with situations where the program exhibits partial redundancy between accesses to the memory location. The recent SSAPRE algorithm for eliminating partial redundancy using a sparse SSA representation forms the foundation for the present algorithm to eliminate redundancy among memory accesses, enabling us to achieve both computational and live range optimality in our register promotion results. We discuss how to effect speculative code motion in the SSAPRE framework. We present two different algorithms for performing speculative code motion: the conservative speculation algorithm used in the absence of profile data, and the the profile-driven speculation algorithm used when profile data are available. We define the static single use (SSU) form and develop the dual of the SSAPRE algorithm, called SSUPRE, to perform the partial redundancy elimination of stores. We provide measurement data on the SPECint95 benchmark suite to demonstrate the effectiveness of our register promotion approach in removing loads and stores. We also study the relative performance of the different speculative code motion strategies when applied to scalar loads and stores.
在观察到将存储器位置的值提升为寄存器的情况与程序在访问存储器位置之间显示部分冗余的情况相一致的情况下,提出了一种寄存器提升算法。最近使用稀疏SSA表示消除部分冗余的SSAPRE算法为当前消除内存访问冗余的算法奠定了基础,使我们能够在寄存器提升结果中实现计算和实时范围的最优性。我们讨论了如何在SSAPRE框架中影响推测代码运动。我们提出了两种不同的算法来执行推测代码运动:在没有轮廓数据时使用的保守推测算法,以及在轮廓数据可用时使用的轮廓驱动推测算法。我们定义了静态单次使用(SSU)形式,并开发了SSAPRE算法的对偶,称为ssuppre,以执行存储的部分冗余消除。我们提供了SPECint95基准套件的测量数据,以证明我们的寄存器提升方法在去除负载和存储方面的有效性。我们还研究了应用于标量加载和标量存储时不同的推测代码运动策略的相对性能。
{"title":"Register promotion by sparse partial redundancy elimination of loads and stores","authors":"R. Lo, Fred C. Chow, Robert Kennedy, Shin-Ming Liu, P. Tu","doi":"10.1145/277650.277659","DOIUrl":"https://doi.org/10.1145/277650.277659","url":null,"abstract":"An algorithm for register promotion is presented based on the observation that the circumstances for promoting a memory location's value to register coincide with situations where the program exhibits partial redundancy between accesses to the memory location. The recent SSAPRE algorithm for eliminating partial redundancy using a sparse SSA representation forms the foundation for the present algorithm to eliminate redundancy among memory accesses, enabling us to achieve both computational and live range optimality in our register promotion results. We discuss how to effect speculative code motion in the SSAPRE framework. We present two different algorithms for performing speculative code motion: the conservative speculation algorithm used in the absence of profile data, and the the profile-driven speculation algorithm used when profile data are available. We define the static single use (SSU) form and develop the dual of the SSAPRE algorithm, called SSUPRE, to perform the partial redundancy elimination of stores. We provide measurement data on the SPECint95 benchmark suite to demonstrate the effectiveness of our register promotion approach in removing loads and stores. We also study the relative performance of the different speculative code motion strategies when applied to scalar loads and stores.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122179547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 112
Generational stack collection and profile-driven pretenuring 分代堆栈收集和配置文件驱动的伪装
P. Cheng, R. Harper, Peter Lee
This paper presents two techniques for improving garbage collection performance: generational stack collection and profile-driven pretenuring. The first is applicable to stack-based implementations of functional languages while the second is useful for any generational collector. We have implemented both techniques in a generational collector used by the TIL compiler (Tarditi, Morrisett, Cheng, Stone, Harper, and Lee 1996), and have observed decreases in garbage collection times of as much as 70% and 30%, respectively.Functional languages encourage the use of recursion which can lead to a long chain of activation records. When a collection occurs, these activation records must be scanned for roots. We show that scanning many activation records can take so long as to become the dominant cost of garbage collection. However, most deep stacks unwind very infrequently, so most of the root information obtained from the stack remains unchanged across successive garbage collections. Generational stack collection greatly reduces the stack scan cost by reusing information from previous scans.Generational techniques have been successful in reducing the cost of garbage collection (Ungar 1984). Various complex heap arrangements and tenuring policies have been proposed to increase the effectiveness of generational techniques by reducing the cost and frequency of scanning and copying. In contrast, we show that by using profile information to make lifetime predictions, pretenuring can avoid copying data altogether. In essence, this technique uses a refinement of the generational hypothesis (most data die young) with a locality principle concerning the age of data: most allocations sites produce data that immediately dies, while a few allocation sites consistently produce data that survives many collections.
本文提出了两种提高垃圾收集性能的技术:分代堆栈收集和配置文件驱动的假装。第一个适用于基于堆栈的函数式语言实现,而第二个适用于任何分代收集器。我们已经在TIL编译器使用的分代收集器中实现了这两种技术(Tarditi、Morrisett、Cheng、Stone、Harper和Lee 1996),并观察到垃圾收集时间分别减少了70%和30%。函数式语言鼓励使用递归,这可能导致长链的激活记录。当发生收集时,必须扫描这些激活记录以查找根。我们表明,扫描许多激活记录可能会花费很长时间,从而成为垃圾收集的主要成本。但是,大多数深度堆栈很少展开,因此从堆栈获得的大多数根信息在连续的垃圾收集中保持不变。分代堆栈收集通过重用以前扫描的信息大大降低了堆栈扫描成本。分代技术已经成功地降低了垃圾收集的成本(Ungar 1984)。为了通过降低扫描和复制的成本和频率来提高分代技术的有效性,已经提出了各种复杂的堆安排和保留期策略。相反,我们表明,通过使用个人资料信息来预测生命周期,假装可以完全避免复制数据。从本质上讲,该技术使用了对分代假设(大多数数据在年轻时死亡)的改进,并使用了有关数据年龄的局域原则:大多数分配站点产生的数据立即死亡,而少数分配站点始终产生的数据在许多收集中幸存下来。
{"title":"Generational stack collection and profile-driven pretenuring","authors":"P. Cheng, R. Harper, Peter Lee","doi":"10.1145/277650.277718","DOIUrl":"https://doi.org/10.1145/277650.277718","url":null,"abstract":"This paper presents two techniques for improving garbage collection performance: generational stack collection and profile-driven pretenuring. The first is applicable to stack-based implementations of functional languages while the second is useful for any generational collector. We have implemented both techniques in a generational collector used by the TIL compiler (Tarditi, Morrisett, Cheng, Stone, Harper, and Lee 1996), and have observed decreases in garbage collection times of as much as 70% and 30%, respectively.Functional languages encourage the use of recursion which can lead to a long chain of activation records. When a collection occurs, these activation records must be scanned for roots. We show that scanning many activation records can take so long as to become the dominant cost of garbage collection. However, most deep stacks unwind very infrequently, so most of the root information obtained from the stack remains unchanged across successive garbage collections. Generational stack collection greatly reduces the stack scan cost by reusing information from previous scans.Generational techniques have been successful in reducing the cost of garbage collection (Ungar 1984). Various complex heap arrangements and tenuring policies have been proposed to increase the effectiveness of generational techniques by reducing the cost and frequency of scanning and copying. In contrast, we show that by using profile information to make lifetime predictions, pretenuring can avoid copying data altogether. In essence, this technique uses a refinement of the generational hypothesis (most data die young) with a locality principle concerning the age of data: most allocations sites produce data that immediately dies, while a few allocation sites consistently produce data that survives many collections.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"477 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121349754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 116
Improving performance by branch reordering 通过分支重新排序提高性能
Minghui Yang, Gang-Ryung Uh, D. Whalley
The conditional branch has long been considered an expensive operation. The relative cost of conditional branches has increased as recently designed machines are now relying on deeper pipelines and higher multiple issue. Reducing the number of conditional branches executed can often result in a substantial performance benefit. This paper describes a code-improving transformation to reorder sequences of conditional branches. First, sequences of branches that can be reordered are detected in the control flow. Second, profiling information is collected to predict the probability that each branch will transfer control out of the sequence. Third, the cost of performing each conditional branch is estimated. Fourth, the most beneficial ordering of the branches based on the estimated probability and cost is selected. The most beneficial ordering often included the insertion of additional conditional branches that did not previously exist in the sequence. Finally, the control flow is restructured to refflect the new ordering. The results of applying the transformation were significant reductions in the dynamic number of instructions and branches, as well as decreases in execution time.
条件分支一直被认为是代价高昂的操作。有条件分支的相对成本增加了,因为最近设计的机器现在依赖于更深的管道和更高的多重问题。减少执行的条件分支的数量通常可以带来显著的性能优势。本文描述了一种对条件分支序列进行重排序的代码改进变换。首先,在控制流中检测可重新排序的分支序列。其次,收集分析信息以预测每个分支将控制权转移到序列之外的概率。第三,估计执行每个条件分支的成本。第四,根据估计的概率和代价选择最有利的分支排序。最有利的排序通常包括插入先前序列中不存在的附加条件分支。最后,重新构造控制流以反映新的排序。应用转换的结果是显著减少了指令和分支的动态数量,以及减少了执行时间。
{"title":"Improving performance by branch reordering","authors":"Minghui Yang, Gang-Ryung Uh, D. Whalley","doi":"10.1145/277650.277711","DOIUrl":"https://doi.org/10.1145/277650.277711","url":null,"abstract":"The conditional branch has long been considered an expensive operation. The relative cost of conditional branches has increased as recently designed machines are now relying on deeper pipelines and higher multiple issue. Reducing the number of conditional branches executed can often result in a substantial performance benefit. This paper describes a code-improving transformation to reorder sequences of conditional branches. First, sequences of branches that can be reordered are detected in the control flow. Second, profiling information is collected to predict the probability that each branch will transfer control out of the sequence. Third, the cost of performing each conditional branch is estimated. Fourth, the most beneficial ordering of the branches based on the estimated probability and cost is selected. The most beneficial ordering often included the insertion of additional conditional branches that did not previously exist in the sequence. Finally, the control flow is restructured to refflect the new ordering. The results of applying the transformation were significant reductions in the dynamic number of instructions and branches, as well as decreases in execution time.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126697383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
A new algorithm for scalar register promotion based on SSA form 一种基于SSA格式的标量寄存器提升算法
A. V. S. Sastry, R. Ju
We present a new register promotion algorithm based on Static Single Assignment (SSA) form. Register promotion is aimed at promoting program names from memory locations to registers. Our algorithm is profile-driven and is based on the scope of intervals. In cases where a complete promotion is not possible because of the presence of function calls or pointer references, the proposed algorithm is capable of eliminating loads and stores on frequently executed paths by placing loads and stores on less frequently executed paths. We also describe an efficient method to incrementally update SSA form when new definitions are cloned from an existing name during register promotion. On SPECInt95 benchmarks, our algorithm removes about ~12% of memory operations which access scalar variables.
提出了一种新的基于静态单分配(SSA)形式的寄存器提升算法。寄存器提升的目的是将程序名从内存位置提升到寄存器。我们的算法是基于区间范围的剖面驱动算法。在由于存在函数调用或指针引用而无法完全提升的情况下,所提出的算法能够通过将加载和存储放在执行频率较低的路径上来消除频繁执行路径上的加载和存储。我们还描述了在注册提升期间从现有名称克隆新定义时增量更新SSA表单的有效方法。在SPECInt95基准测试中,我们的算法删除了大约12%访问标量变量的内存操作。
{"title":"A new algorithm for scalar register promotion based on SSA form","authors":"A. V. S. Sastry, R. Ju","doi":"10.1145/277650.277656","DOIUrl":"https://doi.org/10.1145/277650.277656","url":null,"abstract":"We present a new register promotion algorithm based on Static Single Assignment (SSA) form. Register promotion is aimed at promoting program names from memory locations to registers. Our algorithm is profile-driven and is based on the scope of intervals. In cases where a complete promotion is not possible because of the presence of function calls or pointer references, the proposed algorithm is capable of eliminating loads and stores on frequently executed paths by placing loads and stores on less frequently executed paths. We also describe an efficient method to incrementally update SSA form when new definitions are cloned from an existing name during register promotion. On SPECInt95 benchmarks, our algorithm removes about ~12% of memory operations which access scalar variables.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126422718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
The design and implementation of a certifying compiler 认证编译器的设计和实现
G. Necula, Peter Lee
This paper presents the design and implementation of a compiler that translates programs written in a type-safe subset of the C programming language into highly optimized DEC Alpha assembly language programs, and a certifier that automatically checks the type safety and memory safety of any assembly language program produced by the compiler. The result of the certifier is either a formal proof of type safety or a counterexample pointing to a potential violation of the type system by the target program. The ensemble of the compiler and the certifier is called a certifying compiler.Several advantages of certifying compilation over previous approaches can be claimed. The notion of a certifying compiler is significantly easier to employ than a formal compiler verification, in part because it is generally easier to verify the correctness of the result of a computation than to prove the correctness of the computation itself. Also, the approach can be applied even to highly optimizing compilers, as demonstrated by the fact that our compiler generates target code, for a range of realistic C programs, which is competitive with both the cc and gcc compilers with all optimizations enabled. The certifier also drastically improves the effectiveness of compiler testing because, for each test case, it statically signals compilation errors that might otherwise require many executions to detect. Finally, this approach is a practical way to produce the safety proofs for a Proof-Carrying Code system, and thus may be useful in a system for safe mobile code.
本文介绍了一个编译器的设计和实现,该编译器可以将用C语言的类型安全子集编写的程序翻译成高度优化的DEC Alpha汇编语言程序,并提供了一个自动检查编译器生成的任何汇编语言程序的类型安全和内存安全的认证器。证明器的结果要么是类型安全的正式证明,要么是指向目标程序可能违反类型系统的反例。编译器和认证器的集合称为认证编译器。与以前的方法相比,认证编译有几个优点。使用认证编译器的概念比使用正式的编译器验证要容易得多,部分原因是验证计算结果的正确性通常比证明计算本身的正确性更容易。此外,这种方法甚至可以应用于高度优化的编译器,正如我们的编译器为一系列实际的C程序生成目标代码这一事实所证明的那样,它与启用了所有优化的cc和gcc编译器都具有竞争力。认证器还极大地提高了编译器测试的有效性,因为对于每个测试用例,它会静态地向编译错误发出信号,否则可能需要多次执行才能检测到编译错误。最后,该方法是一种实用的方法来生成携带证明码系统的安全证明,因此可以用于安全的移动代码系统。
{"title":"The design and implementation of a certifying compiler","authors":"G. Necula, Peter Lee","doi":"10.1145/277650.277752","DOIUrl":"https://doi.org/10.1145/277650.277752","url":null,"abstract":"This paper presents the design and implementation of a compiler that translates programs written in a type-safe subset of the C programming language into highly optimized DEC Alpha assembly language programs, and a certifier that automatically checks the type safety and memory safety of any assembly language program produced by the compiler. The result of the certifier is either a formal proof of type safety or a counterexample pointing to a potential violation of the type system by the target program. The ensemble of the compiler and the certifier is called a certifying compiler.Several advantages of certifying compilation over previous approaches can be claimed. The notion of a certifying compiler is significantly easier to employ than a formal compiler verification, in part because it is generally easier to verify the correctness of the result of a computation than to prove the correctness of the computation itself. Also, the approach can be applied even to highly optimizing compilers, as demonstrated by the fact that our compiler generates target code, for a range of realistic C programs, which is competitive with both the cc and gcc compilers with all optimizations enabled. The certifier also drastically improves the effectiveness of compiler testing because, for each test case, it statically signals compilation errors that might otherwise require many executions to detect. Finally, this approach is a practical way to produce the safety proofs for a Proof-Carrying Code system, and thus may be useful in a system for safe mobile code.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123450527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 405
Eliminating array bound checking through dependent types 消除通过依赖类型进行的数组绑定检查
H. Xi, F. Pfenning
We present a type-based approach to eliminating array bound checking and list tag checking by conservatively extending Standard ML with a restricted form of dependent types. This enables the programmer to capture more invariants through types while type-checking remains decidable in theory and can still be performed efficiently in practice. We illustrate our approach through concrete examples and present the result of our preliminary experiments which support support the feasibility and effectiveness of our approach.
我们提出了一种基于类型的方法,通过使用依赖类型的限制形式保守地扩展标准ML来消除数组绑定检查和列表标记检查。这使程序员能够通过类型捕获更多的不变量,而类型检查在理论上仍然是可确定的,并且在实践中仍然可以有效地执行。我们通过具体的例子说明了我们的方法,并给出了我们的初步实验结果,以支持我们的方法的可行性和有效性。
{"title":"Eliminating array bound checking through dependent types","authors":"H. Xi, F. Pfenning","doi":"10.1145/277650.277732","DOIUrl":"https://doi.org/10.1145/277650.277732","url":null,"abstract":"We present a type-based approach to eliminating array bound checking and list tag checking by conservatively extending Standard ML with a restricted form of dependent types. This enables the programmer to capture more invariants through types while type-checking remains decidable in theory and can still be performed efficiently in practice. We illustrate our approach through concrete examples and present the result of our preliminary experiments which support support the feasibility and effectiveness of our approach.","PeriodicalId":365404,"journal":{"name":"Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133618901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 334
期刊
Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1