Eric Mullen, Daryl Zuniga, Zachary Tatlock, D. Grossman
Transformations over assembly code are common in many compilers. These transformations are also some of the most bug-dense compiler components. Such bugs could be elim- inated by formally verifying the compiler, but state-of-the- art formally verified compilers like CompCert do not sup- port assembly-level program transformations. This paper presents Peek, a framework for expressing, verifying, and running meaning-preserving assembly-level program trans- formations in CompCert. Peek contributes four new com- ponents: a lower level semantics for CompCert x86 syntax, a liveness analysis, a library for expressing and verifying peephole optimizations, and a verified peephole optimiza- tion pass built into CompCert. Each of these is accompanied by a correctness proof in Coq against realistic assumptions about the calling convention and the system memory alloca- tor. Verifying peephole optimizations in Peek requires prov- ing only a set of local properties, which we have proved are sufficient to ensure global transformation correctness. We have proven these local properties for 28 peephole transfor- mations from the literature. We discuss the development of our new assembly semantics, liveness analysis, representa- tion of program transformations, and execution engine; de- scribe the verification challenges of each component; and detail techniques we applied to mitigate the proof burden.
{"title":"Verified peephole optimizations for CompCert","authors":"Eric Mullen, Daryl Zuniga, Zachary Tatlock, D. Grossman","doi":"10.1145/2908080.2908109","DOIUrl":"https://doi.org/10.1145/2908080.2908109","url":null,"abstract":"Transformations over assembly code are common in many compilers. These transformations are also some of the most bug-dense compiler components. Such bugs could be elim- inated by formally verifying the compiler, but state-of-the- art formally verified compilers like CompCert do not sup- port assembly-level program transformations. This paper presents Peek, a framework for expressing, verifying, and running meaning-preserving assembly-level program trans- formations in CompCert. Peek contributes four new com- ponents: a lower level semantics for CompCert x86 syntax, a liveness analysis, a library for expressing and verifying peephole optimizations, and a verified peephole optimiza- tion pass built into CompCert. Each of these is accompanied by a correctness proof in Coq against realistic assumptions about the calling convention and the system memory alloca- tor. Verifying peephole optimizations in Peek requires prov- ing only a set of local properties, which we have proved are sufficient to ensure global transformation correctness. We have proven these local properties for 28 peephole transfor- mations from the literature. We discuss the development of our new assembly semantics, liveness analysis, representa- tion of program transformations, and execution engine; de- scribe the verification challenges of each component; and detail techniques we applied to mitigate the proof burden.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134119628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Truong, R. Barik, E. Totoni, Hai Liu, Chick Markley, A. Fox, T. Shpeisman
Deep neural networks (DNNs) have undergone a surge in popularity with consistent advances in the state of the art for tasks including image recognition, natural language processing, and speech recognition. The computationally expensive nature of these networks has led to the proliferation of implementations that sacrifice abstraction for high performance. In this paper, we present Latte, a domain-specific language for DNNs that provides a natural abstraction for specifying new layers without sacrificing performance. Users of Latte express DNNs as ensembles of neurons with connections between them. The Latte compiler synthesizes a program based on the user specification, applies a suite of domain-specific and general optimizations, and emits efficient machine code for heterogeneous architectures. Latte also includes a communication runtime for distributed memory data-parallelism. Using networks described using Latte, we demonstrate 3-6x speedup over Caffe (C++/MKL) on the three state-of-the-art ImageNet models executing on an Intel Xeon E5-2699 v3 x86 CPU.
随着图像识别、自然语言处理和语音识别等任务的不断发展,深度神经网络(dnn)越来越受欢迎。这些网络在计算上的昂贵特性导致了牺牲抽象来换取高性能的实现的激增。在本文中,我们提出了Latte,这是一种针对dnn的领域特定语言,它为在不牺牲性能的情况下指定新层提供了自然抽象。Latte的使用者将dnn表达为神经元之间的连接集合。Latte编译器根据用户规范合成程序,应用一套特定于领域和通用的优化,并为异构体系结构发出高效的机器码。Latte还包括一个用于分布式内存数据并行的通信运行时。使用Latte描述的网络,我们演示了在英特尔至强E5-2699 v3 x86 CPU上执行的三个最先进的ImageNet模型上比Caffe (c++ /MKL)加速3-6倍。
{"title":"Latte: a language, compiler, and runtime for elegant and efficient deep neural networks","authors":"L. Truong, R. Barik, E. Totoni, Hai Liu, Chick Markley, A. Fox, T. Shpeisman","doi":"10.1145/2908080.2908105","DOIUrl":"https://doi.org/10.1145/2908080.2908105","url":null,"abstract":"Deep neural networks (DNNs) have undergone a surge in popularity with consistent advances in the state of the art for tasks including image recognition, natural language processing, and speech recognition. The computationally expensive nature of these networks has led to the proliferation of implementations that sacrifice abstraction for high performance. In this paper, we present Latte, a domain-specific language for DNNs that provides a natural abstraction for specifying new layers without sacrificing performance. Users of Latte express DNNs as ensembles of neurons with connections between them. The Latte compiler synthesizes a program based on the user specification, applies a suite of domain-specific and general optimizations, and emits efficient machine code for heterogeneous architectures. Latte also includes a communication runtime for distributed memory data-parallelism. Using networks described using Latte, we demonstrate 3-6x speedup over Caffe (C++/MKL) on the three state-of-the-art ImageNet models executing on an Intel Xeon E5-2699 v3 x86 CPU.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133264517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuting Chen, Ting Su, Chengnian Sun, Z. Su, Jianjun Zhao
Java virtual machine (JVM) is a core technology, whose reliability is critical. Testing JVM implementations requires painstaking effort in designing test classfiles (*.class) along with their test oracles. An alternative is to employ binary fuzzing to differentially test JVMs by blindly mutating seeding classfiles and then executing the resulting mutants on different JVM binaries for revealing inconsistent behaviors. However, this blind approach is not cost effective in practice because most of the mutants are invalid and redundant. This paper tackles this challenge by introducing classfuzz, a coverage-directed fuzzing approach that focuses on representative classfiles for differential testing of JVMs’ startup processes. Our core insight is to (1) mutate seeding classfiles using a set of predefined mutation operators (mutators) and employ Markov Chain Monte Carlo (MCMC) sampling to guide mutator selection, and (2) execute the mutants on a reference JVM implementation and use coverage uniqueness as a discipline for accepting representative ones. The accepted classfiles are used as inputs to differentially test different JVM implementations and find defects. We have implemented classfuzz and conducted an extensive evaluation of it against existing fuzz testing algorithms. Our evaluation results show that classfuzz can enhance the ratio of discrepancy-triggering classfiles from 1.7% to 11.9%. We have also reported 62 JVM discrepancies, along with the test classfiles, to JVM developers. Many of our reported issues have already been confirmed as JVM defects, and some even match recent clarifications and changes to the Java SE 8 edition of the JVM specification.
Java虚拟机(JVM)是一项核心技术,其可靠性至关重要。测试JVM实现需要在设计测试类文件(*.class)及其测试oracle时付出艰苦的努力。另一种方法是使用二进制模糊测试对JVM进行差异测试,方法是盲目地改变种子类文件,然后在不同的JVM二进制文件上执行结果的改变,以揭示不一致的行为。然而,由于大多数突变体是无效的和冗余的,这种盲方法在实践中并不具有成本效益。本文通过引入classfuzz来解决这个问题,classfuzz是一种面向覆盖率的模糊测试方法,主要关注用于对jvm启动过程进行差异测试的代表性类文件。我们的核心观点是:(1)使用一组预定义的突变操作符(mutators)来改变种子类文件,并使用马尔可夫链蒙特卡罗(Markov Chain Monte Carlo, MCMC)采样来指导突变符的选择,(2)在参考JVM实现上执行突变,并使用覆盖唯一性作为接受代表性突变的准则。接受的类文件用作输入,以不同的方式测试不同的JVM实现并发现缺陷。我们已经实现了classfuzz,并针对现有的模糊测试算法对其进行了广泛的评估。我们的评估结果表明,classfuzz可以将触发差异的类文件的比率从1.7%提高到11.9%。我们还向JVM开发人员报告了62个JVM差异,以及测试类文件。我们报告的许多问题已经被确认为JVM缺陷,其中一些甚至与Java SE 8版JVM规范的最新澄清和更改相匹配。
{"title":"Coverage-directed differential testing of JVM implementations","authors":"Yuting Chen, Ting Su, Chengnian Sun, Z. Su, Jianjun Zhao","doi":"10.1145/2908080.2908095","DOIUrl":"https://doi.org/10.1145/2908080.2908095","url":null,"abstract":"Java virtual machine (JVM) is a core technology, whose reliability is critical. Testing JVM implementations requires painstaking effort in designing test classfiles (*.class) along with their test oracles. An alternative is to employ binary fuzzing to differentially test JVMs by blindly mutating seeding classfiles and then executing the resulting mutants on different JVM binaries for revealing inconsistent behaviors. However, this blind approach is not cost effective in practice because most of the mutants are invalid and redundant. This paper tackles this challenge by introducing classfuzz, a coverage-directed fuzzing approach that focuses on representative classfiles for differential testing of JVMs’ startup processes. Our core insight is to (1) mutate seeding classfiles using a set of predefined mutation operators (mutators) and employ Markov Chain Monte Carlo (MCMC) sampling to guide mutator selection, and (2) execute the mutants on a reference JVM implementation and use coverage uniqueness as a discipline for accepting representative ones. The accepted classfiles are used as inputs to differentially test different JVM implementations and find defects. We have implemented classfuzz and conducted an extensive evaluation of it against existing fuzz testing algorithms. Our evaluation results show that classfuzz can enhance the ratio of discrepancy-triggering classfiles from 1.7% to 11.9%. We have also reported 62 JVM discrepancies, along with the test classfiles, to JVM developers. Many of our reported issues have already been confirmed as JVM defects, and some even match recent clarifications and changes to the Java SE 8 edition of the JVM specification.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125638995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present Flix, a declarative programming language for specifying and solving least fixed point problems, particularly static program analyses. Flix is inspired by Datalog and extends it with lattices and monotone functions. Using Flix, implementors of static analyses can express a broader range of analyses than is currently possible in pure Datalog, while retaining its familiar rule-based syntax. We define a model-theoretic semantics of Flix as a natural extension of the Datalog semantics. This semantics captures the declarative meaning of Flix programs without imposing any specific evaluation strategy. An efficient strategy is semi-naive evaluation which we adapt for Flix. We have implemented a compiler and runtime for Flix, and used it to express several well-known static analyses, including the IFDS and IDE algorithms. The declarative nature of Flix clearly exposes the similarity between these two algorithms.
{"title":"From Datalog to flix: a declarative language for fixed points on lattices","authors":"Magnus Madsen, Ming-Ho Yee, O. Lhoták","doi":"10.1145/2908080.2908096","DOIUrl":"https://doi.org/10.1145/2908080.2908096","url":null,"abstract":"We present Flix, a declarative programming language for specifying and solving least fixed point problems, particularly static program analyses. Flix is inspired by Datalog and extends it with lattices and monotone functions. Using Flix, implementors of static analyses can express a broader range of analyses than is currently possible in pure Datalog, while retaining its familiar rule-based syntax. We define a model-theoretic semantics of Flix as a natural extension of the Datalog semantics. This semantics captures the declarative meaning of Flix programs without imposing any specific evaluation strategy. An efficient strategy is semi-naive evaluation which we adapt for Flix. We have implemented a compiler and runtime for Flix, and used it to express several well-known static analyses, including the IFDS and IDE algorithms. The declarative nature of Flix clearly exposes the similarity between these two algorithms.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116355916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rohit Sinha, Manuel Costa, A. Lal, Nuno P. Lopes, S. Rajamani, S. Seshia, K. Vaswani
Hardware support for isolated execution (such as Intel SGX) enables development of applications that keep their code and data confidential even while running in a hostile or compromised host. However, automatically verifying that such applications satisfy confidentiality remains challenging. We present a methodology for designing such applications in a way that enables certifying their confidentiality. Our methodology consists of forcing the application to communicate with the external world through a narrow interface, compiling it with runtime checks that aid verification, and linking it with a small runtime that implements the narrow interface. The runtime includes services such as secure communication channels and memory management. We formalize this restriction on the application as Information Release Confinement (IRC), and we show that it allows us to decompose the task of proving confidentiality into (a) one-time, human-assisted functional verification of the runtime to ensure that it does not leak secrets, (b) automatic verification of the application's machine code to ensure that it satisfies IRC and does not directly read or corrupt the runtime's internal state. We present /CONFIDENTIAL: a verifier for IRC that is modular, automatic, and keeps our compiler out of the trusted computing base. Our evaluation suggests that the methodology scales to real-world applications.
{"title":"A design and verification methodology for secure isolated regions","authors":"Rohit Sinha, Manuel Costa, A. Lal, Nuno P. Lopes, S. Rajamani, S. Seshia, K. Vaswani","doi":"10.1145/2908080.2908113","DOIUrl":"https://doi.org/10.1145/2908080.2908113","url":null,"abstract":"Hardware support for isolated execution (such as Intel SGX) enables development of applications that keep their code and data confidential even while running in a hostile or compromised host. However, automatically verifying that such applications satisfy confidentiality remains challenging. We present a methodology for designing such applications in a way that enables certifying their confidentiality. Our methodology consists of forcing the application to communicate with the external world through a narrow interface, compiling it with runtime checks that aid verification, and linking it with a small runtime that implements the narrow interface. The runtime includes services such as secure communication channels and memory management. We formalize this restriction on the application as Information Release Confinement (IRC), and we show that it allows us to decompose the task of proving confidentiality into (a) one-time, human-assisted functional verification of the runtime to ensure that it does not leak secrets, (b) automatic verification of the application's machine code to ensure that it satisfies IRC and does not directly read or corrupt the runtime's internal state. We present /CONFIDENTIAL: a verifier for IRC that is modular, automatic, and keeps our compiler out of the trusted computing base. Our evaluation suggests that the methodology scales to real-world applications.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125918574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A computation tree of a program execution describes computations of functions and their dependencies. A computation tree describes how a program works and is at the heart of algorithmic debugging. To generate a computation tree, existing algorithmic debuggers either use a complex implementation or yield a less informative approximation. We present a method for lazy functional languages that requires only a simple tracing library to generate a detailed computation tree. With our algorithmic debugger a programmer can debug any Haskell program by only importing our library and annotating suspected functions.
{"title":"Lightweight computation tree tracing for lazy functional languages","authors":"M. Faddegon, O. Chitil","doi":"10.1145/2908080.2908104","DOIUrl":"https://doi.org/10.1145/2908080.2908104","url":null,"abstract":"A computation tree of a program execution describes computations of functions and their dependencies. A computation tree describes how a program works and is at the heart of algorithmic debugging. To generate a computation tree, existing algorithmic debuggers either use a complex implementation or yield a less informative approximation. We present a method for lazy functional languages that requires only a simple tracing library to generate a detailed computation tree. With our algorithmic debugger a programmer can debug any Haskell program by only importing our library and annotating suspected functions.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121700111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reasoning about floating-point is difficult and becomes only more so if there is an interplay between floating-point and bit-level operations. Even though real-world floating-point libraries use implementations that have such mixed computations, no systematic technique to verify the correctness of the implementations of such computations is known. In this paper, we present the first general technique for verifying the correctness of mixed binaries, which combines abstraction, analytical optimization, and testing. The technique provides a method to compute an error bound of a given implementation with respect to its mathematical specification. We apply our technique to Intel's implementations of transcendental functions and prove formal error bounds for these widely used routines.
{"title":"Verifying bit-manipulations of floating-point","authors":"Wonyeol Lee, Rahul Sharma, A. Aiken","doi":"10.1145/2908080.2908107","DOIUrl":"https://doi.org/10.1145/2908080.2908107","url":null,"abstract":"Reasoning about floating-point is difficult and becomes only more so if there is an interplay between floating-point and bit-level operations. Even though real-world floating-point libraries use implementations that have such mixed computations, no systematic technique to verify the correctness of the implementations of such computations is known. In this paper, we present the first general technique for verifying the correctness of mixed binaries, which combines abstraction, analytical optimization, and testing. The technique provides a method to compute an error bound of a given implementation with respect to its mathematical specification. We apply our technique to Intel's implementations of transcendental functions and prove formal error bounds for these widely used routines.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117036039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many applications require specialized data structures not found in the standard libraries, but implementing new data structures by hand is tedious and error-prone. This paper presents a novel approach for synthesizing efficient implementations of complex collection data structures from high-level specifications that describe the desired retrieval operations. Our approach handles a wider range of data structures than previous work, including structures that maintain an order among their elements or have complex retrieval methods. We have prototyped our approach in a data structure synthesizer called Cozy. Four large, real-world case studies compare structures generated by Cozy against handwritten implementations in terms of correctness and performance. Structures synthesized by Cozy match the performance of handwritten data structures while avoiding human error.
{"title":"Fast synthesis of fast collections","authors":"Calvin Loncaric, E. Torlak, Michael D. Ernst","doi":"10.1145/2908080.2908122","DOIUrl":"https://doi.org/10.1145/2908080.2908122","url":null,"abstract":"Many applications require specialized data structures not found in the standard libraries, but implementing new data structures by hand is tedious and error-prone. This paper presents a novel approach for synthesizing efficient implementations of complex collection data structures from high-level specifications that describe the desired retrieval operations. Our approach handles a wider range of data structures than previous work, including structures that maintain an order among their elements or have complex retrieval methods. We have prototyped our approach in a data structure synthesizer called Cozy. Four large, real-world case studies compare structures generated by Cozy against handwritten implementations in terms of correctness and performance. Structures synthesized by Cozy match the performance of handwritten data structures while avoiding human error.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"7 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120852303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unlike safety properties which require the absence of a “bad” program trace, k-safety properties stipulate the absence of a “bad” interaction between k traces. Examples of k-safety properties include transitivity, associativity, anti-symmetry, and monotonicity. This paper presents a sound and relatively complete calculus, called Cartesian Hoare Logic (CHL), for verifying k-safety properties. We also present an automated verification algorithm based on CHL and implement it in a tool called DESCARTES. We use DESCARTES to analyze user-defined relational operators in Java and demonstrate that DESCARTES is effective at verifying (or finding violations of) multiple k-safety properties.
{"title":"Cartesian hoare logic for verifying k-safety properties","authors":"Marcelo Sousa, Işıl Dillig","doi":"10.1145/2908080.2908092","DOIUrl":"https://doi.org/10.1145/2908080.2908092","url":null,"abstract":"Unlike safety properties which require the absence of a “bad” program trace, k-safety properties stipulate the absence of a “bad” interaction between k traces. Examples of k-safety properties include transitivity, associativity, anti-symmetry, and monotonicity. This paper presents a sound and relatively complete calculus, called Cartesian Hoare Logic (CHL), for verifying k-safety properties. We also present an automated verification algorithm based on CHL and implement it in a tool called DESCARTES. We use DESCARTES to analyze user-defined relational operators in Java and demonstrate that DESCARTES is effective at verifying (or finding violations of) multiple k-safety properties.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126652732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changwan Hong, Wenlei Bao, Albert Cohen, S. Krishnamoorthy, L. Pouchet, F. Rastello, J. Ramanujam, P. Sadayappan
Caches are used to significantly improve performance. Even with high degrees of set associativity, the number of accessed data elements mapping to the same set in a cache can easily exceed the degree of associativity. This can cause conflict misses and lower performance, even if the working set is much smaller than cache capacity. Array padding (increasing the size of array dimensions) is a well-known optimization technique that can reduce conflict misses. In this paper, we develop the first algorithms for optimal padding of arrays aimed at a set-associative cache for arbitrary tile sizes. In addition, we develop the first solution to padding for nested tiles and multi-level caches. Experimental results with multiple benchmarks demonstrate a significant performance improvement from padding.
{"title":"Effective padding of multidimensional arrays to avoid cache conflict misses","authors":"Changwan Hong, Wenlei Bao, Albert Cohen, S. Krishnamoorthy, L. Pouchet, F. Rastello, J. Ramanujam, P. Sadayappan","doi":"10.1145/2908080.2908123","DOIUrl":"https://doi.org/10.1145/2908080.2908123","url":null,"abstract":"Caches are used to significantly improve performance. Even with high degrees of set associativity, the number of accessed data elements mapping to the same set in a cache can easily exceed the degree of associativity. This can cause conflict misses and lower performance, even if the working set is much smaller than cache capacity. Array padding (increasing the size of array dimensions) is a well-known optimization technique that can reduce conflict misses. In this paper, we develop the first algorithms for optimal padding of arrays aimed at a set-associative cache for arbitrary tile sizes. In addition, we develop the first solution to padding for nested tiles and multi-level caches. Experimental results with multiple benchmarks demonstrate a significant performance improvement from padding.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114980120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}