Pantazis Deligiannis, A. Donaldson, J. Ketema, A. Lal, Paul Thomson
Programming efficient asynchronous systems is challenging because it can often be hard to express the design declaratively, or to defend against data races and interleaving-dependent assertion violations. Previous work has only addressed these challenges in isolation, by either designing a new declarative language, a new data race detection tool or a new testing technique. We present P#, a language for high-reliability asynchronous programming co-designed with a static data race analysis and systematic concurrency testing infrastructure. We describe our experience using P# to write several distributed protocols and port an industrial-scale system internal to Microsoft, showing that the combined techniques, by leveraging the design of P#, are effective in finding bugs.
{"title":"Asynchronous programming, analysis and testing with state machines","authors":"Pantazis Deligiannis, A. Donaldson, J. Ketema, A. Lal, Paul Thomson","doi":"10.1145/2737924.2737996","DOIUrl":"https://doi.org/10.1145/2737924.2737996","url":null,"abstract":"Programming efficient asynchronous systems is challenging because it can often be hard to express the design declaratively, or to defend against data races and interleaving-dependent assertion violations. Previous work has only addressed these challenges in isolation, by either designing a new declarative language, a new data race detection tool or a new testing technique. We present P#, a language for high-reliability asynchronous programming co-designed with a static data race analysis and systematic concurrency testing infrastructure. We describe our experience using P# to write several distributed protocols and port an industrial-scale system internal to Microsoft, showing that the combined techniques, by leveraging the design of P#, are effective in finding bugs.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128580629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Energy harvesting enables novel devices and applications without batteries, but intermittent operation under energy harvesting poses new challenges to memory consistency that threaten to leave applications in failed states not reachable in continuous execution. This paper presents analytical models that aid in reasoning about intermittence. Using these, we develop DINO (Death Is Not an Option), a programming and execution model that simplifies programming for intermittent systems and ensures volatile and nonvolatile data consistency despite near-constant interruptions. DINO is the first system to address these consistency problems in the context of intermittent execution. We evaluate DINO on three energy-harvesting hardware platforms running different applications. The applications fail and exhibit error without DINO, but run correctly with DINO’s modest 1.8–2.7× run-time overhead. DINO also dramatically simplifies programming, reducing the set of possible failure- related control transfers by 5–9×.
能量收集可以实现无需电池的新型设备和应用,但能量收集下的间歇性操作对内存一致性提出了新的挑战,可能会使应用程序在连续执行时处于无法访问的失败状态。本文提出了有助于对间歇性进行推理的分析模型。利用这些,我们开发了DINO (Death Is Not an Option),这是一种编程和执行模型,可以简化间歇性系统的编程,并确保在几乎不断中断的情况下,volatile和non - volatile数据的一致性。DINO是第一个在间歇执行环境下解决这些一致性问题的系统。我们在运行不同应用程序的三种能量收集硬件平台上对DINO进行了评估。如果没有DINO,应用程序会失败并显示错误,但在DINO适度的1.8 - 2.7倍运行时开销下,应用程序可以正常运行。DINO还大大简化了编程,减少了5 - 9x可能的故障相关控制传输集。
{"title":"A simpler, safer programming and execution model for intermittent systems","authors":"Brandon Lucia, Benjamin Ransford","doi":"10.1145/2737924.2737978","DOIUrl":"https://doi.org/10.1145/2737924.2737978","url":null,"abstract":"Energy harvesting enables novel devices and applications without batteries, but intermittent operation under energy harvesting poses new challenges to memory consistency that threaten to leave applications in failed states not reachable in continuous execution. This paper presents analytical models that aid in reasoning about intermittence. Using these, we develop DINO (Death Is Not an Option), a programming and execution model that simplifies programming for intermittent systems and ensures volatile and nonvolatile data consistency despite near-constant interruptions. DINO is the first system to address these consistency problems in the context of intermittent execution. We evaluate DINO on three energy-harvesting hardware platforms running different applications. The applications fail and exhibit error without DINO, but run correctly with DINO’s modest 1.8–2.7× run-time overhead. DINO also dramatically simplifies programming, reducing the set of possible failure- related control transfers by 5–9×.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124682753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bin Ren, Youngjoon Jo, S. Krishnamoorthy, Kunal Agrawal, Milind Kulkarni
The pursuit of computational efficiency has led to the proliferation of throughput-oriented hardware, from GPUs to increasingly wide vector units on commodity processors and accelerators. This hardware is designed to efficiently execute data-parallel computations in a vectorized manner. However, many algorithms are more naturally expressed as divide-and-conquer, recursive, task-parallel computations. In the absence of data parallelism, it seems that such algorithms are not well suited to throughput-oriented architectures. This paper presents a set of novel code transformations that expose the data parallelism latent in recursive, task-parallel programs. These transformations facilitate straightforward vectorization of task-parallel programs on commodity hardware. We also present scheduling policies that maintain high utilization of vector resources while limiting space usage. Across several task-parallel benchmarks, we demonstrate both efficient vector resource utilization and substantial speedup on chips using Intel’s SSE4.2 vector units, as well as accelerators using Intel’s AVX512 units.
{"title":"Efficient execution of recursive programs on commodity vector hardware","authors":"Bin Ren, Youngjoon Jo, S. Krishnamoorthy, Kunal Agrawal, Milind Kulkarni","doi":"10.1145/2737924.2738004","DOIUrl":"https://doi.org/10.1145/2737924.2738004","url":null,"abstract":"The pursuit of computational efficiency has led to the proliferation of throughput-oriented hardware, from GPUs to increasingly wide vector units on commodity processors and accelerators. This hardware is designed to efficiently execute data-parallel computations in a vectorized manner. However, many algorithms are more naturally expressed as divide-and-conquer, recursive, task-parallel computations. In the absence of data parallelism, it seems that such algorithms are not well suited to throughput-oriented architectures. This paper presents a set of novel code transformations that expose the data parallelism latent in recursive, task-parallel programs. These transformations facilitate straightforward vectorization of task-parallel programs on commodity hardware. We also present scheduling policies that maintain high utilization of vector resources while limiting space usage. Across several task-parallel benchmarks, we demonstrate both efficient vector resource utilization and substantial speedup on chips using Intel’s SSE4.2 vector units, as well as accelerators using Intel’s AVX512 units.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"48 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115280671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a complete method for synthesizing lexicographic linear ranking functions (and thus proving termination), supported by inductive invariants, in the case where the transition relation of the program includes disjunctions and existentials (large block encoding of control flow). Previous work would either synthesize a ranking function at every basic block head, not just loop headers, which reduces the scope of programs that may be proved to be terminating, or expand large block transitions including tests into (exponentially many) elementary transitions, prior to computing the ranking function, resulting in a very large global constraint system. In contrast, our algorithm incrementally refines a global linear constraint system according to extremal counterexamples: only constraints that exclude spurious solutions are included. Experiments with our tool Termite show marked performance and scalability improvements compared to other systems.
{"title":"Synthesis of ranking functions using extremal counterexamples","authors":"L. Gonnord, D. Monniaux, Gabriel Radanne","doi":"10.1145/2737924.2737976","DOIUrl":"https://doi.org/10.1145/2737924.2737976","url":null,"abstract":"We present a complete method for synthesizing lexicographic linear ranking functions (and thus proving termination), supported by inductive invariants, in the case where the transition relation of the program includes disjunctions and existentials (large block encoding of control flow). Previous work would either synthesize a ranking function at every basic block head, not just loop headers, which reduces the scope of programs that may be proved to be terminating, or expand large block transitions including tests into (exponentially many) elementary transitions, prior to computing the ranking function, resulting in a very large global constraint system. In contrast, our algorithm incrementally refines a global linear constraint system according to extremal counterexamples: only constraints that exclude spurious solutions are included. Experiments with our tool Termite show marked performance and scalability improvements compared to other systems.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"48 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120920953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peter Gammie, Antony Lloyd Hosking, Kai Engelhardt
We report on a machine-checked verification of safety for a state-of-the-art, on-the-fly, concurrent, mark-sweep garbage collector that is designed for multi-core architectures with weak memory consistency. The proof explicitly incorporates the relaxed memory semantics of x86 multiprocessors. To our knowledge, this is the first fully machine-checked proof of safety for such a garbage collector. We couch the proof in a framework that system implementers will find appealing, with the fundamental components of the system specified in a simple and intuitive programming language. The abstract model is detailed enough for its correspondence with an assembly language implementation to be straightforward.
{"title":"Relaxing safely: verified on-the-fly garbage collection for x86-TSO","authors":"Peter Gammie, Antony Lloyd Hosking, Kai Engelhardt","doi":"10.1145/2737924.2738006","DOIUrl":"https://doi.org/10.1145/2737924.2738006","url":null,"abstract":"We report on a machine-checked verification of safety for a state-of-the-art, on-the-fly, concurrent, mark-sweep garbage collector that is designed for multi-core architectures with weak memory consistency. The proof explicitly incorporates the relaxed memory semantics of x86 multiprocessors. To our knowledge, this is the first fully machine-checked proof of safety for such a garbage collector. We couch the proof in a framework that system implementers will find appealing, with the fundamental components of the system specified in a simple and intuitive programming language. The abstract model is detailed enough for its correspondence with an assembly language implementation to be straightforward.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123584273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a ``negative'' semantics of the C11 language---a semantics that does not just give meaning to correct programs, but also rejects undefined programs. We investigate undefined behavior in C and discuss the techniques and special considerations needed for formally specifying it. We have used these techniques to modify and extend a semantics of C into one that captures undefined behavior. The amount of semantic infrastructure and effort required to achieve this was unexpectedly high, in the end nearly doubling the size of the original semantics. From our semantics, we have automatically extracted an undefinedness checker, which we evaluate against other popular analysis tools, using our own test suite in addition to a third-party test suite. Our checker is capable of detecting examples of all 77 categories of core language undefinedness appearing in the C11 standard, more than any other tool we considered. Based on this evaluation, we argue that our work is the most comprehensive and complete semantic treatment of undefined behavior in C, and thus of the C language itself.
{"title":"Defining the undefinedness of C","authors":"Chris Hathhorn, Chucky Ellison, Grigore Roşu","doi":"10.1145/2737924.2737979","DOIUrl":"https://doi.org/10.1145/2737924.2737979","url":null,"abstract":"We present a ``negative'' semantics of the C11 language---a semantics that does not just give meaning to correct programs, but also rejects undefined programs. We investigate undefined behavior in C and discuss the techniques and special considerations needed for formally specifying it. We have used these techniques to modify and extend a semantics of C into one that captures undefined behavior. The amount of semantic infrastructure and effort required to achieve this was unexpectedly high, in the end nearly doubling the size of the original semantics. From our semantics, we have automatically extracted an undefinedness checker, which we evaluate against other popular analysis tools, using our own test suite in addition to a third-party test suite. Our checker is capable of detecting examples of all 77 categories of core language undefinedness appearing in the C11 standard, more than any other tool we considered. Based on this evaluation, we argue that our work is the most comprehensive and complete semantic treatment of undefined behavior in C, and thus of the C language itself.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128571343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. J. Bowman, Swaha Miller, Vincent St-Amour, R. Dybvig
Contemporary compiler systems such as GCC, .NET, and LLVM incorporate profile-guided optimizations (PGOs) on low-level intermediate code and basic blocks, with impressive results over purely static heuristics. Recent work shows that profile information is also useful for performing source-to-source optimizations via meta-programming. For example, using profiling information to inform decisions about data structures and algorithms can potentially lead to asymptotic improvements in performance. We present a design for profile-guided meta-programming in a general-purpose meta-programming system. Our design is parametric over the particular profiler and meta-programming system. We implement this design in two different meta-programming systems---the syntactic extensions systems of Chez Scheme and Racket---and provide several profile-guided meta-programs as usability case studies.
{"title":"Profile-guided meta-programming","authors":"W. J. Bowman, Swaha Miller, Vincent St-Amour, R. Dybvig","doi":"10.1145/2737924.2737990","DOIUrl":"https://doi.org/10.1145/2737924.2737990","url":null,"abstract":"Contemporary compiler systems such as GCC, .NET, and LLVM incorporate profile-guided optimizations (PGOs) on low-level intermediate code and basic blocks, with impressive results over purely static heuristics. Recent work shows that profile information is also useful for performing source-to-source optimizations via meta-programming. For example, using profiling information to inform decisions about data structures and algorithms can potentially lead to asymptotic improvements in performance. We present a design for profile-guided meta-programming in a general-purpose meta-programming system. Our design is parametric over the particular profiler and meta-programming system. We implement this design in two different meta-programming systems---the syntactic extensions systems of Chez Scheme and Racket---and provide several profile-guided meta-programs as usability case studies.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126268848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces three new compiler transformations for representing and transforming sparse matrix computations and their data representations. In cooperation with run-time inspection, our compiler derives transformed matrix representations and associated transformed code to implement a variety of representations targeting different architecture platforms. This systematic approach to combining code and data transformations on sparse computations, which extends a polyhedral transformation and code generation framework, permits the compiler to compose these transformations with other transformations to generate code that is on average within 5% and often exceeds manually-tuned, high-performance sparse matrix libraries CUSP and OSKI. Additionally, the compiler-generated inspector codes are on average 1.5 faster than OSKI and perform comparably to CUSP, respectively.
{"title":"Loop and data transformations for sparse matrix code","authors":"Anand Venkat, Mary W. Hall, M. Strout","doi":"10.1145/2737924.2738003","DOIUrl":"https://doi.org/10.1145/2737924.2738003","url":null,"abstract":"This paper introduces three new compiler transformations for representing and transforming sparse matrix computations and their data representations. In cooperation with run-time inspection, our compiler derives transformed matrix representations and associated transformed code to implement a variety of representations targeting different architecture platforms. This systematic approach to combining code and data transformations on sparse computations, which extends a polyhedral transformation and code generation framework, permits the compiler to compose these transformations with other transformations to generate code that is on average within 5% and often exceeds manually-tuned, high-performance sparse matrix libraries CUSP and OSKI. Additionally, the compiler-generated inspector codes are on average 1.5 faster than OSKI and perform comparably to CUSP, respectively.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"os-37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127776530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the problem of automated reasoning about dynamically manipulated data structures. Essential properties are encoded as predicates whose definitions are formalized via user-defined recursive rules. Traditionally, proving relationships between such properties is limited to the unfold-and-match (U+M) paradigm which employs systematic transformation steps of folding/unfolding the rules. A proof, using U+M, succeeds when we find a sequence of transformations that produces a final formula which is obviously provable by simply matching terms. Our contribution here is the addition of the fundamental principle of induction to this automated process. We first show that some proof obligations that are dynamically generated in the process can be used as induction hypotheses in the future, and then we show how to use these hypotheses in an induction step which generates a new proof obligation aside from those obtained by using the fold/unfold operations. While the adding of induction is an obvious need in general, no automated method has managed to include this in a systematic and general way. The main reason for this is the problem of avoiding circular reasoning. We overcome this with a novel checking condition. In summary, our contribution is a proof method which – beyond U+M – performs automatic formula re-writing by treating previously encountered obligations in each proof path as possible induction hypotheses. In the practical evaluation part of this paper, we show how the commonly used technique of using unproven lemmas can be avoided, using realistic benchmarks. This not only removes the current burden of coming up with the appropriate lemmas, but also significantly boosts up the verification process, since lemma applications, coupled with unfolding, often induce a large search space. In the end, our method can automatically reason about a new class of formulas arising from practical program verification.
{"title":"Automatic induction proofs of data-structures in imperative programs","authors":"D. Chu, J. Jaffar, Minh-Thai Trinh","doi":"10.1145/2737924.2737984","DOIUrl":"https://doi.org/10.1145/2737924.2737984","url":null,"abstract":"We consider the problem of automated reasoning about dynamically manipulated data structures. Essential properties are encoded as predicates whose definitions are formalized via user-defined recursive rules. Traditionally, proving relationships between such properties is limited to the unfold-and-match (U+M) paradigm which employs systematic transformation steps of folding/unfolding the rules. A proof, using U+M, succeeds when we find a sequence of transformations that produces a final formula which is obviously provable by simply matching terms. Our contribution here is the addition of the fundamental principle of induction to this automated process. We first show that some proof obligations that are dynamically generated in the process can be used as induction hypotheses in the future, and then we show how to use these hypotheses in an induction step which generates a new proof obligation aside from those obtained by using the fold/unfold operations. While the adding of induction is an obvious need in general, no automated method has managed to include this in a systematic and general way. The main reason for this is the problem of avoiding circular reasoning. We overcome this with a novel checking condition. In summary, our contribution is a proof method which – beyond U+M – performs automatic formula re-writing by treating previously encountered obligations in each proof path as possible induction hypotheses. In the practical evaluation part of this paper, we show how the commonly used technique of using unproven lemmas can be avoided, using realistic benchmarks. This not only removes the current burden of coming up with the appropriate lemmas, but also significantly boosts up the verification process, since lemma applications, coupled with unfolding, often induce a large search space. In the end, our method can automatically reason about a new class of formulas arising from practical program verification.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121954407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yufei Ding, Jason Ansel, K. Veeramachaneni, Xipeng Shen, Una-May O’Reilly, Saman P. Amarasinghe
A daunting challenge faced by program performance autotuning is input sensitivity, where the best autotuned configuration may vary with different input sets. This paper presents a novel two-level input learning algorithm to tackle the challenge for an important class of autotuning problems, algorithmic autotuning. The new approach uses a two-level input clustering method to automatically refine input grouping, feature selection, and classifier construction. Its design solves a series of open issues that are particularly essential to algorithmic autotuning, including the enormous optimization space, complex influence by deep input features, high cost in feature extraction, and variable accuracy of algorithmic choices. Experimental results show that the new solution yields up to a 3x speedup over using a single configuration for all inputs, and a 34x speedup over a traditional one-level method for addressing input sensitivity in program optimizations.
{"title":"Autotuning algorithmic choice for input sensitivity","authors":"Yufei Ding, Jason Ansel, K. Veeramachaneni, Xipeng Shen, Una-May O’Reilly, Saman P. Amarasinghe","doi":"10.1145/2737924.2737969","DOIUrl":"https://doi.org/10.1145/2737924.2737969","url":null,"abstract":"A daunting challenge faced by program performance autotuning is input sensitivity, where the best autotuned configuration may vary with different input sets. This paper presents a novel two-level input learning algorithm to tackle the challenge for an important class of autotuning problems, algorithmic autotuning. The new approach uses a two-level input clustering method to automatically refine input grouping, feature selection, and classifier construction. Its design solves a series of open issues that are particularly essential to algorithmic autotuning, including the enormous optimization space, complex influence by deep input features, high cost in feature extraction, and variable accuracy of algorithmic choices. Experimental results show that the new solution yields up to a 3x speedup over using a single configuration for all inputs, and a 34x speedup over a traditional one-level method for addressing input sensitivity in program optimizations.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121151927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}