Recent work in networking, storage and multi-threading has demonstrated improved performance and scalability by replacing kernel-mode interrupts with high-rate user-space polling. Typically, such polling is performed by a dedicated core. Compiler Interrupts (CIs) instead enable efficient, automatic high-rate polling on a shared thread, which performs other work between polls. CIs are instrumentation-based and light-weight, allowing frequent interrupts with little performance impact. For example, when targeting a 5,000 cycle interval, the median overhead of our fastest CI design is 4% vs. 800% for hardware interrupts, across programs in the SPLASH-2, Phoenix and Parsec benchmark suites running with 32 threads. We evaluate CIs on three systems-level applications: (a) kernel bypass networking with mTCP, (b) joint kernel bypass networking and CPU scheduling with Shenango, and (c) delegation, a message-passing alternative to locking, with FFWD. For each application, we find that CIs offer compelling qualitative and quantitative improvements over the current state of the art. For example, CI-based mTCP achieves ≈2× stock mTCP throughput on a sample HTTP application.
{"title":"Frequent background polling on a shared thread, using light-weight compiler interrupts","authors":"Nilanjana Basu, C. Montanari, Jakob Eriksson","doi":"10.1145/3453483.3454107","DOIUrl":"https://doi.org/10.1145/3453483.3454107","url":null,"abstract":"Recent work in networking, storage and multi-threading has demonstrated improved performance and scalability by replacing kernel-mode interrupts with high-rate user-space polling. Typically, such polling is performed by a dedicated core. Compiler Interrupts (CIs) instead enable efficient, automatic high-rate polling on a shared thread, which performs other work between polls. CIs are instrumentation-based and light-weight, allowing frequent interrupts with little performance impact. For example, when targeting a 5,000 cycle interval, the median overhead of our fastest CI design is 4% vs. 800% for hardware interrupts, across programs in the SPLASH-2, Phoenix and Parsec benchmark suites running with 32 threads. We evaluate CIs on three systems-level applications: (a) kernel bypass networking with mTCP, (b) joint kernel bypass networking and CPU scheduling with Shenango, and (c) delegation, a message-passing alternative to locking, with FFWD. For each application, we find that CIs offer compelling qualitative and quantitative improvements over the current state of the art. For example, CI-based mTCP achieves ≈2× stock mTCP throughput on a sample HTTP application.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82170783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sharding is a popular way to achieve scalability in blockchain protocols, increasing their throughput by partitioning the set of transaction validators into a number of smaller committees, splitting the workload. Existing approaches for blockchain sharding, however, do not scale well when concurrent transactions alter the same replicated state component—a common scenario in Ethereum-style smart contracts. We propose a novel approach for efficiently sharding such transactions. It is based on a folklore idea: state-manipulating atomic operations that commute can be processed in parallel, with their cumulative result defined deterministically, while executing non-commuting operations requires one to own the state they alter. We present CoSplit—a static program analysis tool that soundly infers ownership and commutativity summaries for smart contracts and translates those summaries to sharding signatures that are used by the blockchain protocol to maximise parallelism. Our evaluation shows that using CoSplit introduces negligible overhead to the transaction validation cost, while the inferred signatures allow the system to achieve a significant increase in transaction processing throughput for real-world smart contracts.
{"title":"Practical smart contract sharding with ownership and commutativity analysis","authors":"George Pîrlea, Amrit Kumar, Ilya Sergey","doi":"10.1145/3453483.3454112","DOIUrl":"https://doi.org/10.1145/3453483.3454112","url":null,"abstract":"Sharding is a popular way to achieve scalability in blockchain protocols, increasing their throughput by partitioning the set of transaction validators into a number of smaller committees, splitting the workload. Existing approaches for blockchain sharding, however, do not scale well when concurrent transactions alter the same replicated state component—a common scenario in Ethereum-style smart contracts. We propose a novel approach for efficiently sharding such transactions. It is based on a folklore idea: state-manipulating atomic operations that commute can be processed in parallel, with their cumulative result defined deterministically, while executing non-commuting operations requires one to own the state they alter. We present CoSplit—a static program analysis tool that soundly infers ownership and commutativity summaries for smart contracts and translates those summaries to sharding signatures that are used by the blockchain protocol to maximise parallelism. Our evaluation shows that using CoSplit introduces negligible overhead to the transaction validation cost, while the inferred signatures allow the system to achieve a significant increase in transaction processing throughput for real-world smart contracts.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"430 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83600411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Implementations of many data structures use several correlated fields to improve their performance; however, inconsistencies between these fields can be a source of serious program errors. To address this problem, we propose a new technique for automatically refining data structures from integrity constraints. In particular, consider a data structure D with fields F and methods M, as well as a new set of auxiliary fields F′ that should be added to D. Given this input and an integrity constraint Φ relating F and F′, our method automatically generates a refinement of D that satisfies the provided integrity constraint. Our method is based on a modular instantiation of the CEGIS paradigm and uses a novel inductive synthesizer that augments top-down search with three key ideas. First, it computes necessary preconditions of partial programs to dramatically prune its search space. Second, it augments the grammar with promising new productions by leveraging the computed preconditions. Third, it guides top-down search using a probabilistic context-free grammar obtained by statically analyzing the integrity checking function and the original code base. We evaluated our method on 25 data structures from popular Java projects and show that our method can successfully refine 23 of them. We also compare our method against two state-of-the-art synthesis tools and perform an ablation study to justify our design choices. Our evaluation shows that (1) our method is successful at refining many data structure implementations in the wild, (2) it advances the state-of-the-art in synthesis, and (3) our proposed ideas are crucial for making this technique practical.
{"title":"Synthesizing data structure refinements from integrity constraints","authors":"Shankara Pailoor, Yuepeng Wang, Xinyu Wang, Işıl Dillig","doi":"10.1145/3453483.3454063","DOIUrl":"https://doi.org/10.1145/3453483.3454063","url":null,"abstract":"Implementations of many data structures use several correlated fields to improve their performance; however, inconsistencies between these fields can be a source of serious program errors. To address this problem, we propose a new technique for automatically refining data structures from integrity constraints. In particular, consider a data structure D with fields F and methods M, as well as a new set of auxiliary fields F′ that should be added to D. Given this input and an integrity constraint Φ relating F and F′, our method automatically generates a refinement of D that satisfies the provided integrity constraint. Our method is based on a modular instantiation of the CEGIS paradigm and uses a novel inductive synthesizer that augments top-down search with three key ideas. First, it computes necessary preconditions of partial programs to dramatically prune its search space. Second, it augments the grammar with promising new productions by leveraging the computed preconditions. Third, it guides top-down search using a probabilistic context-free grammar obtained by statically analyzing the integrity checking function and the original code base. We evaluated our method on 25 data structures from popular Java projects and show that our method can successfully refine 23 of them. We also compare our method against two state-of-the-art synthesis tools and perform an ablation study to justify our design choices. Our evaluation shows that (1) our method is successful at refining many data structure implementations in the wild, (2) it advances the state-of-the-art in synthesis, and (3) our proposed ideas are crucial for making this technique practical.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91082715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huaipan Jiang, Haibo Zhang, Xulong Tang, V. Govindaraj, J. Sampson, M. Kandemir, Danfeng Zhang
In this work, we introduce the Fluid framework, a set of language, compiler and runtime extensions that allow for the expression of regions within which dataflow dependencies can be approximated in a disciplined manner. Our framework allows the eager execution of dependent tasks before their inputs have finalized in order to capitalize on situations where an eagerly-consumed input has a high probability of sufficiently resembling the value or structure of the final value that would have been produced in a conservative/precise execution schedule. We introduce controlled access to the early consumption of intermediate values and provide hooks for user-specified quality assurance mechanisms that can automatically enforce re-execution of eagerly-executed tasks if their output values do not meet heuristic expectations. Our experimental analysis indicates that the fluidized versions of the applications bring 22.2% average execution time improvements, over their original counterparts, under the default values of our fluidization parameters. The Fluid approach is largely orthogonal to approaches that aim to reduce the task effort itself and we show that utilizing the Fluid framework can yield benefits for both originally precise and originally approximate versions of computation.
{"title":"Fluid: a framework for approximate concurrency via controlled dependency relaxation","authors":"Huaipan Jiang, Haibo Zhang, Xulong Tang, V. Govindaraj, J. Sampson, M. Kandemir, Danfeng Zhang","doi":"10.1145/3453483.3454042","DOIUrl":"https://doi.org/10.1145/3453483.3454042","url":null,"abstract":"In this work, we introduce the Fluid framework, a set of language, compiler and runtime extensions that allow for the expression of regions within which dataflow dependencies can be approximated in a disciplined manner. Our framework allows the eager execution of dependent tasks before their inputs have finalized in order to capitalize on situations where an eagerly-consumed input has a high probability of sufficiently resembling the value or structure of the final value that would have been produced in a conservative/precise execution schedule. We introduce controlled access to the early consumption of intermediate values and provide hooks for user-specified quality assurance mechanisms that can automatically enforce re-execution of eagerly-executed tasks if their output values do not meet heuristic expectations. Our experimental analysis indicates that the fluidized versions of the applications bring 22.2% average execution time improvements, over their original counterparts, under the default values of our fluidization parameters. The Fluid approach is largely orthogonal to approaches that aim to reduce the task effort itself and we show that utilizing the Fluid framework can yield benefits for both originally precise and originally approximate versions of computation.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74373132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rohan Basu Roy, Tirthak Patel, V. Gadepally, Devesh Tiwari
As parallel applications become more complex, auto-tuning becomes more desirable, challenging, and time-consuming. We propose, Bliss, a novel solution for auto-tuning parallel applications without requiring apriori information about applications, domain-specific knowledge, or instrumentation. Bliss demonstrates how to leverage a pool of Bayesian Optimization models to find the near-optimal parameter setting 1.64× faster than the state-of-the-art approaches.
{"title":"Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models","authors":"Rohan Basu Roy, Tirthak Patel, V. Gadepally, Devesh Tiwari","doi":"10.1145/3453483.3454109","DOIUrl":"https://doi.org/10.1145/3453483.3454109","url":null,"abstract":"As parallel applications become more complex, auto-tuning becomes more desirable, challenging, and time-consuming. We propose, Bliss, a novel solution for auto-tuning parallel applications without requiring apriori information about applications, domain-specific knowledge, or instrumentation. Bliss demonstrates how to leverage a pool of Bayesian Optimization models to find the near-optimal parameter setting 1.64× faster than the state-of-the-art approaches.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74820034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a fully automated method that takes as input an iterative or recursive reference implementation and produces divide-and-conquer implementations that are functionally equivalent to the input. Three interdependent components have to be synthesized: a function that divides the original problem instance, a function that solves each sub-instance, and a function that combines the results of sub-computations. We propose a methodology that splits the synthesis problem into three successive phases, each with a substantially reduced state space compared to the original monolithic task, and therefore substantially more tractable. Our methodology is implemented as an addition to the existing synthesis tool Parsynt, and we demonstrate the efficacy of it by synthesizing highly nontrivial divide-and-conquer implementations of a set of benchmarks fully automatically.
{"title":"Phased synthesis of divide and conquer programs","authors":"Azadeh Farzan, Victor Nicolet","doi":"10.1145/3453483.3454089","DOIUrl":"https://doi.org/10.1145/3453483.3454089","url":null,"abstract":"We propose a fully automated method that takes as input an iterative or recursive reference implementation and produces divide-and-conquer implementations that are functionally equivalent to the input. Three interdependent components have to be synthesized: a function that divides the original problem instance, a function that solves each sub-instance, and a function that combines the results of sub-computations. We propose a methodology that splits the synthesis problem into three successive phases, each with a substantially reduced state space compared to the original monolithic task, and therefore substantially more tractable. Our methodology is implemented as an addition to the existing synthesis tool Parsynt, and we demonstrate the efficacy of it by synthesizing highly nontrivial divide-and-conquer implementations of a set of benchmarks fully automatically.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81611812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sumanth Prabhu, Grigory Fedyukovich, Kumar Madhukar, D. D'Souza
The problem of synthesizing specifications of undefined procedures has a broad range of applications, but the usefulness of the generated specifications depends on their quality. In this paper, we propose a technique for finding maximal and non-vacuous specifications. Maximality allows for more choices for implementations of undefined procedures, and non-vacuity ensures that safety assertions are reachable. To handle programs with complex control flow, our technique discovers not only specifications but also inductive invariants. Our iterative algorithm lazily generalizes non-vacuous specifications in a counterexample-guided loop. The key component of our technique is an effective non-vacuous specification synthesis algorithm. We have implemented the approach in a tool called HornSpec, taking as input systems of constrained Horn clauses. We have experimentally demonstrated the tool's effectiveness, efficiency, and the quality of generated specifications on a range of benchmarks.
{"title":"Specification synthesis with constrained Horn clauses","authors":"Sumanth Prabhu, Grigory Fedyukovich, Kumar Madhukar, D. D'Souza","doi":"10.1145/3453483.3454104","DOIUrl":"https://doi.org/10.1145/3453483.3454104","url":null,"abstract":"The problem of synthesizing specifications of undefined procedures has a broad range of applications, but the usefulness of the generated specifications depends on their quality. In this paper, we propose a technique for finding maximal and non-vacuous specifications. Maximality allows for more choices for implementations of undefined procedures, and non-vacuity ensures that safety assertions are reachable. To handle programs with complex control flow, our technique discovers not only specifications but also inductive invariants. Our iterative algorithm lazily generalizes non-vacuous specifications in a counterexample-guided loop. The key component of our technique is an effective non-vacuous specification synthesis algorithm. We have implemented the approach in a tool called HornSpec, taking as input systems of constrained Horn clauses. We have experimentally demonstrated the tool's effectiveness, efficiency, and the quality of generated specifications on a range of benchmarks.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"165 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86216757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shachar Itzhaky, Hila Peleg, N. Polikarpova, R. Rowe, Ilya Sergey
We describe the first approach to automatically synthesizing heap-manipulating programs with auxiliary recursive procedures. Such procedures occur routinely in data structure transformations (e.g., flattening a tree into a list) or traversals of composite structures (e.g., n-ary trees). Our approach, dubbed cyclic program synthesis, enhances deductive program synthesis with a novel application of cyclic proofs. Specifically, we observe that the machinery used to form cycles in cyclic proofs can be reused to systematically and efficiently abduce recursive auxiliary procedures. We develop the theory of cyclic program synthesis by extending Synthetic Separation Logic (SSL), a logical framework for deductive synthesis of heap-manipulating programs from Separation Logic specifications. We implement our approach as a tool called Cypress, and showcase it by automatically synthesizing a number of programs manipulating linked data structures using recursive auxiliary procedures and mutual recursion, many of which were beyond the reach of existing program synthesis tools.
{"title":"Cyclic program synthesis","authors":"Shachar Itzhaky, Hila Peleg, N. Polikarpova, R. Rowe, Ilya Sergey","doi":"10.1145/3453483.3454087","DOIUrl":"https://doi.org/10.1145/3453483.3454087","url":null,"abstract":"We describe the first approach to automatically synthesizing heap-manipulating programs with auxiliary recursive procedures. Such procedures occur routinely in data structure transformations (e.g., flattening a tree into a list) or traversals of composite structures (e.g., n-ary trees). Our approach, dubbed cyclic program synthesis, enhances deductive program synthesis with a novel application of cyclic proofs. Specifically, we observe that the machinery used to form cycles in cyclic proofs can be reused to systematically and efficiently abduce recursive auxiliary procedures. We develop the theory of cyclic program synthesis by extending Synthetic Separation Logic (SSL), a logical framework for deductive synthesis of heap-manipulating programs from Separation Logic specifications. We implement our approach as a tool called Cypress, and showcase it by automatically synthesizing a number of programs manipulating linked data structures using recursive auxiliary procedures and mutual recursion, many of which were beyond the reach of existing program synthesis tools.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73728754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the problem of program synthesis from input-output examples via stochastic search. We identify a robust feature of stochastic synthesis: The search often progresses through a series of discrete plateaus. We observe that the distribution of synthesis times is often heavy-tailed and analyze how these distributions arise. Based on these insights, we present an algorithm that speeds up synthesis by an order of magnitude over the naive algorithm currently used in practice. Our experimental results are obtained in part using a new program synthesis benchmark for superoptimization distilled from widely used production code.
{"title":"Adaptive restarts for stochastic synthesis","authors":"Jason R. Koenig, O. Padon, A. Aiken","doi":"10.1145/3453483.3454071","DOIUrl":"https://doi.org/10.1145/3453483.3454071","url":null,"abstract":"We consider the problem of program synthesis from input-output examples via stochastic search. We identify a robust feature of stochastic synthesis: The search often progresses through a series of discrete plateaus. We observe that the distribution of synthesis times is often heavy-tailed and analyze how these distributions arise. Based on these insights, we present an algorithm that speeds up synthesis by an order of magnitude over the naive algorithm currently used in practice. Our experimental results are obtained in part using a new program synthesis benchmark for superoptimization distilled from widely used production code.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87171308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingxuan He, Cheng-Chun Lee, Veselin Raychev, Martin T. Vechev
We introduce a new approach for finding and fixing naming issues in source code. The method is based on a careful combination of unsupervised and supervised procedures: (i) unsupervised mining of patterns from Big Code that express common naming idioms. Program fragments violating such idioms indicates likely naming issues, and (ii) supervised learning of a classifier on a small labeled dataset which filters potential false positives from the violations. We implemented our method in a system called Namer and evaluated it on a large number of Python and Java programs. We demonstrate that Namer is effective in finding naming mistakes in real world repositories with high precision (~70%). Perhaps surprisingly, we also show that existing deep learning methods are not practically effective and achieve low precision in finding naming issues (up to ~16%).
{"title":"Learning to find naming issues with big code and small supervision","authors":"Jingxuan He, Cheng-Chun Lee, Veselin Raychev, Martin T. Vechev","doi":"10.1145/3453483.3454045","DOIUrl":"https://doi.org/10.1145/3453483.3454045","url":null,"abstract":"We introduce a new approach for finding and fixing naming issues in source code. The method is based on a careful combination of unsupervised and supervised procedures: (i) unsupervised mining of patterns from Big Code that express common naming idioms. Program fragments violating such idioms indicates likely naming issues, and (ii) supervised learning of a classifier on a small labeled dataset which filters potential false positives from the violations. We implemented our method in a system called Namer and evaluated it on a large number of Python and Java programs. We demonstrate that Namer is effective in finding naming mistakes in real world repositories with high precision (~70%). Perhaps surprisingly, we also show that existing deep learning methods are not practically effective and achieve low precision in finding naming issues (up to ~16%).","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"64 2-3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72590097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}