Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115706
Yuexing Wang, Min Zhou, Yu Jiang, Xiaoyu Song, M. Gu, Jiaguang Sun
To reduce the false positives of static analysis, many tools collect path constraints and integrate SMT solvers to filter unreachable execution paths. However, the accumulated calling and computing of SMT solvers are time and resource consuming. This paper presents TsmartLW, an alternate static analysis tool in which we implement a path constraint solving engine to speed up reachability determination. Within the engine, typical types of constraint-patterns are firstly defined based on an empirical study of a large number of code repositories. For each pattern, a constraint solving algorithm is designed and implemented. For each program, the engine predicts the most suitable strategy and then applies the strategy to solve path constraints. The experimental results on some well-known benchmarks and real-world applications show that TsmartLW is faster than some state-of-the-art static analysis tools. For example, it is 1.32× faster than CPAchecker and our engine is 369× faster than SMT solvers in solving path constraints. The demo video is available at https://www.youtube.com/watch?v=5c3ARhFclHA&t=2s.
{"title":"A static analysis tool with optimizations for reachability determination","authors":"Yuexing Wang, Min Zhou, Yu Jiang, Xiaoyu Song, M. Gu, Jiaguang Sun","doi":"10.1109/ASE.2017.8115706","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115706","url":null,"abstract":"To reduce the false positives of static analysis, many tools collect path constraints and integrate SMT solvers to filter unreachable execution paths. However, the accumulated calling and computing of SMT solvers are time and resource consuming. This paper presents TsmartLW, an alternate static analysis tool in which we implement a path constraint solving engine to speed up reachability determination. Within the engine, typical types of constraint-patterns are firstly defined based on an empirical study of a large number of code repositories. For each pattern, a constraint solving algorithm is designed and implemented. For each program, the engine predicts the most suitable strategy and then applies the strategy to solve path constraints. The experimental results on some well-known benchmarks and real-world applications show that TsmartLW is faster than some state-of-the-art static analysis tools. For example, it is 1.32× faster than CPAchecker and our engine is 369× faster than SMT solvers in solving path constraints. The demo video is available at https://www.youtube.com/watch?v=5c3ARhFclHA&t=2s.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126885627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115652
Yun Lin, Guozhu Meng, Yinxing Xue, Zhenchang Xing, Jun Sun, Xin Peng, Yang Liu, Wenyun Zhao, J. Dong
In this paper, we propose an approach to detecting project-specific recurring designs in code base and abstracting them into design templates as reuse opportunities. The mined templates allow programmers to make further customization for generating new code. The generated code involves the code skeleton of recurring design as well as the semi-implemented code bodies annotated with comments to remind programmers of necessary modification. We implemented our approach as an Eclipse plugin called MICoDe. We evaluated our approach with a reuse simulation experiment and a user study involving 16 participants. The results of our simulation experiment on 10 open source Java projects show that, to create a new similar feature with a design template, (1) on average 69% of the elements in the template can be reused and (2) on average 60% code of the new feature can be adopted from the template. Our user study further shows that, compared to the participants adopting the copy-paste-modify strategy, the ones using MICoDe are more effective to understand a big design picture and more efficient to accomplish the code reuse task.
{"title":"Mining implicit design templates for actionable code reuse","authors":"Yun Lin, Guozhu Meng, Yinxing Xue, Zhenchang Xing, Jun Sun, Xin Peng, Yang Liu, Wenyun Zhao, J. Dong","doi":"10.1109/ASE.2017.8115652","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115652","url":null,"abstract":"In this paper, we propose an approach to detecting project-specific recurring designs in code base and abstracting them into design templates as reuse opportunities. The mined templates allow programmers to make further customization for generating new code. The generated code involves the code skeleton of recurring design as well as the semi-implemented code bodies annotated with comments to remind programmers of necessary modification. We implemented our approach as an Eclipse plugin called MICoDe. We evaluated our approach with a reuse simulation experiment and a user study involving 16 participants. The results of our simulation experiment on 10 open source Java projects show that, to create a new similar feature with a design template, (1) on average 69% of the elements in the template can be reused and (2) on average 60% code of the new feature can be adopted from the template. Our user study further shows that, compared to the participants adopting the copy-paste-modify strategy, the ones using MICoDe are more effective to understand a big design picture and more efficient to accomplish the code reuse task.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124294232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115651
Matúš Sulír, J. Porubän
Developers often try to find occurrences of a certain term in a software system. Traditionally, a text search is limited to static source code files. In this paper, we introduce a simple approach, RuntimeSearch, where the given term is searched in the values of all string expressions in a running program. When a match is found, the program is paused and its runtime properties can be explored with a traditional debugger. The feasibility and usefulness of RuntimeSearch is demonstrated on a medium-sized Java project.
{"title":"RuntimeSearch: Ctrl+F for a running program","authors":"Matúš Sulír, J. Porubän","doi":"10.1109/ASE.2017.8115651","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115651","url":null,"abstract":"Developers often try to find occurrences of a certain term in a software system. Traditionally, a text search is limited to static source code files. In this paper, we introduce a simple approach, RuntimeSearch, where the given term is searched in the values of all string expressions in a running program. When a match is found, the program is paused and its runtime properties can be explored with a traditional debugger. The feasibility and usefulness of RuntimeSearch is demonstrated on a medium-sized Java project.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133337679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115705
A. Abate, I. Bessa, Dario Cattaruzza, Lennon C. Chaves, L. Cordeiro, C. David, Pascal Kesseli, D. Kroening, E. Polgreen
We present an automated MATLAB Toolbox, named DSSynth (Digital-System Synthesizer), to synthesize sound digital controllers for physical plants that are represented as linear timeinvariant systems with single input and output. In particular, DSSynth synthesizes digital controllers that are sound w.r.t. stability and safety specifications. DSSynth considers the complete range of approximations, including time discretization, quantization effects and finite-precision arithmetic (and its rounding errors). We demonstrate the practical value of this toolbox by automatically synthesizing stable and safe controllers for intricate physical plant models from the digital control literature. The resulting toolbox enables the application of program synthesis to real-world control engineering problems. A demonstration can be found at https://youtu.be_hLQslRcee8.
{"title":"DSSynth: An automated digital controller synthesis tool for physical plants","authors":"A. Abate, I. Bessa, Dario Cattaruzza, Lennon C. Chaves, L. Cordeiro, C. David, Pascal Kesseli, D. Kroening, E. Polgreen","doi":"10.1109/ASE.2017.8115705","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115705","url":null,"abstract":"We present an automated MATLAB Toolbox, named DSSynth (Digital-System Synthesizer), to synthesize sound digital controllers for physical plants that are represented as linear timeinvariant systems with single input and output. In particular, DSSynth synthesizes digital controllers that are sound w.r.t. stability and safety specifications. DSSynth considers the complete range of approximations, including time discretization, quantization effects and finite-precision arithmetic (and its rounding errors). We demonstrate the practical value of this toolbox by automatically synthesizing stable and safe controllers for intricate physical plant models from the digital control literature. The resulting toolbox enables the application of program synthesis to real-world control engineering problems. A demonstration can be found at https://youtu.be_hLQslRcee8.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133108371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115619
Yangyang Zhao, Alexander Serebrenik, Yuming Zhou, V. Filkov, Bogdan Vasilescu
Continuous Integration (CI) has become a disruptive innovation in software development: with proper tool support and adoption, positive effects have been demonstrated for pull request throughput and scaling up of project sizes. As any other innovation, adopting CI implies adapting existing practices in order to take full advantage of its potential, and "best practices" to that end have been proposed. Here we study the adaptation and evolution of code writing and submission, issue and pull request closing, and testing practices as Travis CI is adopted by hundreds of established projects on GitHub. To help essentialize the quantitative results, we also survey a sample of GITHUB developers about their experiences with adopting Travis CI. Our findings suggest a more nuanced picture of how GitHub teams are adapting to, and benefiting from, continuous integration technology than suggested by prior work.
{"title":"The impact of continuous integration on other software development practices: A large-scale empirical study","authors":"Yangyang Zhao, Alexander Serebrenik, Yuming Zhou, V. Filkov, Bogdan Vasilescu","doi":"10.1109/ASE.2017.8115619","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115619","url":null,"abstract":"Continuous Integration (CI) has become a disruptive innovation in software development: with proper tool support and adoption, positive effects have been demonstrated for pull request throughput and scaling up of project sizes. As any other innovation, adopting CI implies adapting existing practices in order to take full advantage of its potential, and \"best practices\" to that end have been proposed. Here we study the adaptation and evolution of code writing and submission, issue and pull request closing, and testing practices as Travis CI is adopted by hundreds of established projects on GitHub. To help essentialize the quantitative results, we also survey a sample of GITHUB developers about their experiences with adopting Travis CI. Our findings suggest a more nuanced picture of how GitHub teams are adapting to, and benefiting from, continuous integration technology than suggested by prior work.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114661773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115718
Xi Cheng, Min Zhou, Xiaoyu Song, M. Gu, Jiaguang Sun
Integer errors in C/C++ are caused by arithmetic operations yielding results which are unrepresentable in certain type. They can lead to serious safety and security issues. Due to the complicated semantics of C/C++ integers, integer errors are widely harbored in real-world programs and it is error-prone to repair them even for experts. An automatic tool is desired to 1) automatically generate fixes which assist developers to correct the buggy code, and 2) provide sufficient hints to help developers review the generated fixes and better understand integer types in C/C++. In this paper, we present a tool IntPTI that implements the desired functionalities for C programs. IntPTI infers appropriate types for variables and expressions to eliminate representation issues, and then utilizes the derived types with fix patterns codified from the successful human-written patches. IntPTI provides a user-friendly web interface which allows users to review and manage the fixes. We evaluate IntPTI on 7 real-world projects and the results show its competitive repair accuracy and its scalability on large code bases. The demo video for IntPTI is available at: https://youtu.be/9Tgd4A_FgZM.
{"title":"IntPTI: Automatic integer error repair with proper-type inference","authors":"Xi Cheng, Min Zhou, Xiaoyu Song, M. Gu, Jiaguang Sun","doi":"10.1109/ASE.2017.8115718","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115718","url":null,"abstract":"Integer errors in C/C++ are caused by arithmetic operations yielding results which are unrepresentable in certain type. They can lead to serious safety and security issues. Due to the complicated semantics of C/C++ integers, integer errors are widely harbored in real-world programs and it is error-prone to repair them even for experts. An automatic tool is desired to 1) automatically generate fixes which assist developers to correct the buggy code, and 2) provide sufficient hints to help developers review the generated fixes and better understand integer types in C/C++. In this paper, we present a tool IntPTI that implements the desired functionalities for C programs. IntPTI infers appropriate types for variables and expressions to eliminate representation issues, and then utilizes the derived types with fix patterns codified from the successful human-written patches. IntPTI provides a user-friendly web interface which allows users to review and manage the fixes. We evaluate IntPTI on 7 real-world projects and the results show its competitive repair accuracy and its scalability on large code bases. The demo video for IntPTI is available at: https://youtu.be/9Tgd4A_FgZM.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121448162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115623
Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, S. Rahimi
Sentiment Analysis tools, developed for analyzing social media text or product reviews, work poorly on a Software Engineering (SE) dataset. Since prior studies have found developers expressing sentiments during various SE activities, there is a need for a customized sentiment analysis tool for the SE domain. On this goal, we manually labeled 2000 review comments to build a training dataset and used our dataset to evaluate seven popular sentiment analysis tools. The poor performances of the existing sentiment analysis tools motivated us to build SentiCR, a sentiment analysis tool especially designed for code review comments. We evaluated SentiCR using one hundred 10-fold cross-validations of eight supervised learning algorithms. We found a model, trained using the Gradient Boosting Tree (GBT) algorithm, providing the highest mean accuracy (83%), the highest mean precision (67.8%), and the highest mean recall (58.4%) in identifying negative review comments.
{"title":"SentiCR: A customized sentiment analysis tool for code review interactions","authors":"Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, S. Rahimi","doi":"10.1109/ASE.2017.8115623","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115623","url":null,"abstract":"Sentiment Analysis tools, developed for analyzing social media text or product reviews, work poorly on a Software Engineering (SE) dataset. Since prior studies have found developers expressing sentiments during various SE activities, there is a need for a customized sentiment analysis tool for the SE domain. On this goal, we manually labeled 2000 review comments to build a training dataset and used our dataset to evaluate seven popular sentiment analysis tools. The poor performances of the existing sentiment analysis tools motivated us to build SentiCR, a sentiment analysis tool especially designed for code review comments. We evaluated SentiCR using one hundred 10-fold cross-validations of eight supervised learning algorithms. We found a model, trained using the Gradient Boosting Tree (GBT) algorithm, providing the highest mean accuracy (83%), the highest mean precision (67.8%), and the highest mean recall (58.4%) in identifying negative review comments.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124050374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115635
F. Corradini, Fabrizio Fornari, A. Polini, B. Re, F. Tiezzi, Andrea Vandin
Business Process Modelling has acquired increasing relevance in software development. Available notations, such as BPMN, permit to describe activities of complex organisations. On the one hand, this shortens the communication gap between domain experts and IT specialists. On the other hand, this permits to clarify the characteristics of software systems introduced to provide automatic support for such activities. Nevertheless, the lack of formal semantics hinders the automatic verification of relevant properties. This paper presents a novel verification framework for BPMN 2.0, called BProVe. It is based on an operational semantics, implemented using MAUDE, devised to make the verification general and effective. A complete tool chain, based on the Eclipse modelling environment, allows for rigorous modelling and analysis of Business Processes. The approach has been validated using more than one thousand models available on a publicly accessible repository. Besides showing the performance of BProVe, this validation demonstrates its practical benefits in identifying correctness issues in real models.
{"title":"BProVe: A formal verification framework for business process models","authors":"F. Corradini, Fabrizio Fornari, A. Polini, B. Re, F. Tiezzi, Andrea Vandin","doi":"10.1109/ASE.2017.8115635","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115635","url":null,"abstract":"Business Process Modelling has acquired increasing relevance in software development. Available notations, such as BPMN, permit to describe activities of complex organisations. On the one hand, this shortens the communication gap between domain experts and IT specialists. On the other hand, this permits to clarify the characteristics of software systems introduced to provide automatic support for such activities. Nevertheless, the lack of formal semantics hinders the automatic verification of relevant properties. This paper presents a novel verification framework for BPMN 2.0, called BProVe. It is based on an operational semantics, implemented using MAUDE, devised to make the verification general and effective. A complete tool chain, based on the Eclipse modelling environment, allows for rigorous modelling and analysis of Business Processes. The approach has been validated using more than one thousand models available on a publicly accessible repository. Besides showing the performance of BProVe, this validation demonstrates its practical benefits in identifying correctness issues in real models.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131636212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115723
Chris Mills
A wide range of text-based artifacts contribute to software projects (e.g., source code, test cases, use cases, project requirements, interaction diagrams, etc.). Traceability Link Recovery (TLR) is the software task in which relevant documents in these various sets are linked to one another, uncovering information about the project that is not available when considering only the documents themselves. This information is helpful for enabling other tasks such as improving test coverage, impact analysis, and ensuring that system or regulatory requirements are met. However, while traceability links are useful, performing TLR manually is time consuming and fraught with error. Previous work has applied Information Retrieval (IR) and other techniques to reduce the human effort involved; however, that effort remains significant. In this research we seek to take the next step in reducing it by using machine learning (ML) classification models to predict whether a candidate link is valid or invalid without human oversight. Preliminary results show that this approach has promise for accurately recommending valid links; however, there are several challenges that still must be addressed in order to achieve a technique with high enough performance to consider it a viable, completely automated solution.
广泛的基于文本的工件有助于软件项目(例如,源代码、测试用例、用例、项目需求、交互图等)。可追溯性链接恢复(Traceability Link Recovery, TLR)是一项软件任务,在该任务中,这些不同集合中的相关文档相互链接,揭示了仅考虑文档本身时无法获得的有关项目的信息。这些信息有助于实现其他任务,例如改进测试覆盖率、影响分析,以及确保系统或法规需求得到满足。然而,尽管可追溯性链接很有用,但手动执行TLR既耗时又充满错误。以前的工作已经应用了信息检索(IR)和其他技术来减少所涉及的人力;然而,这一努力仍然意义重大。在这项研究中,我们试图采取下一步措施,通过使用机器学习(ML)分类模型来预测候选链接在没有人为监督的情况下是有效还是无效。初步结果表明,该方法有望准确推荐有效链接;然而,为了实现具有足够高性能的技术,将其视为可行的、完全自动化的解决方案,仍然必须解决几个挑战。
{"title":"Towards the automatic classification of traceability links","authors":"Chris Mills","doi":"10.1109/ASE.2017.8115723","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115723","url":null,"abstract":"A wide range of text-based artifacts contribute to software projects (e.g., source code, test cases, use cases, project requirements, interaction diagrams, etc.). Traceability Link Recovery (TLR) is the software task in which relevant documents in these various sets are linked to one another, uncovering information about the project that is not available when considering only the documents themselves. This information is helpful for enabling other tasks such as improving test coverage, impact analysis, and ensuring that system or regulatory requirements are met. However, while traceability links are useful, performing TLR manually is time consuming and fraught with error. Previous work has applied Information Retrieval (IR) and other techniques to reduce the human effort involved; however, that effort remains significant. In this research we seek to take the next step in reducing it by using machine learning (ML) classification models to predict whether a candidate link is valid or invalid without human oversight. Preliminary results show that this approach has promise for accurately recommending valid links; however, there are several challenges that still must be addressed in order to achieve a technique with high enough performance to consider it a viable, completely automated solution.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128053424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115665
Olaf Leßenich, S. Apel, Christian Kästner, Georg Seibt, J. Siegmund
Diffing and merging of source-code artifacts is an essential task when integrating changes in software versions. While state-of-the-art line-based merge tools (e.g., git merge) are fast and independent of the programming language used, they have only a low precision. Recently, it has been shown that the precision of merging can be substantially improved by using a language-aware, structured approach that works on abstract syntax trees. But, precise structured merging is NP hard, especially, when considering the notoriously difficult scenarios of renamings and shifted code. To address these scenarios without compromising scalability, we propose a syntax-aware, heuristic optimization for structured merging that employs a lookahead mechanism during tree matching. The key idea is that renamings and shifted code are not arbitrarily distributed, but their occurrence follows patterns, which we address with a syntax-specific lookahead. Our experiments with 48 real-world open-source projects (4,878 merge scenarios with over 400 million lines of code) demonstrate that we can significantly improve matching precision in 28 percent of cases while maintaining performance.
{"title":"Renaming and shifted code in structured merging: Looking ahead for precision and performance","authors":"Olaf Leßenich, S. Apel, Christian Kästner, Georg Seibt, J. Siegmund","doi":"10.1109/ASE.2017.8115665","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115665","url":null,"abstract":"Diffing and merging of source-code artifacts is an essential task when integrating changes in software versions. While state-of-the-art line-based merge tools (e.g., git merge) are fast and independent of the programming language used, they have only a low precision. Recently, it has been shown that the precision of merging can be substantially improved by using a language-aware, structured approach that works on abstract syntax trees. But, precise structured merging is NP hard, especially, when considering the notoriously difficult scenarios of renamings and shifted code. To address these scenarios without compromising scalability, we propose a syntax-aware, heuristic optimization for structured merging that employs a lookahead mechanism during tree matching. The key idea is that renamings and shifted code are not arbitrarily distributed, but their occurrence follows patterns, which we address with a syntax-specific lookahead. Our experiments with 48 real-world open-source projects (4,878 merge scenarios with over 400 million lines of code) demonstrate that we can significantly improve matching precision in 28 percent of cases while maintaining performance.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132257936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}