Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115650
Nicolas Coppik, Oliver Schwahn, Stefan Winter, N. Suri
Modern operating systems (OSs) consist of numerous interacting components, many of which are developed and maintained independently of one another. In monolithic systems, the boundaries of and interfaces between such components are not strictly enforced at runtime. Therefore, faults in individual components may directly affect other parts of the system in various ways. Software fault injection (SFI) is a testing technique to assess the resilience of a software system in the presence of faulty components. Unfortunately, SFI tests of OSs are inconclusive if they do not lead to observable failures, as corruptions of the internal software state may not be visible at its interfaces and, yet, affect the subsequent execution of the OS beyond the duration of the test. In this paper we present TREKER, a fully automated approach for identifying how faulty OS components affect other parts of the system. TREKER combines static and dynamic analyses to achieve efficient tracing on the granularity of memory accesses. We demonstrate TrEKer's ability to support SFI oracles by accurately tracing the effects of faults injected into three widely used Linux kernel modules.
{"title":"TrEKer: Tracing error propagation in operating system kernels","authors":"Nicolas Coppik, Oliver Schwahn, Stefan Winter, N. Suri","doi":"10.1109/ASE.2017.8115650","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115650","url":null,"abstract":"Modern operating systems (OSs) consist of numerous interacting components, many of which are developed and maintained independently of one another. In monolithic systems, the boundaries of and interfaces between such components are not strictly enforced at runtime. Therefore, faults in individual components may directly affect other parts of the system in various ways. Software fault injection (SFI) is a testing technique to assess the resilience of a software system in the presence of faulty components. Unfortunately, SFI tests of OSs are inconclusive if they do not lead to observable failures, as corruptions of the internal software state may not be visible at its interfaces and, yet, affect the subsequent execution of the OS beyond the duration of the test. In this paper we present TREKER, a fully automated approach for identifying how faulty OS components affect other parts of the system. TREKER combines static and dynamic analyses to achieve efficient tracing on the granularity of memory accesses. We demonstrate TrEKer's ability to support SFI oracles by accurately tracing the effects of faults injected into three widely used Linux kernel modules.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128196341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115711
Jonathan A. Saddler, Myra B. Cohen
Most automated testing techniques for graphical user interfaces (GUIs) produce test cases that are only concerned with covering the elements (widgets, menus, etc.) on the interface, or the underlying program code, with little consideration of test case semantics. This is effective for functional testing where the aim is to find as many faults as possible. However, when one wants to mimic a real user for evaluating usability, or when it is necessary to extensively test important end-user tasks of a system, or to generate examples of how to use an interface, this generation approach fails. Capture and replay techniques can be used, however there are often multiple ways to achieve a particular goal, and capturing all of these is usually too time consuming and unrealistic. Prior work on human performance regression testing introduced a constraint based method to filter test cases created by a functional test case generator, however that work did not capture the specifications, or directly generate only the required tests and considered only a single type of test goal. In this paper we present EventFlowSlicer, a tool that allows the GUI tester to specify and generate all realistic test cases relevant to achieve a stated goal. The user first captures relevant events on the interface, then adds constraints to provide restrictions on the task. An event flow graph is extracted containing only the widgets of interest for that goal. Next all test cases are generated for edges in the graph which respect the constraints. The test cases can then be replayed using a modified version of GUITAR. A video demonstration of EventFlowSlicer can be found at https://youtu.be/hw7WYz8WYVU.
{"title":"EventFlowSlicer: A tool for generating realistic goal-driven GUI tests","authors":"Jonathan A. Saddler, Myra B. Cohen","doi":"10.1109/ASE.2017.8115711","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115711","url":null,"abstract":"Most automated testing techniques for graphical user interfaces (GUIs) produce test cases that are only concerned with covering the elements (widgets, menus, etc.) on the interface, or the underlying program code, with little consideration of test case semantics. This is effective for functional testing where the aim is to find as many faults as possible. However, when one wants to mimic a real user for evaluating usability, or when it is necessary to extensively test important end-user tasks of a system, or to generate examples of how to use an interface, this generation approach fails. Capture and replay techniques can be used, however there are often multiple ways to achieve a particular goal, and capturing all of these is usually too time consuming and unrealistic. Prior work on human performance regression testing introduced a constraint based method to filter test cases created by a functional test case generator, however that work did not capture the specifications, or directly generate only the required tests and considered only a single type of test goal. In this paper we present EventFlowSlicer, a tool that allows the GUI tester to specify and generate all realistic test cases relevant to achieve a stated goal. The user first captures relevant events on the interface, then adds constraints to provide restrictions on the task. An event flow graph is extracted containing only the widgets of interest for that goal. Next all test cases are generated for edges in the graph which respect the constraints. The test cases can then be replayed using a modified version of GUITAR. A video demonstration of EventFlowSlicer can be found at https://youtu.be/hw7WYz8WYVU.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130232337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115705
A. Abate, I. Bessa, Dario Cattaruzza, Lennon C. Chaves, L. Cordeiro, C. David, Pascal Kesseli, D. Kroening, E. Polgreen
We present an automated MATLAB Toolbox, named DSSynth (Digital-System Synthesizer), to synthesize sound digital controllers for physical plants that are represented as linear timeinvariant systems with single input and output. In particular, DSSynth synthesizes digital controllers that are sound w.r.t. stability and safety specifications. DSSynth considers the complete range of approximations, including time discretization, quantization effects and finite-precision arithmetic (and its rounding errors). We demonstrate the practical value of this toolbox by automatically synthesizing stable and safe controllers for intricate physical plant models from the digital control literature. The resulting toolbox enables the application of program synthesis to real-world control engineering problems. A demonstration can be found at https://youtu.be_hLQslRcee8.
{"title":"DSSynth: An automated digital controller synthesis tool for physical plants","authors":"A. Abate, I. Bessa, Dario Cattaruzza, Lennon C. Chaves, L. Cordeiro, C. David, Pascal Kesseli, D. Kroening, E. Polgreen","doi":"10.1109/ASE.2017.8115705","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115705","url":null,"abstract":"We present an automated MATLAB Toolbox, named DSSynth (Digital-System Synthesizer), to synthesize sound digital controllers for physical plants that are represented as linear timeinvariant systems with single input and output. In particular, DSSynth synthesizes digital controllers that are sound w.r.t. stability and safety specifications. DSSynth considers the complete range of approximations, including time discretization, quantization effects and finite-precision arithmetic (and its rounding errors). We demonstrate the practical value of this toolbox by automatically synthesizing stable and safe controllers for intricate physical plant models from the digital control literature. The resulting toolbox enables the application of program synthesis to real-world control engineering problems. A demonstration can be found at https://youtu.be_hLQslRcee8.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133108371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115651
Matúš Sulír, J. Porubän
Developers often try to find occurrences of a certain term in a software system. Traditionally, a text search is limited to static source code files. In this paper, we introduce a simple approach, RuntimeSearch, where the given term is searched in the values of all string expressions in a running program. When a match is found, the program is paused and its runtime properties can be explored with a traditional debugger. The feasibility and usefulness of RuntimeSearch is demonstrated on a medium-sized Java project.
{"title":"RuntimeSearch: Ctrl+F for a running program","authors":"Matúš Sulír, J. Porubän","doi":"10.1109/ASE.2017.8115651","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115651","url":null,"abstract":"Developers often try to find occurrences of a certain term in a software system. Traditionally, a text search is limited to static source code files. In this paper, we introduce a simple approach, RuntimeSearch, where the given term is searched in the values of all string expressions in a running program. When a match is found, the program is paused and its runtime properties can be explored with a traditional debugger. The feasibility and usefulness of RuntimeSearch is demonstrated on a medium-sized Java project.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133337679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115619
Yangyang Zhao, Alexander Serebrenik, Yuming Zhou, V. Filkov, Bogdan Vasilescu
Continuous Integration (CI) has become a disruptive innovation in software development: with proper tool support and adoption, positive effects have been demonstrated for pull request throughput and scaling up of project sizes. As any other innovation, adopting CI implies adapting existing practices in order to take full advantage of its potential, and "best practices" to that end have been proposed. Here we study the adaptation and evolution of code writing and submission, issue and pull request closing, and testing practices as Travis CI is adopted by hundreds of established projects on GitHub. To help essentialize the quantitative results, we also survey a sample of GITHUB developers about their experiences with adopting Travis CI. Our findings suggest a more nuanced picture of how GitHub teams are adapting to, and benefiting from, continuous integration technology than suggested by prior work.
{"title":"The impact of continuous integration on other software development practices: A large-scale empirical study","authors":"Yangyang Zhao, Alexander Serebrenik, Yuming Zhou, V. Filkov, Bogdan Vasilescu","doi":"10.1109/ASE.2017.8115619","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115619","url":null,"abstract":"Continuous Integration (CI) has become a disruptive innovation in software development: with proper tool support and adoption, positive effects have been demonstrated for pull request throughput and scaling up of project sizes. As any other innovation, adopting CI implies adapting existing practices in order to take full advantage of its potential, and \"best practices\" to that end have been proposed. Here we study the adaptation and evolution of code writing and submission, issue and pull request closing, and testing practices as Travis CI is adopted by hundreds of established projects on GitHub. To help essentialize the quantitative results, we also survey a sample of GITHUB developers about their experiences with adopting Travis CI. Our findings suggest a more nuanced picture of how GitHub teams are adapting to, and benefiting from, continuous integration technology than suggested by prior work.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114661773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115718
Xi Cheng, Min Zhou, Xiaoyu Song, M. Gu, Jiaguang Sun
Integer errors in C/C++ are caused by arithmetic operations yielding results which are unrepresentable in certain type. They can lead to serious safety and security issues. Due to the complicated semantics of C/C++ integers, integer errors are widely harbored in real-world programs and it is error-prone to repair them even for experts. An automatic tool is desired to 1) automatically generate fixes which assist developers to correct the buggy code, and 2) provide sufficient hints to help developers review the generated fixes and better understand integer types in C/C++. In this paper, we present a tool IntPTI that implements the desired functionalities for C programs. IntPTI infers appropriate types for variables and expressions to eliminate representation issues, and then utilizes the derived types with fix patterns codified from the successful human-written patches. IntPTI provides a user-friendly web interface which allows users to review and manage the fixes. We evaluate IntPTI on 7 real-world projects and the results show its competitive repair accuracy and its scalability on large code bases. The demo video for IntPTI is available at: https://youtu.be/9Tgd4A_FgZM.
{"title":"IntPTI: Automatic integer error repair with proper-type inference","authors":"Xi Cheng, Min Zhou, Xiaoyu Song, M. Gu, Jiaguang Sun","doi":"10.1109/ASE.2017.8115718","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115718","url":null,"abstract":"Integer errors in C/C++ are caused by arithmetic operations yielding results which are unrepresentable in certain type. They can lead to serious safety and security issues. Due to the complicated semantics of C/C++ integers, integer errors are widely harbored in real-world programs and it is error-prone to repair them even for experts. An automatic tool is desired to 1) automatically generate fixes which assist developers to correct the buggy code, and 2) provide sufficient hints to help developers review the generated fixes and better understand integer types in C/C++. In this paper, we present a tool IntPTI that implements the desired functionalities for C programs. IntPTI infers appropriate types for variables and expressions to eliminate representation issues, and then utilizes the derived types with fix patterns codified from the successful human-written patches. IntPTI provides a user-friendly web interface which allows users to review and manage the fixes. We evaluate IntPTI on 7 real-world projects and the results show its competitive repair accuracy and its scalability on large code bases. The demo video for IntPTI is available at: https://youtu.be/9Tgd4A_FgZM.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121448162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115623
Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, S. Rahimi
Sentiment Analysis tools, developed for analyzing social media text or product reviews, work poorly on a Software Engineering (SE) dataset. Since prior studies have found developers expressing sentiments during various SE activities, there is a need for a customized sentiment analysis tool for the SE domain. On this goal, we manually labeled 2000 review comments to build a training dataset and used our dataset to evaluate seven popular sentiment analysis tools. The poor performances of the existing sentiment analysis tools motivated us to build SentiCR, a sentiment analysis tool especially designed for code review comments. We evaluated SentiCR using one hundred 10-fold cross-validations of eight supervised learning algorithms. We found a model, trained using the Gradient Boosting Tree (GBT) algorithm, providing the highest mean accuracy (83%), the highest mean precision (67.8%), and the highest mean recall (58.4%) in identifying negative review comments.
{"title":"SentiCR: A customized sentiment analysis tool for code review interactions","authors":"Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, S. Rahimi","doi":"10.1109/ASE.2017.8115623","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115623","url":null,"abstract":"Sentiment Analysis tools, developed for analyzing social media text or product reviews, work poorly on a Software Engineering (SE) dataset. Since prior studies have found developers expressing sentiments during various SE activities, there is a need for a customized sentiment analysis tool for the SE domain. On this goal, we manually labeled 2000 review comments to build a training dataset and used our dataset to evaluate seven popular sentiment analysis tools. The poor performances of the existing sentiment analysis tools motivated us to build SentiCR, a sentiment analysis tool especially designed for code review comments. We evaluated SentiCR using one hundred 10-fold cross-validations of eight supervised learning algorithms. We found a model, trained using the Gradient Boosting Tree (GBT) algorithm, providing the highest mean accuracy (83%), the highest mean precision (67.8%), and the highest mean recall (58.4%) in identifying negative review comments.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124050374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115635
F. Corradini, Fabrizio Fornari, A. Polini, B. Re, F. Tiezzi, Andrea Vandin
Business Process Modelling has acquired increasing relevance in software development. Available notations, such as BPMN, permit to describe activities of complex organisations. On the one hand, this shortens the communication gap between domain experts and IT specialists. On the other hand, this permits to clarify the characteristics of software systems introduced to provide automatic support for such activities. Nevertheless, the lack of formal semantics hinders the automatic verification of relevant properties. This paper presents a novel verification framework for BPMN 2.0, called BProVe. It is based on an operational semantics, implemented using MAUDE, devised to make the verification general and effective. A complete tool chain, based on the Eclipse modelling environment, allows for rigorous modelling and analysis of Business Processes. The approach has been validated using more than one thousand models available on a publicly accessible repository. Besides showing the performance of BProVe, this validation demonstrates its practical benefits in identifying correctness issues in real models.
{"title":"BProVe: A formal verification framework for business process models","authors":"F. Corradini, Fabrizio Fornari, A. Polini, B. Re, F. Tiezzi, Andrea Vandin","doi":"10.1109/ASE.2017.8115635","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115635","url":null,"abstract":"Business Process Modelling has acquired increasing relevance in software development. Available notations, such as BPMN, permit to describe activities of complex organisations. On the one hand, this shortens the communication gap between domain experts and IT specialists. On the other hand, this permits to clarify the characteristics of software systems introduced to provide automatic support for such activities. Nevertheless, the lack of formal semantics hinders the automatic verification of relevant properties. This paper presents a novel verification framework for BPMN 2.0, called BProVe. It is based on an operational semantics, implemented using MAUDE, devised to make the verification general and effective. A complete tool chain, based on the Eclipse modelling environment, allows for rigorous modelling and analysis of Business Processes. The approach has been validated using more than one thousand models available on a publicly accessible repository. Besides showing the performance of BProVe, this validation demonstrates its practical benefits in identifying correctness issues in real models.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131636212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115723
Chris Mills
A wide range of text-based artifacts contribute to software projects (e.g., source code, test cases, use cases, project requirements, interaction diagrams, etc.). Traceability Link Recovery (TLR) is the software task in which relevant documents in these various sets are linked to one another, uncovering information about the project that is not available when considering only the documents themselves. This information is helpful for enabling other tasks such as improving test coverage, impact analysis, and ensuring that system or regulatory requirements are met. However, while traceability links are useful, performing TLR manually is time consuming and fraught with error. Previous work has applied Information Retrieval (IR) and other techniques to reduce the human effort involved; however, that effort remains significant. In this research we seek to take the next step in reducing it by using machine learning (ML) classification models to predict whether a candidate link is valid or invalid without human oversight. Preliminary results show that this approach has promise for accurately recommending valid links; however, there are several challenges that still must be addressed in order to achieve a technique with high enough performance to consider it a viable, completely automated solution.
广泛的基于文本的工件有助于软件项目(例如,源代码、测试用例、用例、项目需求、交互图等)。可追溯性链接恢复(Traceability Link Recovery, TLR)是一项软件任务,在该任务中,这些不同集合中的相关文档相互链接,揭示了仅考虑文档本身时无法获得的有关项目的信息。这些信息有助于实现其他任务,例如改进测试覆盖率、影响分析,以及确保系统或法规需求得到满足。然而,尽管可追溯性链接很有用,但手动执行TLR既耗时又充满错误。以前的工作已经应用了信息检索(IR)和其他技术来减少所涉及的人力;然而,这一努力仍然意义重大。在这项研究中,我们试图采取下一步措施,通过使用机器学习(ML)分类模型来预测候选链接在没有人为监督的情况下是有效还是无效。初步结果表明,该方法有望准确推荐有效链接;然而,为了实现具有足够高性能的技术,将其视为可行的、完全自动化的解决方案,仍然必须解决几个挑战。
{"title":"Towards the automatic classification of traceability links","authors":"Chris Mills","doi":"10.1109/ASE.2017.8115723","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115723","url":null,"abstract":"A wide range of text-based artifacts contribute to software projects (e.g., source code, test cases, use cases, project requirements, interaction diagrams, etc.). Traceability Link Recovery (TLR) is the software task in which relevant documents in these various sets are linked to one another, uncovering information about the project that is not available when considering only the documents themselves. This information is helpful for enabling other tasks such as improving test coverage, impact analysis, and ensuring that system or regulatory requirements are met. However, while traceability links are useful, performing TLR manually is time consuming and fraught with error. Previous work has applied Information Retrieval (IR) and other techniques to reduce the human effort involved; however, that effort remains significant. In this research we seek to take the next step in reducing it by using machine learning (ML) classification models to predict whether a candidate link is valid or invalid without human oversight. Preliminary results show that this approach has promise for accurately recommending valid links; however, there are several challenges that still must be addressed in order to achieve a technique with high enough performance to consider it a viable, completely automated solution.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128053424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-30DOI: 10.1109/ASE.2017.8115665
Olaf Leßenich, S. Apel, Christian Kästner, Georg Seibt, J. Siegmund
Diffing and merging of source-code artifacts is an essential task when integrating changes in software versions. While state-of-the-art line-based merge tools (e.g., git merge) are fast and independent of the programming language used, they have only a low precision. Recently, it has been shown that the precision of merging can be substantially improved by using a language-aware, structured approach that works on abstract syntax trees. But, precise structured merging is NP hard, especially, when considering the notoriously difficult scenarios of renamings and shifted code. To address these scenarios without compromising scalability, we propose a syntax-aware, heuristic optimization for structured merging that employs a lookahead mechanism during tree matching. The key idea is that renamings and shifted code are not arbitrarily distributed, but their occurrence follows patterns, which we address with a syntax-specific lookahead. Our experiments with 48 real-world open-source projects (4,878 merge scenarios with over 400 million lines of code) demonstrate that we can significantly improve matching precision in 28 percent of cases while maintaining performance.
{"title":"Renaming and shifted code in structured merging: Looking ahead for precision and performance","authors":"Olaf Leßenich, S. Apel, Christian Kästner, Georg Seibt, J. Siegmund","doi":"10.1109/ASE.2017.8115665","DOIUrl":"https://doi.org/10.1109/ASE.2017.8115665","url":null,"abstract":"Diffing and merging of source-code artifacts is an essential task when integrating changes in software versions. While state-of-the-art line-based merge tools (e.g., git merge) are fast and independent of the programming language used, they have only a low precision. Recently, it has been shown that the precision of merging can be substantially improved by using a language-aware, structured approach that works on abstract syntax trees. But, precise structured merging is NP hard, especially, when considering the notoriously difficult scenarios of renamings and shifted code. To address these scenarios without compromising scalability, we propose a syntax-aware, heuristic optimization for structured merging that employs a lookahead mechanism during tree matching. The key idea is that renamings and shifted code are not arbitrarily distributed, but their occurrence follows patterns, which we address with a syntax-specific lookahead. Our experiments with 48 real-world open-source projects (4,878 merge scenarios with over 400 million lines of code) demonstrate that we can significantly improve matching precision in 28 percent of cases while maintaining performance.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132257936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}