Andreas Katis, Grigory Fedyukovich, Jeffrey Chen, D. Greve, Sanjai Rayadurgam, M. Whalen
Diversity in the exhibited behavior of a given system is a desirable characteristic in a variety of application contexts. Synthesis of conformant implementations often proceeds by discovering witnessing Skolem functions, which are traditionally deterministic. In this paper, we present a novel Skolem extraction algorithm to enable synthesis of witnesses with random behavior and demonstrate its applicability in the context of reactive systems. The synthesized solutions are guaranteed by design to meet the given specification, while exhibiting a high degree of diversity in their responses to external stimuli. Case studies demonstrate how our proposed framework unveils a novel application of synthesis in model-based fuzz testing to generate fuzzers of competitive performance to general-purpose alternatives, as well as the practical utility of synthesized controllers in robot motion planning problems.
{"title":"Synthesis of Infinite-State Systems with Random Behavior","authors":"Andreas Katis, Grigory Fedyukovich, Jeffrey Chen, D. Greve, Sanjai Rayadurgam, M. Whalen","doi":"10.1145/3324884.3416586","DOIUrl":"https://doi.org/10.1145/3324884.3416586","url":null,"abstract":"Diversity in the exhibited behavior of a given system is a desirable characteristic in a variety of application contexts. Synthesis of conformant implementations often proceeds by discovering witnessing Skolem functions, which are traditionally deterministic. In this paper, we present a novel Skolem extraction algorithm to enable synthesis of witnesses with random behavior and demonstrate its applicability in the context of reactive systems. The synthesized solutions are guaranteed by design to meet the given specification, while exhibiting a high degree of diversity in their responses to external stimuli. Case studies demonstrate how our proposed framework unveils a novel application of synthesis in model-based fuzz testing to generate fuzzers of competitive performance to general-purpose alternatives, as well as the practical utility of synthesized controllers in robot motion planning problems.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121233251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xian Zhan, Lingling Fan, Tianming Liu, Sen Chen, Li Li, Haoyu Wang, Yifei Xu, Xiapu Luo, Yang Liu
Third-party libraries (TPLs) have become a significant part of the Android ecosystem. Developers can employ various TPLs with different functionalities to facilitate their app development. Unfortunately, the popularity of TPLs also brings new challenges and even threats. TPLs may carry malicious or vulnerable code, which can infect popular apps to pose threats to mobile users. Besides, the code of third-party libraries could constitute noises in some downstream tasks (e.g., malware and repackaged app detection). Thus, researchers have developed various tools to identify TPLs. However, no existing work has studied these TPL detection tools in detail; different tools focus on different applications with performance differences, but little is known about them. To better understand existing TPL detection tools and dissect TPL detection techniques, we conduct a comprehensive empirical study to fill the gap by evaluating and comparing all publicly available TPL detection tools based on four criteria: effectiveness, efficiency, code obfuscation-resilience capability, and ease of use. We reveal their advantages and disadvantages based on a systematic and thorough empirical study. Furthermore, we also conduct a user study to evaluate the usability of each tool. The results showthat LibScout outperforms others regarding effectiveness, LibRadar takes less time than others and is also regarded as the most easy-to-use one, and LibPecker performs the best in defending against code obfuscation techniques. We further summarize the lessons learned from different perspectives, including users, tool implementation, and researchers. Besides, we enhance these open-sourced tools by fixing their limitations to improve their detection ability. We also build an extensible framework that integrates all existing available TPL detection tools, providing online service for the research community. We make publicly available the evaluation dataset and enhanced tools. We believe our work provides a clear picture of existing TPL detection techniques and also give a road-map for future directions.
{"title":"Automated Third-Party Library Detection for Android Applications: Are We There Yet?","authors":"Xian Zhan, Lingling Fan, Tianming Liu, Sen Chen, Li Li, Haoyu Wang, Yifei Xu, Xiapu Luo, Yang Liu","doi":"10.1145/3324884.3416582","DOIUrl":"https://doi.org/10.1145/3324884.3416582","url":null,"abstract":"Third-party libraries (TPLs) have become a significant part of the Android ecosystem. Developers can employ various TPLs with different functionalities to facilitate their app development. Unfortunately, the popularity of TPLs also brings new challenges and even threats. TPLs may carry malicious or vulnerable code, which can infect popular apps to pose threats to mobile users. Besides, the code of third-party libraries could constitute noises in some downstream tasks (e.g., malware and repackaged app detection). Thus, researchers have developed various tools to identify TPLs. However, no existing work has studied these TPL detection tools in detail; different tools focus on different applications with performance differences, but little is known about them. To better understand existing TPL detection tools and dissect TPL detection techniques, we conduct a comprehensive empirical study to fill the gap by evaluating and comparing all publicly available TPL detection tools based on four criteria: effectiveness, efficiency, code obfuscation-resilience capability, and ease of use. We reveal their advantages and disadvantages based on a systematic and thorough empirical study. Furthermore, we also conduct a user study to evaluate the usability of each tool. The results showthat LibScout outperforms others regarding effectiveness, LibRadar takes less time than others and is also regarded as the most easy-to-use one, and LibPecker performs the best in defending against code obfuscation techniques. We further summarize the lessons learned from different perspectives, including users, tool implementation, and researchers. Besides, we enhance these open-sourced tools by fixing their limitations to improve their detection ability. We also build an extensible framework that integrates all existing available TPL detection tools, providing online service for the research community. We make publicly available the evaluation dataset and enhanced tools. We believe our work provides a clear picture of existing TPL detection techniques and also give a road-map for future directions.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"718 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122994433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GUI complexity posts a great challenge to the GUI implementation. According to our pilot study of crowdtesting bug reports, display issues such as text overlap, blurred screen, missing image always occur during GUI rendering on difference devices due to the software or hardware compatibility. They negatively influence the app usability, resulting in poor user experience. To detect these issues, we propose a novel approach, OwlEye, based on deep learning for modelling visual information of the GUI screenshot.Therefore, OwlEye can detect GUIs with display issues and also locate the detailed region of the issue in the given GUI for guiding developers to fix the bug. We manually construct a large-scale labelled dataset with 4,470 GUI screenshots with UI display issues. We develop a heuristics-based data augmentation method and a GAN-based data augmentation method for boosting the performance of our OwlEye. At present, the evaluation demonstrates that our OwlEye can achieve 85% precision and 84% recall in detecting UI display issues, and 90% accuracy in localizing these issues.
{"title":"Discovering UI Display Issues with Visual Understanding","authors":"Zhe Liu","doi":"10.1145/3324884.3418917","DOIUrl":"https://doi.org/10.1145/3324884.3418917","url":null,"abstract":"GUI complexity posts a great challenge to the GUI implementation. According to our pilot study of crowdtesting bug reports, display issues such as text overlap, blurred screen, missing image always occur during GUI rendering on difference devices due to the software or hardware compatibility. They negatively influence the app usability, resulting in poor user experience. To detect these issues, we propose a novel approach, OwlEye, based on deep learning for modelling visual information of the GUI screenshot.Therefore, OwlEye can detect GUIs with display issues and also locate the detailed region of the issue in the given GUI for guiding developers to fix the bug. We manually construct a large-scale labelled dataset with 4,470 GUI screenshots with UI display issues. We develop a heuristics-based data augmentation method and a GAN-based data augmentation method for boosting the performance of our OwlEye. At present, the evaluation demonstrates that our OwlEye can achieve 85% precision and 84% recall in detecting UI display issues, and 90% accuracy in localizing these issues.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117145243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julian Frattini, Maximilian Junker, M. Unterkalmsteiner, D. Méndez
Background: The detection and extraction of causality from natural language sentences have shown great potential in various fields of application. The field of requirements engineering is eligible for multiple reasons: (1) requirements artifacts are primarily written in natural language, (2) causal sentences convey essential context about the subject of requirements, and (3) extracted and formalized causality relations are usable for a (semi-)automatic translation into further artifacts, such as test cases. Objective: We aim at understanding the value of interactive causality extraction based on syntactic criteria for the context of requirements engineering. Method: We developed a prototype of a system for automatic causality extraction and evaluate it by applying it to a set of publicly available requirements artifacts, determining whether the automatic extraction reduces the manual effort of requirements formalization. Result: During the evaluation we analyzed 4457 natural language sentences from 18 requirements documents, 558 of which were causal (12.52%). The best evaluation of a requirements document provided an automatic extraction of 48.57% cause-effect graphs on average, which demonstrates the feasibility of the approach. Limitation: The feasibility of the approach has been proven in theory but lacks exploration of being scaled up for practical use. Evaluating the applicability of the automatic causality extraction for a requirements engineer is left for future research. Conclusion: A syntactic approach for causality extraction is viable for the context of requirements engineering and can aid a pipeline towards an automatic generation of further artifacts from requirements artifacts.
{"title":"Automatic Extraction of Cause-Effect-Relations from Requirements Artifacts","authors":"Julian Frattini, Maximilian Junker, M. Unterkalmsteiner, D. Méndez","doi":"10.1145/3324884.3416549","DOIUrl":"https://doi.org/10.1145/3324884.3416549","url":null,"abstract":"Background: The detection and extraction of causality from natural language sentences have shown great potential in various fields of application. The field of requirements engineering is eligible for multiple reasons: (1) requirements artifacts are primarily written in natural language, (2) causal sentences convey essential context about the subject of requirements, and (3) extracted and formalized causality relations are usable for a (semi-)automatic translation into further artifacts, such as test cases. Objective: We aim at understanding the value of interactive causality extraction based on syntactic criteria for the context of requirements engineering. Method: We developed a prototype of a system for automatic causality extraction and evaluate it by applying it to a set of publicly available requirements artifacts, determining whether the automatic extraction reduces the manual effort of requirements formalization. Result: During the evaluation we analyzed 4457 natural language sentences from 18 requirements documents, 558 of which were causal (12.52%). The best evaluation of a requirements document provided an automatic extraction of 48.57% cause-effect graphs on average, which demonstrates the feasibility of the approach. Limitation: The feasibility of the approach has been proven in theory but lacks exploration of being scaled up for practical use. Evaluating the applicability of the automatic causality extraction for a requirements engineer is left for future research. Conclusion: A syntactic approach for causality extraction is viable for the context of requirements engineering and can aid a pipeline towards an automatic generation of further artifacts from requirements artifacts.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"348 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132418197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ability in rapidly learning and adapting to evolving user needs is key to modern business successes. Existing methods are based on text mining and machine learning techniques to analyze user comments and feedback, and often constrained by heavy reliance on manually codified rules or insufficient training data. Multitask learning (MTL) is an effective approach with many successful applications, with the potential to address these limitations associated with requirements analysis tasks. In this paper, we propose a deep MTL-based approach, DEMAR, to address these limitations when discovering requirements from massive issue reports and annotating the sentences in support of automated requirements analysis. DEMAR consists of three main phases: (1) data augmentation phase, for data preparation and allowing data sharing beyond single task learning; (2) model construction phase, for constructing the MTL-based model for requirements discovery and requirements annotation tasks; and (3) model training phase, enabling eavesdropping by shared loss function between the two related tasks. Evaluation results from eight open-source projects show that, the proposed multitask learning approach outperforms two state-of-the-art approaches (CNC and FRA) and six common machine learning algorithms, with the precision of 91 % and the recall of 83% for requirements discovery task, and the overall accuracy of 83% for requirements annotation task. The proposed approach provides a novel and effective way to jointly learn two related requirements analysis tasks. We believe that it also sheds light on further directions of exploring multitask learning in solving other software engineering problems.
{"title":"A Deep Multitask Learning Approach for Requirements Discovery and Annotation from Open Forum","authors":"Mingyang Li, Lin Shi, Ye Yang, Qing Wang","doi":"10.1145/3324884.3416627","DOIUrl":"https://doi.org/10.1145/3324884.3416627","url":null,"abstract":"The ability in rapidly learning and adapting to evolving user needs is key to modern business successes. Existing methods are based on text mining and machine learning techniques to analyze user comments and feedback, and often constrained by heavy reliance on manually codified rules or insufficient training data. Multitask learning (MTL) is an effective approach with many successful applications, with the potential to address these limitations associated with requirements analysis tasks. In this paper, we propose a deep MTL-based approach, DEMAR, to address these limitations when discovering requirements from massive issue reports and annotating the sentences in support of automated requirements analysis. DEMAR consists of three main phases: (1) data augmentation phase, for data preparation and allowing data sharing beyond single task learning; (2) model construction phase, for constructing the MTL-based model for requirements discovery and requirements annotation tasks; and (3) model training phase, enabling eavesdropping by shared loss function between the two related tasks. Evaluation results from eight open-source projects show that, the proposed multitask learning approach outperforms two state-of-the-art approaches (CNC and FRA) and six common machine learning algorithms, with the precision of 91 % and the recall of 83% for requirements discovery task, and the overall accuracy of 83% for requirements annotation task. The proposed approach provides a novel and effective way to jointly learn two related requirements analysis tasks. We believe that it also sheds light on further directions of exploring multitask learning in solving other software engineering problems.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"42 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131656236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Randomised compiler testing techniques require a means of generating programs that are free from undefined behaviour (UB) in order to reliably reveal miscompilation bugs. Existing program generators such as Csmith heavily restrict the form of generated programs in order to achieve DB-freedom. We hypothesise that the idiomatic nature of such programs limits the test coverage they can offer. Our idea is to generate less restricted programs that are still UB-free-programs that get closer to the edge of UB, but that do not quite cross the edge. We present preliminary support for our idea via a prototype tool, Csmithedge, which uses simple dynamic analysis to determine where Csmith has been too conservative in its use of safe math wrappers that guarantee UB-freedom for arithmetic operations. By eliminating redundant wrappers, Csmithedge was able to discover two new miscompilation bugs in GCC that could not be found via intensive testing using regular Csmith, and to achieve substantial differences in code coverage on GCC compared with regular Csmith. CCS CONCEPTS • Software and its engineering →Compilers; Software verification and validation.
{"title":"Closer to the Edge: Testing Compilers More Thoroughly by Being Less Conservative About Undefined Behaviour","authors":"Karine Even-Mendoza, Cristian Cadar, A. Donaldson","doi":"10.1145/3324884.3418933","DOIUrl":"https://doi.org/10.1145/3324884.3418933","url":null,"abstract":"Randomised compiler testing techniques require a means of generating programs that are free from undefined behaviour (UB) in order to reliably reveal miscompilation bugs. Existing program generators such as Csmith heavily restrict the form of generated programs in order to achieve DB-freedom. We hypothesise that the idiomatic nature of such programs limits the test coverage they can offer. Our idea is to generate less restricted programs that are still UB-free-programs that get closer to the edge of UB, but that do not quite cross the edge. We present preliminary support for our idea via a prototype tool, Csmithedge, which uses simple dynamic analysis to determine where Csmith has been too conservative in its use of safe math wrappers that guarantee UB-freedom for arithmetic operations. By eliminating redundant wrappers, Csmithedge was able to discover two new miscompilation bugs in GCC that could not be found via intensive testing using regular Csmith, and to achieve substantial differences in code coverage on GCC compared with regular Csmith. CCS CONCEPTS • Software and its engineering →Compilers; Software verification and validation.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131059284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software reliability is a primary concern in the construction of software, and thus a fundamental component in the definition of software quality. Analyzing software reliability requires a specification of the intended behavior of the software under analysis. Unfortunately, software many times lacks such specifications. This issue seriously diminishes the analyzability of software with respect to its reliability. Thus, finding novel techniques to capture the intended software behavior in the form of specifications would allow us to exploit them for automated reliability analysis. Our research focuses on the application of learning techniques to automatically distinguish correct from incorrect software behavior. The aim here is to decrease the developer's effort in specifying oracles, and instead generating them from actual software behaviors.
{"title":"Applying Learning Techniques to Oracle Synthesis","authors":"F. Molina","doi":"10.1145/3324884.3415287","DOIUrl":"https://doi.org/10.1145/3324884.3415287","url":null,"abstract":"Software reliability is a primary concern in the construction of software, and thus a fundamental component in the definition of software quality. Analyzing software reliability requires a specification of the intended behavior of the software under analysis. Unfortunately, software many times lacks such specifications. This issue seriously diminishes the analyzability of software with respect to its reliability. Thus, finding novel techniques to capture the intended software behavior in the form of specifications would allow us to exploit them for automated reliability analysis. Our research focuses on the application of learning techniques to automatically distinguish correct from incorrect software behavior. The aim here is to decrease the developer's effort in specifying oracles, and instead generating them from actual software behaviors.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130652361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Today, a plethora of different software verification tools exist. When having a concrete verification task at hand, software developers thus face the problem of algorithm selection. Existing algorithm selectors for software verification typically use handpicked program features together with (1) either manually designed selection heuristics or (2) machine learned strategies. While the first approach suffers from not being transferable to other selection problems, the second approach lacks interpretability, i.e., insights into reasons for choosing particular tools. In this paper, we propose a novel approach to algorithm selection for software verification. Our approach employs representation learning together with an attention mechanism. Representation learning circumvents feature engineering, i.e., avoids the handpicking of program features. Attention permits a form of interpretability of the learned selectors. We have implemented our approach and have experimentally evaluated and compared it with existing approaches. The evaluation shows that representation learning does not only outperform manual feature engineering, but also enables transferability of the learning model to other selection tasks.
{"title":"Attend and Represent: A Novel View on Algorithm Selection for Software Verification","authors":"Cedric Richter, H. Wehrheim","doi":"10.1145/3324884.3416633","DOIUrl":"https://doi.org/10.1145/3324884.3416633","url":null,"abstract":"Today, a plethora of different software verification tools exist. When having a concrete verification task at hand, software developers thus face the problem of algorithm selection. Existing algorithm selectors for software verification typically use handpicked program features together with (1) either manually designed selection heuristics or (2) machine learned strategies. While the first approach suffers from not being transferable to other selection problems, the second approach lacks interpretability, i.e., insights into reasons for choosing particular tools. In this paper, we propose a novel approach to algorithm selection for software verification. Our approach employs representation learning together with an attention mechanism. Representation learning circumvents feature engineering, i.e., avoids the handpicking of program features. Attention permits a form of interpretability of the learned selectors. We have implemented our approach and have experimentally evaluated and compared it with existing approaches. The evaluation shows that representation learning does not only outperform manual feature engineering, but also enables transferability of the learning model to other selection tasks.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126562805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When generating GUI tests for Android apps, it typically is a separate test computer that generates interactions, which are then executed on an actual Android device. While this approach is efficient in the sense that apps and interactions execute quickly, the communication overhead between test computer and device slows down testing considerably. In this work, we present DD-2, a test generator for Android that tests other apps on the device using Android accessibility services. In our experiments, DD-2 has shown to be 3.2 times faster than its computer-device counterpart, while sharing the same source code.
{"title":"Speeding up GUI Testing by On-Device Test Generation","authors":"N. P. Borges, J. Rau, A. Zeller","doi":"10.1145/3324884.3415302","DOIUrl":"https://doi.org/10.1145/3324884.3415302","url":null,"abstract":"When generating GUI tests for Android apps, it typically is a separate test computer that generates interactions, which are then executed on an actual Android device. While this approach is efficient in the sense that apps and interactions execute quickly, the communication overhead between test computer and device slows down testing considerably. In this work, we present DD-2, a test generator for Android that tests other apps on the device using Android accessibility services. In our experiments, DD-2 has shown to be 3.2 times faster than its computer-device counterpart, while sharing the same source code.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116977100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Code comment generation which aims to automatically generate natural language descriptions for source code, is a crucial task in the field of automatic software development. Traditional comment generation methods use manually-crafted templates or information retrieval (IR) techniques to generate summaries for source code. In recent years, neural network-based methods which leveraged acclaimed encoder-decoder deep learning framework to learn comment generation patterns from a large-scale parallel code corpus, have achieved impressive results. However, these emerging methods only take code-related information as input. Software reuse is common in the process of software development, meaning that comments of similar code snippets are helpful for comment generation. Inspired by the IR-based and template-based approaches, in this paper, we propose a neural comment generation approach where we use the existing comments of similar code snippets as exemplars to guide comment generation. Specifically, given a piece of code, we first use an IR technique to retrieve a similar code snippet and treat its comment as an exemplar. Then we design a novel seq2seq neural network that takes the given code, its AST, its similar code, and its exemplar as input, and leverages the information from the exemplar to assist in the target comment generation based on the semantic similarity between the source code and the similar code. We evaluate our approach on a large-scale Java corpus, which contains about 2M samples, and experimental results demonstrate that our model outperforms the state-of-the-art methods by a substantial margin.
{"title":"Retrieve and Refine: Exemplar-based Neural Comment Generation","authors":"Bolin Wei, Yongming Li, Ge Li, Xin Xia, Zhi Jin","doi":"10.1145/3324884.3416578","DOIUrl":"https://doi.org/10.1145/3324884.3416578","url":null,"abstract":"Code comment generation which aims to automatically generate natural language descriptions for source code, is a crucial task in the field of automatic software development. Traditional comment generation methods use manually-crafted templates or information retrieval (IR) techniques to generate summaries for source code. In recent years, neural network-based methods which leveraged acclaimed encoder-decoder deep learning framework to learn comment generation patterns from a large-scale parallel code corpus, have achieved impressive results. However, these emerging methods only take code-related information as input. Software reuse is common in the process of software development, meaning that comments of similar code snippets are helpful for comment generation. Inspired by the IR-based and template-based approaches, in this paper, we propose a neural comment generation approach where we use the existing comments of similar code snippets as exemplars to guide comment generation. Specifically, given a piece of code, we first use an IR technique to retrieve a similar code snippet and treat its comment as an exemplar. Then we design a novel seq2seq neural network that takes the given code, its AST, its similar code, and its exemplar as input, and leverages the information from the exemplar to assist in the target comment generation based on the semantic similarity between the source code and the similar code. We evaluate our approach on a large-scale Java corpus, which contains about 2M samples, and experimental results demonstrate that our model outperforms the state-of-the-art methods by a substantial margin.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124351321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}