Pinjia He, Zhuangbin Chen, Shilin He, Michael R. Lyu
Logging is a common programming practice of great importance in modern software development, because software logs have been widely used in various software maintenance tasks. To provide high-quality logs, developers need to design the description text in logging statements carefully. Inappropriate descriptions will slow down or even mislead the maintenance process, such as postmortem analysis. However, there is currently a lack of rigorous guide and specifications on developer logging behaviors, which makes the construction of description text in logging statements a challenging problem. To fill this significant gap, in this paper, we systematically study what developers log, with focus on the usage of natural language descriptions in logging statements. We obtain 6 valuable findings by conducting source code analysis on 10 Java projects and 7 C# projects, which contain 28,532,975 LOC and 115,159 logging statements in total. Furthermore, our study demonstrates the potential of automated description text generation for logging statements by obtaining up to 49.04 BLEU-4 score and 62.1 ROUGE-L score using a simple information retrieval method. To facilitate future research in this field, the datasets have been publicly released.
{"title":"Characterizing the Natural Language Descriptions in Software Logging Statements","authors":"Pinjia He, Zhuangbin Chen, Shilin He, Michael R. Lyu","doi":"10.1145/3238147.3238193","DOIUrl":"https://doi.org/10.1145/3238147.3238193","url":null,"abstract":"Logging is a common programming practice of great importance in modern software development, because software logs have been widely used in various software maintenance tasks. To provide high-quality logs, developers need to design the description text in logging statements carefully. Inappropriate descriptions will slow down or even mislead the maintenance process, such as postmortem analysis. However, there is currently a lack of rigorous guide and specifications on developer logging behaviors, which makes the construction of description text in logging statements a challenging problem. To fill this significant gap, in this paper, we systematically study what developers log, with focus on the usage of natural language descriptions in logging statements. We obtain 6 valuable findings by conducting source code analysis on 10 Java projects and 7 C# projects, which contain 28,532,975 LOC and 115,159 logging statements in total. Furthermore, our study demonstrates the potential of automated description text generation for logging statements by obtaining up to 49.04 BLEU-4 score and 62.1 ROUGE-L score using a simple information retrieval method. To facilitate future research in this field, the datasets have been publicly released.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"1 1","pages":"178-189"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73044014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributed message systems (DMSs) serve as the communication backbone for many real-time streaming data processing applications. To support the vast diversity of such applications, DMSs provide a large number of parameters to configure. However, It overwhelms for most users to configure these parameters well for better performance. Although many automatic configuration approaches have been proposed to address this issue, critical challenges still remain: 1) to train a better and robust performance prediction model using a limited number of samples, and 2) to search for a high-dimensional parameter space efficiently within a time constraint. In this paper, we propose AutoConfig – an automatic configuration system that can optimize producer-side throughput on DMSs. AutoConfig constructs a novel comparison-based model (CBM) that is more robust that the prediction-based model (PBM) used by previous learning-based approaches. Furthermore, AutoConfig uses a weighted Latin hypercube sampling (wLHS) approach to select a set of samples that can provide a better coverage over the high-dimensional parameter space. wLHS allows AutoConfig to search for more promising configurations using the trained CBM. We have implemented AutoConfig on the Kafka platform, and evaluated it using eight different testing scenarios deployed on a public cloud. Experimental results show that our CBM can obtain better results than that of PBM under the same random forests based model. Furthermore, AutoConfig outperforms default configurations by 215.40% on average, and five state-of-the-art configuration algorithms by 7.21%-64.56%.
{"title":"AutoConfig: Automatic Configuration Tuning for Distributed Message Systems","authors":"Liang Bao, Xin Liu, Ziheng Xu, Baoyin Fang","doi":"10.1145/3238147.3238175","DOIUrl":"https://doi.org/10.1145/3238147.3238175","url":null,"abstract":"Distributed message systems (DMSs) serve as the communication backbone for many real-time streaming data processing applications. To support the vast diversity of such applications, DMSs provide a large number of parameters to configure. However, It overwhelms for most users to configure these parameters well for better performance. Although many automatic configuration approaches have been proposed to address this issue, critical challenges still remain: 1) to train a better and robust performance prediction model using a limited number of samples, and 2) to search for a high-dimensional parameter space efficiently within a time constraint. In this paper, we propose AutoConfig – an automatic configuration system that can optimize producer-side throughput on DMSs. AutoConfig constructs a novel comparison-based model (CBM) that is more robust that the prediction-based model (PBM) used by previous learning-based approaches. Furthermore, AutoConfig uses a weighted Latin hypercube sampling (wLHS) approach to select a set of samples that can provide a better coverage over the high-dimensional parameter space. wLHS allows AutoConfig to search for more promising configurations using the trained CBM. We have implemented AutoConfig on the Kafka platform, and evaluated it using eight different testing scenarios deployed on a public cloud. Experimental results show that our CBM can obtain better results than that of PBM under the same random forests based model. Furthermore, AutoConfig outperforms default configurations by 215.40% on average, and five state-of-the-art configuration algorithms by 7.21%-64.56%.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"68 1","pages":"29-40"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74878102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oscar Soria Dustmann, Klaus Wehrle, Cristian Cadar
Symbolic execution is an effective program analysis technique whose scalability largely depends on the ability to quickly solve large numbers of first-order logic queries. We propose an effective general technique for speeding up the solving of queries in the theory of arrays and bit-vectors with a specific structure, while otherwise falling back to a complete solver. The technique has two stages: a learning stage that determines the solution sets of each symbolic variable, and a decision stage that uses this information to quickly determine the satisfiability of certain types of queries. The main challenges involve deciding which operators to support and precisely dealing with integer type casts and arithmetic underflow and overflow. We implemented this technique in an incomplete solver called PARTI (“PARtial Theory solver for Intervals”), directly integrating it into the popular KLEE symbolic execution engine. We applied KLEE with PARTI and a state-of-the-art SMT solver to synthetic and real-world benchmarks. We found that PARTI practically does not hurt performance while many times achieving order-of-magnitude speedups.
{"title":"PARTI: A Multi-interval Theory Solver for Symbolic Execution","authors":"Oscar Soria Dustmann, Klaus Wehrle, Cristian Cadar","doi":"10.1145/3238147.3238179","DOIUrl":"https://doi.org/10.1145/3238147.3238179","url":null,"abstract":"Symbolic execution is an effective program analysis technique whose scalability largely depends on the ability to quickly solve large numbers of first-order logic queries. We propose an effective general technique for speeding up the solving of queries in the theory of arrays and bit-vectors with a specific structure, while otherwise falling back to a complete solver. The technique has two stages: a learning stage that determines the solution sets of each symbolic variable, and a decision stage that uses this information to quickly determine the satisfiability of certain types of queries. The main challenges involve deciding which operators to support and precisely dealing with integer type casts and arithmetic underflow and overflow. We implemented this technique in an incomplete solver called PARTI (“PARtial Theory solver for Intervals”), directly integrating it into the popular KLEE symbolic execution engine. We applied KLEE with PARTI and a state-of-the-art SMT solver to synthetic and real-world benchmarks. We found that PARTI practically does not hurt performance while many times achieving order-of-magnitude speedups.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"11 1","pages":"430-440"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81907127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software is often built by integrating components created by different teams or even different organizations. With little understanding of changes in dependent components, it is challenging to maintain correctness and robustness of the entire system. In this paper, we investigate the effect of component changes on the behavior of their clients. We observe that changes in a component are often irrelevant to a particular client and thus can be adopted without any delays or negative effects. Following this observation, we formulate the notion of client-specific equivalence checking (CSE) and develop an automated technique optimized for checking such equivalence. We evaluate our technique on a set of benchmarks, including those from the existing literature on equivalence checking, and show its applicability and effectiveness.
{"title":"Client-Specific Equivalence Checking","authors":"Federico Mora, Yi Li, J. Rubin, M. Chechik","doi":"10.1145/3238147.3238178","DOIUrl":"https://doi.org/10.1145/3238147.3238178","url":null,"abstract":"Software is often built by integrating components created by different teams or even different organizations. With little understanding of changes in dependent components, it is challenging to maintain correctness and robustness of the entire system. In this paper, we investigate the effect of component changes on the behavior of their clients. We observe that changes in a component are often irrelevant to a particular client and thus can be adopted without any delays or negative effects. Following this observation, we formulate the notion of client-specific equivalence checking (CSE) and develop an automated technique optimized for checking such equivalence. We evaluate our technique on a set of benchmarks, including those from the existing literature on equivalence checking, and show its applicability and effectiveness.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"1 1","pages":"441-451"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83177627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keheliya Gallaba, Christian Macho, M. Pinzger, Shane McIntosh
Automated builds, which may pass or fail, provide feedback to a development team about changes to the codebase. A passing build indicates that the change compiles cleanly and tests (continue to) pass. A failing (a.k.a., broken) build indicates that there are issues that require attention. Without a closer analysis of the nature of build outcome data, practitioners and researchers are likely to make two critical assumptions: (1) build results are not noisy; however, passing builds may contain failing or skipped jobs that are actively or passively ignored; and (2) builds are equal; however, builds vary in terms of the number of jobs and configurations. To investigate the degree to which these assumptions about build breakage hold, we perform an empirical study of 3.7 million build jobs spanning 1,276 open source projects. We find that: (1) 12% of passing builds have an actively ignored failure; (2) 9% of builds have a misleading or incorrect outcome on average; and (3) at least 44% of the broken builds contain passing jobs, i.e., the breakage is local to a subset of build variants. Like other software archives, build data is noisy and complex. Analysis of build data requires nuance.
{"title":"Noise and Heterogeneity in Historical Build Data: An Empirical Study of Travis CI","authors":"Keheliya Gallaba, Christian Macho, M. Pinzger, Shane McIntosh","doi":"10.1145/3238147.3238171","DOIUrl":"https://doi.org/10.1145/3238147.3238171","url":null,"abstract":"Automated builds, which may pass or fail, provide feedback to a development team about changes to the codebase. A passing build indicates that the change compiles cleanly and tests (continue to) pass. A failing (a.k.a., broken) build indicates that there are issues that require attention. Without a closer analysis of the nature of build outcome data, practitioners and researchers are likely to make two critical assumptions: (1) build results are not noisy; however, passing builds may contain failing or skipped jobs that are actively or passively ignored; and (2) builds are equal; however, builds vary in terms of the number of jobs and configurations. To investigate the degree to which these assumptions about build breakage hold, we perform an empirical study of 3.7 million build jobs spanning 1,276 open source projects. We find that: (1) 12% of passing builds have an actively ignored failure; (2) 9% of builds have a misleading or incorrect outcome on average; and (3) at least 44% of the broken builds contain passing jobs, i.e., the breakage is local to a subset of build variants. Like other software archives, build data is noisy and complex. Analysis of build data requires nuance.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"1 1","pages":"87-97"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89710174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automated program repair is an active research area. However, existing research focuses mostly on imperative code, e.g. in Java. In this paper, we study the problem of repairing declarative models in Alloy - a first order relational logic with transitive closure. We introduce ARepair, the first technique for repairing Alloy models. ARepair follows the spirit of traditional automated program repair techniques. Specifically, ARepair takes as input a faulty Alloy model and a test suite that contains some failing test, and outputs a repaired model that is correct with respect to the given tests. ARepair integrates ideas from mutation testing and program synthesis to provide an effective solution for repairing Alloy models. The experimental results show that ARepair can fix 28 out of 38 real-world faulty models we collected.
{"title":"Automated Model Repair for Alloy","authors":"Kaiyuan Wang, Allison Sullivan, S. Khurshid","doi":"10.1145/3238147.3238162","DOIUrl":"https://doi.org/10.1145/3238147.3238162","url":null,"abstract":"Automated program repair is an active research area. However, existing research focuses mostly on imperative code, e.g. in Java. In this paper, we study the problem of repairing declarative models in Alloy - a first order relational logic with transitive closure. We introduce ARepair, the first technique for repairing Alloy models. ARepair follows the spirit of traditional automated program repair techniques. Specifically, ARepair takes as input a faulty Alloy model and a test suite that contains some failing test, and outputs a repaired model that is correct with respect to the given tests. ARepair integrates ideas from mutation testing and program synthesis to provide an effective solution for repairing Alloy models. The experimental results show that ARepair can fix 28 out of 38 real-world faulty models we collected.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"110 1","pages":"577-588"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89291543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software refactoring is widely employed to improve software quality. A key step in software refactoring is to identify which part of the software should be refactored. To facilitate the identification, a number of approaches have been proposed to identify certain structures in the code (called code smells) that suggest the possibility of refactoring. Most of such approaches rely on manually designed heuristics to map manually selected source code metrics to predictions. However, it is challenging to manually select the best features, especially textual features. It is also difficult to manually construct the optimal heuristics. To this end, in this paper we propose a deep learning based novel approach to detecting feature envy, one of the most common code smells. The key insight is that deep neural networks and advanced deep learning techniques could automatically select features (especially textual features) of source code for feature envy detection, and could automatically build the complex mapping between such features and predictions. We also propose an automatic approach to generating labeled training data for the neural network based classifier, which does not require any human intervention. Evaluation results on open-source applications suggest that the proposed approach significantly improves the state-of-the-art in both detecting feature envy smells and recommending destinations for identified smelly methods.
{"title":"Deep Learning Based Feature Envy Detection","authors":"Hui Liu, Zhifeng Xu, Yanzhen Zou","doi":"10.1145/3238147.3238166","DOIUrl":"https://doi.org/10.1145/3238147.3238166","url":null,"abstract":"Software refactoring is widely employed to improve software quality. A key step in software refactoring is to identify which part of the software should be refactored. To facilitate the identification, a number of approaches have been proposed to identify certain structures in the code (called code smells) that suggest the possibility of refactoring. Most of such approaches rely on manually designed heuristics to map manually selected source code metrics to predictions. However, it is challenging to manually select the best features, especially textual features. It is also difficult to manually construct the optimal heuristics. To this end, in this paper we propose a deep learning based novel approach to detecting feature envy, one of the most common code smells. The key insight is that deep neural networks and advanced deep learning techniques could automatically select features (especially textual features) of source code for feature envy detection, and could automatically build the complex mapping between such features and predictions. We also propose an automatic approach to generating labeled training data for the neural network based classifier, which does not require any human intervention. Evaluation results on open-source applications suggest that the proposed approach significantly improves the state-of-the-art in both detecting feature envy smells and recommending destinations for identified smelly methods.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"49 1","pages":"385-396"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85868249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiajun Hu, Lili Wei, Yepang Liu, S. Cheung, Huaxun Huang
WebView is a widely used Android component that augments a native app with web browser capabilities. It eases the interactions between an app's native code and web code. However, the interaction mechanism of WebView induces new types of bugs in Android apps. Understanding the characteristics and manifestation of these WebView-induced bugs ($omega text{Bugs}$ for short) facilitates the correct usages of WebViews in Android apps. This motivates us to conduct the first empirical study on $omega text{Bugs}$ based on those found in popular open-source Android apps. Our study identified the major root causes and consequences of $omega text{Bugs}$ and made interesting observations that can be leveraged for detecting and diagnosing $omega text{Bugs}$. Based on the empirical study, we further propose an automated testing technique $omega text{Droid}$ to effectively expose $omega text{Bugs}$ in Android apps. In our experiments, $omega text{Droid}$ successfully discovered 30 unique and previously-unknown $omega text{Bugs}$ when applied to 146 open-source Android apps. We reported the 30 $omega text{Bugs}$ to the corresponding app developers. Out of these 30 $omega text{Bugs}$, 14 were confirmed and 7 of them were fixed. This shows that $omega text{Droid}$ can effectively detect $omega text{Bugs}$ that are of the developers' concern.
{"title":"A Tale of Two Cities: How WebView Induces Bugs to Android Applications","authors":"Jiajun Hu, Lili Wei, Yepang Liu, S. Cheung, Huaxun Huang","doi":"10.1145/3238147.3238180","DOIUrl":"https://doi.org/10.1145/3238147.3238180","url":null,"abstract":"WebView is a widely used Android component that augments a native app with web browser capabilities. It eases the interactions between an app's native code and web code. However, the interaction mechanism of WebView induces new types of bugs in Android apps. Understanding the characteristics and manifestation of these WebView-induced bugs ($omega text{Bugs}$ for short) facilitates the correct usages of WebViews in Android apps. This motivates us to conduct the first empirical study on $omega text{Bugs}$ based on those found in popular open-source Android apps. Our study identified the major root causes and consequences of $omega text{Bugs}$ and made interesting observations that can be leveraged for detecting and diagnosing $omega text{Bugs}$. Based on the empirical study, we further propose an automated testing technique $omega text{Droid}$ to effectively expose $omega text{Bugs}$ in Android apps. In our experiments, $omega text{Droid}$ successfully discovered 30 unique and previously-unknown $omega text{Bugs}$ when applied to 146 open-source Android apps. We reported the 30 $omega text{Bugs}$ to the corresponding app developers. Out of these 30 $omega text{Bugs}$, 14 were confirmed and 7 of them were fixed. This shows that $omega text{Droid}$ can effectively detect $omega text{Bugs}$ that are of the developers' concern.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"69 1","pages":"702-713"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89100252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mikaela Cashman, Myra B. Cohen, P. Ranjan, R. Cottingham
The bioinformatics software domain contains thousands of applications for automating tasks such as the pairwise alignment of DNA sequences, building and reasoning about metabolic models or simulating growth of an organism. Its end users range from sophisticated developers to those with little computational experience. In response to their needs, developers provide many options to customize the way their algorithms are tuned. Yet there is little or no automated help for the user in determining the consequences or impact of the options they choose. In this paper we describe our experience working with configurable bioinformatics tools. We find limited documentation and help for combining and selecting options along with variation in both functionality and performance. We also find previously undetected faults. We summarize our findings with a set of lessons learned, and present a roadmap for creating automated techniques to interact with bioinformatics software. We believe these will generalize to other types of scientific software.
{"title":"Navigating the Maze: The Impact of Configurability in Bioinformatics Software","authors":"Mikaela Cashman, Myra B. Cohen, P. Ranjan, R. Cottingham","doi":"10.1145/3238147.3240466","DOIUrl":"https://doi.org/10.1145/3238147.3240466","url":null,"abstract":"The bioinformatics software domain contains thousands of applications for automating tasks such as the pairwise alignment of DNA sequences, building and reasoning about metabolic models or simulating growth of an organism. Its end users range from sophisticated developers to those with little computational experience. In response to their needs, developers provide many options to customize the way their algorithms are tuned. Yet there is little or no automated help for the user in determining the consequences or impact of the options they choose. In this paper we describe our experience working with configurable bioinformatics tools. We find limited documentation and help for combining and selecting options along with variation in both functionality and performance. We also find previously undetected faults. We summarize our findings with a set of lessons learned, and present a roadmap for creating automated techniques to interact with bioinformatics software. We believe these will generalize to other types of scientific software.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"32 1","pages":"757-767"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77499179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software performance is important for ensuring the quality of software products. Performance bugs, defined as programming errors that cause significant performance degradation, can lead to slow systems and poor user experience. While there has been some research on automated performance testing such as test case generation, the main idea is to select workload values to increase the program execution times. These techniques often assume the initial test cases have the right combination of input parameters and focus on evolving values of certain input parameters. However, such an assumption may not hold for highly configurable real-word applications, in which the combinations of input parameters can be very large. In this paper, we manually analyze 300 bug reports from three large open source projects - Apache HTTP Server, MySQL, and Mozilla Firefox. We found that 1) exposing performance bugs often requires combinations of multiple input parameters, and 2) certain input parameters are frequently involved in exposing performance bugs. Guided by these findings, we designed and evaluated an automated approach, PerfLearner, to extract execution commands and input parameters from descriptions of performance bug reports and use them to generate test frames for guiding actual performance test case generation.
{"title":"PerfLearner: Learning from Bug Reports to Understand and Generate Performance Test Frames","authors":"Xue Han, Tingting Yu, D. Lo","doi":"10.1145/3238147.3238204","DOIUrl":"https://doi.org/10.1145/3238147.3238204","url":null,"abstract":"Software performance is important for ensuring the quality of software products. Performance bugs, defined as programming errors that cause significant performance degradation, can lead to slow systems and poor user experience. While there has been some research on automated performance testing such as test case generation, the main idea is to select workload values to increase the program execution times. These techniques often assume the initial test cases have the right combination of input parameters and focus on evolving values of certain input parameters. However, such an assumption may not hold for highly configurable real-word applications, in which the combinations of input parameters can be very large. In this paper, we manually analyze 300 bug reports from three large open source projects - Apache HTTP Server, MySQL, and Mozilla Firefox. We found that 1) exposing performance bugs often requires combinations of multiple input parameters, and 2) certain input parameters are frequently involved in exposing performance bugs. Guided by these findings, we designed and evaluated an automated approach, PerfLearner, to extract execution commands and input parameters from descriptions of performance bug reports and use them to generate test frames for guiding actual performance test case generation.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"99 1","pages":"17-28"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88568544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}