Automatic unit test generation that explores the input space and produces effective test cases for given programs have been studied for decades. Many unit test generation tools that can help generate unit test cases with high structural coverage over a program have been examined. However, the fact that existing test generation tools are mainly evaluated on general software programs calls into question about its practical effectiveness and usefulness for machine learning libraries, which are statistically orientated and have fundamentally different nature and construction from general software projects. In this paper, we set out to investigate the effectiveness of existing unit test generation techniques on machine learning libraries. To investigate this issue, we conducted an empirical study on five widely used machine learning libraries with two popular unit testcase generation tools, i.e., EVOSUITE and Randoop. We find that (1) most of the machine learning libraries do not maintain a high-quality unit test suite regarding commonly applied quality metrics such as code coverage (on average is 34.1%) and mutation score (on average is 21.3%), (2) unit test case generation tools, i.e., EVOSUITE and Randoop, lead to clear improvements in code coverage and mutation score, however, the improvement is limited, and (3) there exist common patterns in the uncovered code across the five machine learning libraries that can be used to improve unit test case generation tasks.
{"title":"Automatic Unit Test Generation for Machine Learning Libraries: How Far Are We?","authors":"Song Wang, Nishtha Shrestha, Abarna Kucheri Subburaman, Junjie Wang, Moshi Wei, Nachiappan Nagappan","doi":"10.1109/ICSE43902.2021.00138","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00138","url":null,"abstract":"Automatic unit test generation that explores the input space and produces effective test cases for given programs have been studied for decades. Many unit test generation tools that can help generate unit test cases with high structural coverage over a program have been examined. However, the fact that existing test generation tools are mainly evaluated on general software programs calls into question about its practical effectiveness and usefulness for machine learning libraries, which are statistically orientated and have fundamentally different nature and construction from general software projects. In this paper, we set out to investigate the effectiveness of existing unit test generation techniques on machine learning libraries. To investigate this issue, we conducted an empirical study on five widely used machine learning libraries with two popular unit testcase generation tools, i.e., EVOSUITE and Randoop. We find that (1) most of the machine learning libraries do not maintain a high-quality unit test suite regarding commonly applied quality metrics such as code coverage (on average is 34.1%) and mutation score (on average is 21.3%), (2) unit test case generation tools, i.e., EVOSUITE and Randoop, lead to clear improvements in code coverage and mutation score, however, the improvement is limited, and (3) there exist common patterns in the uncovered code across the five machine learning libraries that can be used to improve unit test case generation tasks.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134524495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00018
Peilun Zhang, Yanjie Jiang, Anjiang Wei, V. Stodden, D. Marinov, A. Shi
Library developers can provide classes and methods with underdetermined specifications that allow flexibility in future implementations. Library users may write code that relies on a specific implementation rather than on the specification, e.g., assuming mistakenly that the order of elements cannot change in the future. Prior work proposed the NonDex approach that detects such wrong assumptions. We present a novel approach, called DexFix, to repair wrong assumptions on underdetermined specifications in an automated way. We run the NonDex tool on 200 open-source Java projects and detect 275 tests that fail due to wrong assumptions. The majority of failures are from iterating over HashMap/HashSet collections and the getDeclaredFields method. We provide several new repair strategies that can fix these violations in both the test code and the main code. DexFix proposes fixes for 119 tests from the detected 275 tests. We have already reported fixes for 102 tests as GitHub pull requests: 74 have been merged, with only 5 rejected, and the remaining pending.
{"title":"Domain-Specific Fixes for Flaky Tests with Wrong Assumptions on Underdetermined Specifications","authors":"Peilun Zhang, Yanjie Jiang, Anjiang Wei, V. Stodden, D. Marinov, A. Shi","doi":"10.1109/ICSE43902.2021.00018","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00018","url":null,"abstract":"Library developers can provide classes and methods with underdetermined specifications that allow flexibility in future implementations. Library users may write code that relies on a specific implementation rather than on the specification, e.g., assuming mistakenly that the order of elements cannot change in the future. Prior work proposed the NonDex approach that detects such wrong assumptions. We present a novel approach, called DexFix, to repair wrong assumptions on underdetermined specifications in an automated way. We run the NonDex tool on 200 open-source Java projects and detect 275 tests that fail due to wrong assumptions. The majority of failures are from iterating over HashMap/HashSet collections and the getDeclaredFields method. We provide several new repair strategies that can fix these violations in both the test code and the main code. DexFix proposes fixes for 119 tests from the detected 275 tests. We have already reported fixes for 102 tests as GitHub pull requests: 74 have been merged, with only 5 rejected, and the remaining pending.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115242013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00019
Dong Jae Kim, Nikolaos Tsantalis, T. Chen, Jinqiu Yang
Since the introduction of annotations in Java 5, the majority of testing frameworks, such as JUnit, TestNG, and Mockito, have adopted annotations in their core design. This adoption affected the testing practices in every step of the test life-cycle, from fixture setup and test execution to fixture teardown. Despite the importance of test annotations, most research on test maintenance has mainly focused on test code quality and test assertions. As a result, there is little empirical evidence on the evolution and maintenance of test annotations. To fill this gap, we perform the first fine-grained empirical study on annotation changes. We developed a tool to mine 82,810 commits and detect 23,936 instances of test annotation changes from 12 open-source Java projects. Our main findings are: (1) Test annotation changes are more frequent than rename and type change refactorings. (2) We recover various migration efforts within the same testing framework or between different frameworks by analyzing common annotation replacement patterns. (3) We create a taxonomy by manually inspecting and classifying a sample of 368 test annotation changes and documenting the motivations driving these changes. Finally, we present a list of actionable implications for developers, researchers, and framework designers.
{"title":"Studying Test Annotation Maintenance in the Wild","authors":"Dong Jae Kim, Nikolaos Tsantalis, T. Chen, Jinqiu Yang","doi":"10.1109/ICSE43902.2021.00019","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00019","url":null,"abstract":"Since the introduction of annotations in Java 5, the majority of testing frameworks, such as JUnit, TestNG, and Mockito, have adopted annotations in their core design. This adoption affected the testing practices in every step of the test life-cycle, from fixture setup and test execution to fixture teardown. Despite the importance of test annotations, most research on test maintenance has mainly focused on test code quality and test assertions. As a result, there is little empirical evidence on the evolution and maintenance of test annotations. To fill this gap, we perform the first fine-grained empirical study on annotation changes. We developed a tool to mine 82,810 commits and detect 23,936 instances of test annotation changes from 12 open-source Java projects. Our main findings are: (1) Test annotation changes are more frequent than rename and type change refactorings. (2) We recover various migration efforts within the same testing framework or between different frameworks by analyzing common annotation replacement patterns. (3) We create a taxonomy by manually inspecting and classifying a sample of 368 test annotation changes and documenting the motivations driving these changes. Finally, we present a list of actionable implications for developers, researchers, and framework designers.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"49 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115861014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00024
Chengcheng Wan, Shicheng Liu, H. Hoffmann, M. Maire, Shan Lu
Machine learning (ML) cloud APIs enable developers to easily incorporate learning solutions into software systems. Unfortunately, ML APIs are challenging to use correctly and efficiently, given their unique semantics, data requirements, and accuracy-performance tradeoffs. Much prior work has studied how to develop ML APIs or ML cloud services, but not how open-source applications are using ML APIs. In this paper, we manually studied 360 representative open-source applications that use Google or AWS cloud-based ML APIs, and found 70% of these applications contain API misuses in their latest versions that degrade functional, performance, or economical quality of the software. We have generalized 8 anti-patterns based on our manual study and developed automated checkers that identify hundreds of more applications that contain ML API misuses.
{"title":"Are Machine Learning Cloud APIs Used Correctly?","authors":"Chengcheng Wan, Shicheng Liu, H. Hoffmann, M. Maire, Shan Lu","doi":"10.1109/ICSE43902.2021.00024","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00024","url":null,"abstract":"Machine learning (ML) cloud APIs enable developers to easily incorporate learning solutions into software systems. Unfortunately, ML APIs are challenging to use correctly and efficiently, given their unique semantics, data requirements, and accuracy-performance tradeoffs. Much prior work has studied how to develop ML APIs or ML cloud services, but not how open-source applications are using ML APIs. In this paper, we manually studied 360 representative open-source applications that use Google or AWS cloud-based ML APIs, and found 70% of these applications contain API misuses in their latest versions that degrade functional, performance, or economical quality of the software. We have generalized 8 anti-patterns based on our manual study and developed automated checkers that identify hundreds of more applications that contain ML API misuses.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130449565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00080
Wei Song, Mengqi Han, Jeff Huang
Images are essential for many Android applications or apps. Although images play a critical role in app functionalities and user experience, inefficient or improper image loading and displaying operations may severely impact the app performance and quality. Additionally, since these image loading defects may not be manifested by immediate failures, e.g., app crashes, existing GUI testing approaches cannot detect them effectively. In this paper, we identify five anti-patterns of such image loading defects, including image passing by intent, image decoding without resizing, local image loading without permission, repeated decoding without caching, and image decoding in UI thread. Based on these anti-patterns, we propose a static analysis technique, IMGDroid, to automatically and effectively detect such defects. We have applied IMGDroid to a benchmark of 21 open-source Android apps, and found that it not only successfully detects the 45 previously-known image loading defects but also finds 15 new such defects. Our empirical study on 1,000 commercial Android apps demonstrates that the image loading defects are prevalent.
{"title":"IMGDroid: Detecting Image Loading Defects in Android Applications","authors":"Wei Song, Mengqi Han, Jeff Huang","doi":"10.1109/ICSE43902.2021.00080","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00080","url":null,"abstract":"Images are essential for many Android applications or apps. Although images play a critical role in app functionalities and user experience, inefficient or improper image loading and displaying operations may severely impact the app performance and quality. Additionally, since these image loading defects may not be manifested by immediate failures, e.g., app crashes, existing GUI testing approaches cannot detect them effectively. In this paper, we identify five anti-patterns of such image loading defects, including image passing by intent, image decoding without resizing, local image loading without permission, repeated decoding without caching, and image decoding in UI thread. Based on these anti-patterns, we propose a static analysis technique, IMGDroid, to automatically and effectively detect such defects. We have applied IMGDroid to a benchmark of 21 open-source Android apps, and found that it not only successfully detects the 45 previously-known image loading defects but also finds 15 new such defects. Our empirical study on 1,000 commercial Android apps demonstrates that the image loading defects are prevalent.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115374625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00070
Rahul Gopinath, Hamed Nemati, A. Zeller
Grammar-based test generators are highly efficient in producing syntactically valid test inputs, and give their user precise control over which test inputs should be generated. Adapting a grammar or a test generator towards a particular testing goal can be tedious, though. We introduce the concept of a grammar transformer, specializing a grammar towards inclusion or exclusion of specific patterns: "The phone number must not start with 011 or +1". To the best of our knowledge, ours is the first approach to allow for arbitrary Boolean combinations of patterns, giving testers unprecedented flexibility in creating targeted software tests. The resulting specialized grammars can be used with any grammar-based fuzzer for targeted test generation, but also as validators to check whether the given specialization is met or not, opening up additional usage scenarios. In our evaluation on real-world bugs, we show that specialized grammars are accurate both in producing and validating targeted inputs.
{"title":"Input Algebras","authors":"Rahul Gopinath, Hamed Nemati, A. Zeller","doi":"10.1109/ICSE43902.2021.00070","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00070","url":null,"abstract":"Grammar-based test generators are highly efficient in producing syntactically valid test inputs, and give their user precise control over which test inputs should be generated. Adapting a grammar or a test generator towards a particular testing goal can be tedious, though. We introduce the concept of a grammar transformer, specializing a grammar towards inclusion or exclusion of specific patterns: \"The phone number must not start with 011 or +1\". To the best of our knowledge, ours is the first approach to allow for arbitrary Boolean combinations of patterns, giving testers unprecedented flexibility in creating targeted software tests. The resulting specialized grammars can be used with any grammar-based fuzzer for targeted test generation, but also as validators to check whether the given specialization is met or not, opening up additional usage scenarios. In our evaluation on real-world bugs, we show that specialized grammars are accurate both in producing and validating targeted inputs.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122800279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00078
Qinkun Bao, Zihao Wang, Xiaoting Li, J. Larus, Dinghao Wu
Side-channel attacks allow adversaries to infer sensitive information from non-functional characteristics. Prior side-channel detection work is able to identify numerous potential vulnerabilities. However, in practice, many such vulnerabilities leak a negligible amount of sensitive information, and thus developers are often reluctant to address them. Existing tools do not provide information to evaluate a leak's severity, such as the number of leaked bits. To address this issue, we propose a new program analysis method to precisely quantify the leaked information in a single-trace attack through side-channels. It can identify covert information flows in programs that expose confidential information and can reason about security flaws that would otherwise be difficult, if not impossible, for a developer to find. We model an attacker's observation of each leakage site as a constraint. We use symbolic execution to generate these constraints and then run Monte Carlo sampling to estimate the number of leaked bits for each leakage site. By applying the Central Limit Theorem, we provide an error bound for these estimations. We have implemented the technique in a tool called Abacus, which not only finds very fine-grained side-channel vulnerabilities but also estimates how many bits are leaked. Abacus outperforms existing dynamic side-channel detection tools in performance and accuracy. We evaluate Abacus on OpenSSL, mbedTLS, Libgcrypt, and Monocypher. Our results demonstrate that most reported vulnerabilities are difficult to exploit in practice and should be de-prioritized by developers. We also find several sensitive vulnerabilities that are missed by the existing tools. We confirm those vulnerabilities with manual checks and by contacting the developers.
{"title":"Abacus: Precise Side-Channel Analysis","authors":"Qinkun Bao, Zihao Wang, Xiaoting Li, J. Larus, Dinghao Wu","doi":"10.1109/ICSE43902.2021.00078","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00078","url":null,"abstract":"Side-channel attacks allow adversaries to infer sensitive information from non-functional characteristics. Prior side-channel detection work is able to identify numerous potential vulnerabilities. However, in practice, many such vulnerabilities leak a negligible amount of sensitive information, and thus developers are often reluctant to address them. Existing tools do not provide information to evaluate a leak's severity, such as the number of leaked bits. To address this issue, we propose a new program analysis method to precisely quantify the leaked information in a single-trace attack through side-channels. It can identify covert information flows in programs that expose confidential information and can reason about security flaws that would otherwise be difficult, if not impossible, for a developer to find. We model an attacker's observation of each leakage site as a constraint. We use symbolic execution to generate these constraints and then run Monte Carlo sampling to estimate the number of leaked bits for each leakage site. By applying the Central Limit Theorem, we provide an error bound for these estimations. We have implemented the technique in a tool called Abacus, which not only finds very fine-grained side-channel vulnerabilities but also estimates how many bits are leaked. Abacus outperforms existing dynamic side-channel detection tools in performance and accuracy. We evaluate Abacus on OpenSSL, mbedTLS, Libgcrypt, and Monocypher. Our results demonstrate that most reported vulnerabilities are difficult to exploit in practice and should be de-prioritized by developers. We also find several sensitive vulnerabilities that are missed by the existing tools. We confirm those vulnerabilities with manual checks and by contacting the developers.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131117761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00103
Ze-Yi Zhao, Yanyan Jiang, Chang Xu, Tianxiao Gu, Xiaoxing Ma
There is an increasing demand for evolving software systems to deliver continuous services of no restart. Dynamic software update (DSU) aims to achieve this goal by patching the system state on the fly but is currently hindered from practice due to non-trivial cross-version object state transformations. This paper revisits this problem through an in-depth empirical study of over 190 class changes from Tomcat 8. The study produced an important finding that most non-trivial object state transformers can be constructed by reassembling existing old/new version code snippets. This paper presents a domain-specific language and an efficient algorithm for synthesizing non-trivial object transformers over code reuse. We experimentally evaluated our tool implementation PASTA with real-world software systems, reporting PASTA's effectiveness in succeeding in 7.5X non-trivial object transformation tasks compared with the best existing DSU techniques.
{"title":"Synthesizing Object State Transformers for Dynamic Software Updates","authors":"Ze-Yi Zhao, Yanyan Jiang, Chang Xu, Tianxiao Gu, Xiaoxing Ma","doi":"10.1109/ICSE43902.2021.00103","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00103","url":null,"abstract":"There is an increasing demand for evolving software systems to deliver continuous services of no restart. Dynamic software update (DSU) aims to achieve this goal by patching the system state on the fly but is currently hindered from practice due to non-trivial cross-version object state transformations. This paper revisits this problem through an in-depth empirical study of over 190 class changes from Tomcat 8. The study produced an important finding that most non-trivial object state transformers can be constructed by reassembling existing old/new version code snippets. This paper presents a domain-specific language and an efficient algorithm for synthesizing non-trivial object transformers over code reuse. We experimentally evaluated our tool implementation PASTA with real-world software systems, reporting PASTA's effectiveness in succeeding in 7.5X non-trivial object transformation tasks compared with the best existing DSU techniques.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128861869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the growth of software systems, logs have become an important data to aid system maintenance. Log-based anomaly detection is one of the most important methods for such purpose, which aims to automatically detect system anomalies via log analysis. However, existing log-based anomaly detection approaches still suffer from practical issues due to either depending on a large amount of manually labeled training data (supervised approaches) or unsatisfactory performance without learning the knowledge on historical anomalies (unsupervised and semi-supervised approaches). In this paper, we propose a novel practical log-based anomaly detection approach, PLELog, which is semi-supervised to get rid of time-consuming manual labeling and incorporates the knowledge on historical anomalies via probabilistic label estimation to bring supervised approaches' superiority into play. In addition, PLELog is able to stay immune to unstable log data via semantic embedding and detect anomalies efficiently and effectively by designing an attention-based GRU neural network. We evaluated PLELog on two most widely-used public datasets, and the results demonstrate the effectiveness of PLELog, significantly outperforming the compared approaches with an average of 181.6% improvement in terms of F1-score. In particular, PLELog has been applied to two real-world systems from our university and a large corporation, further demonstrating its practicability
{"title":"Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation","authors":"Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, Wenbin Zhang","doi":"10.1109/ICSE43902.2021.00130","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00130","url":null,"abstract":"With the growth of software systems, logs have become an important data to aid system maintenance. Log-based anomaly detection is one of the most important methods for such purpose, which aims to automatically detect system anomalies via log analysis. However, existing log-based anomaly detection approaches still suffer from practical issues due to either depending on a large amount of manually labeled training data (supervised approaches) or unsatisfactory performance without learning the knowledge on historical anomalies (unsupervised and semi-supervised approaches). In this paper, we propose a novel practical log-based anomaly detection approach, PLELog, which is semi-supervised to get rid of time-consuming manual labeling and incorporates the knowledge on historical anomalies via probabilistic label estimation to bring supervised approaches' superiority into play. In addition, PLELog is able to stay immune to unstable log data via semantic embedding and detect anomalies efficiently and effectively by designing an attention-based GRU neural network. We evaluated PLELog on two most widely-used public datasets, and the results demonstrate the effectiveness of PLELog, significantly outperforming the compared approaches with an average of 181.6% improvement in terms of F1-score. In particular, PLELog has been applied to two real-world systems from our university and a large corporation, further demonstrating its practicability","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125196900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00136
M. Vidoni
Testing Technical Debt (TTD) occurs due to shortcuts (non-optimal decisions) taken about testing; it is the test dimension of technical debt. R is a package-based programming ecosystem that provides an easy way to install third-party code, datasets, tests, documentation and examples. This structure makes it especially vulnerable to TTD because errors present in a package can transitively affect all packages and scripts that depend on it. Thus, TTD can effectively become a threat to the validity of all analysis written in R that rely on potentially faulty code. This two-part study provides the first analysis in this area. First, 177 systematically-selected, open-source R packages were mined and analysed to address quality of testing, testing goals, and identify potential TTD sources. Second, a survey addressed how R package developers perceive testing and face its challenges (response rate of 19.4%). Results show that testing in R packages is of low quality; the most common smells are inadequate and obscure unit testing, improper asserts, inexperienced testers and improper test design. Furthermore, skilled R developers still face challenges such as time constraints, emphasis on development rather than testing, poor tool documentation and a steep learning curve.
{"title":"Evaluating Unit Testing Practices in R Packages","authors":"M. Vidoni","doi":"10.1109/ICSE43902.2021.00136","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00136","url":null,"abstract":"Testing Technical Debt (TTD) occurs due to shortcuts (non-optimal decisions) taken about testing; it is the test dimension of technical debt. R is a package-based programming ecosystem that provides an easy way to install third-party code, datasets, tests, documentation and examples. This structure makes it especially vulnerable to TTD because errors present in a package can transitively affect all packages and scripts that depend on it. Thus, TTD can effectively become a threat to the validity of all analysis written in R that rely on potentially faulty code. This two-part study provides the first analysis in this area. First, 177 systematically-selected, open-source R packages were mined and analysed to address quality of testing, testing goals, and identify potential TTD sources. Second, a survey addressed how R package developers perceive testing and face its challenges (response rate of 19.4%). Results show that testing in R packages is of low quality; the most common smells are inadequate and obscure unit testing, improper asserts, inexperienced testers and improper test design. Furthermore, skilled R developers still face challenges such as time constraints, emphasis on development rather than testing, poor tool documentation and a steep learning curve.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115264390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}