Pub Date : 2022-10-01DOI: 10.1109/icsme55016.2022.00085
Emanuele Iannone, Fabio Palomba
Software security concerns the creation of secure software starting from its initial development phases, i.e., software that can withstand malicious attacks. To this end, several automated and not-automated solutions have been developed that support developers in identifying and assessing security issues, e.g., software vulnerabilities. However, most solutions were not meant to cooperate synergically or continuously run in the context of evolving software, i.e., software subject to frequent maintenance and evolution activities. In this scenario, developers have trouble setting up an effective defensive line against security issues arising in their projects. This research fills this gap by investigating how vulnerabilities affect evolving software projects and by proposing novel solutions to improve and simplify the security verification and validation process. The paper concludes by presenting the open challenges in the field of software security we framed while conducting our research.
{"title":"The Phantom Menace: Unmasking Security Issues in Evolving Software","authors":"Emanuele Iannone, Fabio Palomba","doi":"10.1109/icsme55016.2022.00085","DOIUrl":"https://doi.org/10.1109/icsme55016.2022.00085","url":null,"abstract":"Software security concerns the creation of secure software starting from its initial development phases, i.e., software that can withstand malicious attacks. To this end, several automated and not-automated solutions have been developed that support developers in identifying and assessing security issues, e.g., software vulnerabilities. However, most solutions were not meant to cooperate synergically or continuously run in the context of evolving software, i.e., software subject to frequent maintenance and evolution activities. In this scenario, developers have trouble setting up an effective defensive line against security issues arising in their projects. This research fills this gap by investigating how vulnerabilities affect evolving software projects and by proposing novel solutions to improve and simplify the security verification and validation process. The paper concludes by presenting the open challenges in the field of software security we framed while conducting our research.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114518663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ICSME55016.2022.00063
Stefano Dalla Palma, D. D. Nucci, D. Tamburri
We propose a language-agnostic tool for software defect prediction, called DEFUSE. The tool automatically collects and classifies failure data, enables the correction of those classifications, and builds machine learning models to detect defects based on those data. We instantiated the tool in the scope of Infrastructure-as-Code, the DevOps practice enabling management and provisioning of infrastructure through the definition of machine-readable files. We present its architecture and provide examples of its application.Demo video: https://youtu.be/37mmLdCX3jU.
{"title":"Defuse: A Data Annotator and Model Builder for Software Defect Prediction","authors":"Stefano Dalla Palma, D. D. Nucci, D. Tamburri","doi":"10.1109/ICSME55016.2022.00063","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00063","url":null,"abstract":"We propose a language-agnostic tool for software defect prediction, called DEFUSE. The tool automatically collects and classifies failure data, enables the correction of those classifications, and builds machine learning models to detect defects based on those data. We instantiated the tool in the scope of Infrastructure-as-Code, the DevOps practice enabling management and provisioning of infrastructure through the definition of machine-readable files. We present its architecture and provide examples of its application.Demo video: https://youtu.be/37mmLdCX3jU.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129218558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ICSME55016.2022.00064
Shriram Shanbhag, S. Chimalakonda, V. Sharma, Vikrant S. Kaulgud
Energy efficiency is an essential consideration in mobile application development, given that these apps run on battery-powered devices. This has led the researchers to develop a set of energy design patterns that can help the developers improve the energy efficiency of their applications. However, the adoption of these energy patterns in projects remains a challenge, given the lack of awareness about these patterns among the developers. To bridge this gap, we propose our tool eTagger, a Google Chrome extension that tags GitHub issues from Android repositories with associated energy patterns. eTagger works based on the embeddings generated by Sentence-BERT. We believe that labeling the GitHub issues with energy patterns may help towards their larger adoption as GitHub is a prominent platform in collaborative software development. A preliminary evaluation of eTagger achieved an AUC-ROC of 0.73 with a precision of 0.58, recall of 0.53 and an F1-score of 0.5. The demonstration of the tool is available at https://youtu.be/hP4pWJ4AKxE and related artifacts at https://rishalab.github.io/eTagger/.
{"title":"eTagger - An Energy Pattern Tagging Tool for GitHub Issues in Android Projects","authors":"Shriram Shanbhag, S. Chimalakonda, V. Sharma, Vikrant S. Kaulgud","doi":"10.1109/ICSME55016.2022.00064","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00064","url":null,"abstract":"Energy efficiency is an essential consideration in mobile application development, given that these apps run on battery-powered devices. This has led the researchers to develop a set of energy design patterns that can help the developers improve the energy efficiency of their applications. However, the adoption of these energy patterns in projects remains a challenge, given the lack of awareness about these patterns among the developers. To bridge this gap, we propose our tool eTagger, a Google Chrome extension that tags GitHub issues from Android repositories with associated energy patterns. eTagger works based on the embeddings generated by Sentence-BERT. We believe that labeling the GitHub issues with energy patterns may help towards their larger adoption as GitHub is a prominent platform in collaborative software development. A preliminary evaluation of eTagger achieved an AUC-ROC of 0.73 with a precision of 0.58, recall of 0.53 and an F1-score of 0.5. The demonstration of the tool is available at https://youtu.be/hP4pWJ4AKxE and related artifacts at https://rishalab.github.io/eTagger/.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121604054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ICSME55016.2022.00068
Davide Corradini, Amedeo Zampieri, Michele Pasqua, M. Ceccato
Over the past few years, several novel black-box testing approaches targeting RESTful APIs have been proposed. In order to assess their effectiveness, such testing strategies had to be implemented as a prototype tool and validated on empirical data. However, developing a testing tool is a time-consuming task, and reimplementing from scratch the same common basic features represents a waste of resources that causes a remarkable overhead in the "time to market" of research results.In this paper, we present RestTestGen, an extensible framework for implementing new automated black-box testing strategies for RESTful APIs. The framework provides a collection of commonly used components, such as a robust OpenAPI specification parser, dictionaries, input value generators, mutation operators, oracles, and others. Many of the provided components are customizable and extensible, enabling researchers and practitioners to quickly prototype, deploy, and evaluate their novel ideas. Additionally, the framework facilitates the development of novel black-box testing strategies by guiding researchers, by means of abstract components that explicitly identify those parts of the framework requiring a concrete implementation.As an adoption example, we show how we can implement nominal and error black-box testing strategies for RESTful APIs, by reusing primitives and features provided by the framework, and by concretely extending very few abstract components.RestTestGen is open-source, actively maintained, and publicly available on GitHub at https://github.com/SeUniVr/RestTestGen
{"title":"RestTestGen: An Extensible Framework for Automated Black-box Testing of RESTful APIs","authors":"Davide Corradini, Amedeo Zampieri, Michele Pasqua, M. Ceccato","doi":"10.1109/ICSME55016.2022.00068","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00068","url":null,"abstract":"Over the past few years, several novel black-box testing approaches targeting RESTful APIs have been proposed. In order to assess their effectiveness, such testing strategies had to be implemented as a prototype tool and validated on empirical data. However, developing a testing tool is a time-consuming task, and reimplementing from scratch the same common basic features represents a waste of resources that causes a remarkable overhead in the \"time to market\" of research results.In this paper, we present RestTestGen, an extensible framework for implementing new automated black-box testing strategies for RESTful APIs. The framework provides a collection of commonly used components, such as a robust OpenAPI specification parser, dictionaries, input value generators, mutation operators, oracles, and others. Many of the provided components are customizable and extensible, enabling researchers and practitioners to quickly prototype, deploy, and evaluate their novel ideas. Additionally, the framework facilitates the development of novel black-box testing strategies by guiding researchers, by means of abstract components that explicitly identify those parts of the framework requiring a concrete implementation.As an adoption example, we show how we can implement nominal and error black-box testing strategies for RESTful APIs, by reusing primitives and features provided by the framework, and by concretely extending very few abstract components.RestTestGen is open-source, actively maintained, and publicly available on GitHub at https://github.com/SeUniVr/RestTestGen","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"290 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114053888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ICSME55016.2022.00032
Alexander Schultheiss, P. M. Bittner, Thomas Thüm, Timo Kehrer
In clone-and-own - the predominant paradigm for developing multi-variant software systems in practice - a new variant of a software system is created by copying and adapting an existing one. While clone-and-own is flexible, it causes high maintenance effort in the long run as cloned variants evolve in parallel; certain changes, such as bug fixes, need to be propagated between variants manually. On top of the principle of cherry-picking and by collecting lightweight domain knowledge on cloned variants and software changes, a recent line of research proposes to automate such synchronization tasks when migration to a software product line is not feasible. However, it is yet unclear how far this synchronization can actually be pushed. We conduct an empirical study in which we quantify the potential to automate the synchronization of variants in clone-and-own. We simulate the variant synchronization using the history of a real-world multi-variant software system as a case study. Our results indicate that existing patching techniques propagate changes with an accuracy of up to 85%, if applied consistently from the start of a project. This can be even further improved to 93% by exploiting lightweight domain knowledge about which features are affected by a change, and which variants implement affected features. Based on our findings, we conclude that there is potential to automate the synchronization of cloned variants through existing patching techniques.
{"title":"Quantifying the Potential to Automate the Synchronization of Variants in Clone-and-Own","authors":"Alexander Schultheiss, P. M. Bittner, Thomas Thüm, Timo Kehrer","doi":"10.1109/ICSME55016.2022.00032","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00032","url":null,"abstract":"In clone-and-own - the predominant paradigm for developing multi-variant software systems in practice - a new variant of a software system is created by copying and adapting an existing one. While clone-and-own is flexible, it causes high maintenance effort in the long run as cloned variants evolve in parallel; certain changes, such as bug fixes, need to be propagated between variants manually. On top of the principle of cherry-picking and by collecting lightweight domain knowledge on cloned variants and software changes, a recent line of research proposes to automate such synchronization tasks when migration to a software product line is not feasible. However, it is yet unclear how far this synchronization can actually be pushed. We conduct an empirical study in which we quantify the potential to automate the synchronization of variants in clone-and-own. We simulate the variant synchronization using the history of a real-world multi-variant software system as a case study. Our results indicate that existing patching techniques propagate changes with an accuracy of up to 85%, if applied consistently from the start of a project. This can be even further improved to 93% by exploiting lightweight domain knowledge about which features are affected by a change, and which variants implement affected features. Based on our findings, we conclude that there is potential to automate the synchronization of cloned variants through existing patching techniques.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133885978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ICSME55016.2022.00048
Victoria Bogachenkova, Linh Nguyen, Felipe Ebert, Alexander Serebrenik, Fernando Castor
Code review is a popular software engineering practice. Success of code reviews can be threatened by confusion experienced by code reviewers. For instance, on the one hand, research has studied the reasons for confusion in code reviews, and on the other hand, it also has analyzed source code patterns, so called "atoms of confusion", that have been shown to lead to misunderstanding in the lab setting. However, to the best of our knowledge, there is no research which tried to investigate the possible cause and effect relationship between atoms of confusion and confusion in code reviews. Another important aspect still not studied is how those atoms of confusion evolve across pull requests. In this emerging results paper, we report an exploratory case study to provide a deeper understanding of atoms of confusion, more specifically, whether atoms of confusion are related to confusion in code reviews and how they persist across pull requests. With the help of an existing tool for the detection of atoms of confusion, and a manual analysis of code reviews comments, we observed that statistical analysis did not show any relationship between atoms of confusion and presence of confusion comments in code reviews. Additionally, we found evidence that atoms of confusion are mostly not being removed in pull requests. Based on the results, we formulate hypotheses on atoms of confusion in the code review context, that should be confirmed or rejected by future studies.
{"title":"Evaluating Atoms of Confusion in the Context of Code Reviews","authors":"Victoria Bogachenkova, Linh Nguyen, Felipe Ebert, Alexander Serebrenik, Fernando Castor","doi":"10.1109/ICSME55016.2022.00048","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00048","url":null,"abstract":"Code review is a popular software engineering practice. Success of code reviews can be threatened by confusion experienced by code reviewers. For instance, on the one hand, research has studied the reasons for confusion in code reviews, and on the other hand, it also has analyzed source code patterns, so called \"atoms of confusion\", that have been shown to lead to misunderstanding in the lab setting. However, to the best of our knowledge, there is no research which tried to investigate the possible cause and effect relationship between atoms of confusion and confusion in code reviews. Another important aspect still not studied is how those atoms of confusion evolve across pull requests. In this emerging results paper, we report an exploratory case study to provide a deeper understanding of atoms of confusion, more specifically, whether atoms of confusion are related to confusion in code reviews and how they persist across pull requests. With the help of an existing tool for the detection of atoms of confusion, and a manual analysis of code reviews comments, we observed that statistical analysis did not show any relationship between atoms of confusion and presence of confusion comments in code reviews. Additionally, we found evidence that atoms of confusion are mostly not being removed in pull requests. Based on the results, we formulate hypotheses on atoms of confusion in the code review context, that should be confirmed or rejected by future studies.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130316903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ICSME55016.2022.00024
Jiamou Sun, Zhenchang Xing, Xiwei Xu, Liming Zhu, Qinghua Lu
Security databases describe characteristics of discovered vulnerabilities in text for future studying and patching. However, due to different maintainers having different perspectives about vulnerabilities, they often describe the same vulnerability in different ways, creating obstacles for gathering comprehensive information about the vulnerabilities from different databases. To mitigate this problem, Common Vulnerability and Exposures (CVE) is established to identify each publicly disclosed vulnerability with a unique CVE id, and vulnerability databases by different vendors and organizations can reference the CVE ids in their vulnerability reports. In spite of the wide adoption of CVEs, traceability issues are still prevalent. Our empirical study on vulnerability traceability across four representative security databases (NVD, IBM X-Force, ExploitDB, Openwall) shows that there was a fast-increasing amount of CVE records, traceability delay, and missing issues become severe for the vulnerability databases. To address these issues, we develop an automatic traceability recovery method for recommending related external vulnerability reports to the reports in one database. As vulnerability reports from different databases differ in content details and length, our approach does not match the reports at the document level but extracts seven distinctive vulnerability key aspects that are widely present in vulnerability descriptions. As a proof of concept, we apply our methods to recommend the reports from IBM X-Force, ExploitDB and Openwall to the NVD report. We use NVD as the target because it is a de-facto standard vulnerability database that contains the most comprehensive list of vulnerabilities. Our experiments on a wide range of NLP methods show our aspect-level matching methods can achieve high MRR and accuracy for traceability recovery across heterogeneous vulnerability databases.
安全数据库以文本形式描述已发现漏洞的特征,以便将来研究和修补。然而,由于不同的维护人员对漏洞有不同的看法,他们经常以不同的方式描述相同的漏洞,这为从不同的数据库收集有关漏洞的全面信息造成了障碍。为了缓解这一问题,我们建立了CVE (Common Vulnerability and Exposures)机制,通过唯一的CVE id来识别每个公开披露的漏洞,不同厂商和组织的漏洞数据库可以在各自的漏洞报告中引用CVE id。尽管cve被广泛采用,可追溯性问题仍然很普遍。我们对四个具有代表性的安全数据库(NVD、IBM X-Force、ExploitDB、Openwall)的漏洞可追溯性进行了实证研究,结果表明,漏洞数据库的CVE记录数量快速增长,可追溯性延迟,缺失问题变得严重。为了解决这些问题,我们开发了一种自动跟踪恢复方法,用于向一个数据库中的报告推荐相关的外部漏洞报告。由于来自不同数据库的漏洞报告的内容细节和长度不同,我们的方法并不匹配文档级别的报告,而是提取了在漏洞描述中广泛存在的七个不同的漏洞关键方面。作为概念证明,我们应用我们的方法将IBM X-Force、ExploitDB和Openwall的报告推荐到NVD报告中。我们使用NVD作为目标,因为它是一个事实上的标准漏洞数据库,包含了最全面的漏洞列表。我们在各种NLP方法上的实验表明,我们的方面级匹配方法可以实现跨异构漏洞数据库的高MRR和准确度的可追溯性恢复。
{"title":"Heterogeneous Vulnerability Report Traceability Recovery by Vulnerability Aspect Matching","authors":"Jiamou Sun, Zhenchang Xing, Xiwei Xu, Liming Zhu, Qinghua Lu","doi":"10.1109/ICSME55016.2022.00024","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00024","url":null,"abstract":"Security databases describe characteristics of discovered vulnerabilities in text for future studying and patching. However, due to different maintainers having different perspectives about vulnerabilities, they often describe the same vulnerability in different ways, creating obstacles for gathering comprehensive information about the vulnerabilities from different databases. To mitigate this problem, Common Vulnerability and Exposures (CVE) is established to identify each publicly disclosed vulnerability with a unique CVE id, and vulnerability databases by different vendors and organizations can reference the CVE ids in their vulnerability reports. In spite of the wide adoption of CVEs, traceability issues are still prevalent. Our empirical study on vulnerability traceability across four representative security databases (NVD, IBM X-Force, ExploitDB, Openwall) shows that there was a fast-increasing amount of CVE records, traceability delay, and missing issues become severe for the vulnerability databases. To address these issues, we develop an automatic traceability recovery method for recommending related external vulnerability reports to the reports in one database. As vulnerability reports from different databases differ in content details and length, our approach does not match the reports at the document level but extracts seven distinctive vulnerability key aspects that are widely present in vulnerability descriptions. As a proof of concept, we apply our methods to recommend the reports from IBM X-Force, ExploitDB and Openwall to the NVD report. We use NVD as the target because it is a de-facto standard vulnerability database that contains the most comprehensive list of vulnerabilities. Our experiments on a wide range of NLP methods show our aspect-level matching methods can achieve high MRR and accuracy for traceability recovery across heterogeneous vulnerability databases.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115204236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ICSME55016.2022.00073
Neela Sawant, Srinivasan H. Sengamedu
Automatic identification of coding best practices can scale the development of code and application analyzers. We present Doc2BP, a deep learning tool to identify coding best practices in software documentation. Natural language descriptions are mapped to an informative embedding space, optimized under the dual objectives of binary and few shot classification. The binary objective powers general classification into known best practice categories using a deep learning classifier. The few shot objective facilitates example-based classification into novel categories by matching embeddings with user-provided examples at run-time, without having to retrain the underlying model. We analyze the effects of manually and synthetically labeled examples, context, and cross-domain information.We have applied Doc2BP to Java, Python, AWS Java SDK, and AWS CloudFormation documentations. With respect to prior works that primarily leverage keyword heuristics and our own parts of speech pattern baselines, we obtain 3-5% F1 score improvement for Java and Python, and 15-20% for AWS Java SDK and AWS CloudFormation. Experiments with four few shot use-cases show promising results (5-shot accuracy of 99%+ for Java NullPointerException and AWS Java metrics, 65% for AWS CloudFormation numerics, and 35% for Python best practices).Doc2BP has contributed new rules and improved specifications in Amazon's code and application analyzers: (a) 500+ new checks in cfn-lint, an open-source AWS CloudFormation linter, (b) over 97% automated coverage of metrics APIs and related practices in Amazon DevOps Guru, (c) support for nullable AWS APIs in Amazon CodeGuru's Java NullPointerException (NPE) detector, (d) 200+ new best practices for Java, Python, and respective AWS SDKs in Amazon CodeGuru, and (e) 2% reduction in false positives in Amazon CodeGuru's Java resource leak detector.
{"title":"Learning-based Identification of Coding Best Practices from Software Documentation","authors":"Neela Sawant, Srinivasan H. Sengamedu","doi":"10.1109/ICSME55016.2022.00073","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00073","url":null,"abstract":"Automatic identification of coding best practices can scale the development of code and application analyzers. We present Doc2BP, a deep learning tool to identify coding best practices in software documentation. Natural language descriptions are mapped to an informative embedding space, optimized under the dual objectives of binary and few shot classification. The binary objective powers general classification into known best practice categories using a deep learning classifier. The few shot objective facilitates example-based classification into novel categories by matching embeddings with user-provided examples at run-time, without having to retrain the underlying model. We analyze the effects of manually and synthetically labeled examples, context, and cross-domain information.We have applied Doc2BP to Java, Python, AWS Java SDK, and AWS CloudFormation documentations. With respect to prior works that primarily leverage keyword heuristics and our own parts of speech pattern baselines, we obtain 3-5% F1 score improvement for Java and Python, and 15-20% for AWS Java SDK and AWS CloudFormation. Experiments with four few shot use-cases show promising results (5-shot accuracy of 99%+ for Java NullPointerException and AWS Java metrics, 65% for AWS CloudFormation numerics, and 35% for Python best practices).Doc2BP has contributed new rules and improved specifications in Amazon's code and application analyzers: (a) 500+ new checks in cfn-lint, an open-source AWS CloudFormation linter, (b) over 97% automated coverage of metrics APIs and related practices in Amazon DevOps Guru, (c) support for nullable AWS APIs in Amazon CodeGuru's Java NullPointerException (NPE) detector, (d) 200+ new best practices for Java, Python, and respective AWS SDKs in Amazon CodeGuru, and (e) 2% reduction in false positives in Amazon CodeGuru's Java resource leak detector.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122727801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ICSME55016.2022.00057
Olivier Nourry, Yutaro Kashiwa, B. Lin, G. Bavota, Michele Lanza, Yasutaka Kamei
Energy consumption in mobile applications is a key area of software engineering studies, since any advance could affect billions of devices. Currently, several software-based energy calculation tools can provide close estimates of the energy consumed by mobile applications without relying on physical hardware, offering new opportunities to conduct large-scale energy studies in mobile devices. In these studies, one key step of data collection is generating events, since it allows exercising specific parts of the code and, as a consequence, assessing their energy consumption. Given the fact that manually generating events by interacting with applications is time-consuming and not scalable, large-scale studies often use software-based tools to automate event generation to profile devices. Existing tools rely on randomly generated events, which undermines the reproducibility and generalizability of such studies.We present AIP (Android Instrumentation Profiler), an alternative to existing software-based event generation tools such as Monkey. AIP uses instrumented tests as a source of event generation, which enables the targeting of complex use cases for energy consumption estimations, as well as the creation of fully reproducible events and execution traces, while maintaining the scaling abilities of other state-of-the-art tools. The tool and demo video can be found on https://github.com/ONourry/AndroidInstrumentationProfiler.
{"title":"AIP: Scalable and Reproducible Execution Traces in Energy Studies on Mobile Devices","authors":"Olivier Nourry, Yutaro Kashiwa, B. Lin, G. Bavota, Michele Lanza, Yasutaka Kamei","doi":"10.1109/ICSME55016.2022.00057","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00057","url":null,"abstract":"Energy consumption in mobile applications is a key area of software engineering studies, since any advance could affect billions of devices. Currently, several software-based energy calculation tools can provide close estimates of the energy consumed by mobile applications without relying on physical hardware, offering new opportunities to conduct large-scale energy studies in mobile devices. In these studies, one key step of data collection is generating events, since it allows exercising specific parts of the code and, as a consequence, assessing their energy consumption. Given the fact that manually generating events by interacting with applications is time-consuming and not scalable, large-scale studies often use software-based tools to automate event generation to profile devices. Existing tools rely on randomly generated events, which undermines the reproducibility and generalizability of such studies.We present AIP (Android Instrumentation Profiler), an alternative to existing software-based event generation tools such as Monkey. AIP uses instrumented tests as a source of event generation, which enables the targeting of complex use cases for energy consumption estimations, as well as the creation of fully reproducible events and execution traces, while maintaining the scaling abilities of other state-of-the-art tools. The tool and demo video can be found on https://github.com/ONourry/AndroidInstrumentationProfiler.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131116375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ICSME55016.2022.00070
Tim Vahlbrock, Martin Guddat, Tom Vierjahn
Modern software is subject to continuous change and so are its interfaces to other software. Introducing breaking changes to an interface requires its consumers to make adaptions to their own code base in order to compensate. Oftentimes, the number of changes requires a large effort when performed manually. Additionally, the places that require changes may be hard to find using simple pattern matching. Both have lead to the development of tools like ClangMR, which automatically find and adapt affected pieces of code. Such automatic tools, however, assume that the correctness of the applied changes will be verified by tests. This makes them risky to use for projects with a low test coverage.In this paper we present VSCode Migrate, an IDE extension to perform semi-automatic migrations, enabling large refactorings in low coverage projects. The locations that need to be refactored can be found using alternative matching strategies, including AST based ones, and the changes to perform can be generated automatically. However, instead of applying the changes immediately, VSCode Migrate lets the developer verify and modify each adaption. If a change is not sufficiently covered, additional tests can be added before the change is applied. These mechanisms provide the safeguarding necessary for projects with low test coverage.
{"title":"VSCode Migrate: Semi-Automatic Migrations for Low Coverage Projects","authors":"Tim Vahlbrock, Martin Guddat, Tom Vierjahn","doi":"10.1109/ICSME55016.2022.00070","DOIUrl":"https://doi.org/10.1109/ICSME55016.2022.00070","url":null,"abstract":"Modern software is subject to continuous change and so are its interfaces to other software. Introducing breaking changes to an interface requires its consumers to make adaptions to their own code base in order to compensate. Oftentimes, the number of changes requires a large effort when performed manually. Additionally, the places that require changes may be hard to find using simple pattern matching. Both have lead to the development of tools like ClangMR, which automatically find and adapt affected pieces of code. Such automatic tools, however, assume that the correctness of the applied changes will be verified by tests. This makes them risky to use for projects with a low test coverage.In this paper we present VSCode Migrate, an IDE extension to perform semi-automatic migrations, enabling large refactorings in low coverage projects. The locations that need to be refactored can be found using alternative matching strategies, including AST based ones, and the changes to perform can be generated automatically. However, instead of applying the changes immediately, VSCode Migrate lets the developer verify and modify each adaption. If a change is not sufficiently covered, additional tests can be added before the change is applied. These mechanisms provide the safeguarding necessary for projects with low test coverage.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116915940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}