Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00043
Yusuke Shinyama, Yoshitaka Arahori, K. Gondow
We investigated how programmers express high-level concepts such as path names and coordinates using primitive data types. While relying too much on primitive data types is sometimes criticized as a bad smell, it is still a common practice among programmers. We propose a novel way to accurately identify expressions for certain predefined concepts by examining API calls. We defined twelve conceptual types used in the Java Standard API. We then obtained expressions for each conceptual type from 26 open source projects. Based on the expressions obtained, we trained a decision tree-based classifier. It achieved 83 % F -score for correctly predicting the conceptual type for a given expression. Our result indicates that it is possible to infer a conceptual type from a source code reasonably well once enough examples are given. The obtained classifier can be used for potential bug detection, test case generation and documentation.
我们研究了程序员如何使用基本数据类型表达高级概念,如路径名和坐标。虽然过分依赖原始数据类型有时被批评为一种不好的气味,但这仍然是程序员的常见做法。我们提出了一种新的方法,通过检查API调用来准确地识别某些预定义概念的表达式。我们定义了Java Standard API中使用的12种概念类型。然后,我们从26个开源项目中获得了每个概念类型的表达式。基于得到的表达式,我们训练了一个基于决策树的分类器。它在正确预测给定表达式的概念类型方面获得了83%的F分。我们的结果表明,只要给出足够多的例子,就有可能从源代码中很好地推断出概念类型。获得的分类器可用于潜在的错误检测、测试用例生成和文档。
{"title":"How Do Programmers Express High-Level Concepts using Primitive Data Types?","authors":"Yusuke Shinyama, Yoshitaka Arahori, K. Gondow","doi":"10.1109/APSEC53868.2021.00043","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00043","url":null,"abstract":"We investigated how programmers express high-level concepts such as path names and coordinates using primitive data types. While relying too much on primitive data types is sometimes criticized as a bad smell, it is still a common practice among programmers. We propose a novel way to accurately identify expressions for certain predefined concepts by examining API calls. We defined twelve conceptual types used in the Java Standard API. We then obtained expressions for each conceptual type from 26 open source projects. Based on the expressions obtained, we trained a decision tree-based classifier. It achieved 83 % F -score for correctly predicting the conceptual type for a given expression. Our result indicates that it is possible to infer a conceptual type from a source code reasonably well once enough examples are given. The obtained classifier can be used for potential bug detection, test case generation and documentation.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126687066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00068
Scott Lupton, H. Washizaki, Nobukazu Yoshioka, Y. Fukazawa
The use of anomaly detection for log monitoring requires parsing model input features from raw, unstructured data. Log parsing methods come in many forms, but are generally categorized as being either offline or online. In this study, a systematic literature review of anomaly detection approaches utilizing online parsing methods is performed. An inventory of these approaches is taken, research gaps are explored, and suggestions for future exploration and study are presented.
{"title":"Literature Review on Log Anomaly Detection Approaches Utilizing Online Parsing Methodology*","authors":"Scott Lupton, H. Washizaki, Nobukazu Yoshioka, Y. Fukazawa","doi":"10.1109/APSEC53868.2021.00068","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00068","url":null,"abstract":"The use of anomaly detection for log monitoring requires parsing model input features from raw, unstructured data. Log parsing methods come in many forms, but are generally categorized as being either offline or online. In this study, a systematic literature review of anomaly detection approaches utilizing online parsing methods is performed. An inventory of these approaches is taken, research gaps are explored, and suggestions for future exploration and study are presented.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121480127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00025
Jingxuan Zhang, W. Zou, Zhiqiu Huang
Identifiers play an important role in helping developers comprehend and maintain source code. In practice, developers usually employ two widely-used identifier styles, i.e., snake case and camel case, to format identifiers to make them understandable and informative. Despite researchers have empirically investigated the impacts of identifier styles on code comprehension activities, the usage and evolution of identifier styles, however, have not been fully explored. How are individual identifier styles formed in practice? How would identifier styles change and evolve? What are the potential impacts of identifier style-changes? Questions like these are important but have not been fully answered yet. In this paper, we conducted an empirical study on 9,792 GitHub projects to gain some insights into these problems. Specifically, we first analyzed how different identifier styles were formed in real software projects. Next, we explored the change patterns of identifier styles along with the project evolution. Finally, we investigated the potential impacts as well as categories of identifier style-changes. Our empirical results achieved some interesting findings. For example, we first reported some identifier style-change patterns (e.g., snake case →camel case → snake case), which could help developers resolve style-change problems in practice. Our study also provided some hints for researchers and developers when they use specific identifier styles in programs. For example, when researchers explore the impacts of identifier styles on code comprehension, they are suggested to consider the imbalanced distribution phenomenon of individual identifier styles. Besides, it is worthwhile for developers to build an identifier style-change prediction and propagation tool to reduce the style-change costs.
{"title":"An Empirical Study on the Usage and Evolution of Identifier Styles in Practice","authors":"Jingxuan Zhang, W. Zou, Zhiqiu Huang","doi":"10.1109/APSEC53868.2021.00025","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00025","url":null,"abstract":"Identifiers play an important role in helping developers comprehend and maintain source code. In practice, developers usually employ two widely-used identifier styles, i.e., snake case and camel case, to format identifiers to make them understandable and informative. Despite researchers have empirically investigated the impacts of identifier styles on code comprehension activities, the usage and evolution of identifier styles, however, have not been fully explored. How are individual identifier styles formed in practice? How would identifier styles change and evolve? What are the potential impacts of identifier style-changes? Questions like these are important but have not been fully answered yet. In this paper, we conducted an empirical study on 9,792 GitHub projects to gain some insights into these problems. Specifically, we first analyzed how different identifier styles were formed in real software projects. Next, we explored the change patterns of identifier styles along with the project evolution. Finally, we investigated the potential impacts as well as categories of identifier style-changes. Our empirical results achieved some interesting findings. For example, we first reported some identifier style-change patterns (e.g., snake case →camel case → snake case), which could help developers resolve style-change problems in practice. Our study also provided some hints for researchers and developers when they use specific identifier styles in programs. For example, when researchers explore the impacts of identifier styles on code comprehension, they are suggested to consider the imbalanced distribution phenomenon of individual identifier styles. Besides, it is worthwhile for developers to build an identifier style-change prediction and propagation tool to reduce the style-change costs.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127603336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00017
Lingjia Li, Jian Cao, Qing Qi
Open source software (OSS) development is a highly collaborative process where individuals, groups and organizations interact to develop, operate and maintain software and related artifacts. The developers' sentiment in this process can have an impact on their working willingness and efficiency. Monitoring sentiment factors can help to improve OSS development and management. However, no method has been proposed to dynamically monitor the sentiment phenomena during the OSS development process. In this paper, an approach to detect Negative Sentiment-related Events (NSE) is proposed. It consists of two steps. The first step is to identify the burst interval of negative comments from open source projects, which corresponds to a NSE. The second step is to annotate this NSE with its event type. To support this approach, the types of NSEs in OSS projects are defined through an empirical study and classifiers are trained to annotate event types automatically. Moreover, conversation disentanglement techniques are employed to make the comments extracted more complete. Finally, the factors that have an influence on NSEs in the OSS project are studied.
{"title":"Monitoring Negative Sentiment-Related Events in Open Source Software Projects","authors":"Lingjia Li, Jian Cao, Qing Qi","doi":"10.1109/APSEC53868.2021.00017","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00017","url":null,"abstract":"Open source software (OSS) development is a highly collaborative process where individuals, groups and organizations interact to develop, operate and maintain software and related artifacts. The developers' sentiment in this process can have an impact on their working willingness and efficiency. Monitoring sentiment factors can help to improve OSS development and management. However, no method has been proposed to dynamically monitor the sentiment phenomena during the OSS development process. In this paper, an approach to detect Negative Sentiment-related Events (NSE) is proposed. It consists of two steps. The first step is to identify the burst interval of negative comments from open source projects, which corresponds to a NSE. The second step is to annotate this NSE with its event type. To support this approach, the types of NSEs in OSS projects are defined through an empirical study and classifiers are trained to annotate event types automatically. Moreover, conversation disentanglement techniques are employed to make the comments extracted more complete. Finally, the factors that have an influence on NSEs in the OSS project are studied.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133898679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00074
Shaikh Mostafa, Xiaoyin Wang
Security vulnerabilities are major defects in software implementation that allow malicious uses to undermine its integrity by triggering crashes, stealing information, or even taking control of the software and its underlying system. Despite the extensive research on vulnerabilities themselves, few studies have been performed on understanding the relations between security vulnerabilities and other bugs, which have attracted attention due to some recently found important vulnerabilities. In this paper, we present an exploration study on the vulnerability-bug relations in two important software projects: Firefox as the representative of browsers, and Red Hat as the representative of operating systems. In the study, we automatically extracted dependencies among vulnerability and bugs and manually investigated the character of such dependencies.
{"title":"An Exploration Study On the Dependency Among Vulnerabilities and Bugs","authors":"Shaikh Mostafa, Xiaoyin Wang","doi":"10.1109/APSEC53868.2021.00074","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00074","url":null,"abstract":"Security vulnerabilities are major defects in software implementation that allow malicious uses to undermine its integrity by triggering crashes, stealing information, or even taking control of the software and its underlying system. Despite the extensive research on vulnerabilities themselves, few studies have been performed on understanding the relations between security vulnerabilities and other bugs, which have attracted attention due to some recently found important vulnerabilities. In this paper, we present an exploration study on the vulnerability-bug relations in two important software projects: Firefox as the representative of browsers, and Red Hat as the representative of operating systems. In the study, we automatically extracted dependencies among vulnerability and bugs and manually investigated the character of such dependencies.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128954062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00034
Bo Gao, Siyuan Shen, Ling Shi, Jiaying Li, Jun Sun, Lei Bu
Smart contracts are computerized transaction protocols built on top of blockchain networks. Users are charged with fees, a.k.a. gas in Ethereum, when they create, deploy or execute smart contracts. Since smart contracts may contain vulnerabilities which may result in huge financial loss, developers and smart contract compilers often insert codes for security checks. The trouble is that those codes consume gas every time they are executed. Many of the inserted codes are however redundant. In this work, we present sOptimize, a tool that optimizes smart contract gas consumption automatically without compromising functionality or security. sOptimize works on smart contract bytecode, statically identifies 3 kinds of code patterns, and further removes them through verification-assisted techniques. The resulting code is guaranteed to be equivalent to the original one and can be directly deployed on blockchain. We evaluate sOptimize on a collection of 1,152 real-world smart contracts and show that it optimizes 43% of them, and the reduction on gas consumption is about 2.0% while in deployment and 1.2% in transactions, the amount can be as high as 954,201 gas units per contract.
{"title":"Verification Assisted Gas Reduction for Smart Contracts","authors":"Bo Gao, Siyuan Shen, Ling Shi, Jiaying Li, Jun Sun, Lei Bu","doi":"10.1109/APSEC53868.2021.00034","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00034","url":null,"abstract":"Smart contracts are computerized transaction protocols built on top of blockchain networks. Users are charged with fees, a.k.a. gas in Ethereum, when they create, deploy or execute smart contracts. Since smart contracts may contain vulnerabilities which may result in huge financial loss, developers and smart contract compilers often insert codes for security checks. The trouble is that those codes consume gas every time they are executed. Many of the inserted codes are however redundant. In this work, we present sOptimize, a tool that optimizes smart contract gas consumption automatically without compromising functionality or security. sOptimize works on smart contract bytecode, statically identifies 3 kinds of code patterns, and further removes them through verification-assisted techniques. The resulting code is guaranteed to be equivalent to the original one and can be directly deployed on blockchain. We evaluate sOptimize on a collection of 1,152 real-world smart contracts and show that it optimizes 43% of them, and the reduction on gas consumption is about 2.0% while in deployment and 1.2% in transactions, the amount can be as high as 954,201 gas units per contract.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"10 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114171386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00016
Hongbing Wang, Qi Li
With the increasing scale and complexity of open source software, the quality of software has become a focus to which repairers pay close attention. Due to the inevitable existence of some known or unknown bugs in software,under certain conditions, software bugs may directly cause program running errors, and then produce abnormal running results and wrong program behavior, which will cause huge economic losses. Therefore, software defect repair is an important part of software evolution and quality assurance. Quickly and efficiently assigning defect reports to the right repairer for repair,to ensure efficiency and reduce the cost of open-source software development is an important problem that must be solved in software quality improvement. In this study, we propose a new defect report repair recommendation algorithm, RCNN, which can effectively learn the features of the defect report and recommend the appropriate repairer according to the feature. The proposed algorithm uses a CNN convolution kernel to capture the local information of the text and RNN is used to capture the sequence information of the text. The attention mechanism is introduced to learn the contribution ratio of each part of the text to the overall semantic information of the text. Thus, to a certain extent, it makes up for the defect that RNN cannot effectively learn and monitor remote information. Through experiments on the Eclipse and Mozilla datasets, compared with NB (naive Bayes), SVM (support vector machines), LeeCNN and DBRNNA, the RCNN model can effectively find the appropriate bug repairer among many repairers, and achieve higher classification accuracy.
{"title":"Effective Bug Triage Based on a Hybrid Neural Network","authors":"Hongbing Wang, Qi Li","doi":"10.1109/APSEC53868.2021.00016","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00016","url":null,"abstract":"With the increasing scale and complexity of open source software, the quality of software has become a focus to which repairers pay close attention. Due to the inevitable existence of some known or unknown bugs in software,under certain conditions, software bugs may directly cause program running errors, and then produce abnormal running results and wrong program behavior, which will cause huge economic losses. Therefore, software defect repair is an important part of software evolution and quality assurance. Quickly and efficiently assigning defect reports to the right repairer for repair,to ensure efficiency and reduce the cost of open-source software development is an important problem that must be solved in software quality improvement. In this study, we propose a new defect report repair recommendation algorithm, RCNN, which can effectively learn the features of the defect report and recommend the appropriate repairer according to the feature. The proposed algorithm uses a CNN convolution kernel to capture the local information of the text and RNN is used to capture the sequence information of the text. The attention mechanism is introduced to learn the contribution ratio of each part of the text to the overall semantic information of the text. Thus, to a certain extent, it makes up for the defect that RNN cannot effectively learn and monitor remote information. Through experiments on the Eclipse and Mozilla datasets, compared with NB (naive Bayes), SVM (support vector machines), LeeCNN and DBRNNA, the RCNN model can effectively find the appropriate bug repairer among many repairers, and achieve higher classification accuracy.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114341946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00030
Yusuke Shinyama, Yoshitaka Arahori, K. Gondow
Consistency is one of the keys to maintainable source code and hence a successful software project. We propose a novel method of extracting the intent of programmers from source code of a large project (~ 300 kLOC) and checking the semantic consistency of its variable names. Our system learns a project-specific naming convention for variables based on its role solely from source code, and suggest alternatives when it violates its internal consistency. The system can also show the reasoning why a certain variable should be named in a specific way. The system does not rely on any external knowledge. We applied our method to 12 open-source projects and evaluated its results with human reviewers. Our system proposed alternative variable names for 416 out of 1080 (39%) instances that are considered better than ones originally used by the developers. Based on the results, we created patches to correct the inconsistent names and sent them to its developers. Three open-source projects adopted it.
{"title":"Improving Semantic Consistency of Variable Names with Use-Flow Graph Analysis","authors":"Yusuke Shinyama, Yoshitaka Arahori, K. Gondow","doi":"10.1109/APSEC53868.2021.00030","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00030","url":null,"abstract":"Consistency is one of the keys to maintainable source code and hence a successful software project. We propose a novel method of extracting the intent of programmers from source code of a large project (~ 300 kLOC) and checking the semantic consistency of its variable names. Our system learns a project-specific naming convention for variables based on its role solely from source code, and suggest alternatives when it violates its internal consistency. The system can also show the reasoning why a certain variable should be named in a specific way. The system does not rely on any external knowledge. We applied our method to 12 open-source projects and evaluated its results with human reviewers. Our system proposed alternative variable names for 416 out of 1080 (39%) instances that are considered better than ones originally used by the developers. Based on the results, we created patches to correct the inconsistent names and sent them to its developers. Three open-source projects adopted it.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130584540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}