Pub Date : 2019-12-01DOI: 10.1109/apsec48747.2019.00006
{"title":"Message from the APSEC 2019 Program Chairs","authors":"","doi":"10.1109/apsec48747.2019.00006","DOIUrl":"https://doi.org/10.1109/apsec48747.2019.00006","url":null,"abstract":"","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128452772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During the software development life cycle, assumptions are an important type of software development knowledge that can be extracted from textual artifacts. Analyzing assumptions can help to, for example, comprehend software design and further facilitate software maintenance. Manual identification of assumptions by stakeholders is rather time-consuming, especially when analyzing a large dataset of textual artifacts. To address this problem, one promising way is to use automatic techniques for assumption identification. In this study, we conducted an experiment to evaluate the performance of existing machine learning classification algorithms for automatic assumption identification, through a dataset extracted from the Hibernate developer mailing list. The dataset is composed of 400 "Assumption" sentences and 400 "Non-Assumption" sentences. Seven classifiers using different machine learning algorithms were selected and evaluated. The experiment results show that the SVM algorithm achieved the best performance (with a precision of 0.829, a recall of 0.812, and an F1-score of 0.819). Additionally, according to the ROC curves and related AUC values, the SVM-based classifier comparatively performed better than other classifiers for the binary classification of assumptions.
{"title":"Automatic Identification of Assumptions from the Hibernate Developer Mailing List","authors":"Ruiyin Li, Peng Liang, Chen Yang, Georgios Digkas, A. Chatzigeorgiou, Zhuang Xiong","doi":"10.1109/APSEC48747.2019.00060","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00060","url":null,"abstract":"During the software development life cycle, assumptions are an important type of software development knowledge that can be extracted from textual artifacts. Analyzing assumptions can help to, for example, comprehend software design and further facilitate software maintenance. Manual identification of assumptions by stakeholders is rather time-consuming, especially when analyzing a large dataset of textual artifacts. To address this problem, one promising way is to use automatic techniques for assumption identification. In this study, we conducted an experiment to evaluate the performance of existing machine learning classification algorithms for automatic assumption identification, through a dataset extracted from the Hibernate developer mailing list. The dataset is composed of 400 \"Assumption\" sentences and 400 \"Non-Assumption\" sentences. Seven classifiers using different machine learning algorithms were selected and evaluated. The experiment results show that the SVM algorithm achieved the best performance (with a precision of 0.829, a recall of 0.812, and an F1-score of 0.819). Additionally, according to the ROC curves and related AUC values, the SVM-based classifier comparatively performed better than other classifiers for the binary classification of assumptions.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125276639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/APSEC48747.2019.00068
Wei Li, Shuhan Yan, Beijun Shen, Yuting Chen
Searching and reusing online code is a common activity in software development. Meanwhile, like many general-purposed searches, code search also faces the session search problem: in a code search session, the user needs to iteratively search for code snippets, exploring new code snippets that meet his/her needs and/or making some results highly ranked. This paper presents Cosoch, a reinforcement learning approach to session search of code documents (code snippets with textual explanations). Cosoch is aimed at generating a session that reveals user intentions, and correspondingly searching and reranking the resulting documents. More specifically, Cosoch casts a code search session into a Markov decision process, in which rewards measuring the relevances between the queries and the resulting code documents guide the whole session search. We have built a dataset, say CosoBe, from StackOverflow, containing 103 code search sessions with 378 pieces of user feedback. We have also evaluated Cosoch on CosoBe. The evaluation results show that Cosoch achieves an average NDCG@3 score of 0.7379, outperforming StackOverflow by 21.3%.
{"title":"Reinforcement Learning of Code Search Sessions","authors":"Wei Li, Shuhan Yan, Beijun Shen, Yuting Chen","doi":"10.1109/APSEC48747.2019.00068","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00068","url":null,"abstract":"Searching and reusing online code is a common activity in software development. Meanwhile, like many general-purposed searches, code search also faces the session search problem: in a code search session, the user needs to iteratively search for code snippets, exploring new code snippets that meet his/her needs and/or making some results highly ranked. This paper presents Cosoch, a reinforcement learning approach to session search of code documents (code snippets with textual explanations). Cosoch is aimed at generating a session that reveals user intentions, and correspondingly searching and reranking the resulting documents. More specifically, Cosoch casts a code search session into a Markov decision process, in which rewards measuring the relevances between the queries and the resulting code documents guide the whole session search. We have built a dataset, say CosoBe, from StackOverflow, containing 103 code search sessions with 378 pieces of user feedback. We have also evaluated Cosoch on CosoBe. The evaluation results show that Cosoch achieves an average NDCG@3 score of 0.7379, outperforming StackOverflow by 21.3%.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128929333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/APSEC48747.2019.00052
Yongxin Zhao, Xiujuan Zhang, Ling Shi, Gan Zeng, Feng Sheng, Shuang Liu
With the rapid development of software engineering and the widely adoption of software systems in various domains, the requirement for software systems is becoming more and more complex, which results in very complex software systems. Motivated by the principle of divide and conquer, component based software development is an effective way of managing the complexity in software development. In this paper, we propose a calculus to formally describe the functional and performance specification of component based software and provide formal semantics for the proposed calculus. Then we provide a method to measure the dynamic complexity of software compositions based on the proposed calculus. Finally, we define a set of algebraic laws to manifest the complexity relations between different functionally equivalent components. We conduct a case study with a real software system and the results show that our method is able to calculate the dynamic complexity of component based systems, and the complexity can be reduced based on our algebraic laws.
{"title":"Towards a Formal Approach to Defining and Computing the Complexity of Component Based Software","authors":"Yongxin Zhao, Xiujuan Zhang, Ling Shi, Gan Zeng, Feng Sheng, Shuang Liu","doi":"10.1109/APSEC48747.2019.00052","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00052","url":null,"abstract":"With the rapid development of software engineering and the widely adoption of software systems in various domains, the requirement for software systems is becoming more and more complex, which results in very complex software systems. Motivated by the principle of divide and conquer, component based software development is an effective way of managing the complexity in software development. In this paper, we propose a calculus to formally describe the functional and performance specification of component based software and provide formal semantics for the proposed calculus. Then we provide a method to measure the dynamic complexity of software compositions based on the proposed calculus. Finally, we define a set of algebraic laws to manifest the complexity relations between different functionally equivalent components. We conduct a case study with a real software system and the results show that our method is able to calculate the dynamic complexity of component based systems, and the complexity can be reduced based on our algebraic laws.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125470669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/APSEC48747.2019.00048
Zhiwen Zheng, Liang Wang, Yue Cao, Yuqian Zhuang, Xianping Tao
Flow is a holistic description of people's optimal experiences during creative activities that can be characterized as being totally concentrated on, and actively involved in the task, enjoying the process of creation, and achieving a balance between one's skill and the task's challenge. Understanding software developers' flow states has attracted an increasing attention in both research and practice because of the strong link between being in flow and achieving good performance. In this paper, we study the problem of tracking and recognizing developers' flow states by tracing their computer interactions including activities of using the keyboard, mouse, IDE functions, and switching application windows. Compared to the traditional approaches that rely on self-reports or wearable sensors, a major advantage of the proposed approach is being non-invasive for not requiring any additional efforts from the developers after the training phase is completed, which is important because the developers' flow states can easily be interrupted by external interferences. Based on the captured interaction traces, we represent the developers' activities with extensive features, and propose to address the flow state recognition problem using machine learning technologies. And a hierarchical recognition model is built following the multi-dimensional construct of the flow concept, which is interpretable and effective. We develop a prototype system and conduct a 17-day field study in a medium-sized IT company in China to collect real-world data. The results show that our approach is effective by achieving the highest recognition accuracy of 92.6%, and efficient for performing real-time recognition.
{"title":"Towards Non-Invasive Recognition of Developers' Flow States with Computer Interaction Traces","authors":"Zhiwen Zheng, Liang Wang, Yue Cao, Yuqian Zhuang, Xianping Tao","doi":"10.1109/APSEC48747.2019.00048","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00048","url":null,"abstract":"Flow is a holistic description of people's optimal experiences during creative activities that can be characterized as being totally concentrated on, and actively involved in the task, enjoying the process of creation, and achieving a balance between one's skill and the task's challenge. Understanding software developers' flow states has attracted an increasing attention in both research and practice because of the strong link between being in flow and achieving good performance. In this paper, we study the problem of tracking and recognizing developers' flow states by tracing their computer interactions including activities of using the keyboard, mouse, IDE functions, and switching application windows. Compared to the traditional approaches that rely on self-reports or wearable sensors, a major advantage of the proposed approach is being non-invasive for not requiring any additional efforts from the developers after the training phase is completed, which is important because the developers' flow states can easily be interrupted by external interferences. Based on the captured interaction traces, we represent the developers' activities with extensive features, and propose to address the flow state recognition problem using machine learning technologies. And a hierarchical recognition model is built following the multi-dimensional construct of the flow concept, which is interpretable and effective. We develop a prototype system and conduct a 17-day field study in a medium-sized IT company in China to collect real-world data. The results show that our approach is effective by achieving the highest recognition accuracy of 92.6%, and efficient for performing real-time recognition.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131946943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/APSEC48747.2019.00073
Shintaro Kurimoto, Yasuhiro Hayase, Hiroshi Yonai, Hiroyoshi Ito, H. Kitagawa
In software development, the quality of identifier names is important because it greatly affects program comprehension for developers. However, naming identifiers that appropriately represent the nature or behavior of program elements such as classes and methods is a difficult task requiring rich development experience and software domain knowledge. Although several studies proposed techniques for recommending identifier names, there are few studies targeting class names and they have limited availability. This paper proposes a novel class name recommendation approach widely available in software development. The key idea is to represent quantitatively the nature or behavior of classes by leveraging embedding technology for heterogeneous graphs. This makes it possible to recommend class names even where a previous approach cannot work. Experimental results suggest that the proposed approach can produce more accurate class name recommendation regardless of whether classes are used. In addition, a further experiment reveals a situation where the proposed approach is particularly effective.
{"title":"Class Name Recommendation Based on Graph Embedding of Program Elements","authors":"Shintaro Kurimoto, Yasuhiro Hayase, Hiroshi Yonai, Hiroyoshi Ito, H. Kitagawa","doi":"10.1109/APSEC48747.2019.00073","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00073","url":null,"abstract":"In software development, the quality of identifier names is important because it greatly affects program comprehension for developers. However, naming identifiers that appropriately represent the nature or behavior of program elements such as classes and methods is a difficult task requiring rich development experience and software domain knowledge. Although several studies proposed techniques for recommending identifier names, there are few studies targeting class names and they have limited availability. This paper proposes a novel class name recommendation approach widely available in software development. The key idea is to represent quantitatively the nature or behavior of classes by leveraging embedding technology for heterogeneous graphs. This makes it possible to recommend class names even where a previous approach cannot work. Experimental results suggest that the proposed approach can produce more accurate class name recommendation regardless of whether classes are used. In addition, a further experiment reveals a situation where the proposed approach is particularly effective.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":" 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132125381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/APSEC48747.2019.00074
Liting Wang, Li Zhang, Jing Jiang
Stack Overflow is a popular question and answer website based on the software programming. Different users often ask the same questions in different ways, resulting in a large number of duplicate questions in Stack Overflow. Generally, the users with high reputation manually analyze and mark duplicate questions, which is time consuming and low efficiency. Therefore, the automatic duplicate question detection approach is demanded. We first investigate the application of deep learning models to software engineering task. Then, three deep learning models (i.e., CNN, RNN and LSTM) are applied to demonstrate whether they are effective to duplicate question detection task in Stack Overflow. In this paper, we explore three deep learning approaches DQ-CNN, DQ-RNN and DQ-LSTM based on CNN, RNN and LSTM to detect duplicate questions. The effectiveness of DQ-CNN, DQ-RNN and DQ-LSTM is evaluated by six different question groups. The experimental results show that DQ-LSTM outperforms DupPredictor, Dupe, DupePredictorRep-T and DupeRep in terms of recall-rate@5, recall-rate@10 and recall-rate@20 except for Ruby question group.
{"title":"Detecting Duplicate Questions in Stack Overflow via Deep Learning Approaches","authors":"Liting Wang, Li Zhang, Jing Jiang","doi":"10.1109/APSEC48747.2019.00074","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00074","url":null,"abstract":"Stack Overflow is a popular question and answer website based on the software programming. Different users often ask the same questions in different ways, resulting in a large number of duplicate questions in Stack Overflow. Generally, the users with high reputation manually analyze and mark duplicate questions, which is time consuming and low efficiency. Therefore, the automatic duplicate question detection approach is demanded. We first investigate the application of deep learning models to software engineering task. Then, three deep learning models (i.e., CNN, RNN and LSTM) are applied to demonstrate whether they are effective to duplicate question detection task in Stack Overflow. In this paper, we explore three deep learning approaches DQ-CNN, DQ-RNN and DQ-LSTM based on CNN, RNN and LSTM to detect duplicate questions. The effectiveness of DQ-CNN, DQ-RNN and DQ-LSTM is evaluated by six different question groups. The experimental results show that DQ-LSTM outperforms DupPredictor, Dupe, DupePredictorRep-T and DupeRep in terms of recall-rate@5, recall-rate@10 and recall-rate@20 except for Ruby question group.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"584 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113982288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/APSEC48747.2019.00034
X. Chi, Min Zhang, X. Xu
Internet of Things (IoT) is being widely adopted to facilitate living environments such as cities and homes to become smart. Devices in IoT systems are capable of automatically adjusting their behaviors according to the change of environments. The capability is usually driven by the policies which are predefined inside devices. Policies can be customized by end users. Inconsistencies or conflicts among policies may cause malfunction of systems and therefore must be eliminated before deployment. In this paper, we propose a novel algebraic approach to modeling and verifying policy-driven smart devices in IoT systems on the basis of a domain-specific modeling language called PobSAM (Policy-based Self-Adaptive Model) and an efficient rewriting system called Maude. We formalize the operational semantics of PobSAM using Maude, which is an executable specification as well as a formal verification tool. The Maude formalization can be used to verify smart devices that are specified in PobSAM. We conduct a case study on a smart home setting to evaluate the effectiveness and efficiency of our approach.
{"title":"An Algebraic Approach to Modeling and Verifying Policy-Driven Smart Devices in IoT Systems","authors":"X. Chi, Min Zhang, X. Xu","doi":"10.1109/APSEC48747.2019.00034","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00034","url":null,"abstract":"Internet of Things (IoT) is being widely adopted to facilitate living environments such as cities and homes to become smart. Devices in IoT systems are capable of automatically adjusting their behaviors according to the change of environments. The capability is usually driven by the policies which are predefined inside devices. Policies can be customized by end users. Inconsistencies or conflicts among policies may cause malfunction of systems and therefore must be eliminated before deployment. In this paper, we propose a novel algebraic approach to modeling and verifying policy-driven smart devices in IoT systems on the basis of a domain-specific modeling language called PobSAM (Policy-based Self-Adaptive Model) and an efficient rewriting system called Maude. We formalize the operational semantics of PobSAM using Maude, which is an executable specification as well as a formal verification tool. The Maude formalization can be used to verify smart devices that are specified in PobSAM. We conduct a case study on a smart home setting to evaluate the effectiveness and efficiency of our approach.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114075647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Static analysis tools have demonstrated their ability to find non-compliant code of coding standards. However, for industrial-sized systems, static analysis tools frequently report a large number of warnings, which contain both true positives and false positives. In this research, to enable precise check for SEI CERT C Coding Standard, we combine static analysis with three different techniques. Firstly, a static analysis tool is used to detect non-compliant code, which are positions that may violate a SEI CERT C rule or recommendation. Each detected position is called a warning. Secondly, deductive verification, model checking, and pattern matching are used to verify whether each warning is a true positive or a false positive. Our experiments with two automotive applications show that this approach can help to improve the accuracy to check for SEI CERT C Coding Standard. We verify nearly 60% warnings of Rosecheckers, a static analysis tool. In these verified warnings, 97% of them are automatically detected to be true positives or false positives by our approach.
{"title":"Multiple Program Analysis Techniques Enable Precise Check for SEI CERT C Coding Standard","authors":"Thu-Trang Nguyen, Toshiaki Aoki, Takashi Tomita, Iori Yamada","doi":"10.1109/APSEC48747.2019.00019","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00019","url":null,"abstract":"Static analysis tools have demonstrated their ability to find non-compliant code of coding standards. However, for industrial-sized systems, static analysis tools frequently report a large number of warnings, which contain both true positives and false positives. In this research, to enable precise check for SEI CERT C Coding Standard, we combine static analysis with three different techniques. Firstly, a static analysis tool is used to detect non-compliant code, which are positions that may violate a SEI CERT C rule or recommendation. Each detected position is called a warning. Secondly, deductive verification, model checking, and pattern matching are used to verify whether each warning is a true positive or a false positive. Our experiments with two automotive applications show that this approach can help to improve the accuracy to check for SEI CERT C Coding Standard. We verify nearly 60% warnings of Rosecheckers, a static analysis tool. In these verified warnings, 97% of them are automatically detected to be true positives or false positives by our approach.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122408126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}