Pub Date : 2019-12-01DOI: 10.1109/APSEC48747.2019.00056
Jamal El Hachem, Ali Sedaghatbaf, Elena Lisova, Aida Čaušević
System of Systems (SoS) represent a set of independent Constituent Systems (CS) that collaborate in order to provide functionalities that they are unable to achieve independently. We consider SoS as a set of connected services that needs to be adequately protected. The integration of these independent, evolutionary and distributed systems, intensifies SoS complexity and emphasizes the behavior uncertainty, which makes an SoS security analysis a critical challenge. One of the major priorities when designing SoS, is to analyze the unknown dependencies among CS services and vulnerabilities leading to potential cyberattacks. The aim of this work is to investigate how Software Engineering approaches could be leveraged to analyze the cyberattack propagation problem within an SoS. Such analysis is essential for an efficient SoS risk assessment performed early at the SoS design phase and required to protect the SoS from possibly high impact attacks affecting its safety and security. In order to achieve our objective, we present a model-driven analysis approach, based on Bayesian Networks, a sensitivity analysis and Common Vulnerability Scoring System (CVSS) with aim to discover potential cyberattacks propagation and estimate the probability of a security failure and its impact on SoS services. We illustrate this approach in an autonomous quarry example.
{"title":"Using Bayesian Networks for a Cyberattacks Propagation Analysis in Systems-of-Systems","authors":"Jamal El Hachem, Ali Sedaghatbaf, Elena Lisova, Aida Čaušević","doi":"10.1109/APSEC48747.2019.00056","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00056","url":null,"abstract":"System of Systems (SoS) represent a set of independent Constituent Systems (CS) that collaborate in order to provide functionalities that they are unable to achieve independently. We consider SoS as a set of connected services that needs to be adequately protected. The integration of these independent, evolutionary and distributed systems, intensifies SoS complexity and emphasizes the behavior uncertainty, which makes an SoS security analysis a critical challenge. One of the major priorities when designing SoS, is to analyze the unknown dependencies among CS services and vulnerabilities leading to potential cyberattacks. The aim of this work is to investigate how Software Engineering approaches could be leveraged to analyze the cyberattack propagation problem within an SoS. Such analysis is essential for an efficient SoS risk assessment performed early at the SoS design phase and required to protect the SoS from possibly high impact attacks affecting its safety and security. In order to achieve our objective, we present a model-driven analysis approach, based on Bayesian Networks, a sensitivity analysis and Common Vulnerability Scoring System (CVSS) with aim to discover potential cyberattacks propagation and estimate the probability of a security failure and its impact on SoS services. We illustrate this approach in an autonomous quarry example.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131684474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/APSEC48747.2019.00015
Shiheng Wang, Tong Li, Zhen Yang
Continuously maintaining software requirements traceability links is essential for managing and evolving software systems. Due to development pressure, traceability links are usually missing during the early development phase in practice, and thus many information retrieval-based approaches have been proposed to automatically recover the traceability links. However, such approaches typically calculate textual similarities among software artifacts without considering specific features of different software artifacts, leading to less accurate results. In this paper, we propose a hybrid approach to recover requirements traceability links, which combines machine learning and logical reasoning to explore features of use cases and code. On one hand, our approach engineers features of use cases and code by taking into account their semantics, based on which a classifier is trained by using supervised learning algorithms. On the other hand, we investigate and leverage the structural information of code to incrementally discover traceability links by defining a list of reasoning rules. We have carried out a series of experiments to compare our approach with state-of-the-art methods, the results of which show that our approach significantly outperforms others.
{"title":"Exploring Semantics of Software Artifacts to Improve Requirements Traceability Recovery: A Hybrid Approach","authors":"Shiheng Wang, Tong Li, Zhen Yang","doi":"10.1109/APSEC48747.2019.00015","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00015","url":null,"abstract":"Continuously maintaining software requirements traceability links is essential for managing and evolving software systems. Due to development pressure, traceability links are usually missing during the early development phase in practice, and thus many information retrieval-based approaches have been proposed to automatically recover the traceability links. However, such approaches typically calculate textual similarities among software artifacts without considering specific features of different software artifacts, leading to less accurate results. In this paper, we propose a hybrid approach to recover requirements traceability links, which combines machine learning and logical reasoning to explore features of use cases and code. On one hand, our approach engineers features of use cases and code by taking into account their semantics, based on which a classifier is trained by using supervised learning algorithms. On the other hand, we investigate and leverage the structural information of code to incrementally discover traceability links by defining a list of reasoning rules. We have carried out a series of experiments to compare our approach with state-of-the-art methods, the results of which show that our approach significantly outperforms others.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115643438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/APSEC48747.2019.00061
Subhajit Datta
It is widely perceived that the egalitarian ecosystems of large scale open source software development foster effective team outcomes. In this study, we question this conventional wisdom by examining whether and how the centralization of information and influence in a software development team relate to the quality of the team's work products. Analyzing data from more than a hundred real world projects that include development activities over close to a decade, involving 2000+ developers, who collectively resolve more than two hundred thousand defects through discussions covering more than six hundred thousand comments, we arrive at statistically significant evidence indicating that concentration of information and influence in the developer communication networks of the projects are associated with the quality of a team's work products, even after controlling for various factors related to levels of developer engagement. Our results suggest that merely facilitating easy interaction between team members may not be sufficient to enhance team outcomes. The design of efficient collaborative development environments, and devising tools and processes for team assembly and governance can be informed by our results.
{"title":"Influence, Information and Team Outcomes in Large Scale Software Development","authors":"Subhajit Datta","doi":"10.1109/APSEC48747.2019.00061","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00061","url":null,"abstract":"It is widely perceived that the egalitarian ecosystems of large scale open source software development foster effective team outcomes. In this study, we question this conventional wisdom by examining whether and how the centralization of information and influence in a software development team relate to the quality of the team's work products. Analyzing data from more than a hundred real world projects that include development activities over close to a decade, involving 2000+ developers, who collectively resolve more than two hundred thousand defects through discussions covering more than six hundred thousand comments, we arrive at statistically significant evidence indicating that concentration of information and influence in the developer communication networks of the projects are associated with the quality of a team's work products, even after controlling for various factors related to levels of developer engagement. Our results suggest that merely facilitating easy interaction between team members may not be sufficient to enhance team outcomes. The design of efficient collaborative development environments, and devising tools and processes for team assembly and governance can be informed by our results.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115638523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/APSEC48747.2019.00045
Zheng Li, Xing-yao Zhang, Junxia Guo, Y. Shang
The imbalanced nature of class in software defect data, which including intra-class imbalance and inter-classes imbalance, increases the difficulty of learning an effective defect prediction model. Most of sampling and example generation approaches just focused on inter-class imbalanced defect data, and they are not effective to handle the issue of intra-class imbalance. This paper proposed a distribution based data generation approach for software defect prediction to deal with inter-class and intra-class imbalanced data simultaneously. First, the classified sub-regions are clustered according to the distribution in the sample feature space. Second, the data are generated by corresponding strategies according to different distribution in sub-regions, where the inter-class balance is achieved by increasing the number of defective samples, and the intra-class balance is achieved by generating different density of data in different sub-regions. Experiment results show that the proposed method can reduce the impact of data imbalance on defect prediction and improve the accuracy of software defect prediction model effectively by generating inter-class and intra-class balanced defects data.
{"title":"Class Imbalance Data-Generation for Software Defect Prediction","authors":"Zheng Li, Xing-yao Zhang, Junxia Guo, Y. Shang","doi":"10.1109/APSEC48747.2019.00045","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00045","url":null,"abstract":"The imbalanced nature of class in software defect data, which including intra-class imbalance and inter-classes imbalance, increases the difficulty of learning an effective defect prediction model. Most of sampling and example generation approaches just focused on inter-class imbalanced defect data, and they are not effective to handle the issue of intra-class imbalance. This paper proposed a distribution based data generation approach for software defect prediction to deal with inter-class and intra-class imbalanced data simultaneously. First, the classified sub-regions are clustered according to the distribution in the sample feature space. Second, the data are generated by corresponding strategies according to different distribution in sub-regions, where the inter-class balance is achieved by increasing the number of defective samples, and the intra-class balance is achieved by generating different density of data in different sub-regions. Experiment results show that the proposed method can reduce the impact of data imbalance on defect prediction and improve the accuracy of software defect prediction model effectively by generating inter-class and intra-class balanced defects data.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123671662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/APSEC48747.2019.00014
Muhammad Abbas, Irum Inayat, Naila Jan, Mehrdad Saadatmand, Eduard Paul Enoiu, Daniel Sundmark
Requirements prioritization plays an important role in driving project success during software development. Literature reveals that existing requirements prioritization approaches ignore vital factors such as interdependency between requirements. Existing requirements prioritization approaches are also generally time-consuming and involve substantial manual effort. Besides, these approaches show substantial limitations in terms of the number of requirements under consideration. There is some evidence suggesting that models could have a useful role in the analysis of requirements interdependency and their visualization, contributing towards the improvement of the overall requirements prioritization process. However, to date, just a handful of studies are focused on model-based strategies for requirements prioritization, considering only conflict-free functional requirements. This paper uses a meta-model-based approach to help the requirements analyst to model the requirements, stakeholders, and inter-dependencies between requirements. The model instance is then processed by our modified PageRank algorithm to prioritize the given requirements. An experiment was conducted, comparing our modified PageRank algorithm's efficiency and accuracy with five existing requirements prioritization methods. Besides, we also compared our results with a baseline prioritized list of 104 requirements prepared by 28 graduate students. Our results show that our modified PageRank algorithm was able to prioritize the requirements more effectively and efficiently than the other prioritization methods.
{"title":"MBRP: Model-Based Requirements Prioritization Using PageRank Algorithm","authors":"Muhammad Abbas, Irum Inayat, Naila Jan, Mehrdad Saadatmand, Eduard Paul Enoiu, Daniel Sundmark","doi":"10.1109/APSEC48747.2019.00014","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00014","url":null,"abstract":"Requirements prioritization plays an important role in driving project success during software development. Literature reveals that existing requirements prioritization approaches ignore vital factors such as interdependency between requirements. Existing requirements prioritization approaches are also generally time-consuming and involve substantial manual effort. Besides, these approaches show substantial limitations in terms of the number of requirements under consideration. There is some evidence suggesting that models could have a useful role in the analysis of requirements interdependency and their visualization, contributing towards the improvement of the overall requirements prioritization process. However, to date, just a handful of studies are focused on model-based strategies for requirements prioritization, considering only conflict-free functional requirements. This paper uses a meta-model-based approach to help the requirements analyst to model the requirements, stakeholders, and inter-dependencies between requirements. The model instance is then processed by our modified PageRank algorithm to prioritize the given requirements. An experiment was conducted, comparing our modified PageRank algorithm's efficiency and accuracy with five existing requirements prioritization methods. Besides, we also compared our results with a baseline prioritized list of 104 requirements prepared by 28 graduate students. Our results show that our modified PageRank algorithm was able to prioritize the requirements more effectively and efficiently than the other prioritization methods.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124121229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/APSEC48747.2019.00037
Sang Hyun, Jiyoung Song, Seungchyul Shin, Doo-Hwan Bae
Platooning system is a well-known technology for alleviating traffic congestion and increasing fuel efficiency by grouping vehicles. It has the major characteristics of Systems of Systems (SoS), such as uncertainty. Several internal and external factors of uncertainty exist in the platooning system, such as car accidents, network disconnections, and simultaneous requests from other platoons. These factors make it difficult to guarantee that the system operates correctly in unpredictable scenarios and environments. The existing techniques used to verify the platooning system have two limitations: 1) the lack of consideration of uncertainty in scenarios and environments; 2) the application of exhaustive verification techniques which are vulnerable to the state-explosion problem. Thus, we suggest a statistical verification framework for a platooning SoS to address the above two limitations. The proposed framework automatically generates platooning configurations and scenarios with internal and external uncertain factors considered, and bypasses the state-explosion problem using a statistical verification technique. In this study, experimental results showed that the proposed approach generates 50% more valid scenarios than pure random strategy. In addition, we found two types of undiscovered failures and their causes in the VENTOS platooning system. These results indicate that our approaches enable the deep analysis of the platooning management system.
{"title":"Statistical Verification Framework for Platooning System of Systems with Uncertainty","authors":"Sang Hyun, Jiyoung Song, Seungchyul Shin, Doo-Hwan Bae","doi":"10.1109/APSEC48747.2019.00037","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00037","url":null,"abstract":"Platooning system is a well-known technology for alleviating traffic congestion and increasing fuel efficiency by grouping vehicles. It has the major characteristics of Systems of Systems (SoS), such as uncertainty. Several internal and external factors of uncertainty exist in the platooning system, such as car accidents, network disconnections, and simultaneous requests from other platoons. These factors make it difficult to guarantee that the system operates correctly in unpredictable scenarios and environments. The existing techniques used to verify the platooning system have two limitations: 1) the lack of consideration of uncertainty in scenarios and environments; 2) the application of exhaustive verification techniques which are vulnerable to the state-explosion problem. Thus, we suggest a statistical verification framework for a platooning SoS to address the above two limitations. The proposed framework automatically generates platooning configurations and scenarios with internal and external uncertain factors considered, and bypasses the state-explosion problem using a statistical verification technique. In this study, experimental results showed that the proposed approach generates 50% more valid scenarios than pure random strategy. In addition, we found two types of undiscovered failures and their causes in the VENTOS platooning system. These results indicate that our approaches enable the deep analysis of the platooning management system.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129847256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/APSEC48747.2019.00072
Duo Wang, Jian Cao, Shiyou Qian, Qing Qi
Teamwork is very important to software development. There are many studies focusing on different aspects of teamwork in open source software projects, but neglecting the fact that most teams of open source projects are temporary and dependent on the context of one specific project. Whether the collaboration of such teams can extend to different projects is highly doubted. In contrast, we are interested in long-lasting socially connected teams, whose members have steady social connections and have collaborated with each other on multiple projects. Therefore, we mine Cross-Repository Socially Connected (CRSC) teams on GitHub, the largest open-source project hosting platform. Community detection methods are used to mine CRSC teams from the developer network and more than 20,000 CRSC teams are discovered on GitHub. The productivity of such teams and how the hosting repository may influence them are studied. Their preferences for repositories are investigated. Moreover, we study the structures of these teams using complex network analysis methods. Our results indicate that CRSC teams are stable, highly productive and mature. Therefore, open-source project owners and recruiters can pay more attention to such teams.
{"title":"Investigating Cross-Repository Socially Connected Teams on GitHub","authors":"Duo Wang, Jian Cao, Shiyou Qian, Qing Qi","doi":"10.1109/APSEC48747.2019.00072","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00072","url":null,"abstract":"Teamwork is very important to software development. There are many studies focusing on different aspects of teamwork in open source software projects, but neglecting the fact that most teams of open source projects are temporary and dependent on the context of one specific project. Whether the collaboration of such teams can extend to different projects is highly doubted. In contrast, we are interested in long-lasting socially connected teams, whose members have steady social connections and have collaborated with each other on multiple projects. Therefore, we mine Cross-Repository Socially Connected (CRSC) teams on GitHub, the largest open-source project hosting platform. Community detection methods are used to mine CRSC teams from the developer network and more than 20,000 CRSC teams are discovered on GitHub. The productivity of such teams and how the hosting repository may influence them are studied. Their preferences for repositories are investigated. Moreover, we study the structures of these teams using complex network analysis methods. Our results indicate that CRSC teams are stable, highly productive and mature. Therefore, open-source project owners and recruiters can pay more attention to such teams.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116514773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/APSEC48747.2019.00057
M. R. Setyautami, Rafiano R. Rubiantoro, A. Azurat
Software product line engineering (SPLE) is an approach in software development that produces various products based on commonality and variability. SPLE maintains the product variations within two main phases: domain engineering and application engineering. Lack of adequate technology and tools support is one of the problems in adopting SPLE. In this research, a model-driven approach based on delta-oriented programming is proposed for SPLE. The process starts with the domain analysis phase by defining a feature diagram and Unified Modeling Language (UML) based on existing systems. While those models represent the problem domain, delta-oriented programming with abstract behavioral specification? (ABS) language is used in the solution domain. This approach is supported by automated model transformations, which transform the feature diagram and UML to ABS models. A code generator mechanism is also used to produce a running application based on ABS models. When the user selects features in this application, our tools generate the running application based on those selections. We provide a running example, a charity organization system, as a case study. Therefore, this research proposes an entire SPLE process based on a model-driven approach that covers the problem and solution domains and produces a running application.
{"title":"Model-Driven Engineering for Delta-Oriented Software Product Lines","authors":"M. R. Setyautami, Rafiano R. Rubiantoro, A. Azurat","doi":"10.1109/APSEC48747.2019.00057","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00057","url":null,"abstract":"Software product line engineering (SPLE) is an approach in software development that produces various products based on commonality and variability. SPLE maintains the product variations within two main phases: domain engineering and application engineering. Lack of adequate technology and tools support is one of the problems in adopting SPLE. In this research, a model-driven approach based on delta-oriented programming is proposed for SPLE. The process starts with the domain analysis phase by defining a feature diagram and Unified Modeling Language (UML) based on existing systems. While those models represent the problem domain, delta-oriented programming with abstract behavioral specification? (ABS) language is used in the solution domain. This approach is supported by automated model transformations, which transform the feature diagram and UML to ABS models. A code generator mechanism is also used to produce a running application based on ABS models. When the user selects features in this application, our tools generate the running application based on those selections. We provide a running example, a charity organization system, as a case study. Therefore, this research proposes an entire SPLE process based on a model-driven approach that covers the problem and solution domains and produces a running application.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122870473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-01DOI: 10.1109/APSEC48747.2019.00041
Guisheng Fan, Xuyang Diao, Huiqun Yu, Kang Yang, Liqiong Chen
Software defect prediction, which locates defective code snippets, can assist developers in finding potential bugs and assigning their testing efforts. Traditional defect prediction features are static code metrics, which only contain statistic information of programs and fail to capture semantics in programs, leading to the degradation of defect prediction performance. To take full advantage of the semantics and static metrics of programs, we propose a framework called Defect Prediction via Attention Mechanism (DP-AM) in this paper. Specifically, DPAM first extracts vectors which are then encoded as digital vectors by mapping and word embedding from abstract syntax trees (ASTs) of programs. Then it feeds these numerical vectors into Recurrent Neural Network to automatically learn semantic features of programs. After that, it applies self-attention mechanism to further build relationship among these features. Furthermore, it employs global attention mechanism to generate significant features among them. Finally, we combine these semantic features with traditional static metrics for accurate software defect prediction. We evaluate our method in terms of F1-measure on seven open-source Java projects in Apache. Our experimental results show that DP-AM improves F1-measure by 11% in average, compared with the state-of-the-art methods.
{"title":"Deep Semantic Feature Learning with Embedded Static Metrics for Software Defect Prediction","authors":"Guisheng Fan, Xuyang Diao, Huiqun Yu, Kang Yang, Liqiong Chen","doi":"10.1109/APSEC48747.2019.00041","DOIUrl":"https://doi.org/10.1109/APSEC48747.2019.00041","url":null,"abstract":"Software defect prediction, which locates defective code snippets, can assist developers in finding potential bugs and assigning their testing efforts. Traditional defect prediction features are static code metrics, which only contain statistic information of programs and fail to capture semantics in programs, leading to the degradation of defect prediction performance. To take full advantage of the semantics and static metrics of programs, we propose a framework called Defect Prediction via Attention Mechanism (DP-AM) in this paper. Specifically, DPAM first extracts vectors which are then encoded as digital vectors by mapping and word embedding from abstract syntax trees (ASTs) of programs. Then it feeds these numerical vectors into Recurrent Neural Network to automatically learn semantic features of programs. After that, it applies self-attention mechanism to further build relationship among these features. Furthermore, it employs global attention mechanism to generate significant features among them. Finally, we combine these semantic features with traditional static metrics for accurate software defect prediction. We evaluate our method in terms of F1-measure on seven open-source Java projects in Apache. Our experimental results show that DP-AM improves F1-measure by 11% in average, compared with the state-of-the-art methods.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115996139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}