Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00042
Lwin Khin Shar, T. Duong, D. Lo
In Android malware classification, the distribution of training data among classes is often imbalanced. This causes the learning algorithm to bias towards the dominant classes, resulting in mis-classification of minority classes. One effective way to improve the performance of classifiers is the synthetic generation of minority instances. One pioneer technique in this area is Synthetic Minority Oversampling Technique (SMOTE) and since its publication in 2002, several variants of SMOTE have been proposed and evaluated on various imbalanced datasets. However, these techniques have not been evaluated in the context of Android malware detection. Studies have shown that the performance of SMOTE and its variants can vary across different application domains. In this paper, we conduct a large scale empirical evaluation of SMOTE and its variants on six different datasets that reflect six types of features commonly used in Android malware detection. The datasets are extracted from a benchmark of 4,572 benign apps and 2,399 malicious Android apps, used in our previous study. Through extensive experiments, we set a new baseline in the field of Android malware detection, and provide guidance to practitioners on the application of different SMOTE variants to Android malware detection.
{"title":"Empirical Evaluation of Minority Oversampling Techniques in the Context of Android Malware Detection","authors":"Lwin Khin Shar, T. Duong, D. Lo","doi":"10.1109/APSEC53868.2021.00042","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00042","url":null,"abstract":"In Android malware classification, the distribution of training data among classes is often imbalanced. This causes the learning algorithm to bias towards the dominant classes, resulting in mis-classification of minority classes. One effective way to improve the performance of classifiers is the synthetic generation of minority instances. One pioneer technique in this area is Synthetic Minority Oversampling Technique (SMOTE) and since its publication in 2002, several variants of SMOTE have been proposed and evaluated on various imbalanced datasets. However, these techniques have not been evaluated in the context of Android malware detection. Studies have shown that the performance of SMOTE and its variants can vary across different application domains. In this paper, we conduct a large scale empirical evaluation of SMOTE and its variants on six different datasets that reflect six types of features commonly used in Android malware detection. The datasets are extracted from a benchmark of 4,572 benign apps and 2,399 malicious Android apps, used in our previous study. Through extensive experiments, we set a new baseline in the field of Android malware detection, and provide guidance to practitioners on the application of different SMOTE variants to Android malware detection.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115945074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00060
Hiroto Watanabe, S. Matsumoto, Yoshiki Higo, S. Kusumoto, Toshiyuki Kurabayashi, Hiroyuki Kirinuki, Haruto Tanno
Automated program generation (APG) is a concept of automatically making a computer program. Toward this goal, transferring automated program repair (APR) to APG can be considered. APR modifies the buggy input source code to pass all test cases. APG regards empty source code as initially failing all test cases, i.e., containing multiple bugs. Search-based APR repeatedly generates program variants and evaluates them. Many traditional APR systems evaluate the fitness of variants based on the number of passing test cases. However, when source code contains multiple bugs, this fitness function lacks the expressive power of variants. In this paper, we propose the application of a multi-objective genetic algorithm to APR in order to improve efficiency. We also propose a new crossover method that combines two variants with complementary test results, taking advantage of the high expressive power of multi-objective genetic algorithms for evaluation. We tested the effectiveness of the proposed method on competitive programming tasks. The obtained results showed significant differences in the number of successful trials and the required generation time.
{"title":"Applying Multi-Objective Genetic Algorithm for Efficient Selection on Program Generation","authors":"Hiroto Watanabe, S. Matsumoto, Yoshiki Higo, S. Kusumoto, Toshiyuki Kurabayashi, Hiroyuki Kirinuki, Haruto Tanno","doi":"10.1109/APSEC53868.2021.00060","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00060","url":null,"abstract":"Automated program generation (APG) is a concept of automatically making a computer program. Toward this goal, transferring automated program repair (APR) to APG can be considered. APR modifies the buggy input source code to pass all test cases. APG regards empty source code as initially failing all test cases, i.e., containing multiple bugs. Search-based APR repeatedly generates program variants and evaluates them. Many traditional APR systems evaluate the fitness of variants based on the number of passing test cases. However, when source code contains multiple bugs, this fitness function lacks the expressive power of variants. In this paper, we propose the application of a multi-objective genetic algorithm to APR in order to improve efficiency. We also propose a new crossover method that combines two variants with complementary test results, taking advantage of the high expressive power of multi-objective genetic algorithms for evaluation. We tested the effectiveness of the proposed method on competitive programming tasks. The obtained results showed significant differences in the number of successful trials and the required generation time.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130892994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00022
E. Akimova, A. Bersenev, Artem A. Deikov, Konstantin S. Kobylkin, A. Konygin, I. Mezentsev, V. Misilov
Contemporary software engineering tools employ deep learning methods to identify bugs and defects in source code. Being data-hungry, supervised deep neural network models require large labeled datasets for their robust and accurate training. In distinction to, say, Java, there is lack of such datasets for Python. Most of the known datasets containing the labeled Python source code are of relatively small size. Those datasets are suitable for testing built deep learning models, but not for their training. Therefore, larger labeled datasets have to be created based on some well-received algorithmic principles to select relevant source code from the available public codebases. In this work, a large dataset of the labeled Python source code is created named PyTraceBugs. It is intended for training, validating, and evaluating large deep learning models to identify a special class of low-level bugs in source code snippets manifested by throwing error exceptions, reported in standard traceback messages. Here, a code snippet is assumed to be either a function or a method implementation. The dataset contains 5.7 million correct source code snippets and 24 thousands buggy snippets from the Github public repositories. Most represented bugs are: absence of attribute, empty object, index out of range, and text encoding/decoding errors. The dataset is split into training, validation and test samples. Confidence in labeling of the snippets into buggy and correct is about 85% according to our estimates. Labeling of the snippets in the test sample is additionally manually validated to be almost 100% confident. To demonstrate advantages of our dataset, it is used to train a binary classification model for distinguishing the buggy and correct source code. This model employs the pretrained BERT-like contextual embeddings. Its performances are as follows: precision on the test set is 96 % for the buggy source code and 61 % for the correct source code whereas recall is 34 % and 99 % respectively. The model performance is also estimated on the known BugsInPy dataset: here, it reports approximately 14% of buggy snippets.
{"title":"PyTraceBugs: A Large Python Code Dataset for Supervised Machine Learning in Software Defect Prediction","authors":"E. Akimova, A. Bersenev, Artem A. Deikov, Konstantin S. Kobylkin, A. Konygin, I. Mezentsev, V. Misilov","doi":"10.1109/APSEC53868.2021.00022","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00022","url":null,"abstract":"Contemporary software engineering tools employ deep learning methods to identify bugs and defects in source code. Being data-hungry, supervised deep neural network models require large labeled datasets for their robust and accurate training. In distinction to, say, Java, there is lack of such datasets for Python. Most of the known datasets containing the labeled Python source code are of relatively small size. Those datasets are suitable for testing built deep learning models, but not for their training. Therefore, larger labeled datasets have to be created based on some well-received algorithmic principles to select relevant source code from the available public codebases. In this work, a large dataset of the labeled Python source code is created named PyTraceBugs. It is intended for training, validating, and evaluating large deep learning models to identify a special class of low-level bugs in source code snippets manifested by throwing error exceptions, reported in standard traceback messages. Here, a code snippet is assumed to be either a function or a method implementation. The dataset contains 5.7 million correct source code snippets and 24 thousands buggy snippets from the Github public repositories. Most represented bugs are: absence of attribute, empty object, index out of range, and text encoding/decoding errors. The dataset is split into training, validation and test samples. Confidence in labeling of the snippets into buggy and correct is about 85% according to our estimates. Labeling of the snippets in the test sample is additionally manually validated to be almost 100% confident. To demonstrate advantages of our dataset, it is used to train a binary classification model for distinguishing the buggy and correct source code. This model employs the pretrained BERT-like contextual embeddings. Its performances are as follows: precision on the test set is 96 % for the buggy source code and 61 % for the correct source code whereas recall is 34 % and 99 % respectively. The model performance is also estimated on the known BugsInPy dataset: here, it reports approximately 14% of buggy snippets.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132352759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00064
Yuwei Wang, D. Conan, S. Chabridon, Kavoos Bojnourdi, Jingxua Ma
Microservice architectures focus on developing modular and independent functional units, which can be automatically deployed, enabling agile DevOps. One major challenge is to manage the rapid evolutionary changes in microservices and perform continuous redeployment without interrupting the application execution. The existing solutions provide limited capacities to help software architects model, plan, and perform version management activities. The architects lack a representation of a microservice architecture with versions tracking. In this paper, we propose runtime models that distinguishes the type model from the instance model, and we build up an evolution graph of configuration snapshots of types and instances to allow the traceability of microservice versions and their deployment. We demonstrate our solution with an illustrative application that involves synchronous (RPC calls) and asynchronous (publish-subscribe) interaction within information systems.
{"title":"Runtime models and evolution graphs for the version management of microservice architectures","authors":"Yuwei Wang, D. Conan, S. Chabridon, Kavoos Bojnourdi, Jingxua Ma","doi":"10.1109/APSEC53868.2021.00064","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00064","url":null,"abstract":"Microservice architectures focus on developing modular and independent functional units, which can be automatically deployed, enabling agile DevOps. One major challenge is to manage the rapid evolutionary changes in microservices and perform continuous redeployment without interrupting the application execution. The existing solutions provide limited capacities to help software architects model, plan, and perform version management activities. The architects lack a representation of a microservice architecture with versions tracking. In this paper, we propose runtime models that distinguishes the type model from the instance model, and we build up an evolution graph of configuration snapshots of types and instances to allow the traceability of microservice versions and their deployment. We demonstrate our solution with an illustrative application that involves synchronous (RPC calls) and asynchronous (publish-subscribe) interaction within information systems.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121052010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00069
Ben Wang, Hanting Chu, Pengcheng Zhang, Hai Dong
At present, most smart contract vulnerability detection use manually-defined patterns, which is time-consuming and far from satisfactory. To address this issue, researchers attempt to deploy deep learning techniques for automatic vulnerability detection in smart contracts. Nevertheless, current work mostly relies on a single code representation such as AST (Abstract Syntax Tree) or code tokens to learn vulnerability characteristics, which might lead to incompleteness of learned semantics information. In addition, the number of available vulnerability datasets is also insufficient. To address these limitations, first, we construct a dataset covering most typical types of smart contract vulnerabilities, which can accurately indicate the specific row number where a vulnerability may exist. Second, for each single code representation, we propose a novel way called AFS (AST Fuse program Slicing) to fuse code characteristic information. AFS can fuse the structured information of AST with program slicing information and detect vulnerabilities by learning new vulnerability characteristic information.
{"title":"Smart Contract Vulnerability Detection Using Code Representation Fusion","authors":"Ben Wang, Hanting Chu, Pengcheng Zhang, Hai Dong","doi":"10.1109/APSEC53868.2021.00069","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00069","url":null,"abstract":"At present, most smart contract vulnerability detection use manually-defined patterns, which is time-consuming and far from satisfactory. To address this issue, researchers attempt to deploy deep learning techniques for automatic vulnerability detection in smart contracts. Nevertheless, current work mostly relies on a single code representation such as AST (Abstract Syntax Tree) or code tokens to learn vulnerability characteristics, which might lead to incompleteness of learned semantics information. In addition, the number of available vulnerability datasets is also insufficient. To address these limitations, first, we construct a dataset covering most typical types of smart contract vulnerabilities, which can accurately indicate the specific row number where a vulnerability may exist. Second, for each single code representation, we propose a novel way called AFS (AST Fuse program Slicing) to fuse code characteristic information. AFS can fuse the structured information of AST with program slicing information and detect vulnerabilities by learning new vulnerability characteristic information.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114902453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00075
Chu-Ti Lin, Sheng-Hsiang Yuan, Jutarporn Intasara
Many prior studies with attempt to improve regression testing adopt test case prioritization (TCP). TCP generally arranges the execution of regression test cases according to specific rules with the goal of revealing faults as early as possible. It is noted that different TCP algorithms adopt different metrics to evaluate test cases' priority so that they may be effect at revealing faults early in different faulty programs. Adopting a single metric may not generally work well. In this decade, learning-to-rank (LTR) strategies have been adopted to address some software engineering problems. This study also uses a pairwise LTR strategy XGBoost to combine several existing metrics so as to improve TCP effectiveness. More specifically, we regard the metrics adopted by TCP techniques to evaluate test cases' priority as the features of the training data and adopt XGBoost to learn the weights of the combined metrics. Additionally, in order to avoid overfitting, we use a fuzzy inference system to generate additional features for data augmentation. The experimental results show that our approach achieves more excellent effectiveness than the existing TCP techniques with respect to the selected subject programs.
{"title":"A Learning-to-Rank Based Approach for Improving Regression Test Case Prioritization","authors":"Chu-Ti Lin, Sheng-Hsiang Yuan, Jutarporn Intasara","doi":"10.1109/APSEC53868.2021.00075","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00075","url":null,"abstract":"Many prior studies with attempt to improve regression testing adopt test case prioritization (TCP). TCP generally arranges the execution of regression test cases according to specific rules with the goal of revealing faults as early as possible. It is noted that different TCP algorithms adopt different metrics to evaluate test cases' priority so that they may be effect at revealing faults early in different faulty programs. Adopting a single metric may not generally work well. In this decade, learning-to-rank (LTR) strategies have been adopted to address some software engineering problems. This study also uses a pairwise LTR strategy XGBoost to combine several existing metrics so as to improve TCP effectiveness. More specifically, we regard the metrics adopted by TCP techniques to evaluate test cases' priority as the features of the training data and adopt XGBoost to learn the weights of the combined metrics. Additionally, in order to avoid overfitting, we use a fuzzy inference system to generate additional features for data augmentation. The experimental results show that our approach achieves more excellent effectiveness than the existing TCP techniques with respect to the selected subject programs.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115227806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00052
Pouya Ataei, A. Litchfield
The big data revolution began when the volume, velocity, and variety of data completely overwhelmed the systems used to store, manipulate and analyze that data. As a result, a new class of software systems emerged called big data systems. While many attempted to harness the power of these new systems, it is estimated that approximately 75% of the big data projects have failed within the last decade. One of the root causes of this is software engineering and architecture aspect of these systems. This paper aims to facilitate big data system development by introducing a software reference architecture. The work provides an event driven microservices architecture that addresses specific limitations in current big data reference architectures (RA). The artefact development has followed the principles of empirically grounded RAs. The RA has been evaluated by developing a prototype that solves a real-world problem in practice. At the end, succesful implementation of the reference architecture have been presented. The results displayed a good degree of applicability with respect to Quality factors.
{"title":"NeoMycelia: A software reference architecturefor big data systems","authors":"Pouya Ataei, A. Litchfield","doi":"10.1109/APSEC53868.2021.00052","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00052","url":null,"abstract":"The big data revolution began when the volume, velocity, and variety of data completely overwhelmed the systems used to store, manipulate and analyze that data. As a result, a new class of software systems emerged called big data systems. While many attempted to harness the power of these new systems, it is estimated that approximately 75% of the big data projects have failed within the last decade. One of the root causes of this is software engineering and architecture aspect of these systems. This paper aims to facilitate big data system development by introducing a software reference architecture. The work provides an event driven microservices architecture that addresses specific limitations in current big data reference architectures (RA). The artefact development has followed the principles of empirically grounded RAs. The RA has been evaluated by developing a prototype that solves a real-world problem in practice. At the end, succesful implementation of the reference architecture have been presented. The results displayed a good degree of applicability with respect to Quality factors.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124431735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00020
Jessica Turner, Judy Bowen, Nikki van Zandwijk
Informal design artefacts allow end-users and non-experts to contribute to software design ideas and development. In contrast, software engineering techniques such as model-driven development support experts in ensuring quality properties of the software they propose and build. Each of these approaches have benefits which contribute to the development of robust, reliable and usable software, however it is not always obvious how best to combine these two. In this paper we describe a novel technique which allows us to use informal design artefacts, in the form of ideation card designs, to generate formal models of IoT applications. To implement this technique, we created the Cards-to-Model (C2M) tool which allows us to automate the model generation process. We demonstrate this technique with a case study for a safety-critical IoT application called “Medication Reminders”. By generating formal models directly from the design we reduce the complexity of the modelling process. In addition, by incorporating easy-to-use informal design artefacts in the process we allow non-experts to engage in the design and modelling process of IoT applications.
{"title":"Interaction Modelling for IoT","authors":"Jessica Turner, Judy Bowen, Nikki van Zandwijk","doi":"10.1109/APSEC53868.2021.00020","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00020","url":null,"abstract":"Informal design artefacts allow end-users and non-experts to contribute to software design ideas and development. In contrast, software engineering techniques such as model-driven development support experts in ensuring quality properties of the software they propose and build. Each of these approaches have benefits which contribute to the development of robust, reliable and usable software, however it is not always obvious how best to combine these two. In this paper we describe a novel technique which allows us to use informal design artefacts, in the form of ideation card designs, to generate formal models of IoT applications. To implement this technique, we created the Cards-to-Model (C2M) tool which allows us to automate the model generation process. We demonstrate this technique with a case study for a safety-critical IoT application called “Medication Reminders”. By generating formal models directly from the design we reduce the complexity of the modelling process. In addition, by incorporating easy-to-use informal design artefacts in the process we allow non-experts to engage in the design and modelling process of IoT applications.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131394893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00048
I. Bardhan, Subhajit Datta, S. Majumder
Large scale software development ecosystems represent one of the most complex human enterprises. In such settings, developers are embedded in a web of shared concerns, responsibilities, and objectives at individual and collective levels. A deep understanding of the factors that influence developers to connect with one another is crucial in appreciating the challenges of such ecosystems as well as formulating strategies to overcome those challenges. We use real world data from multiple software development ecosystems to construct developer interaction networks and examine the mechanisms of such network formation using statistical models to identify developer attributes that have maximal influence on whether and how developers connect with one another. Our results challenge the conventional wisdom on the importance of particular developer attributes in their interaction practices, and offer useful insights for individual developers, project managers, and organizational decision-makers.
{"title":"Degree doesn't Matter: Identifying the Drivers of Interaction in Software Development Ecosystems","authors":"I. Bardhan, Subhajit Datta, S. Majumder","doi":"10.1109/APSEC53868.2021.00048","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00048","url":null,"abstract":"Large scale software development ecosystems represent one of the most complex human enterprises. In such settings, developers are embedded in a web of shared concerns, responsibilities, and objectives at individual and collective levels. A deep understanding of the factors that influence developers to connect with one another is crucial in appreciating the challenges of such ecosystems as well as formulating strategies to overcome those challenges. We use real world data from multiple software development ecosystems to construct developer interaction networks and examine the mechanisms of such network formation using statistical models to identify developer attributes that have maximal influence on whether and how developers connect with one another. Our results challenge the conventional wisdom on the importance of particular developer attributes in their interaction practices, and offer useful insights for individual developers, project managers, and organizational decision-makers.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114782373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/APSEC53868.2021.00009
Mouna Hammoudi, Christoph Mayr-Dorn, A. Mashkoor, Alexander Egyed
Requirement-to-code traces reveal the code location(s) where a requirement is implemented. Traceability is essential for code evolution and understanding. However, creating and maintaining requirement-to-code traces is a tedious and costly process. In this paper, we introduce TraceRefiner, a novel technique for automatically refining coarse-grained requirement-to-class traces to fine-grained requirement-to-method traces. The inputs of TraceRefiner are (1) the set of requirement-to-class traces, which are easier to create as there are far fewer traces to capture, and (2) information about the code structure (i.e., method calls). The output of TraceRefiner is the set of requirement-to-method traces (providing additional, fine-grained information to the developer). We demonstrate the quality of TraceRefiner on four case study systems (7-72KLOC) and evaluated it on over 230,000 requirement-to-method predictions. The evaluation demonstrates TraceRefiner's ability to refine traces even if many requirement-to-class traces are undefined (incomplete input). The obtained results show that the proposed technique is fully automated, tool-supported, and scalable.
{"title":"TraceRefiner: An Automated Technique for Refining Coarse-Grained Requirement-to-Class Traces","authors":"Mouna Hammoudi, Christoph Mayr-Dorn, A. Mashkoor, Alexander Egyed","doi":"10.1109/APSEC53868.2021.00009","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00009","url":null,"abstract":"Requirement-to-code traces reveal the code location(s) where a requirement is implemented. Traceability is essential for code evolution and understanding. However, creating and maintaining requirement-to-code traces is a tedious and costly process. In this paper, we introduce TraceRefiner, a novel technique for automatically refining coarse-grained requirement-to-class traces to fine-grained requirement-to-method traces. The inputs of TraceRefiner are (1) the set of requirement-to-class traces, which are easier to create as there are far fewer traces to capture, and (2) information about the code structure (i.e., method calls). The output of TraceRefiner is the set of requirement-to-method traces (providing additional, fine-grained information to the developer). We demonstrate the quality of TraceRefiner on four case study systems (7-72KLOC) and evaluated it on over 230,000 requirement-to-method predictions. The evaluation demonstrates TraceRefiner's ability to refine traces even if many requirement-to-class traces are undefined (incomplete input). The obtained results show that the proposed technique is fully automated, tool-supported, and scalable.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133306477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}