首页 > 最新文献

2021 28th Asia-Pacific Software Engineering Conference (APSEC)最新文献

英文 中文
Empirical Evaluation of Minority Oversampling Techniques in the Context of Android Malware Detection Android恶意软件检测中少数派过采样技术的实证评价
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00042
Lwin Khin Shar, T. Duong, D. Lo
In Android malware classification, the distribution of training data among classes is often imbalanced. This causes the learning algorithm to bias towards the dominant classes, resulting in mis-classification of minority classes. One effective way to improve the performance of classifiers is the synthetic generation of minority instances. One pioneer technique in this area is Synthetic Minority Oversampling Technique (SMOTE) and since its publication in 2002, several variants of SMOTE have been proposed and evaluated on various imbalanced datasets. However, these techniques have not been evaluated in the context of Android malware detection. Studies have shown that the performance of SMOTE and its variants can vary across different application domains. In this paper, we conduct a large scale empirical evaluation of SMOTE and its variants on six different datasets that reflect six types of features commonly used in Android malware detection. The datasets are extracted from a benchmark of 4,572 benign apps and 2,399 malicious Android apps, used in our previous study. Through extensive experiments, we set a new baseline in the field of Android malware detection, and provide guidance to practitioners on the application of different SMOTE variants to Android malware detection.
在Android恶意软件分类中,训练数据在类之间的分布往往是不平衡的。这会导致学习算法偏向优势类,导致对少数类的错误分类。少数派实例的合成是提高分类器性能的一种有效方法。该领域的一个先驱技术是合成少数派过采样技术(SMOTE),自2002年发表以来,已经提出了几种SMOTE的变体,并在各种不平衡数据集上进行了评估。然而,这些技术还没有在Android恶意软件检测的背景下进行评估。研究表明,SMOTE及其变体的性能可能在不同的应用领域中有所不同。在本文中,我们在六个不同的数据集上对SMOTE及其变体进行了大规模的实证评估,这些数据集反映了Android恶意软件检测中常用的六种特征。数据集是从我们之前的研究中使用的4572个良性应用和2399个恶意Android应用的基准中提取的。通过大量的实验,我们为Android恶意软件检测领域设定了新的基线,并为从业者提供了不同SMOTE变体在Android恶意软件检测中的应用指导。
{"title":"Empirical Evaluation of Minority Oversampling Techniques in the Context of Android Malware Detection","authors":"Lwin Khin Shar, T. Duong, D. Lo","doi":"10.1109/APSEC53868.2021.00042","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00042","url":null,"abstract":"In Android malware classification, the distribution of training data among classes is often imbalanced. This causes the learning algorithm to bias towards the dominant classes, resulting in mis-classification of minority classes. One effective way to improve the performance of classifiers is the synthetic generation of minority instances. One pioneer technique in this area is Synthetic Minority Oversampling Technique (SMOTE) and since its publication in 2002, several variants of SMOTE have been proposed and evaluated on various imbalanced datasets. However, these techniques have not been evaluated in the context of Android malware detection. Studies have shown that the performance of SMOTE and its variants can vary across different application domains. In this paper, we conduct a large scale empirical evaluation of SMOTE and its variants on six different datasets that reflect six types of features commonly used in Android malware detection. The datasets are extracted from a benchmark of 4,572 benign apps and 2,399 malicious Android apps, used in our previous study. Through extensive experiments, we set a new baseline in the field of Android malware detection, and provide guidance to practitioners on the application of different SMOTE variants to Android malware detection.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115945074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applying Multi-Objective Genetic Algorithm for Efficient Selection on Program Generation 应用多目标遗传算法在程序生成中的高效选择
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00060
Hiroto Watanabe, S. Matsumoto, Yoshiki Higo, S. Kusumoto, Toshiyuki Kurabayashi, Hiroyuki Kirinuki, Haruto Tanno
Automated program generation (APG) is a concept of automatically making a computer program. Toward this goal, transferring automated program repair (APR) to APG can be considered. APR modifies the buggy input source code to pass all test cases. APG regards empty source code as initially failing all test cases, i.e., containing multiple bugs. Search-based APR repeatedly generates program variants and evaluates them. Many traditional APR systems evaluate the fitness of variants based on the number of passing test cases. However, when source code contains multiple bugs, this fitness function lacks the expressive power of variants. In this paper, we propose the application of a multi-objective genetic algorithm to APR in order to improve efficiency. We also propose a new crossover method that combines two variants with complementary test results, taking advantage of the high expressive power of multi-objective genetic algorithms for evaluation. We tested the effectiveness of the proposed method on competitive programming tasks. The obtained results showed significant differences in the number of successful trials and the required generation time.
自动程序生成(APG)是自动生成计算机程序的概念。为了实现这一目标,可以考虑将自动程序修复(APR)转换为APG。APR修改有bug的输入源代码以通过所有的测试用例。APG将空源代码视为最初所有测试用例失败,即包含多个错误。基于搜索的APR反复生成程序变量并对其进行评估。许多传统的APR系统基于通过测试用例的数量来评估变量的适应度。但是,当源代码包含多个错误时,该适应度函数缺乏变体的表达能力。本文提出将多目标遗传算法应用于APR,以提高效率。我们还提出了一种新的交叉方法,将两个变量与互补的测试结果结合起来,利用多目标遗传算法的高表达能力进行评估。我们测试了该方法在竞争性规划任务中的有效性。得到的结果表明,在成功试验次数和所需的生成时间上存在显著差异。
{"title":"Applying Multi-Objective Genetic Algorithm for Efficient Selection on Program Generation","authors":"Hiroto Watanabe, S. Matsumoto, Yoshiki Higo, S. Kusumoto, Toshiyuki Kurabayashi, Hiroyuki Kirinuki, Haruto Tanno","doi":"10.1109/APSEC53868.2021.00060","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00060","url":null,"abstract":"Automated program generation (APG) is a concept of automatically making a computer program. Toward this goal, transferring automated program repair (APR) to APG can be considered. APR modifies the buggy input source code to pass all test cases. APG regards empty source code as initially failing all test cases, i.e., containing multiple bugs. Search-based APR repeatedly generates program variants and evaluates them. Many traditional APR systems evaluate the fitness of variants based on the number of passing test cases. However, when source code contains multiple bugs, this fitness function lacks the expressive power of variants. In this paper, we propose the application of a multi-objective genetic algorithm to APR in order to improve efficiency. We also propose a new crossover method that combines two variants with complementary test results, taking advantage of the high expressive power of multi-objective genetic algorithms for evaluation. We tested the effectiveness of the proposed method on competitive programming tasks. The obtained results showed significant differences in the number of successful trials and the required generation time.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130892994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PyTraceBugs: A Large Python Code Dataset for Supervised Machine Learning in Software Defect Prediction PyTraceBugs:用于软件缺陷预测中监督机器学习的大型Python代码数据集
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00022
E. Akimova, A. Bersenev, Artem A. Deikov, Konstantin S. Kobylkin, A. Konygin, I. Mezentsev, V. Misilov
Contemporary software engineering tools employ deep learning methods to identify bugs and defects in source code. Being data-hungry, supervised deep neural network models require large labeled datasets for their robust and accurate training. In distinction to, say, Java, there is lack of such datasets for Python. Most of the known datasets containing the labeled Python source code are of relatively small size. Those datasets are suitable for testing built deep learning models, but not for their training. Therefore, larger labeled datasets have to be created based on some well-received algorithmic principles to select relevant source code from the available public codebases. In this work, a large dataset of the labeled Python source code is created named PyTraceBugs. It is intended for training, validating, and evaluating large deep learning models to identify a special class of low-level bugs in source code snippets manifested by throwing error exceptions, reported in standard traceback messages. Here, a code snippet is assumed to be either a function or a method implementation. The dataset contains 5.7 million correct source code snippets and 24 thousands buggy snippets from the Github public repositories. Most represented bugs are: absence of attribute, empty object, index out of range, and text encoding/decoding errors. The dataset is split into training, validation and test samples. Confidence in labeling of the snippets into buggy and correct is about 85% according to our estimates. Labeling of the snippets in the test sample is additionally manually validated to be almost 100% confident. To demonstrate advantages of our dataset, it is used to train a binary classification model for distinguishing the buggy and correct source code. This model employs the pretrained BERT-like contextual embeddings. Its performances are as follows: precision on the test set is 96 % for the buggy source code and 61 % for the correct source code whereas recall is 34 % and 99 % respectively. The model performance is also estimated on the known BugsInPy dataset: here, it reports approximately 14% of buggy snippets.
当代软件工程工具使用深度学习方法来识别源代码中的错误和缺陷。由于需要大量数据,监督深度神经网络模型需要大量标记数据集才能进行鲁棒性和准确性的训练。与Java不同的是,Python缺乏这样的数据集。大多数包含标记Python源代码的已知数据集都相对较小。这些数据集适合测试已构建的深度学习模型,但不适用于它们的训练。因此,必须根据一些广为接受的算法原则创建更大的标记数据集,以便从可用的公共代码库中选择相关的源代码。在这项工作中,创建了一个名为PyTraceBugs的标记Python源代码的大型数据集。它旨在训练、验证和评估大型深度学习模型,以识别通过抛出错误异常(在标准回溯消息中报告)来表现的源代码片段中的一类特殊的低级错误。这里,假设代码片段是函数或方法实现。该数据集包含570万个正确的源代码片段和来自Github公共存储库的2.4万个错误片段。最常见的bug是:属性缺失、空对象、索引超出范围以及文本编码/解码错误。数据集分为训练样本、验证样本和测试样本。根据我们的估计,将代码片段标记为错误和正确的置信度约为85%。另外,测试样本中片段的标记是手动验证的,几乎是100%的自信。为了展示我们的数据集的优势,我们使用它来训练一个二元分类模型来区分错误和正确的源代码。该模型采用了预训练的类bert上下文嵌入。它的性能如下:在测试集上,对有缺陷的源代码的准确率为96%,对正确的源代码的准确率为61%,而召回率分别为34%和99%。模型的性能也在已知的BugsInPy数据集上进行了估计:在这里,它报告了大约14%的错误片段。
{"title":"PyTraceBugs: A Large Python Code Dataset for Supervised Machine Learning in Software Defect Prediction","authors":"E. Akimova, A. Bersenev, Artem A. Deikov, Konstantin S. Kobylkin, A. Konygin, I. Mezentsev, V. Misilov","doi":"10.1109/APSEC53868.2021.00022","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00022","url":null,"abstract":"Contemporary software engineering tools employ deep learning methods to identify bugs and defects in source code. Being data-hungry, supervised deep neural network models require large labeled datasets for their robust and accurate training. In distinction to, say, Java, there is lack of such datasets for Python. Most of the known datasets containing the labeled Python source code are of relatively small size. Those datasets are suitable for testing built deep learning models, but not for their training. Therefore, larger labeled datasets have to be created based on some well-received algorithmic principles to select relevant source code from the available public codebases. In this work, a large dataset of the labeled Python source code is created named PyTraceBugs. It is intended for training, validating, and evaluating large deep learning models to identify a special class of low-level bugs in source code snippets manifested by throwing error exceptions, reported in standard traceback messages. Here, a code snippet is assumed to be either a function or a method implementation. The dataset contains 5.7 million correct source code snippets and 24 thousands buggy snippets from the Github public repositories. Most represented bugs are: absence of attribute, empty object, index out of range, and text encoding/decoding errors. The dataset is split into training, validation and test samples. Confidence in labeling of the snippets into buggy and correct is about 85% according to our estimates. Labeling of the snippets in the test sample is additionally manually validated to be almost 100% confident. To demonstrate advantages of our dataset, it is used to train a binary classification model for distinguishing the buggy and correct source code. This model employs the pretrained BERT-like contextual embeddings. Its performances are as follows: precision on the test set is 96 % for the buggy source code and 61 % for the correct source code whereas recall is 34 % and 99 % respectively. The model performance is also estimated on the known BugsInPy dataset: here, it reports approximately 14% of buggy snippets.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132352759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Runtime models and evolution graphs for the version management of microservice architectures 微服务架构版本管理的运行时模型和进化图
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00064
Yuwei Wang, D. Conan, S. Chabridon, Kavoos Bojnourdi, Jingxua Ma
Microservice architectures focus on developing modular and independent functional units, which can be automatically deployed, enabling agile DevOps. One major challenge is to manage the rapid evolutionary changes in microservices and perform continuous redeployment without interrupting the application execution. The existing solutions provide limited capacities to help software architects model, plan, and perform version management activities. The architects lack a representation of a microservice architecture with versions tracking. In this paper, we propose runtime models that distinguishes the type model from the instance model, and we build up an evolution graph of configuration snapshots of types and instances to allow the traceability of microservice versions and their deployment. We demonstrate our solution with an illustrative application that involves synchronous (RPC calls) and asynchronous (publish-subscribe) interaction within information systems.
微服务架构专注于开发模块化和独立的功能单元,这些功能单元可以自动部署,从而实现敏捷的DevOps。一个主要的挑战是管理微服务中的快速演变变化,并在不中断应用程序执行的情况下执行持续的重新部署。现有的解决方案提供了有限的能力来帮助软件架构师建模、计划和执行版本管理活动。架构师缺乏带有版本跟踪的微服务架构的表示。在本文中,我们提出了区分类型模型和实例模型的运行时模型,并建立了类型和实例配置快照的演化图,以实现微服务版本及其部署的可追溯性。我们用一个说明性应用程序来演示我们的解决方案,该应用程序涉及信息系统中的同步(RPC调用)和异步(发布-订阅)交互。
{"title":"Runtime models and evolution graphs for the version management of microservice architectures","authors":"Yuwei Wang, D. Conan, S. Chabridon, Kavoos Bojnourdi, Jingxua Ma","doi":"10.1109/APSEC53868.2021.00064","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00064","url":null,"abstract":"Microservice architectures focus on developing modular and independent functional units, which can be automatically deployed, enabling agile DevOps. One major challenge is to manage the rapid evolutionary changes in microservices and perform continuous redeployment without interrupting the application execution. The existing solutions provide limited capacities to help software architects model, plan, and perform version management activities. The architects lack a representation of a microservice architecture with versions tracking. In this paper, we propose runtime models that distinguishes the type model from the instance model, and we build up an evolution graph of configuration snapshots of types and instances to allow the traceability of microservice versions and their deployment. We demonstrate our solution with an illustrative application that involves synchronous (RPC calls) and asynchronous (publish-subscribe) interaction within information systems.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121052010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Smart Contract Vulnerability Detection Using Code Representation Fusion 基于代码表示融合的智能合约漏洞检测
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00069
Ben Wang, Hanting Chu, Pengcheng Zhang, Hai Dong
At present, most smart contract vulnerability detection use manually-defined patterns, which is time-consuming and far from satisfactory. To address this issue, researchers attempt to deploy deep learning techniques for automatic vulnerability detection in smart contracts. Nevertheless, current work mostly relies on a single code representation such as AST (Abstract Syntax Tree) or code tokens to learn vulnerability characteristics, which might lead to incompleteness of learned semantics information. In addition, the number of available vulnerability datasets is also insufficient. To address these limitations, first, we construct a dataset covering most typical types of smart contract vulnerabilities, which can accurately indicate the specific row number where a vulnerability may exist. Second, for each single code representation, we propose a novel way called AFS (AST Fuse program Slicing) to fuse code characteristic information. AFS can fuse the structured information of AST with program slicing information and detect vulnerabilities by learning new vulnerability characteristic information.
目前,大多数智能合约漏洞检测都使用人工定义的模式,耗时长,效果不理想。为了解决这个问题,研究人员试图在智能合约中部署深度学习技术来自动检测漏洞。然而,目前的工作大多依赖于单一的代码表示,如AST(抽象语法树)或代码令牌来学习漏洞特征,这可能导致学习到的语义信息不完整。此外,可用的漏洞数据集数量也不足。为了解决这些限制,首先,我们构建了一个涵盖最典型类型的智能合约漏洞的数据集,它可以准确地指出漏洞可能存在的特定行号。其次,针对每个单一的代码表示,我们提出了一种称为AFS (AST融合程序切片)的新方法来融合代码特征信息。AFS可以将AST的结构化信息与程序切片信息融合,通过学习新的漏洞特征信息来检测漏洞。
{"title":"Smart Contract Vulnerability Detection Using Code Representation Fusion","authors":"Ben Wang, Hanting Chu, Pengcheng Zhang, Hai Dong","doi":"10.1109/APSEC53868.2021.00069","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00069","url":null,"abstract":"At present, most smart contract vulnerability detection use manually-defined patterns, which is time-consuming and far from satisfactory. To address this issue, researchers attempt to deploy deep learning techniques for automatic vulnerability detection in smart contracts. Nevertheless, current work mostly relies on a single code representation such as AST (Abstract Syntax Tree) or code tokens to learn vulnerability characteristics, which might lead to incompleteness of learned semantics information. In addition, the number of available vulnerability datasets is also insufficient. To address these limitations, first, we construct a dataset covering most typical types of smart contract vulnerabilities, which can accurately indicate the specific row number where a vulnerability may exist. Second, for each single code representation, we propose a novel way called AFS (AST Fuse program Slicing) to fuse code characteristic information. AFS can fuse the structured information of AST with program slicing information and detect vulnerabilities by learning new vulnerability characteristic information.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114902453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Learning-to-Rank Based Approach for Improving Regression Test Case Prioritization 一种基于学习排序的改进回归测试用例优先级的方法
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00075
Chu-Ti Lin, Sheng-Hsiang Yuan, Jutarporn Intasara
Many prior studies with attempt to improve regression testing adopt test case prioritization (TCP). TCP generally arranges the execution of regression test cases according to specific rules with the goal of revealing faults as early as possible. It is noted that different TCP algorithms adopt different metrics to evaluate test cases' priority so that they may be effect at revealing faults early in different faulty programs. Adopting a single metric may not generally work well. In this decade, learning-to-rank (LTR) strategies have been adopted to address some software engineering problems. This study also uses a pairwise LTR strategy XGBoost to combine several existing metrics so as to improve TCP effectiveness. More specifically, we regard the metrics adopted by TCP techniques to evaluate test cases' priority as the features of the training data and adopt XGBoost to learn the weights of the combined metrics. Additionally, in order to avoid overfitting, we use a fuzzy inference system to generate additional features for data augmentation. The experimental results show that our approach achieves more excellent effectiveness than the existing TCP techniques with respect to the selected subject programs.
许多先前的研究试图改进回归测试,采用测试用例优先级(TCP)。TCP一般按照特定的规则安排回归测试用例的执行,目的是尽早发现故障。值得注意的是,不同的TCP算法采用不同的度量来评估测试用例的优先级,以便它们可能在不同的错误程序中及早发现错误。采用单一的度量标准通常可能效果不佳。在这十年中,学习排序(LTR)策略被用来解决一些软件工程问题。本研究还使用了配对LTR策略XGBoost来结合几个现有的指标,以提高TCP的有效性。更具体地说,我们将TCP技术用于评估测试用例优先级的指标作为训练数据的特征,并采用XGBoost来学习组合指标的权重。此外,为了避免过拟合,我们使用模糊推理系统来生成用于数据增强的附加特征。实验结果表明,对于所选的主题程序,我们的方法比现有的TCP技术取得了更好的效果。
{"title":"A Learning-to-Rank Based Approach for Improving Regression Test Case Prioritization","authors":"Chu-Ti Lin, Sheng-Hsiang Yuan, Jutarporn Intasara","doi":"10.1109/APSEC53868.2021.00075","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00075","url":null,"abstract":"Many prior studies with attempt to improve regression testing adopt test case prioritization (TCP). TCP generally arranges the execution of regression test cases according to specific rules with the goal of revealing faults as early as possible. It is noted that different TCP algorithms adopt different metrics to evaluate test cases' priority so that they may be effect at revealing faults early in different faulty programs. Adopting a single metric may not generally work well. In this decade, learning-to-rank (LTR) strategies have been adopted to address some software engineering problems. This study also uses a pairwise LTR strategy XGBoost to combine several existing metrics so as to improve TCP effectiveness. More specifically, we regard the metrics adopted by TCP techniques to evaluate test cases' priority as the features of the training data and adopt XGBoost to learn the weights of the combined metrics. Additionally, in order to avoid overfitting, we use a fuzzy inference system to generate additional features for data augmentation. The experimental results show that our approach achieves more excellent effectiveness than the existing TCP techniques with respect to the selected subject programs.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115227806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeoMycelia: A software reference architecturefor big data systems NeoMycelia:大数据系统的软件参考架构
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00052
Pouya Ataei, A. Litchfield
The big data revolution began when the volume, velocity, and variety of data completely overwhelmed the systems used to store, manipulate and analyze that data. As a result, a new class of software systems emerged called big data systems. While many attempted to harness the power of these new systems, it is estimated that approximately 75% of the big data projects have failed within the last decade. One of the root causes of this is software engineering and architecture aspect of these systems. This paper aims to facilitate big data system development by introducing a software reference architecture. The work provides an event driven microservices architecture that addresses specific limitations in current big data reference architectures (RA). The artefact development has followed the principles of empirically grounded RAs. The RA has been evaluated by developing a prototype that solves a real-world problem in practice. At the end, succesful implementation of the reference architecture have been presented. The results displayed a good degree of applicability with respect to Quality factors.
当数据的数量、速度和种类完全超过用于存储、操作和分析这些数据的系统时,大数据革命就开始了。因此,出现了一类新的软件系统,称为大数据系统。虽然许多人试图利用这些新系统的力量,但据估计,在过去十年中,大约75%的大数据项目失败了。其中一个根本原因是这些系统的软件工程和体系结构方面。本文旨在通过引入一个软件参考架构来促进大数据系统的开发。这项工作提供了一个事件驱动的微服务架构,解决了当前大数据参考架构(RA)中的特定限制。人工制品的开发遵循了基于经验的RAs的原则。通过开发一个解决实际问题的原型来评估RA。最后,给出了该参考体系结构的成功实现。结果表明,质量因子具有良好的适用性。
{"title":"NeoMycelia: A software reference architecturefor big data systems","authors":"Pouya Ataei, A. Litchfield","doi":"10.1109/APSEC53868.2021.00052","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00052","url":null,"abstract":"The big data revolution began when the volume, velocity, and variety of data completely overwhelmed the systems used to store, manipulate and analyze that data. As a result, a new class of software systems emerged called big data systems. While many attempted to harness the power of these new systems, it is estimated that approximately 75% of the big data projects have failed within the last decade. One of the root causes of this is software engineering and architecture aspect of these systems. This paper aims to facilitate big data system development by introducing a software reference architecture. The work provides an event driven microservices architecture that addresses specific limitations in current big data reference architectures (RA). The artefact development has followed the principles of empirically grounded RAs. The RA has been evaluated by developing a prototype that solves a real-world problem in practice. At the end, succesful implementation of the reference architecture have been presented. The results displayed a good degree of applicability with respect to Quality factors.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124431735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Interaction Modelling for IoT 物联网交互建模
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00020
Jessica Turner, Judy Bowen, Nikki van Zandwijk
Informal design artefacts allow end-users and non-experts to contribute to software design ideas and development. In contrast, software engineering techniques such as model-driven development support experts in ensuring quality properties of the software they propose and build. Each of these approaches have benefits which contribute to the development of robust, reliable and usable software, however it is not always obvious how best to combine these two. In this paper we describe a novel technique which allows us to use informal design artefacts, in the form of ideation card designs, to generate formal models of IoT applications. To implement this technique, we created the Cards-to-Model (C2M) tool which allows us to automate the model generation process. We demonstrate this technique with a case study for a safety-critical IoT application called “Medication Reminders”. By generating formal models directly from the design we reduce the complexity of the modelling process. In addition, by incorporating easy-to-use informal design artefacts in the process we allow non-experts to engage in the design and modelling process of IoT applications.
非正式的设计工件允许最终用户和非专家为软件设计思想和开发做出贡献。相反,软件工程技术,如模型驱动开发,支持专家确保他们提出和构建的软件的质量属性。这些方法中的每一种都有优点,有助于开发健壮、可靠和可用的软件,但是如何最好地将这两者结合起来并不总是显而易见的。在本文中,我们描述了一种新技术,该技术允许我们以创意卡设计的形式使用非正式设计工件来生成物联网应用的正式模型。为了实现这项技术,我们创建了卡到模型(C2M)工具,它允许我们自动化模型生成过程。我们通过一个名为“药物提醒”的安全关键物联网应用的案例研究来演示这种技术。通过直接从设计中生成正式模型,我们降低了建模过程的复杂性。此外,通过在过程中加入易于使用的非正式设计工件,我们允许非专家参与物联网应用的设计和建模过程。
{"title":"Interaction Modelling for IoT","authors":"Jessica Turner, Judy Bowen, Nikki van Zandwijk","doi":"10.1109/APSEC53868.2021.00020","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00020","url":null,"abstract":"Informal design artefacts allow end-users and non-experts to contribute to software design ideas and development. In contrast, software engineering techniques such as model-driven development support experts in ensuring quality properties of the software they propose and build. Each of these approaches have benefits which contribute to the development of robust, reliable and usable software, however it is not always obvious how best to combine these two. In this paper we describe a novel technique which allows us to use informal design artefacts, in the form of ideation card designs, to generate formal models of IoT applications. To implement this technique, we created the Cards-to-Model (C2M) tool which allows us to automate the model generation process. We demonstrate this technique with a case study for a safety-critical IoT application called “Medication Reminders”. By generating formal models directly from the design we reduce the complexity of the modelling process. In addition, by incorporating easy-to-use informal design artefacts in the process we allow non-experts to engage in the design and modelling process of IoT applications.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131394893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Degree doesn't Matter: Identifying the Drivers of Interaction in Software Development Ecosystems 程度无关紧要:识别软件开发生态系统中交互的驱动因素
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00048
I. Bardhan, Subhajit Datta, S. Majumder
Large scale software development ecosystems represent one of the most complex human enterprises. In such settings, developers are embedded in a web of shared concerns, responsibilities, and objectives at individual and collective levels. A deep understanding of the factors that influence developers to connect with one another is crucial in appreciating the challenges of such ecosystems as well as formulating strategies to overcome those challenges. We use real world data from multiple software development ecosystems to construct developer interaction networks and examine the mechanisms of such network formation using statistical models to identify developer attributes that have maximal influence on whether and how developers connect with one another. Our results challenge the conventional wisdom on the importance of particular developer attributes in their interaction practices, and offer useful insights for individual developers, project managers, and organizational decision-makers.
大型软件开发生态系统代表了最复杂的人类企业之一。在这样的环境中,开发人员被嵌入到个人和集体级别的共享关注点、责任和目标的网络中。深刻理解影响开发者相互联系的因素对于认识这种生态系统的挑战以及制定克服这些挑战的策略至关重要。我们使用来自多个软件开发生态系统的真实世界数据来构建开发人员交互网络,并使用统计模型来检查这种网络形成的机制,以确定对开发人员是否以及如何相互联系具有最大影响的开发人员属性。我们的结果挑战了在交互实践中特定开发人员属性重要性的传统智慧,并为个人开发人员、项目经理和组织决策者提供了有用的见解。
{"title":"Degree doesn't Matter: Identifying the Drivers of Interaction in Software Development Ecosystems","authors":"I. Bardhan, Subhajit Datta, S. Majumder","doi":"10.1109/APSEC53868.2021.00048","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00048","url":null,"abstract":"Large scale software development ecosystems represent one of the most complex human enterprises. In such settings, developers are embedded in a web of shared concerns, responsibilities, and objectives at individual and collective levels. A deep understanding of the factors that influence developers to connect with one another is crucial in appreciating the challenges of such ecosystems as well as formulating strategies to overcome those challenges. We use real world data from multiple software development ecosystems to construct developer interaction networks and examine the mechanisms of such network formation using statistical models to identify developer attributes that have maximal influence on whether and how developers connect with one another. Our results challenge the conventional wisdom on the importance of particular developer attributes in their interaction practices, and offer useful insights for individual developers, project managers, and organizational decision-makers.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114782373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TraceRefiner: An Automated Technique for Refining Coarse-Grained Requirement-to-Class Traces TraceRefiner:一种用于细化粗粒度需求到类跟踪的自动化技术
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00009
Mouna Hammoudi, Christoph Mayr-Dorn, A. Mashkoor, Alexander Egyed
Requirement-to-code traces reveal the code location(s) where a requirement is implemented. Traceability is essential for code evolution and understanding. However, creating and maintaining requirement-to-code traces is a tedious and costly process. In this paper, we introduce TraceRefiner, a novel technique for automatically refining coarse-grained requirement-to-class traces to fine-grained requirement-to-method traces. The inputs of TraceRefiner are (1) the set of requirement-to-class traces, which are easier to create as there are far fewer traces to capture, and (2) information about the code structure (i.e., method calls). The output of TraceRefiner is the set of requirement-to-method traces (providing additional, fine-grained information to the developer). We demonstrate the quality of TraceRefiner on four case study systems (7-72KLOC) and evaluated it on over 230,000 requirement-to-method predictions. The evaluation demonstrates TraceRefiner's ability to refine traces even if many requirement-to-class traces are undefined (incomplete input). The obtained results show that the proposed technique is fully automated, tool-supported, and scalable.
需求到代码的跟踪显示了实现需求的代码位置。可追溯性对于代码进化和理解是必不可少的。然而,创建和维护从需求到代码的跟踪是一个冗长而昂贵的过程。在本文中,我们将介绍TraceRefiner,这是一种新技术,用于自动将粗粒度的需求到类的跟踪细化为细粒度的需求到方法的跟踪。TraceRefiner的输入是(1)从需求到类的跟踪的集合,它更容易创建,因为需要捕获的跟踪要少得多,以及(2)关于代码结构的信息(例如,方法调用)。TraceRefiner的输出是一组从需求到方法的跟踪(为开发人员提供额外的细粒度信息)。我们在四个案例研究系统(7-72KLOC)上展示了TraceRefiner的质量,并在超过230,000个需求到方法的预测中对其进行了评估。评估证明了TraceRefiner细化跟踪的能力,即使许多需求到类的跟踪是未定义的(不完整的输入)。实验结果表明,该方法具有自动化程度高、工具支持强、可扩展性好等特点。
{"title":"TraceRefiner: An Automated Technique for Refining Coarse-Grained Requirement-to-Class Traces","authors":"Mouna Hammoudi, Christoph Mayr-Dorn, A. Mashkoor, Alexander Egyed","doi":"10.1109/APSEC53868.2021.00009","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00009","url":null,"abstract":"Requirement-to-code traces reveal the code location(s) where a requirement is implemented. Traceability is essential for code evolution and understanding. However, creating and maintaining requirement-to-code traces is a tedious and costly process. In this paper, we introduce TraceRefiner, a novel technique for automatically refining coarse-grained requirement-to-class traces to fine-grained requirement-to-method traces. The inputs of TraceRefiner are (1) the set of requirement-to-class traces, which are easier to create as there are far fewer traces to capture, and (2) information about the code structure (i.e., method calls). The output of TraceRefiner is the set of requirement-to-method traces (providing additional, fine-grained information to the developer). We demonstrate the quality of TraceRefiner on four case study systems (7-72KLOC) and evaluated it on over 230,000 requirement-to-method predictions. The evaluation demonstrates TraceRefiner's ability to refine traces even if many requirement-to-class traces are undefined (incomplete input). The obtained results show that the proposed technique is fully automated, tool-supported, and scalable.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133306477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2021 28th Asia-Pacific Software Engineering Conference (APSEC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1