Automated Software Engineering最新文献_第10页

DifFuzzAR: automatic repair of timing side-channel vulnerabilities via refactoring DifFuzzAR:通过重构自动修复定时侧信道漏洞

IF 3.4 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2023-10-18 DOI: 10.1007/s10515-023-00398-6

Rui Lima, João F. Ferreira, Alexandra Mendes, Carolina Carreira

Vulnerability detection and repair is a demanding and expensive part of the software development process. As such, there has been an effort to develop new and better ways to automatically detect and repair vulnerabilities. DifFuzz is a state-of-the-art tool for automatic detection of timing side-channel vulnerabilities, a type of vulnerability that is particularly difficult to detect and correct. Despite recent progress made with tools such as DifFuzz, work on tools capable of automatically repairing timing side-channel vulnerabilities is scarce. In this paper, we propose DifFuzzAR, a tool for automatic repair of timing side-channel vulnerabilities in Java code. The tool works in conjunction with DifFuzz and it is able to repair 56% of the vulnerabilities identified in DifFuzz’s dataset. The results show that the tool can automatically correct timing side-channel vulnerabilities, being more effective with those that are control-flow based. In addition, the results of a user study show that users generally trust the refactorings produced by DifFuzzAR and that they see value in such a tool, in particular for more critical code.

漏洞检测和修复是软件开发过程中一个要求很高且成本高昂的部分。因此，一直在努力开发新的更好的方法来自动检测和修复漏洞。DifFuzz是一种最先进的工具，用于自动检测定时侧信道漏洞，这种漏洞特别难以检测和纠正。尽管DifFuzz等工具最近取得了进展，但开发能够自动修复定时侧通道漏洞的工具的工作却很少。在本文中，我们提出了DifFuzzAR，一种用于自动修复Java代码中定时侧通道漏洞的工具。该工具与DifFuzz协同工作，能够修复DifFutz数据集中发现的56%的漏洞。结果表明，该工具可以自动纠正定时侧通道漏洞，与基于控制流的漏洞相比更有效。此外，一项用户研究的结果表明，用户通常信任DifFuzzAR产生的重构，他们看到了这种工具的价值，尤其是对于更关键的代码。

{"title":"DifFuzzAR: automatic repair of timing side-channel vulnerabilities via refactoring","authors":"Rui Lima, João F. Ferreira, Alexandra Mendes, Carolina Carreira","doi":"10.1007/s10515-023-00398-6","DOIUrl":"10.1007/s10515-023-00398-6","url":null,"abstract":"<div><p>Vulnerability detection and repair is a demanding and expensive part of the software development process. As such, there has been an effort to develop new and better ways to automatically detect and repair vulnerabilities. DifFuzz is a state-of-the-art tool for automatic detection of timing side-channel vulnerabilities, a type of vulnerability that is particularly difficult to detect and correct. Despite recent progress made with tools such as DifFuzz, work on tools capable of automatically repairing timing side-channel vulnerabilities is scarce. In this paper, we propose DifFuzzAR, a tool for automatic repair of timing side-channel vulnerabilities in Java code. The tool works in conjunction with DifFuzz and it is able to repair 56% of the vulnerabilities identified in DifFuzz’s dataset. The results show that the tool can automatically correct timing side-channel vulnerabilities, being more effective with those that are control-flow based. In addition, the results of a user study show that users generally trust the refactorings produced by DifFuzzAR and that they see value in such a tool, in particular for more critical code.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-023-00398-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50036842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

(AI^{2}): the next leap toward native language-based and explainable machine learning framework (AI^{2}):向基于本地语言和可解释的机器学习框架的下一个飞跃

IF 3.4 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2023-09-24 DOI: 10.1007/s10515-023-00399-5

Jean-Sébastien Dessureault, Daniel Massicotte

The machine learning frameworks flourished in the last decades, allowing artificial intelligence to get out of academic circles to be applied to enterprise domains. This field has significantly advanced, but there is still some meaningful improvement to reach the subsequent expectations. The proposed framework, named AI(^{2}), uses a natural language interface that allows non-specialists to benefit from machine learning algorithms without necessarily knowing how to program with a programming language. The primary contribution of the AI(^{2}) framework allows a user to call the machine learning algorithms in English, making its interface usage easier. The second contribution is greenhouse gas (GHG) awareness. It has some strategies to evaluate the GHG generated by the algorithm to be called and to propose alternatives to find a solution without executing the energy-intensive algorithm. Another contribution is a preprocessing module that helps to describe and to load data properly. Using an English text-based chatbot, this module guides the user to define every dataset so that it can be described, normalized, loaded, and divided appropriately. The last contribution of this paper is about explainability. The scientific community has known that machine learning algorithms imply the famous black-box problem for decades. Traditional machine learning methods convert an input into an output without being able to justify this result. The proposed framework explains the algorithm’s process with the proper texts, graphics, and tables. The results, declined in five cases, present usage applications from the user’s English command to the explained output. Ultimately, the AI(^{2}) framework represents the next leap toward native language-based, human-oriented concerns about machine learning framework.

机器学习框架在过去几十年中蓬勃发展，使人工智能走出学术界，应用于企业领域。这一领域取得了显著进步，但仍有一些有意义的改进，以达到随后的预期。所提出的框架名为AI（^｛2｝），使用自然语言接口，允许非专业人员从机器学习算法中受益，而不必知道如何使用编程语言编程。人工智能框架的主要贡献是允许用户用英语调用机器学习算法，使其界面使用更容易。第二个贡献是对温室气体的认识。它有一些策略来评估要调用的算法产生的GHG，并提出替代方案，以在不执行能源密集型算法的情况下找到解决方案。另一个贡献是预处理模块，它有助于正确地描述和加载数据。该模块使用基于英语文本的聊天机器人，指导用户定义每个数据集，以便对其进行适当的描述、规范化、加载和划分。本文的最后一个贡献是关于可解释性。几十年来，科学界一直知道机器学习算法隐含着著名的黑盒问题。传统的机器学习方法将输入转换为输出，而无法证明这一结果的合理性。所提出的框架用适当的文本、图形和表格解释了算法的过程。在五种情况下，结果有所下降，从用户的英语命令到解释的输出都显示了使用应用程序。最终，人工智能（^｛2｝）框架代表了对机器学习框架的下一次基于母语、以人为本的关注。

{"title":"(AI^{2}): the next leap toward native language-based and explainable machine learning framework","authors":"Jean-Sébastien Dessureault, Daniel Massicotte","doi":"10.1007/s10515-023-00399-5","DOIUrl":"10.1007/s10515-023-00399-5","url":null,"abstract":"<div><p>The machine learning frameworks flourished in the last decades, allowing artificial intelligence to get out of academic circles to be applied to enterprise domains. This field has significantly advanced, but there is still some meaningful improvement to reach the subsequent expectations. The proposed framework, named AI<span>(^{2})</span>, uses a natural language interface that allows non-specialists to benefit from machine learning algorithms without necessarily knowing how to program with a programming language. The primary contribution of the AI<span>(^{2})</span> framework allows a user to call the machine learning algorithms in English, making its interface usage easier. The second contribution is greenhouse gas (GHG) awareness. It has some strategies to evaluate the GHG generated by the algorithm to be called and to propose alternatives to find a solution without executing the energy-intensive algorithm. Another contribution is a preprocessing module that helps to describe and to load data properly. Using an English text-based chatbot, this module guides the user to define every dataset so that it can be described, normalized, loaded, and divided appropriately. The last contribution of this paper is about explainability. The scientific community has known that machine learning algorithms imply the famous black-box problem for decades. Traditional machine learning methods convert an input into an output without being able to justify this result. The proposed framework explains the algorithm’s process with the proper texts, graphics, and tables. The results, declined in five cases, present usage applications from the user’s English command to the explained output. Ultimately, the AI<span>(^{2})</span> framework represents the next leap toward native language-based, human-oriented concerns about machine learning framework.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"30 2","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50046787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tips: towards automating patch suggestion for vulnerable smart contracts 提示:为易受攻击的智能合约自动提供补丁建议

IF 3.4 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2023-09-13 DOI: 10.1007/s10515-023-00392-y

Qianguo Chen, Teng Zhou, Kui Liu, Li Li, Chunpeng Ge, Zhe Liu, Jacques Klein, Tegawendé F. Bissyandé

Smart contracts are slowly penetrating our society where they are leveraged to support critical business transactions of which financial stakes are high. Smart contract programming is, however, in its infancy, and many failures due to programming defects exploited by malicious attackers and have made the headlines. In recent years, there has been an increasing effort in the literature to identify such vulnerabilities early in smart contracts to reduce the threats to the security of the accounts. Automatically patching smart contracts, however, is a much less investigated research topic. Yet, it can provide tools to help developers in fixing known vulnerabilities more rapidly. In this paper, we propose to review smart contract vulnerabilities and specify templates that will serve to automate patch generation. We implement the TIPS pipeline with 12 fix templates and assess its effectiveness on established smart contract datasets such as SmartBugs and ContractDefects. In particular, we show that TIPS is competitive against the state-of-the-art automated repair approach (SCRepair) in the literature. Finally, we evaluate the impact of the code changes suggested by TIPS in terms of gas usage.

智能合约正在慢慢渗透到我们的社会中，它们被用来支持金融风险很高的关键商业交易。然而，智能合约编程还处于起步阶段，许多失败都是由于恶意攻击者利用编程缺陷造成的，并成为头条新闻。近年来，文献中越来越多地致力于在智能合约中尽早识别此类漏洞，以减少对账户安全的威胁。然而，自动修补智能合约是一个研究较少的课题。然而，它可以提供工具来帮助开发人员更快地修复已知的漏洞。在本文中，我们建议审查智能合约漏洞，并指定用于自动生成补丁的模板。我们用12个修复模板实现了TIPS管道，并在已建立的智能合约数据集（如SmartBugs和ContractDefects）上评估其有效性。特别是，我们发现TIPS与文献中最先进的自动修复方法（SCRepair）相比具有竞争力。最后，我们评估了TIPS建议的代码更改对天然气使用的影响。

{"title":"Tips: towards automating patch suggestion for vulnerable smart contracts","authors":"Qianguo Chen, Teng Zhou, Kui Liu, Li Li, Chunpeng Ge, Zhe Liu, Jacques Klein, Tegawendé F. Bissyandé","doi":"10.1007/s10515-023-00392-y","DOIUrl":"10.1007/s10515-023-00392-y","url":null,"abstract":"<div><p>Smart contracts are slowly penetrating our society where they are leveraged to support critical business transactions of which financial stakes are high. Smart contract programming is, however, in its infancy, and many failures due to programming defects exploited by malicious attackers and have made the headlines. In recent years, there has been an increasing effort in the literature to identify such vulnerabilities early in smart contracts to reduce the threats to the security of the accounts. Automatically patching smart contracts, however, is a much less investigated research topic. Yet, it can provide tools to help developers in fixing known vulnerabilities more rapidly. In this paper, we propose to review smart contract vulnerabilities and specify templates that will serve to automate patch generation. We implement the TIPS pipeline with 12 fix templates and assess its effectiveness on established smart contract datasets such as SmartBugs and ContractDefects. In particular, we show that TIPS is competitive against the state-of-the-art automated repair approach (SCRepair) in the literature. Finally, we evaluate the impact of the code changes suggested by TIPS in terms of gas usage.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"30 2","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50024647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An automated approach to aspect-based sentiment analysis of apps reviews using machine and deep learning 使用机器和深度学习对应用评论进行基于方面的情感分析的自动化方法

IF 3.4 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2023-09-09 DOI: 10.1007/s10515-023-00397-7

Nouf Alturayeif, Hamoud Aljamaan, Jameleddine Hassine

Apps reviews hold a huge amount of informative user feedback that may be used to assist software practitioners in better understanding users’ needs, identify issues related to quality, such as privacy concerns and low efficiency, and evaluate the perceived users’ satisfaction with the app features. One way to efficiently extract this information is by using Aspect-Based Sentiment Analysis (ABSA). The role of ABSA of apps reviews is to identify all app’s aspects being reviewed and assign a sentiment polarity towards each aspect. This paper aims to build ABSA models using supervised Machine Learning (ML) and Deep Learning (DL) approaches. Our automated technique is intended to (1) identify the most useful and effective text-representation and task-specific features in both Aspect Category Detection (ACD) and Aspect Category Polarity, (2) empirically investigate the performance of conventional ML models when utilized for ABSA task of apps reviews, and (3) empirically compare the performance of ML models and DL models in the context of ABSA task. We built the models using different algorithms/architectures and performed hyper-parameters tuning. In addition, we extracted a set of relevant features for the ML models and performed an ablation study to analyze their contribution to the performance. Our empirical study showed that the ML model trained using Logistic Regression algorithm and BERT embeddings outperformed the other models. Although ML outperformed DL, DL models do not require hand-crafted features and they allow for a better learning of features when trained with more data.

应用程序评论包含大量信息丰富的用户反馈，可用于帮助软件从业者更好地了解用户需求，识别与质量相关的问题，例如隐私问题和低效率问题，并评估感知用户对应用程序功能的满意度。有效提取这些信息的一种方法是使用基于方面的情感分析(ABSA)。应用评论的ABSA的作用是识别所有被评论的应用方面，并为每个方面分配情感极性。本文旨在使用监督机器学习(ML)和深度学习(DL)方法构建ABSA模型。我们的自动化技术旨在(1)识别方面类别检测(ACD)和方面类别极性中最有用和最有效的文本表示和任务特定的特征，(2)在应用程序评论的ABSA任务中使用传统ML模型时，实证研究其性能，以及(3)在ABSA任务上下文中经验比较ML模型和DL模型的性能。我们使用不同的算法/架构构建模型，并执行超参数调优。此外，我们为ML模型提取了一组相关特征，并进行了消融研究，以分析它们对性能的贡献。我们的实证研究表明，使用逻辑回归算法和BERT嵌入训练的机器学习模型优于其他模型。虽然ML优于DL，但DL模型不需要手工制作的特征，并且当使用更多数据训练时，它们允许更好地学习特征。

{"title":"An automated approach to aspect-based sentiment analysis of apps reviews using machine and deep learning","authors":"Nouf Alturayeif, Hamoud Aljamaan, Jameleddine Hassine","doi":"10.1007/s10515-023-00397-7","DOIUrl":"10.1007/s10515-023-00397-7","url":null,"abstract":"<div><p>Apps reviews hold a huge amount of informative user feedback that may be used to assist software practitioners in better understanding users’ needs, identify issues related to quality, such as privacy concerns and low efficiency, and evaluate the perceived users’ satisfaction with the app features. One way to efficiently extract this information is by using Aspect-Based Sentiment Analysis (ABSA). The role of ABSA of apps reviews is to identify all app’s aspects being reviewed and assign a sentiment polarity towards each aspect. This paper aims to build ABSA models using supervised Machine Learning (ML) and Deep Learning (DL) approaches. Our automated technique is intended to (1) identify the most useful and effective text-representation and task-specific features in both Aspect Category Detection (ACD) and Aspect Category Polarity, (2) empirically investigate the performance of conventional ML models when utilized for ABSA task of apps reviews, and (3) empirically compare the performance of ML models and DL models in the context of ABSA task. We built the models using different algorithms/architectures and performed hyper-parameters tuning. In addition, we extracted a set of relevant features for the ML models and performed an ablation study to analyze their contribution to the performance. Our empirical study showed that the ML model trained using Logistic Regression algorithm and BERT embeddings outperformed the other models. Although ML outperformed DL, DL models do not require hand-crafted features and they allow for a better learning of features when trained with more data.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"30 2","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50017466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automated verification of concurrent go programs via bounded model checking 通过有界模型检查并发围棋程序的自动验证

IF 3.4 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2023-08-26 DOI: 10.1007/s10515-023-00391-z

Nicolas Dilley, Julien Lange

The Go programming language offers a wide range of primitives to coordinate lightweight threads, e.g., channels, waitgroups, and mutexes—all of which may cause concurrency bugs. Static checkers that guarantee the absence of bugs are essential to help programmers avoid these costly errors before their code is executed. However existing tools either miss too many bugs or cannot handle large programs, and do not support programs that rely on statically unknown parameters that affect their concurrent structure (e.g., number of threads). To address these limitations, we propose a static checker for Go programs which relies on performing bounded model checking of their concurrent behaviours. In contrast to previous works, our approach deals with large codebases, supports programs that have statically unknown parameters, and is extensible to additional concurrency primitives. Our work includes a detailed presentation of the extraction algorithm from Go programs to models, an algorithm to automatically check programs with statically unknown parameters, and a large scale evaluation of our approach. The latter shows that our approach outperforms the state-of-the-art on 220 synthetic programs and 78 buggy programs adapted from existing codebases.

Go编程语言提供了广泛的基元来协调轻量级线程，例如通道、等待组和互斥对象，所有这些都可能导致并发错误。保证没有错误的静态检查器对于帮助程序员在执行代码之前避免这些代价高昂的错误至关重要。然而，现有的工具要么遗漏了太多错误，要么无法处理大型程序，并且不支持依赖于影响其并发结构（例如线程数）的静态未知参数的程序。为了解决这些限制，我们提出了一种Go程序的静态检查器，该检查器依赖于对其并发行为执行有界模型检查。与以前的工作相比，我们的方法处理大型代码库，支持具有静态未知参数的程序，并可扩展到其他并发原语。我们的工作包括从Go程序到模型的提取算法的详细介绍，自动检查具有静态未知参数的程序的算法，以及对我们的方法的大规模评估。后者表明，我们的方法在220个合成程序和78个改编自现有代码库的bug程序上优于最先进的方法。

{"title":"Automated verification of concurrent go programs via bounded model checking","authors":"Nicolas Dilley, Julien Lange","doi":"10.1007/s10515-023-00391-z","DOIUrl":"10.1007/s10515-023-00391-z","url":null,"abstract":"<div><p>The Go programming language offers a wide range of primitives to coordinate lightweight threads, e.g., channels, waitgroups, and mutexes—all of which may cause concurrency bugs. Static checkers that guarantee the absence of bugs are essential to help programmers avoid these costly errors before their code is executed. However existing tools either miss too many bugs or cannot handle large programs, and do not support programs that rely on statically unknown parameters that affect their concurrent structure (e.g., number of threads). To address these limitations, we propose a static checker for Go programs which relies on performing bounded model checking of their concurrent behaviours. In contrast to previous works, our approach deals with large codebases, supports programs that have statically unknown parameters, and is extensible to additional concurrency primitives. Our work includes a detailed presentation of the extraction algorithm from Go programs to models, an algorithm to automatically check programs with statically unknown parameters, and a large scale evaluation of our approach. The latter shows that our approach outperforms the state-of-the-art on 220 synthetic programs and 78 buggy programs adapted from existing codebases.\u0000</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"30 2","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50047664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ActGraph: prioritization of test cases based on deep neural network activation graph ActGraph:基于深度神经网络激活图的测试用例优先级排序

IF 3.4 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2023-08-22 DOI: 10.1007/s10515-023-00396-8

Jinyin Chen, Jie Ge, Haibin Zheng

Widespread applications of deep neural networks (DNNs) benefit from DNN testing to guarantee their quality. In the DNN testing, numerous test cases are fed into the model to explore potential vulnerabilities, but they require expensive manual cost to check the label. Therefore, test case prioritization is proposed to solve the problem of labeling cost, e.g., surprise adequacy-based, uncertainty quantifiers-based and mutation-based prioritization methods. However, most of them suffer from limited scenarios (i.e. high confidence adversarial or false positive cases) and high time complexity. To address these challenges, we propose the concept of the activation graph from the perspective of the spatial relationship of neurons. We observe that the activation graph of cases that triggers the model’s misbehavior significantly differs from that of normal cases. Motivated by it, we design a test case prioritization method based on the activation graph, ActGraph, by extracting the high-order node feature of the activation graph for prioritization. ActGraph explains the difference between the test cases to solve the problem of scenario limitation. Without mutation operations, ActGraph is easy to implement, leading to lower time complexity. Extensive experiments on three datasets and four models demonstrate that ActGraph has the following key characteristics. (i) Effectiveness and generalizability: ActGraph shows competitive performance in all of the natural, adversarial and mixed scenarios, especially in RAUC-100 improvement ((sim times )1.40). (ii) Efficiency: ActGraph runs at less time cost ((sim times )1/50) than the state-of-the-art method. The code of ActGraph is open-sourced at https://github.com/Embed-Debuger/ActGraph.

深度神经网络（DNN）的广泛应用得益于DNN测试以保证其质量。在DNN测试中，许多测试用例被输入到模型中，以探索潜在的漏洞，但它们需要昂贵的手动成本来检查标签。因此，为了解决标签成本问题，提出了测试用例优先级排序方法，如基于惊喜充分性的、基于不确定性量词的和基于变异的优先级排序方法。然而，它们中的大多数都存在有限的场景（即高置信度对抗性或假阳性案例）和高时间复杂性。为了应对这些挑战，我们从神经元的空间关系的角度提出了激活图的概念。我们观察到，触发模型不当行为的案例的激活图与正常案例的激活图显着不同。受此启发，我们设计了一种基于激活图ActGraph的测试用例优先级排序方法，通过提取激活图的高阶节点特征进行优先级排序。ActGraph解释了测试用例之间的差异，以解决场景限制的问题。在没有突变操作的情况下，ActGraph易于实现，从而降低了时间复杂性。在三个数据集和四个模型上进行的大量实验表明，ActGraph具有以下关键特性。（i）有效性和可推广性：ActGraph在所有自然、对抗性和混合场景中都显示出竞争性能，尤其是在RAUC-100改进中（（simtimes）1.40）。ActGraph的代码开源于https://github.com/Embed-Debuger/ActGraph.

{"title":"ActGraph: prioritization of test cases based on deep neural network activation graph","authors":"Jinyin Chen, Jie Ge, Haibin Zheng","doi":"10.1007/s10515-023-00396-8","DOIUrl":"10.1007/s10515-023-00396-8","url":null,"abstract":"<div><p>Widespread applications of deep neural networks (DNNs) benefit from DNN testing to guarantee their quality. In the DNN testing, numerous test cases are fed into the model to explore potential vulnerabilities, but they require expensive manual cost to check the label. Therefore, test case prioritization is proposed to solve the problem of labeling cost, e.g., surprise adequacy-based, uncertainty quantifiers-based and mutation-based prioritization methods. However, most of them suffer from limited scenarios (i.e. high confidence adversarial or false positive cases) and high time complexity. To address these challenges, we propose the concept of the activation graph from the perspective of the spatial relationship of neurons. We observe that the activation graph of cases that triggers the model’s misbehavior significantly differs from that of normal cases. Motivated by it, we design a test case prioritization method based on the activation graph, ActGraph, by extracting the high-order node feature of the activation graph for prioritization. ActGraph explains the difference between the test cases to solve the problem of scenario limitation. Without mutation operations, ActGraph is easy to implement, leading to lower time complexity. Extensive experiments on three datasets and four models demonstrate that ActGraph has the following key characteristics. (i) <i>Effectiveness and generalizability</i>: ActGraph shows competitive performance in all of the natural, adversarial and mixed scenarios, especially in <i>RAUC-100</i> improvement (<span>(sim times )</span>1.40). (ii) <i>Efficiency</i>: ActGraph runs at less time cost (<span>(sim times )</span>1/50) than the state-of-the-art method. The code of ActGraph is open-sourced at <i>https://github.com/Embed-Debuger/ActGraph</i>.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"30 2","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-023-00396-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50041902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

HMPT: a human–machine cooperative program translation method HMPT:一种人机协作的程序翻译方法

IF 3.4 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2023-08-21 DOI: 10.1007/s10515-023-00395-9

Xin Zhang, Zhiwen Yu, Jiaqi Liu, Hui Wang, Liang Wang, Bin Guo

Program translation aims to translate one kind of programming language to another, e.g., from Python to Java. Due to the inefficiency of translation rules construction with pure human effort (software engineer) and the low quality of machine translation results with pure machine effort, it is suggested to implement program translation in a human–machine cooperative way. However, existing human–machine program translation methods fail to utilize the human’s ability effectively, which require human to post-edit the results (i.e., statically modified directly on the model generated code). To solve this problem, we propose HMPT (Human-Machine Program Translation), a novel method that achieves program translation based on human–machine cooperation. It can (1) reduce the human effort by introducing a prefix-based interactive protocol that feeds the human’s edit into the model as the prefix and regenerates better output code, and (2) reduce the interactive response time resulted by excessive program length in the regeneration process from two aspects: avoiding duplicate prefix generation with cache attention information, as well as reducing invalid suffix generation by splicing the suffix of the results. The experiments are conducted on two real datasets. Results show compared to the baselines, our method reduces the human effort up to 73.5% at the token level and reduces the response time up to 76.1%.

程序翻译旨在将一种编程语言翻译成另一种，例如从Python翻译成Java。由于纯人工（软件工程师）的翻译规则构建效率低下，而纯机器的机器翻译结果质量较低，因此建议采用人机合作的方式来实现程序翻译。然而，现有的人机程序翻译方法未能有效利用人类的能力，这需要人类对结果进行后编辑（即，直接在模型生成的代码上静态修改）。为了解决这个问题，我们提出了HMPT（人机程序翻译），这是一种基于人机合作实现程序翻译的新方法。它可以（1）通过引入一种基于前缀的交互式协议来减少人工的工作量，该协议将人工的编辑作为前缀输入到模型中，并重新生成更好的输出代码；（2）从两个方面减少重新生成过程中因程序长度过长而导致的交互响应时间：避免缓存注意力信息的重复前缀生成，以及通过拼接结果的后缀来减少无效后缀的生成。实验是在两个真实的数据集上进行的。结果显示，与基线相比，我们的方法在令牌级别将人力投入减少了73.5%，响应时间减少了76.1%。

{"title":"HMPT: a human–machine cooperative program translation method","authors":"Xin Zhang, Zhiwen Yu, Jiaqi Liu, Hui Wang, Liang Wang, Bin Guo","doi":"10.1007/s10515-023-00395-9","DOIUrl":"10.1007/s10515-023-00395-9","url":null,"abstract":"<div><p>Program translation aims to translate one kind of programming language to another, e.g., from Python to Java. Due to the inefficiency of translation rules construction with pure human effort (software engineer) and the low quality of machine translation results with pure machine effort, it is suggested to implement program translation in a human–machine cooperative way. However, existing human–machine program translation methods fail to utilize the human’s ability effectively, which require human to post-edit the results (i.e., statically modified directly on the model generated code). To solve this problem, we propose HMPT (Human-Machine Program Translation), a novel method that achieves program translation based on human–machine cooperation. It can (1) reduce the human effort by introducing a prefix-based interactive protocol that feeds the human’s edit into the model as the prefix and regenerates better output code, and (2) reduce the interactive response time resulted by excessive program length in the regeneration process from two aspects: avoiding duplicate prefix generation with cache attention information, as well as reducing invalid suffix generation by splicing the suffix of the results. The experiments are conducted on two real datasets. Results show compared to the baselines, our method reduces the human effort up to 73.5% at the token level and reduces the response time up to 76.1%.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"30 2","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-023-00395-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50040344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Crowdsourced test case generation for android applications via static program analysis 通过静态程序分析为android应用程序生成众包测试用例

IF 3.4 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2023-08-02 DOI: 10.1007/s10515-023-00394-w

Yuying Li, Yang Feng, Chao Guo, Zhenyu Chen, Baowen Xu

The testing of Android applications(apps) is a challenging task due to the serious fragmentation issues and diverse usage environments. To improve the testing efficiency and collect the feedbacks from real usage scenarios, crowdsourcing has been employed in the testing of Android. However, crowdsourced testing is a manual working paradigm, while the shortage of testing guidance for crowd workers who often have limited software engineering knowledge may result in many redundant or invalid test reports. To fill this gap, this paper presents an automated test case generation approach for the testing of Android apps. Our approach is built upon static program analysis and is capable of providing detailed testing steps to guide workers in performing testing. Furthermore, we use the automated testing tool for pre-testing, and crowd workers only need to test the uncovered test cases. We evaluate our approach with six widely-used apps to evaluate its effectiveness and efficiency. The experimental results show that our approach can detect 71.5% more bugs in diverse categories and achieve 21.8% higher path coverage in comparison with classic crowdsourced testing techniques. Also, in the experiment, we detect 44 unknown bugs in the six subjects, which indicates our approach is highly promising for assisting the testing of Android apps in practice.

由于严重的碎片问题和多样化的使用环境，Android应用程序的测试是一项具有挑战性的任务。为了提高测试效率，收集真实使用场景的反馈，在Android的测试中采用了众包的方式。然而，众包测试是一种手工的工作范式，而缺乏对通常具有有限软件工程知识的众包工作者的测试指导可能会导致许多冗余或无效的测试报告。为了填补这一空白，本文提出了一种用于测试Android应用程序的自动化测试用例生成方法。我们的方法建立在静态程序分析的基础上，能够提供详细的测试步骤来指导工作人员执行测试。此外，我们使用自动化的测试工具进行预测试，而人群工作人员只需要测试未覆盖的测试用例。我们用六个广泛使用的应用程序来评估我们的方法，以评估其有效性和效率。实验结果表明，与传统的众包测试技术相比，我们的方法可以在不同类别中检测到71.5%的bug，并实现21.8%的路径覆盖率。此外，在实验中，我们在6个被试中发现了44个未知bug，这表明我们的方法在实际中非常有希望辅助Android应用的测试。

{"title":"Crowdsourced test case generation for android applications via static program analysis","authors":"Yuying Li, Yang Feng, Chao Guo, Zhenyu Chen, Baowen Xu","doi":"10.1007/s10515-023-00394-w","DOIUrl":"10.1007/s10515-023-00394-w","url":null,"abstract":"<div><p>The testing of Android applications(apps) is a challenging task due to the serious fragmentation issues and diverse usage environments. To improve the testing efficiency and collect the feedbacks from real usage scenarios, crowdsourcing has been employed in the testing of Android. However, crowdsourced testing is a manual working paradigm, while the shortage of testing guidance for crowd workers who often have limited software engineering knowledge may result in many redundant or invalid test reports. To fill this gap, this paper presents an automated test case generation approach for the testing of Android apps. Our approach is built upon static program analysis and is capable of providing detailed testing steps to guide workers in performing testing. Furthermore, we use the automated testing tool for pre-testing, and crowd workers only need to test the uncovered test cases. We evaluate our approach with six widely-used apps to evaluate its effectiveness and efficiency. The experimental results show that our approach can detect 71.5% more bugs in diverse categories and achieve 21.8% higher path coverage in comparison with classic crowdsourced testing techniques. Also, in the experiment, we detect 44 unknown bugs in the six subjects, which indicates our approach is highly promising for assisting the testing of Android apps in practice.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"30 2","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-023-00394-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50001596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

QUARE: towards a question-answering model for requirements elicitation 面向需求引出的问答模型

IF 3.4 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2023-07-29 DOI: 10.1007/s10515-023-00386-w

Johnathan Mauricio Calle Gallego, Carlos Mario Zapata Jaramillo

Requirements elicitation is a stakeholder-centered approach; therefore, natural language remains an effective way of documenting and validating requirements. As the scope of the software domain grows, software analysts process a higher number of requirements documents, generating delays and errors while characterizing the software domain. Natural language processing is key in such a process, allowing software analysts for speeding up the requirements elicitation process and mitigating the impact of the ambiguity and misinterpretations coming from natural-language-based requirements documents. However, natural-language-processing-based proposals for requirements elicitation are mainly focused on specific domains and still fail for understanding several requirements writing styles. In this paper, we present QUARE, a question-answering model for requirements elicitation. The QUARE model comprises a meta-ontology for requirements elicitation, easing the generation of requirements-elicitation-related questions and the initial structuration of any software domain. In addition, the QUARE model includes a named entity recognition and relation extraction system focused on requirements elicitation, allowing software analysts for processing several requirements writing styles. Although software analysts address a software domain at a time, they use the same kind of questions for identifying and characterizing requirements abstractions such as actors, concepts, and actions from a software domain. Such a process may be framed into the QUARE model workflow. We validate our proposal by using an experimental process including real-world requirements documents coming from several software domains and requirements writing styles. The QUARE model is a novel proposal aimed at supporting software analysts in the requirements elicitation process.

需求启发是一种以利益相关者为中心的方法；因此，自然语言仍然是记录和验证需求的有效方式。随着软件领域范围的扩大，软件分析师处理的需求文档数量越来越多，在描述软件领域的同时产生延迟和错误。自然语言处理是这一过程的关键，使软件分析师能够加快需求引出过程，并减轻基于自然语言的需求文档中的歧义和误解的影响。然而，基于自然语言处理的需求启发建议主要集中在特定领域，并且仍然无法理解几种需求写作风格。在本文中，我们提出了一个用于需求启发的问答模型QUARE。QUARE模型包括用于需求启发的元本体，简化了与需求启发相关的问题的生成和任何软件领域的初始结构化。此外，QUARE模型包括一个专注于需求启发的命名实体识别和关系提取系统，允许软件分析师处理几种需求写作风格。尽管软件分析师一次处理一个软件领域，但他们使用相同类型的问题来识别和表征需求抽象，例如来自软件领域的参与者、概念和操作。这样的过程可以被框架化到QUARE模型工作流中。我们通过使用一个实验过程来验证我们的提案，该过程包括来自几个软件领域和需求写作风格的真实需求文档。QUARE模型是一个新颖的提议，旨在支持软件分析师进行需求获取过程。

{"title":"QUARE: towards a question-answering model for requirements elicitation","authors":"Johnathan Mauricio Calle Gallego, Carlos Mario Zapata Jaramillo","doi":"10.1007/s10515-023-00386-w","DOIUrl":"10.1007/s10515-023-00386-w","url":null,"abstract":"<div><p>Requirements elicitation is a stakeholder-centered approach; therefore, natural language remains an effective way of documenting and validating requirements. As the scope of the software domain grows, software analysts process a higher number of requirements documents, generating delays and errors while characterizing the software domain. Natural language processing is key in such a process, allowing software analysts for speeding up the requirements elicitation process and mitigating the impact of the ambiguity and misinterpretations coming from natural-language-based requirements documents. However, natural-language-processing-based proposals for requirements elicitation are mainly focused on specific domains and still fail for understanding several requirements writing styles. In this paper, we present QUARE, a question-answering model for requirements elicitation. The QUARE model comprises a meta-ontology for requirements elicitation, easing the generation of requirements-elicitation-related questions and the initial structuration of any software domain. In addition, the QUARE model includes a named entity recognition and relation extraction system focused on requirements elicitation, allowing software analysts for processing several requirements writing styles. Although software analysts address a software domain at a time, they use the same kind of questions for identifying and characterizing requirements abstractions such as actors, concepts, and actions from a software domain. Such a process may be framed into the QUARE model workflow. We validate our proposal by using an experimental process including real-world requirements documents coming from several software domains and requirements writing styles. The QUARE model is a novel proposal aimed at supporting software analysts in the requirements elicitation process.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"30 2","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-023-00386-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50053460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A model-based DevOps process for development of mathematical database cost models 一个基于模型的DevOps过程，用于开发数学数据库成本模型

IF 3.4 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2023-07-28 DOI: 10.1007/s10515-023-00390-0

Ahmed Chikhaoui, Abdelhafid Chadli, Abdelkader Ouared

Obviously, the complexity of mathematical database cost models increases with the evolution of the database technology brought by emerging hardware and the new deployment platforms (ex. Cloud). This finding raises questions about the reliability of past Cost Models (CMs). Indeed, redesigning a database CM to evaluate the quality of service (QoS) attributes (i.e. response time, energy, sizing, etc.) is becoming a challenging task. First, because developers directly implement the CM by hard coding inside a DBMS without a prior design. Second, due to a lack of a stepwise development process to support an incremental CM design and continuous testing to diagnose errors that occur at each design stage. Moreover, reusing CMs for other purposes is a major issue that necessitates investigations to allow designers reusing and adapting CMs according to their needs. To take up these challenges, we propose a model-based framework for incremental design and continuous testing of Database CMs Specifically, we are motivated by proposing an approach that aims at shifting CMs design from an adhoc design to a structured and shared design by using a set of design guidelines inspired from software engineering practices. Finally, we propose to use the DevOps reuse practices (Continuous Integration/Continuous Delivery: CI/CD) to store the CM under design in a repository after each upgrade to be reused, improved, calibrated, and refined for other purposes. We evaluate our approach against common CM features, and we carry out a comparison with some analytical models from the literature. Findings show that our framework provides a high CM prediction accuracy, and identify the right design components with a precision ranging from 85% to 100%.

显然，随着新兴硬件和新部署平台（如Cloud）带来的数据库技术的发展，数学数据库成本模型的复杂性也在增加。这一发现引发了对过去成本模型（CM）可靠性的质疑。事实上，重新设计数据库CM以评估服务质量（QoS）属性（即响应时间、能量、大小等）正成为一项具有挑战性的任务。首先，因为开发人员在没有事先设计的情况下，通过在DBMS中进行硬编码来直接实现CM。其次，由于缺乏支持增量CM设计的逐步开发过程和诊断每个设计阶段发生的错误的连续测试。此外，将CM重新用于其他目的是一个主要问题，需要进行调查，以允许设计者根据其需求重新使用和调整CM。为了应对这些挑战，我们提出了一个基于模型的框架，用于数据库CM的增量设计和连续测试。具体而言，我们的动机是提出一种方法，旨在通过使用一套受软件工程实践启发的设计指南，将CM设计从自组织设计转变为结构化和共享设计。最后，我们建议使用DevOps重用实践（持续集成/持续交付：CI/CD），在每次升级后将设计中的CM存储在存储库中，以便重用、改进、校准和改进以用于其他目的。我们根据常见的CM特征评估了我们的方法，并与文献中的一些分析模型进行了比较。研究结果表明，我们的框架提供了高的CM预测精度，并以85%至100%的精度确定了正确的设计组件。

{"title":"A model-based DevOps process for development of mathematical database cost models","authors":"Ahmed Chikhaoui, Abdelhafid Chadli, Abdelkader Ouared","doi":"10.1007/s10515-023-00390-0","DOIUrl":"10.1007/s10515-023-00390-0","url":null,"abstract":"<div><p>Obviously, the complexity of mathematical database cost models increases with the evolution of the database technology brought by emerging hardware and the new deployment platforms (ex. Cloud). This finding raises questions about the reliability of past Cost Models (CMs). Indeed, redesigning a database CM to evaluate the quality of service (QoS) attributes (i.e. response time, energy, sizing, etc.) is becoming a challenging task. First, because developers directly implement the CM by hard coding inside a DBMS without a prior design. Second, due to a lack of a stepwise development process to support an incremental CM design and continuous testing to diagnose errors that occur at each design stage. Moreover, reusing CMs for other purposes is a major issue that necessitates investigations to allow designers reusing and adapting CMs according to their needs. To take up these challenges, we propose a model-based framework for incremental design and continuous testing of Database CMs Specifically, we are motivated by proposing an approach that aims at shifting CMs design from an adhoc design to a structured and shared design by using a set of design guidelines inspired from software engineering practices. Finally, we propose to use the DevOps reuse practices (Continuous Integration/Continuous Delivery: CI/CD) to store the CM under design in a repository after each upgrade to be reused, improved, calibrated, and refined for other purposes. We evaluate our approach against common CM features, and we carry out a comparison with some analytical models from the literature. Findings show that our framework provides a high CM prediction accuracy, and identify the right design components with a precision ranging from 85% to 100%.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"30 2","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-023-00390-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50051211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0