Automated Software Engineering最新文献_第8页

DL4SC: a novel deep learning-based vulnerability detection framework for smart contracts DL4SC：基于深度学习的新型智能合约漏洞检测框架

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-03-01 DOI: 10.1007/s10515-024-00418-z

Yang Liu, Chao Wang, Yan Ma

Smart contract is a new paradigm for the decentralized software system, which plays an important and key role in Blockchain-based application. The vulnerabilities in smart contracts are unacceptable, and some of which have caused significant economic losses. The machine learning, especially deep learning, is a very promising and potential approach to vulnerability detecting for smart contracts. At present, deep learning-based vulnerability detection methods have low accuracy, time-consuming, and too small application range. For dealing with these, we propose a novel deep learning-based vulnerability detection framework for smart contracts at opcode level, named as DL4SC. It orthogonally combines the Transformer encoder and CNN (convolutional neural networks) to detect vulnerabilities of smart contracts for the first time, and firstly exploit SSA (sparrow search algorithm) to automatically search model hyperparameters for vulnerability detection. We implement the framework DL4SC on deep learning platform Pytorch with Python, and compare it with existing works on the three public datasets and one dataset we collect. The experiment results show that DL4SC can accurately detect vulnerabilities of smart contracts, and performs better than state-of-the-art works for detecting vulnerabilities in smart contracts. The accuracy and F1-score of DL4SC are 95.29% and 95.68%, respectively.

智能合约是去中心化软件系统的一种新范式，在基于区块链的应用中发挥着重要而关键的作用。智能合约中的漏洞是不可接受的，其中一些漏洞已经造成了重大经济损失。机器学习，尤其是深度学习，是一种非常有前景和潜力的智能合约漏洞检测方法。目前，基于深度学习的漏洞检测方法存在准确率低、耗时长、应用范围太小等问题。针对这些问题，我们提出了一种新颖的基于深度学习的智能合约操作码级漏洞检测框架，命名为 DL4SC。它首次将 Transformer 编码器和 CNN（卷积神经网络）正交结合起来检测智能合约的漏洞，并首次利用 SSA（麻雀搜索算法）自动搜索模型超参数进行漏洞检测。我们在深度学习平台 Pytorch 上用 Python 实现了框架 DL4SC，并在三个公开数据集和我们收集的一个数据集上与现有作品进行了比较。实验结果表明，DL4SC 可以准确检测智能合约的漏洞，在检测智能合约漏洞方面的表现优于最先进的作品。DL4SC 的准确率和 F1 分数分别为 95.29% 和 95.68%。

{"title":"DL4SC: a novel deep learning-based vulnerability detection framework for smart contracts","authors":"Yang Liu, Chao Wang, Yan Ma","doi":"10.1007/s10515-024-00418-z","DOIUrl":"10.1007/s10515-024-00418-z","url":null,"abstract":"<div><p>Smart contract is a new paradigm for the decentralized software system, which plays an important and key role in Blockchain-based application. The vulnerabilities in smart contracts are unacceptable, and some of which have caused significant economic losses. The machine learning, especially deep learning, is a very promising and potential approach to vulnerability detecting for smart contracts. At present, deep learning-based vulnerability detection methods have low accuracy, time-consuming, and too small application range. For dealing with these, we propose a novel deep learning-based vulnerability detection framework for smart contracts at opcode level, named as DL4SC. It orthogonally combines the Transformer encoder and CNN (convolutional neural networks) to detect vulnerabilities of smart contracts for the first time, and firstly exploit SSA (sparrow search algorithm) to automatically search model hyperparameters for vulnerability detection. We implement the framework DL4SC on deep learning platform Pytorch with Python, and compare it with existing works on the three public datasets and one dataset we collect. The experiment results show that DL4SC can accurately detect vulnerabilities of smart contracts, and performs better than state-of-the-art works for detecting vulnerabilities in smart contracts. The accuracy and F1-score of DL4SC are 95.29% and 95.68%, respectively.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140020010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sound analysis and migration of data from Ethereum smart contracts 对以太坊智能合约中的数据进行合理分析和迁移

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-02-29 DOI: 10.1007/s10515-024-00422-3

Maha Ayub, Muhammad Waiz Khan, Muhammmad Umar Janjua

With the addition of multiple blockchain platforms in the ecosystem, the Dapp owners need to migrate their smart contracts from one platform to another to remain competitive, cost-effective, and secure. A smart contract is a piece of code that contains logic and data. To migrate a smart contract, whether it’s on the same blockchain platform or a different one, we need both its source code that represents the logic and data that indicate the state of the contract. The source code can be easily set up, but to complete the migration, we have to extract the current state of the contract. In this paper, we have developed an advanced state extraction technique that uses static analysis to analyze the smart contract’s call graph and events, and extracts the entire storage state from the storage trie, along with the proper associations across function calls, enabling users to visualize, manage, and transform the state as desired for migration. The soundness of the extracted state was confirmed using the method of abstract interpretation. Further, the migration adapter is designed to transform the extracted state into slot-value pairs and migrate it to the target blockchain. Using our new approach, we were able to completely analyze 14% more smart contracts with the extraction of 53% more data by analyzing function calls and event logs from 67,993 contracts and also migrated some contracts to the multiple popular EVM-compatible blockchains.

随着生态系统中增加了多个区块链平台，Dapp 所有者需要将其智能合约从一个平台迁移到另一个平台，以保持竞争力、成本效益和安全性。智能合约是一段包含逻辑和数据的代码。要迁移智能合约，无论它是在同一个区块链平台上还是在另一个平台上，我们都需要它的源代码（表示合约状态的逻辑和数据）。源代码很容易设置，但要完成迁移，我们必须提取合约的当前状态。在本文中，我们开发了一种先进的状态提取技术，它利用静态分析来分析智能合约的调用图和事件，并从存储三元组中提取出整个存储状态，以及跨函数调用的适当关联，使用户能够可视化地管理和转换所需的状态，从而实现迁移。使用抽象解释方法确认了提取状态的合理性。此外，迁移适配器旨在将提取的状态转换为槽值对，并将其迁移到目标区块链。使用我们的新方法，通过分析 67,993 个合约的函数调用和事件日志，我们能够完全分析多 14% 的智能合约，提取多 53% 的数据，还将一些合约迁移到了多个流行的 EVM 兼容区块链上。

{"title":"Sound analysis and migration of data from Ethereum smart contracts","authors":"Maha Ayub, Muhammad Waiz Khan, Muhammmad Umar Janjua","doi":"10.1007/s10515-024-00422-3","DOIUrl":"10.1007/s10515-024-00422-3","url":null,"abstract":"<div><p>With the addition of multiple blockchain platforms in the ecosystem, the Dapp owners need to migrate their smart contracts from one platform to another to remain competitive, cost-effective, and secure. A smart contract is a piece of code that contains logic and data. To migrate a smart contract, whether it’s on the same blockchain platform or a different one, we need both its source code that represents the logic and data that indicate the state of the contract. The source code can be easily set up, but to complete the migration, we have to extract the current state of the contract. In this paper, we have developed an advanced state extraction technique that uses static analysis to analyze the smart contract’s call graph and events, and extracts the entire storage state from the storage trie, along with the proper associations across function calls, enabling users to visualize, manage, and transform the state as desired for migration. The soundness of the extracted state was confirmed using the method of abstract interpretation. Further, the migration adapter is designed to transform the extracted state into slot-value pairs and migrate it to the target blockchain. Using our new approach, we were able to completely analyze 14% more smart contracts with the extraction of 53% more data by analyzing function calls and event logs from 67,993 contracts and also migrated some contracts to the multiple popular EVM-compatible blockchains.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140008624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using model-driven engineering to automate software language translation 利用模型驱动工程实现软件语言翻译自动化

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-02-28 DOI: 10.1007/s10515-024-00419-y

Kevin Lano, Hanan Siala

The porting or translation of software applications from one programming language to another is a common requirement of organisations that utilise software, and the increasing number and diversity of programming languages makes this capability as relevant today as in previous decades. Several approaches have been used to address this challenge, including machine learning and the manual definition of direct language-to-language translation rules, however the accuracy of these approaches remains unsatisfactory. In this paper we describe a new approach to program translation using model-driven engineering techniques: reverse-engineering source programs into specifications in the UML and OCL formalisms, and then forward-engineering the specifications to the required target language. This approach can provide assurance of semantic preservation, and additionally has the advantage of extracting precise specifications of software from code. We provide an evaluation based on a comprehensive dataset of examples, including industrial cases, and compare our results to those of other approaches and tools. Our specific contributions are: (1) Reverse-engineering source programs to detailed semantic models of software behaviour, to enable semantically-correct translations and reduce re-testing costs; (2) Program abstraction processes defined by precise and explicit rules, which can be edited and configured by users; (3) A set of reusable OCL library components appropriate for representing program semantics, and which can also be used for OCL specification of new applications; (4) A systematic procedure for building program abstractors based on language grammars and semantics.

将软件应用程序从一种编程语言移植或翻译成另一种编程语言，是使用软件的机构的共同要求，而编程语言的数量和多样性不断增加，使得这种能力在今天与过去几十年一样重要。为应对这一挑战，人们采用了多种方法，包括机器学习和手动定义语言间的直接翻译规则，但这些方法的准确性仍不能令人满意。在本文中，我们介绍了一种使用模型驱动工程技术进行程序翻译的新方法：将源程序逆向工程转换为 UML 和 OCL 形式的规范，然后将规范正向工程转换为所需的目标语言。这种方法可以确保语义的保留，而且还具有从代码中提取精确软件规范的优势。我们基于包括工业案例在内的综合示例数据集进行了评估，并将我们的结果与其他方法和工具的结果进行了比较。我们的具体贡献如下(1) 将源程序逆向工程化为详细的软件行为语义模型，从而实现语义正确的翻译并降低重新测试的成本；(2) 通过精确而明确的规则定义程序抽象过程，用户可对其进行编辑和配置；(3) 一套可重复使用的 OCL 库组件，适用于表示程序语义，也可用于新应用程序的 OCL 规范；(4) 基于语言语法和语义构建程序抽象器的系统程序。

{"title":"Using model-driven engineering to automate software language translation","authors":"Kevin Lano, Hanan Siala","doi":"10.1007/s10515-024-00419-y","DOIUrl":"10.1007/s10515-024-00419-y","url":null,"abstract":"<div><p>The porting or translation of software applications from one programming language to another is a common requirement of organisations that utilise software, and the increasing number and diversity of programming languages makes this capability as relevant today as in previous decades. Several approaches have been used to address this challenge, including machine learning and the manual definition of direct language-to-language translation rules, however the accuracy of these approaches remains unsatisfactory. In this paper we describe a new approach to program translation using model-driven engineering techniques: reverse-engineering source programs into specifications in the UML and OCL formalisms, and then forward-engineering the specifications to the required target language. This approach can provide assurance of semantic preservation, and additionally has the advantage of extracting precise specifications of software from code. We provide an evaluation based on a comprehensive dataset of examples, including industrial cases, and compare our results to those of other approaches and tools. Our specific contributions are: (1) Reverse-engineering source programs to detailed <i>semantic models</i> of software behaviour, to enable semantically-correct translations and reduce re-testing costs; (2) Program abstraction processes defined by precise and explicit rules, which can be edited and configured by users; (3) A set of reusable OCL library components appropriate for representing program semantics, and which can also be used for OCL specification of new applications; (4) A systematic procedure for building program abstractors based on language grammars and semantics.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-024-00419-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140008564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Software defect prediction: future directions and challenges 软件缺陷预测：未来方向与挑战

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-02-27 DOI: 10.1007/s10515-024-00424-1

Zhiqiang Li, Jingwen Niu, Xiao-Yuan Jing

Software defect prediction is one of the most popular research topics in software engineering. The objective of defect prediction is to identify defective instances prior to the occurrence of software defects, thus it aids in more effectively prioritizing software quality assurance efforts. In this article, we delve into various prospective research directions and potential challenges in the field of defect prediction. The aim of this article is to propose a range of defect prediction techniques and methodologies for the future. These ideas are intended to enhance the practicality, explainability, and actionability of the predictions of defect models.

软件缺陷预测是软件工程领域最热门的研究课题之一。缺陷预测的目的是在软件缺陷发生之前识别缺陷实例，从而更有效地确定软件质量保证工作的优先次序。本文将深入探讨缺陷预测领域的各种前瞻性研究方向和潜在挑战。本文旨在为未来提出一系列缺陷预测技术和方法。这些观点旨在提高缺陷模型预测的实用性、可解释性和可操作性。

引用次数: 0

ReBack: recommending backports in social coding environments ReBack：在社交编码环境中推荐回溯程序

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-02-23 DOI: 10.1007/s10515-024-00416-1

Debasish Chakroborti, Kevin A. Schneider, Chanchal K. Roy

Pull-based development is widely used in popular social coding environments like GitHub and GitLab for both internal and external contributions. When critical bug fixes or features are committed to the main branch of a project, it is often desirable to also port those changes to other stable branches. This process is referred to as backporting, and pull-requests in the process are known as backports. Backports are typically determined after extensive discussion with collaborators, and it may take many days to identify backports, which commonly results in tags and references to the original pull-requests (i.e., pull-requests for the main branch) being missed. To help software development teams better identify and manage backports, we propose ReBack (Recommending Backports), a tool based on a deep-learning model for automatically identifying backports from pull-requests and related reviews, discussions, metadata, and committed code. ReBack predicted backports with 90.98% precision and 91.81% recall from 80,000 pull-requests in 17 GitHub projects. Although the results are promising, more research is required to further support backporting, including research into automatically porting a pull-request to further reduce costs when managing software versions and branches.

摘要基于拉动的开发在 GitHub 和 GitLab 等流行的社交编码环境中被广泛用于内部和外部贡献。当重要的错误修复或功能提交到项目的主分支时，通常也希望将这些更改移植到其他稳定分支。这一过程被称为反向移植，而这一过程中的拉取请求则被称为反向移植。反向移植通常是在与合作者进行广泛讨论后确定的，可能需要很多天才能确定反向移植，这通常会导致原始拉取请求（即主分支的拉取请求）的标记和引用被遗漏。为了帮助软件开发团队更好地识别和管理回溯，我们提出了 ReBack（Recommending Backports，回溯推荐），这是一种基于深度学习模型的工具，可以自动从拉取请求和相关评论、讨论、元数据以及提交的代码中识别回溯。ReBack 从 17 个 GitHub 项目的 80,000 个 pull-requests 中预测出了 backports，准确率为 90.98%，召回率为 91.81%。虽然结果很有希望，但还需要更多的研究来进一步支持反向移植，包括研究自动移植拉取请求，以进一步降低管理软件版本和分支的成本。

{"title":"ReBack: recommending backports in social coding environments","authors":"Debasish Chakroborti, Kevin A. Schneider, Chanchal K. Roy","doi":"10.1007/s10515-024-00416-1","DOIUrl":"10.1007/s10515-024-00416-1","url":null,"abstract":"<div><p>Pull-based development is widely used in popular social coding environments like GitHub and GitLab for both internal and external contributions. When critical bug fixes or features are committed to the main branch of a project, it is often desirable to also port those changes to other stable branches. This process is referred to as backporting, and pull-requests in the process are known as backports. Backports are typically determined after extensive discussion with collaborators, and it may take many days to identify backports, which commonly results in tags and references to the original pull-requests (i.e., pull-requests for the main branch) being missed. To help software development teams better identify and manage backports, we propose <b>ReBack</b> (<b>Re</b>commending <b>Back</b>ports), a tool based on a deep-learning model for automatically identifying backports from pull-requests and related reviews, discussions, metadata, and committed code. ReBack predicted backports with 90.98% precision and 91.81% recall from 80,000 pull-requests in 17 GitHub projects. Although the results are promising, more research is required to further support backporting, including research into automatically porting a pull-request to further reduce costs when managing software versions and branches.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139953757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using data mining techniques to generate test cases from graph transformation systems specifications 使用数据挖掘技术从图形转换系统规范中生成测试用例

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-02-21 DOI: 10.1007/s10515-024-00417-0

Maryam Asgari Araghi, Vahid Rafe, Ferhat Khendek

Software testing plays a crucial role in enhancing software quality. A significant portion of the time and cost in software development is dedicated to testing. Automation, particularly in generating test cases, can greatly reduce the cost. Model-based testing aims at generating automatically test cases from models. Several model based approaches use model checking tools to automate test case generation. However, this technique faces challenges such as state space explosion and duplication of test cases. This paper introduces a novel solution based on data mining algorithms for systems specified using graph transformation systems. To overcome the aforementioned challenges, the proposed method wisely explores only a portion of the state space based on test objectives. The proposed method is implemented using the GROOVE tool set for model-checking graph transformation systems specifications. Empirical results on widely used case studies in service-oriented architecture as well as a comparison with related state-of-the-art techniques demonstrate the efficiency and superiority of the proposed approach in terms of coverage and test suite size.

软件测试在提高软件质量方面发挥着至关重要的作用。软件开发的大部分时间和成本都用于测试。自动化，尤其是生成测试用例的自动化，可以大大降低成本。基于模型的测试旨在根据模型自动生成测试用例。一些基于模型的方法使用模型检查工具来自动生成测试用例。然而，这种技术面临着状态空间爆炸和测试用例重复等挑战。本文介绍了一种基于数据挖掘算法的新型解决方案，适用于使用图转换系统指定的系统。为了克服上述挑战，所提出的方法根据测试目标只对状态空间的一部分进行明智的探索。提议的方法是利用 GROOVE 工具集实现的，用于对图转换系统规范进行模型检查。在面向服务架构中广泛使用的案例研究的实证结果以及与相关先进技术的比较都证明了所提方法在覆盖率和测试套件大小方面的效率和优越性。

{"title":"Using data mining techniques to generate test cases from graph transformation systems specifications","authors":"Maryam Asgari Araghi, Vahid Rafe, Ferhat Khendek","doi":"10.1007/s10515-024-00417-0","DOIUrl":"10.1007/s10515-024-00417-0","url":null,"abstract":"<div><p>Software testing plays a crucial role in enhancing software quality. A significant portion of the time and cost in software development is dedicated to testing. Automation, particularly in generating test cases, can greatly reduce the cost. Model-based testing aims at generating automatically test cases from models. Several model based approaches use model checking tools to automate test case generation. However, this technique faces challenges such as state space explosion and duplication of test cases. This paper introduces a novel solution based on data mining algorithms for systems specified using graph transformation systems. To overcome the aforementioned challenges, the proposed method wisely explores only a portion of the state space based on test objectives. The proposed method is implemented using the GROOVE tool set for model-checking graph transformation systems specifications. Empirical results on widely used case studies in service-oriented architecture as well as a comparison with related state-of-the-art techniques demonstrate the efficiency and superiority of the proposed approach in terms of coverage and test suite size.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139923665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging privacy profiles to empower users in the digital society 利用隐私档案增强用户在数字社会中的能力

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-02-18 DOI: 10.1007/s10515-024-00415-2

Davide Di Ruscio, Paola Inverardi, Patrizio Migliarini, Phuong T. Nguyen

Protecting privacy and ethics of citizens is among the core concerns raised by an increasingly digital society. Profiling users is common practice for software applications triggering the need for users, also enforced by laws, to manage privacy settings properly. Users need to properly manage these settings to protect personally identifiable information and express personal ethical preferences. This has shown to be very difficult for several concurrent reasons. However, profiling technologies can also empower users in their interaction with the digital world by reflecting personal ethical preferences and allowing for automatizing/assisting users in privacy settings. In this way, if properly reflecting users’ preferences, privacy profiling can become a key enabler for a trustworthy digital society. We focus on characterizing/collecting users’ privacy preferences and contribute a step in this direction through an empirical study on an existing dataset collected from the fitness domain. We aim to understand which set of questions is more appropriate to differentiate users according to their privacy preferences. The results reveal that a compact set of semantic-driven questions (about domain-independent privacy preferences) helps distinguish users better than a complex domain-dependent one. Based on the outcome, we implement a recommender system to provide users with suitable recommendations related to privacy choices. We then show that the proposed recommender system provides relevant settings to users, obtaining high accuracy.

保护公民的隐私和道德是日益数字化的社会所关注的核心问题之一。对用户进行分析是软件应用程序的常见做法，这就要求用户正确管理隐私设置，同时法律也强制要求用户这样做。用户需要正确管理这些设置，以保护个人身份信息和表达个人道德偏好。由于一些并存的原因，这一点已被证明是非常困难的。然而，通过反映个人道德偏好并允许自动/协助用户进行隐私设置，特征分析技术也能增强用户与数字世界互动的能力。这样，如果能正确反映用户的偏好，隐私分析就能成为建立一个值得信赖的数字社会的关键因素。我们的重点是描述/收集用户的隐私偏好，并通过对健身领域收集的现有数据集进行实证研究，朝这个方向迈出了一步。我们旨在了解哪组问题更适合根据用户的隐私偏好来区分他们。研究结果表明，与复杂的与领域相关的问题相比，语义驱动型问题集（与领域无关的隐私偏好）更有助于区分用户。在此基础上，我们实施了一个推荐系统，为用户提供与隐私选择相关的合适推荐。随后，我们展示了所提出的推荐系统为用户提供的相关设置，并获得了较高的准确性。

{"title":"Leveraging privacy profiles to empower users in the digital society","authors":"Davide Di Ruscio, Paola Inverardi, Patrizio Migliarini, Phuong T. Nguyen","doi":"10.1007/s10515-024-00415-2","DOIUrl":"10.1007/s10515-024-00415-2","url":null,"abstract":"<div><p>Protecting privacy and ethics of citizens is among the core concerns raised by an increasingly digital society. Profiling users is common practice for software applications triggering the need for users, also enforced by laws, to manage privacy settings properly. Users need to properly manage these settings to protect personally identifiable information and express personal ethical preferences. This has shown to be very difficult for several concurrent reasons. However, profiling technologies can also empower users in their interaction with the digital world by reflecting personal ethical preferences and allowing for automatizing/assisting users in privacy settings. In this way, if properly reflecting users’ preferences, privacy profiling can become a key enabler for a trustworthy digital society. We focus on characterizing/collecting users’ privacy preferences and contribute a step in this direction through an empirical study on an existing dataset collected from the fitness domain. We aim to understand which set of questions is more appropriate to differentiate users according to their privacy preferences. The results reveal that a compact set of semantic-driven questions (about domain-independent privacy preferences) helps distinguish users better than a complex domain-dependent one. Based on the outcome, we implement a recommender system to provide users with suitable recommendations related to privacy choices. We then show that the proposed recommender system provides relevant settings to users, obtaining high accuracy.\u0000</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-024-00415-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139923666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An extensive study of the effects of different deep learning models on code vulnerability detection in Python code 不同深度学习模型对 Python 代码漏洞检测效果的广泛研究

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-01-31 DOI: 10.1007/s10515-024-00413-4

Rongcun Wang, Senlei Xu, Xingyu Ji, Yuan Tian, Lina Gong, Ke Wang

Deep learning has achieved great progress in automated code vulnerability detection. Several code vulnerability detection approaches based on deep learning have been proposed. However, few studies empirically studied the impacts of different deep learning models on code vulnerability detection in Python. For this reason, we strive to cover many more code representation learning models and classification models for vulnerability detection. We design and conduct an empirical study for evaluating the effects of the eighteen deep learning architectures derived from combinations of three representation learning models, i.e., Word2Vec, fastText, and CodeBERT, and six classification models, i.e., random forest, XGBoost, Multi-Layer Perception (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gate Recurrent Unit (GRU) on code vulnerability detection in total. Additionally, two machine learning strategies i.e., the attention and bi-directional mechanisms are also empirically compared. The statistical significance and effect size analysis between different models are also conducted. In terms of precision, recall, and F-score, Word2Vec is better than Bidirectional Encoder Representations from Transformers CodeBERT and fastText. Likewise, long short-term memory (LSTM) and gated recurrent unit (GRU) are superior to other classification models we studied. The bi-directional LSTM and GRU with attention using Word2Vec are two optimal models for solving code vulnerability detection for Python code. Moreover, they have medium or large effect sizes on LSTM and GRU using only a single mechanism. Both the representation learning models and classification models have important influences on vulnerability detection in Python code. Likewise, the bi-directional and attention mechanisms can impact the performance of code vulnerability detection.

深度学习在自动代码漏洞检测方面取得了巨大进展。目前已经提出了几种基于深度学习的代码漏洞检测方法。然而，很少有研究对不同深度学习模型对 Python 代码漏洞检测的影响进行实证研究。为此，我们努力涵盖更多用于漏洞检测的代码表示学习模型和分类模型。我们设计并开展了一项实证研究，评估由三种表示学习模型（即 Word2Vec、fastText 和 CodeBERT）和六种分类模型（即随机森林、XGBoost、多层感知（MLP）、卷积神经网络（CNN）、长短期记忆（LSTM）、门递归单元（GRU））组合而成的十八种深度学习架构对代码漏洞检测的影响。此外，还对两种机器学习策略（即注意力机制和双向机制）进行了实证比较。此外，还对不同模型之间的统计显著性和效应大小进行了分析。就精确度、召回率和 F 分数而言，Word2Vec 优于来自转换器 CodeBERT 和 fastText 的双向编码器表示法。同样，长短期记忆（LSTM）和门控递归单元（GRU）也优于我们研究的其他分类模型。使用 Word2Vec 的双向 LSTM 和带有注意力的 GRU 是解决 Python 代码漏洞检测的两个最佳模型。此外，与只使用单一机制的 LSTM 和 GRU 相比，它们具有中等或较大的效应大小。表示学习模型和分类模型对 Python 代码的漏洞检测都有重要影响。同样，双向机制和注意力机制也会影响代码漏洞检测的性能。

{"title":"An extensive study of the effects of different deep learning models on code vulnerability detection in Python code","authors":"Rongcun Wang, Senlei Xu, Xingyu Ji, Yuan Tian, Lina Gong, Ke Wang","doi":"10.1007/s10515-024-00413-4","DOIUrl":"10.1007/s10515-024-00413-4","url":null,"abstract":"<div><p>Deep learning has achieved great progress in automated code vulnerability detection. Several code vulnerability detection approaches based on deep learning have been proposed. However, few studies empirically studied the impacts of different deep learning models on code vulnerability detection in Python. For this reason, we strive to cover many more code representation learning models and classification models for vulnerability detection. We design and conduct an empirical study for evaluating the effects of the eighteen deep learning architectures derived from combinations of three representation learning models, i.e., Word2Vec, fastText, and CodeBERT, and six classification models, i.e., random forest, XGBoost, Multi-Layer Perception (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gate Recurrent Unit (GRU) on code vulnerability detection in total. Additionally, two machine learning strategies i.e., the attention and bi-directional mechanisms are also empirically compared. The statistical significance and effect size analysis between different models are also conducted. In terms of <i>precision</i>, <i>recall</i>, and <i>F</i>-<i>score</i>, Word2Vec is better than Bidirectional Encoder Representations from Transformers CodeBERT and fastText. Likewise, long short-term memory (LSTM) and gated recurrent unit (GRU) are superior to other classification models we studied. The bi-directional LSTM and GRU with attention using Word2Vec are two optimal models for solving code vulnerability detection for Python code. Moreover, they have medium or large effect sizes on LSTM and GRU using only a single mechanism. Both the representation learning models and classification models have important influences on vulnerability detection in Python code. Likewise, the bi-directional and attention mechanisms can impact the performance of code vulnerability detection.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139657735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Coevolutionary scheduling of dynamic software project considering the new skill learning 考虑新技能学习的动态软件项目协同进化调度

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-01-19 DOI: 10.1007/s10515-023-00411-y

Xiaoning Shen, Chengbin Yao, Liyan Song, Jiyong Xu, Mingjian Mao

In the process of software project development, completing tasks may require new skills that employees have not yet mastered due to factors such as requirement changes. However, existing studies on software project scheduling usually overlook such new skill demands. This paper designs the learning mechanism targeting the treatment of new skills for project employees, including how to select appropriate employees to learn new skills, the growth curves of new skill proficiencies and the adaptive dedication changes for the selected employees. Three common dynamic events are considered to establish a mathematical model for the dynamic software project scheduling problem considering the new skill learning. To solve the model, a multi-population coevolutionary algorithm-based predictive-reactive scheduling method is proposed in this paper. Three novel strategies are incorporated, which include a response mechanism to environmental changes, a population grouping strategy based on dual indicators, and a dynamic allocation of subpopulation size according to the variation trend of contribution. Systematic experimental results based on ten synthetic instances and three real-world instances show that when dynamic events occur, the proposed algorithm can quickly reschedule the tasks with a better duration, cost and stability compared with six state-of-the-art algorithms, helping project manager make a more informed decision.

在软件项目开发过程中，由于需求变化等因素，完成任务可能需要员工尚未掌握的新技能。然而，现有的软件项目调度研究通常会忽略这种新技能需求。本文设计了针对项目员工新技能处理的学习机制，包括如何选择合适的员工学习新技能、新技能熟练程度的增长曲线以及所选员工的适应性奉献变化。考虑了三种常见的动态事件，建立了考虑新技能学习的动态软件项目调度问题数学模型。为了解决该模型，本文提出了一种基于多群体协同进化算法的预测-反应调度方法。其中包括对环境变化的响应机制、基于双指标的种群分组策略以及根据贡献率变化趋势动态分配子种群规模的三种新策略。基于 10 个合成实例和 3 个实际实例的系统实验结果表明，当动态事件发生时，与 6 种最先进的算法相比，本文提出的算法可以快速重新安排任务，且工期、成本和稳定性都更好，从而帮助项目经理做出更明智的决策。

{"title":"Coevolutionary scheduling of dynamic software project considering the new skill learning","authors":"Xiaoning Shen, Chengbin Yao, Liyan Song, Jiyong Xu, Mingjian Mao","doi":"10.1007/s10515-023-00411-y","DOIUrl":"10.1007/s10515-023-00411-y","url":null,"abstract":"<div><p>In the process of software project development, completing tasks may require new skills that employees have not yet mastered due to factors such as requirement changes. However, existing studies on software project scheduling usually overlook such new skill demands. This paper designs the learning mechanism targeting the treatment of new skills for project employees, including how to select appropriate employees to learn new skills, the growth curves of new skill proficiencies and the adaptive dedication changes for the selected employees. Three common dynamic events are considered to establish a mathematical model for the dynamic software project scheduling problem considering the new skill learning. To solve the model, a multi-population coevolutionary algorithm-based predictive-reactive scheduling method is proposed in this paper. Three novel strategies are incorporated, which include a response mechanism to environmental changes, a population grouping strategy based on dual indicators, and a dynamic allocation of subpopulation size according to the variation trend of contribution. Systematic experimental results based on ten synthetic instances and three real-world instances show that when dynamic events occur, the proposed algorithm can quickly reschedule the tasks with a better duration, cost and stability compared with six state-of-the-art algorithms, helping project manager make a more informed decision.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139509199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Can AI serve as a substitute for human subjects in software engineering research? 在软件工程研究中，人工智能能否替代人类研究对象？

IF 3.4 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering

Pub Date : 2024-01-11 DOI: 10.1007/s10515-023-00409-6

Marco Gerosa, Bianca Trinkenreich, Igor Steinmacher, Anita Sarma

Research within sociotechnical domains, such as software engineering, fundamentally requires the human perspective. Nevertheless, traditional qualitative data collection methods suffer from difficulties in participant recruitment, scaling, and labor intensity. This vision paper proposes a novel approach to qualitative data collection in software engineering research by harnessing the capabilities of artificial intelligence (AI), especially large language models (LLMs) like ChatGPT and multimodal foundation models. We explore the potential of AI-generated synthetic text as an alternative source of qualitative data, discussing how LLMs can replicate human responses and behaviors in research settings. We discuss AI applications in emulating humans in interviews, focus groups, surveys, observational studies, and user evaluations. We discuss open problems and research opportunities to implement this vision. In the future, an integrated approach where both AI and human-generated data coexist will likely yield the most effective outcomes.

社会技术领域（如软件工程）的研究从根本上讲需要人类视角。然而，传统的定性数据收集方法在参与者招募、规模化和劳动强度等方面都存在困难。本愿景论文提出了一种在软件工程研究中收集定性数据的新方法，即利用人工智能（AI）的能力，尤其是大型语言模型（LLM），如 ChatGPT 和多模态基础模型。我们探讨了人工智能生成的合成文本作为定性数据替代来源的潜力，讨论了 LLM 如何在研究环境中复制人类的反应和行为。我们讨论了人工智能在访谈、焦点小组、调查、观察研究和用户评估中模拟人类的应用。我们还讨论了实现这一愿景所面临的问题和研究机会。未来，人工智能与人类生成的数据共存的综合方法将可能产生最有效的成果。

引用次数: 0