首页 > 最新文献

Automated Software Engineering最新文献

英文 中文
Distilled GPT for source code summarization 用于源代码汇总的精选 GPT
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-03-01 DOI: 10.1007/s10515-024-00421-4
Chia-Yi Su, Collin McMillan

A code summary is a brief natural language description of source code. Summaries are usually only a single sentence long, and yet form the backbone of developer documentation. A short descriptions such as “changes all visible polygons to the color blue” can give a programmer a high-level idea of what code does without the effort of reading the code itself. Recently, products based on Large Language Models such as ChatGPT have demonstrated a strong ability to write these descriptions automatically. However, to use these tools, programmers must send their code to untrusted third parties for processing (e.g., via an API call). This loss of custody is not acceptable to many organizations. In this paper, we present an alternative: we train an open source model using sample output generated by GPT(-)3.5 in a process related to knowledge distillation. Our model is small enough (350 m parameters) to be run on a single 16gb GPU, yet we show in our evaluation that it is large enough to mimic GPT(-)3.5 on this task.

代码摘要是对源代码的简短自然语言描述。摘要通常只有一句话的长度,但却是开发人员文档的支柱。简短的描述,如 "将所有可见多边形变为蓝色",可以让程序员对代码的作用有一个高层次的概念,而无需费力阅读代码本身。最近,基于大型语言模型的产品(如 ChatGPT)已经展示了自动编写这些描述的强大能力。但是,要使用这些工具,程序员必须将他们的代码发送给不受信任的第三方进行处理(例如,通过 API 调用)。对于许多组织来说,这种监护权的丧失是不可接受的。在本文中,我们提出了一个替代方案:我们使用 GPT(-)3.5 在知识提炼相关过程中生成的样本输出来训练一个开源模型。我们的模型足够小(350 m 参数),可以在单个 16gb GPU 上运行,但我们在评估中表明,它足够大,可以在这项任务上模仿 GPT(-)3.5 。
{"title":"Distilled GPT for source code summarization","authors":"Chia-Yi Su,&nbsp;Collin McMillan","doi":"10.1007/s10515-024-00421-4","DOIUrl":"10.1007/s10515-024-00421-4","url":null,"abstract":"<div><p>A code summary is a brief natural language description of source code. Summaries are usually only a single sentence long, and yet form the backbone of developer documentation. A short descriptions such as “changes all visible polygons to the color blue” can give a programmer a high-level idea of what code does without the effort of reading the code itself. Recently, products based on Large Language Models such as ChatGPT have demonstrated a strong ability to write these descriptions automatically. However, to use these tools, programmers must send their code to untrusted third parties for processing (e.g., via an API call). This loss of custody is not acceptable to many organizations. In this paper, we present an alternative: we train an open source model using sample output generated by GPT<span>(-)</span>3.5 in a process related to knowledge distillation. Our model is small enough (350 m parameters) to be run on a single 16gb GPU, yet we show in our evaluation that it is large enough to mimic GPT<span>(-)</span>3.5 on this task.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140020008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GenerativeGI: creating generative art with genetic improvement GenerativeGI:利用基因改良创造生成艺术
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-03-01 DOI: 10.1007/s10515-024-00414-3
Erik M. Fredericks, Jared M. Moore, Abigail C. Diller

Generative art is a domain in which artistic output is created via a procedure or heuristic that may result in digital and/or physical results. A generative artist will typically act as a domain expert by specifying the algorithms that will form the basis of the piece as well as defining and refining parameters that can impact the results, however such efforts can require a significant amount of time to generate the final output. This article presents and extends GenerativeGI, an evolutionary computation-based technique for creating generative art by automatically searching through combinations of artistic techniques and their accompanying parameters to produce outputs desirable by the designer. Generative art techniques and their respective parameters are encoded within a grammar that is then the target for genetic improvement. This grammar-based approach, combined with a many-objective evolutionary algorithm, enables the designer to efficiently search through a massive number of possible outputs that reflect their aesthetic preferences. We included a total of 15 generative art techniques and performed three separate empirical evaluations, each of which targets different aesthetic preferences and varying aspects of the search heuristic. Experimental results suggest that GenerativeGI can produce outputs that are significantly more novel than those generated by random or single objective search. Furthermore, GenerativeGI produces individuals with a larger number of relevant techniques used to generate their overall composition.

生成艺术是一个通过程序或启发式方法创造艺术成果的领域,可能会产生数字和/或物理结果。生成艺术家通常会充当领域专家,指定构成作品基础的算法,并定义和完善可能会影响结果的参数,但这些工作可能需要大量时间才能生成最终输出。本文介绍并扩展了 GenerativeGI,这是一种基于进化计算的生成艺术创作技术,通过自动搜索艺术技术及其附带参数的组合,生成设计者所需的输出结果。生成艺术技术及其各自的参数被编码在一个语法中,然后成为遗传改进的目标。这种基于语法的方法与多目标进化算法相结合,能让设计者在大量可能的输出结果中有效地进行搜索,从而反映出他们的审美偏好。我们共采用了 15 种生成艺术技术,并分别进行了三次实证评估,每次评估都针对不同的审美偏好和搜索启发式的不同方面。实验结果表明,与随机搜索或单一目标搜索相比,生成式图形艺术能够生成更加新颖的输出结果。此外,GenerativeGI 生成的个体在生成其整体构成时使用了更多的相关技术。
{"title":"GenerativeGI: creating generative art with genetic improvement","authors":"Erik M. Fredericks,&nbsp;Jared M. Moore,&nbsp;Abigail C. Diller","doi":"10.1007/s10515-024-00414-3","DOIUrl":"10.1007/s10515-024-00414-3","url":null,"abstract":"<div><p>Generative art is a domain in which artistic output is created via a procedure or heuristic that may result in digital and/or physical results. A generative artist will typically act as a domain expert by specifying the algorithms that will form the basis of the piece as well as defining and refining parameters that can impact the results, however such efforts can require a significant amount of time to generate the final output. This article presents and extends <i>GenerativeGI</i>, an evolutionary computation-based technique for creating generative art by automatically searching through combinations of artistic techniques and their accompanying parameters to produce outputs desirable by the designer. Generative art techniques and their respective parameters are encoded within a grammar that is then the target for genetic improvement. This grammar-based approach, combined with a many-objective evolutionary algorithm, enables the designer to efficiently search through a massive number of possible outputs that reflect their aesthetic preferences. We included a total of 15 generative art techniques and performed three separate empirical evaluations, each of which targets different aesthetic preferences and varying aspects of the search heuristic. Experimental results suggest that <i>GenerativeGI</i> can produce outputs that are significantly more novel than those generated by random or single objective search. Furthermore, <i>GenerativeGI</i> produces individuals with a larger number of relevant techniques used to generate their overall composition.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140020041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DL4SC: a novel deep learning-based vulnerability detection framework for smart contracts DL4SC:基于深度学习的新型智能合约漏洞检测框架
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-03-01 DOI: 10.1007/s10515-024-00418-z
Yang Liu, Chao Wang, Yan Ma

Smart contract is a new paradigm for the decentralized software system, which plays an important and key role in Blockchain-based application. The vulnerabilities in smart contracts are unacceptable, and some of which have caused significant economic losses. The machine learning, especially deep learning, is a very promising and potential approach to vulnerability detecting for smart contracts. At present, deep learning-based vulnerability detection methods have low accuracy, time-consuming, and too small application range. For dealing with these, we propose a novel deep learning-based vulnerability detection framework for smart contracts at opcode level, named as DL4SC. It orthogonally combines the Transformer encoder and CNN (convolutional neural networks) to detect vulnerabilities of smart contracts for the first time, and firstly exploit SSA (sparrow search algorithm) to automatically search model hyperparameters for vulnerability detection. We implement the framework DL4SC on deep learning platform Pytorch with Python, and compare it with existing works on the three public datasets and one dataset we collect. The experiment results show that DL4SC can accurately detect vulnerabilities of smart contracts, and performs better than state-of-the-art works for detecting vulnerabilities in smart contracts. The accuracy and F1-score of DL4SC are 95.29% and 95.68%, respectively.

智能合约是去中心化软件系统的一种新范式,在基于区块链的应用中发挥着重要而关键的作用。智能合约中的漏洞是不可接受的,其中一些漏洞已经造成了重大经济损失。机器学习,尤其是深度学习,是一种非常有前景和潜力的智能合约漏洞检测方法。目前,基于深度学习的漏洞检测方法存在准确率低、耗时长、应用范围太小等问题。针对这些问题,我们提出了一种新颖的基于深度学习的智能合约操作码级漏洞检测框架,命名为 DL4SC。它首次将 Transformer 编码器和 CNN(卷积神经网络)正交结合起来检测智能合约的漏洞,并首次利用 SSA(麻雀搜索算法)自动搜索模型超参数进行漏洞检测。我们在深度学习平台 Pytorch 上用 Python 实现了框架 DL4SC,并在三个公开数据集和我们收集的一个数据集上与现有作品进行了比较。实验结果表明,DL4SC 可以准确检测智能合约的漏洞,在检测智能合约漏洞方面的表现优于最先进的作品。DL4SC 的准确率和 F1 分数分别为 95.29% 和 95.68%。
{"title":"DL4SC: a novel deep learning-based vulnerability detection framework for smart contracts","authors":"Yang Liu,&nbsp;Chao Wang,&nbsp;Yan Ma","doi":"10.1007/s10515-024-00418-z","DOIUrl":"10.1007/s10515-024-00418-z","url":null,"abstract":"<div><p>Smart contract is a new paradigm for the decentralized software system, which plays an important and key role in Blockchain-based application. The vulnerabilities in smart contracts are unacceptable, and some of which have caused significant economic losses. The machine learning, especially deep learning, is a very promising and potential approach to vulnerability detecting for smart contracts. At present, deep learning-based vulnerability detection methods have low accuracy, time-consuming, and too small application range. For dealing with these, we propose a novel deep learning-based vulnerability detection framework for smart contracts at opcode level, named as DL4SC. It orthogonally combines the Transformer encoder and CNN (convolutional neural networks) to detect vulnerabilities of smart contracts for the first time, and firstly exploit SSA (sparrow search algorithm) to automatically search model hyperparameters for vulnerability detection. We implement the framework DL4SC on deep learning platform Pytorch with Python, and compare it with existing works on the three public datasets and one dataset we collect. The experiment results show that DL4SC can accurately detect vulnerabilities of smart contracts, and performs better than state-of-the-art works for detecting vulnerabilities in smart contracts. The accuracy and F1-score of DL4SC are 95.29% and 95.68%, respectively.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140020010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sound analysis and migration of data from Ethereum smart contracts 对以太坊智能合约中的数据进行合理分析和迁移
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-29 DOI: 10.1007/s10515-024-00422-3
Maha Ayub, Muhammad Waiz Khan, Muhammmad Umar Janjua

With the addition of multiple blockchain platforms in the ecosystem, the Dapp owners need to migrate their smart contracts from one platform to another to remain competitive, cost-effective, and secure. A smart contract is a piece of code that contains logic and data. To migrate a smart contract, whether it’s on the same blockchain platform or a different one, we need both its source code that represents the logic and data that indicate the state of the contract. The source code can be easily set up, but to complete the migration, we have to extract the current state of the contract. In this paper, we have developed an advanced state extraction technique that uses static analysis to analyze the smart contract’s call graph and events, and extracts the entire storage state from the storage trie, along with the proper associations across function calls, enabling users to visualize, manage, and transform the state as desired for migration. The soundness of the extracted state was confirmed using the method of abstract interpretation. Further, the migration adapter is designed to transform the extracted state into slot-value pairs and migrate it to the target blockchain. Using our new approach, we were able to completely analyze 14% more smart contracts with the extraction of 53% more data by analyzing function calls and event logs from 67,993 contracts and also migrated some contracts to the multiple popular EVM-compatible blockchains.

随着生态系统中增加了多个区块链平台,Dapp 所有者需要将其智能合约从一个平台迁移到另一个平台,以保持竞争力、成本效益和安全性。智能合约是一段包含逻辑和数据的代码。要迁移智能合约,无论它是在同一个区块链平台上还是在另一个平台上,我们都需要它的源代码(表示合约状态的逻辑和数据)。源代码很容易设置,但要完成迁移,我们必须提取合约的当前状态。在本文中,我们开发了一种先进的状态提取技术,它利用静态分析来分析智能合约的调用图和事件,并从存储三元组中提取出整个存储状态,以及跨函数调用的适当关联,使用户能够可视化地管理和转换所需的状态,从而实现迁移。使用抽象解释方法确认了提取状态的合理性。此外,迁移适配器旨在将提取的状态转换为槽值对,并将其迁移到目标区块链。使用我们的新方法,通过分析 67,993 个合约的函数调用和事件日志,我们能够完全分析多 14% 的智能合约,提取多 53% 的数据,还将一些合约迁移到了多个流行的 EVM 兼容区块链上。
{"title":"Sound analysis and migration of data from Ethereum smart contracts","authors":"Maha Ayub,&nbsp;Muhammad Waiz Khan,&nbsp;Muhammmad Umar Janjua","doi":"10.1007/s10515-024-00422-3","DOIUrl":"10.1007/s10515-024-00422-3","url":null,"abstract":"<div><p>With the addition of multiple blockchain platforms in the ecosystem, the Dapp owners need to migrate their smart contracts from one platform to another to remain competitive, cost-effective, and secure. A smart contract is a piece of code that contains logic and data. To migrate a smart contract, whether it’s on the same blockchain platform or a different one, we need both its source code that represents the logic and data that indicate the state of the contract. The source code can be easily set up, but to complete the migration, we have to extract the current state of the contract. In this paper, we have developed an advanced state extraction technique that uses static analysis to analyze the smart contract’s call graph and events, and extracts the entire storage state from the storage trie, along with the proper associations across function calls, enabling users to visualize, manage, and transform the state as desired for migration. The soundness of the extracted state was confirmed using the method of abstract interpretation. Further, the migration adapter is designed to transform the extracted state into slot-value pairs and migrate it to the target blockchain. Using our new approach, we were able to completely analyze 14% more smart contracts with the extraction of 53% more data by analyzing function calls and event logs from 67,993 contracts and also migrated some contracts to the multiple popular EVM-compatible blockchains.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140008624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using model-driven engineering to automate software language translation 利用模型驱动工程实现软件语言翻译自动化
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-28 DOI: 10.1007/s10515-024-00419-y
Kevin Lano, Hanan Siala

The porting or translation of software applications from one programming language to another is a common requirement of organisations that utilise software, and the increasing number and diversity of programming languages makes this capability as relevant today as in previous decades. Several approaches have been used to address this challenge, including machine learning and the manual definition of direct language-to-language translation rules, however the accuracy of these approaches remains unsatisfactory. In this paper we describe a new approach to program translation using model-driven engineering techniques: reverse-engineering source programs into specifications in the UML and OCL formalisms, and then forward-engineering the specifications to the required target language. This approach can provide assurance of semantic preservation, and additionally has the advantage of extracting precise specifications of software from code. We provide an evaluation based on a comprehensive dataset of examples, including industrial cases, and compare our results to those of other approaches and tools. Our specific contributions are: (1) Reverse-engineering source programs to detailed semantic models of software behaviour, to enable semantically-correct translations and reduce re-testing costs; (2) Program abstraction processes defined by precise and explicit rules, which can be edited and configured by users; (3) A set of reusable OCL library components appropriate for representing program semantics, and which can also be used for OCL specification of new applications; (4) A systematic procedure for building program abstractors based on language grammars and semantics.

将软件应用程序从一种编程语言移植或翻译成另一种编程语言,是使用软件的机构的共同要求,而编程语言的数量和多样性不断增加,使得这种能力在今天与过去几十年一样重要。为应对这一挑战,人们采用了多种方法,包括机器学习和手动定义语言间的直接翻译规则,但这些方法的准确性仍不能令人满意。在本文中,我们介绍了一种使用模型驱动工程技术进行程序翻译的新方法:将源程序逆向工程转换为 UML 和 OCL 形式的规范,然后将规范正向工程转换为所需的目标语言。这种方法可以确保语义的保留,而且还具有从代码中提取精确软件规范的优势。我们基于包括工业案例在内的综合示例数据集进行了评估,并将我们的结果与其他方法和工具的结果进行了比较。我们的具体贡献如下(1) 将源程序逆向工程化为详细的软件行为语义模型,从而实现语义正确的翻译并降低重新测试的成本;(2) 通过精确而明确的规则定义程序抽象过程,用户可对其进行编辑和配置;(3) 一套可重复使用的 OCL 库组件,适用于表示程序语义,也可用于新应用程序的 OCL 规范;(4) 基于语言语法和语义构建程序抽象器的系统程序。
{"title":"Using model-driven engineering to automate software language translation","authors":"Kevin Lano,&nbsp;Hanan Siala","doi":"10.1007/s10515-024-00419-y","DOIUrl":"10.1007/s10515-024-00419-y","url":null,"abstract":"<div><p>The porting or translation of software applications from one programming language to another is a common requirement of organisations that utilise software, and the increasing number and diversity of programming languages makes this capability as relevant today as in previous decades. Several approaches have been used to address this challenge, including machine learning and the manual definition of direct language-to-language translation rules, however the accuracy of these approaches remains unsatisfactory. In this paper we describe a new approach to program translation using model-driven engineering techniques: reverse-engineering source programs into specifications in the UML and OCL formalisms, and then forward-engineering the specifications to the required target language. This approach can provide assurance of semantic preservation, and additionally has the advantage of extracting precise specifications of software from code. We provide an evaluation based on a comprehensive dataset of examples, including industrial cases, and compare our results to those of other approaches and tools. Our specific contributions are: (1) Reverse-engineering source programs to detailed <i>semantic models</i> of software behaviour, to enable semantically-correct translations and reduce re-testing costs; (2) Program abstraction processes defined by precise and explicit rules, which can be edited and configured by users; (3) A set of reusable OCL library components appropriate for representing program semantics, and which can also be used for OCL specification of new applications; (4) A systematic procedure for building program abstractors based on language grammars and semantics.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-024-00419-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140008564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Software defect prediction: future directions and challenges 软件缺陷预测:未来方向与挑战
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-27 DOI: 10.1007/s10515-024-00424-1
Zhiqiang Li, Jingwen Niu, Xiao-Yuan Jing

Software defect prediction is one of the most popular research topics in software engineering. The objective of defect prediction is to identify defective instances prior to the occurrence of software defects, thus it aids in more effectively prioritizing software quality assurance efforts. In this article, we delve into various prospective research directions and potential challenges in the field of defect prediction. The aim of this article is to propose a range of defect prediction techniques and methodologies for the future. These ideas are intended to enhance the practicality, explainability, and actionability of the predictions of defect models.

软件缺陷预测是软件工程领域最热门的研究课题之一。缺陷预测的目的是在软件缺陷发生之前识别缺陷实例,从而更有效地确定软件质量保证工作的优先次序。本文将深入探讨缺陷预测领域的各种前瞻性研究方向和潜在挑战。本文旨在为未来提出一系列缺陷预测技术和方法。这些观点旨在提高缺陷模型预测的实用性、可解释性和可操作性。
{"title":"Software defect prediction: future directions and challenges","authors":"Zhiqiang Li,&nbsp;Jingwen Niu,&nbsp;Xiao-Yuan Jing","doi":"10.1007/s10515-024-00424-1","DOIUrl":"10.1007/s10515-024-00424-1","url":null,"abstract":"<div><p>Software defect prediction is one of the most popular research topics in software engineering. The objective of defect prediction is to identify defective instances prior to the occurrence of software defects, thus it aids in more effectively prioritizing software quality assurance efforts. In this article, we delve into various prospective research directions and potential challenges in the field of defect prediction. The aim of this article is to propose a range of defect prediction techniques and methodologies for the future. These ideas are intended to enhance the practicality, explainability, and actionability of the predictions of defect models.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139988033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ReBack: recommending backports in social coding environments ReBack:在社交编码环境中推荐回溯程序
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-23 DOI: 10.1007/s10515-024-00416-1
Debasish Chakroborti, Kevin A. Schneider, Chanchal K. Roy

Pull-based development is widely used in popular social coding environments like GitHub and GitLab for both internal and external contributions. When critical bug fixes or features are committed to the main branch of a project, it is often desirable to also port those changes to other stable branches. This process is referred to as backporting, and pull-requests in the process are known as backports. Backports are typically determined after extensive discussion with collaborators, and it may take many days to identify backports, which commonly results in tags and references to the original pull-requests (i.e., pull-requests for the main branch) being missed. To help software development teams better identify and manage backports, we propose ReBack (Recommending Backports), a tool based on a deep-learning model for automatically identifying backports from pull-requests and related reviews, discussions, metadata, and committed code. ReBack predicted backports with 90.98% precision and 91.81% recall from 80,000 pull-requests in 17 GitHub projects. Although the results are promising, more research is required to further support backporting, including research into automatically porting a pull-request to further reduce costs when managing software versions and branches.

摘要 基于拉动的开发在 GitHub 和 GitLab 等流行的社交编码环境中被广泛用于内部和外部贡献。当重要的错误修复或功能提交到项目的主分支时,通常也希望将这些更改移植到其他稳定分支。这一过程被称为反向移植,而这一过程中的拉取请求则被称为反向移植。反向移植通常是在与合作者进行广泛讨论后确定的,可能需要很多天才能确定反向移植,这通常会导致原始拉取请求(即主分支的拉取请求)的标记和引用被遗漏。为了帮助软件开发团队更好地识别和管理回溯,我们提出了 ReBack(Recommending Backports,回溯推荐),这是一种基于深度学习模型的工具,可以自动从拉取请求和相关评论、讨论、元数据以及提交的代码中识别回溯。ReBack 从 17 个 GitHub 项目的 80,000 个 pull-requests 中预测出了 backports,准确率为 90.98%,召回率为 91.81%。虽然结果很有希望,但还需要更多的研究来进一步支持反向移植,包括研究自动移植拉取请求,以进一步降低管理软件版本和分支的成本。
{"title":"ReBack: recommending backports in social coding environments","authors":"Debasish Chakroborti,&nbsp;Kevin A. Schneider,&nbsp;Chanchal K. Roy","doi":"10.1007/s10515-024-00416-1","DOIUrl":"10.1007/s10515-024-00416-1","url":null,"abstract":"<div><p>Pull-based development is widely used in popular social coding environments like GitHub and GitLab for both internal and external contributions. When critical bug fixes or features are committed to the main branch of a project, it is often desirable to also port those changes to other stable branches. This process is referred to as backporting, and pull-requests in the process are known as backports. Backports are typically determined after extensive discussion with collaborators, and it may take many days to identify backports, which commonly results in tags and references to the original pull-requests (i.e., pull-requests for the main branch) being missed. To help software development teams better identify and manage backports, we propose <b>ReBack</b> (<b>Re</b>commending <b>Back</b>ports), a tool based on a deep-learning model for automatically identifying backports from pull-requests and related reviews, discussions, metadata, and committed code. ReBack predicted backports with 90.98% precision and 91.81% recall from 80,000 pull-requests in 17 GitHub projects. Although the results are promising, more research is required to further support backporting, including research into automatically porting a pull-request to further reduce costs when managing software versions and branches.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139953757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using data mining techniques to generate test cases from graph transformation systems specifications 使用数据挖掘技术从图形转换系统规范中生成测试用例
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-21 DOI: 10.1007/s10515-024-00417-0
Maryam Asgari Araghi, Vahid Rafe, Ferhat Khendek

Software testing plays a crucial role in enhancing software quality. A significant portion of the time and cost in software development is dedicated to testing. Automation, particularly in generating test cases, can greatly reduce the cost. Model-based testing aims at generating automatically test cases from models. Several model based approaches use model checking tools to automate test case generation. However, this technique faces challenges such as state space explosion and duplication of test cases. This paper introduces a novel solution based on data mining algorithms for systems specified using graph transformation systems. To overcome the aforementioned challenges, the proposed method wisely explores only a portion of the state space based on test objectives. The proposed method is implemented using the GROOVE tool set for model-checking graph transformation systems specifications. Empirical results on widely used case studies in service-oriented architecture as well as a comparison with related state-of-the-art techniques demonstrate the efficiency and superiority of the proposed approach in terms of coverage and test suite size.

软件测试在提高软件质量方面发挥着至关重要的作用。软件开发的大部分时间和成本都用于测试。自动化,尤其是生成测试用例的自动化,可以大大降低成本。基于模型的测试旨在根据模型自动生成测试用例。一些基于模型的方法使用模型检查工具来自动生成测试用例。然而,这种技术面临着状态空间爆炸和测试用例重复等挑战。本文介绍了一种基于数据挖掘算法的新型解决方案,适用于使用图转换系统指定的系统。为了克服上述挑战,所提出的方法根据测试目标只对状态空间的一部分进行明智的探索。提议的方法是利用 GROOVE 工具集实现的,用于对图转换系统规范进行模型检查。在面向服务架构中广泛使用的案例研究的实证结果以及与相关先进技术的比较都证明了所提方法在覆盖率和测试套件大小方面的效率和优越性。
{"title":"Using data mining techniques to generate test cases from graph transformation systems specifications","authors":"Maryam Asgari Araghi,&nbsp;Vahid Rafe,&nbsp;Ferhat Khendek","doi":"10.1007/s10515-024-00417-0","DOIUrl":"10.1007/s10515-024-00417-0","url":null,"abstract":"<div><p>Software testing plays a crucial role in enhancing software quality. A significant portion of the time and cost in software development is dedicated to testing. Automation, particularly in generating test cases, can greatly reduce the cost. Model-based testing aims at generating automatically test cases from models. Several model based approaches use model checking tools to automate test case generation. However, this technique faces challenges such as state space explosion and duplication of test cases. This paper introduces a novel solution based on data mining algorithms for systems specified using graph transformation systems. To overcome the aforementioned challenges, the proposed method wisely explores only a portion of the state space based on test objectives. The proposed method is implemented using the GROOVE tool set for model-checking graph transformation systems specifications. Empirical results on widely used case studies in service-oriented architecture as well as a comparison with related state-of-the-art techniques demonstrate the efficiency and superiority of the proposed approach in terms of coverage and test suite size.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139923665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging privacy profiles to empower users in the digital society 利用隐私档案增强用户在数字社会中的能力
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-18 DOI: 10.1007/s10515-024-00415-2
Davide Di Ruscio, Paola Inverardi, Patrizio Migliarini, Phuong T. Nguyen

Protecting privacy and ethics of citizens is among the core concerns raised by an increasingly digital society. Profiling users is common practice for software applications triggering the need for users, also enforced by laws, to manage privacy settings properly. Users need to properly manage these settings to protect personally identifiable information and express personal ethical preferences. This has shown to be very difficult for several concurrent reasons. However, profiling technologies can also empower users in their interaction with the digital world by reflecting personal ethical preferences and allowing for automatizing/assisting users in privacy settings. In this way, if properly reflecting users’ preferences, privacy profiling can become a key enabler for a trustworthy digital society. We focus on characterizing/collecting users’ privacy preferences and contribute a step in this direction through an empirical study on an existing dataset collected from the fitness domain. We aim to understand which set of questions is more appropriate to differentiate users according to their privacy preferences. The results reveal that a compact set of semantic-driven questions (about domain-independent privacy preferences) helps distinguish users better than a complex domain-dependent one. Based on the outcome, we implement a recommender system to provide users with suitable recommendations related to privacy choices. We then show that the proposed recommender system provides relevant settings to users, obtaining high accuracy.

保护公民的隐私和道德是日益数字化的社会所关注的核心问题之一。对用户进行分析是软件应用程序的常见做法,这就要求用户正确管理隐私设置,同时法律也强制要求用户这样做。用户需要正确管理这些设置,以保护个人身份信息和表达个人道德偏好。由于一些并存的原因,这一点已被证明是非常困难的。然而,通过反映个人道德偏好并允许自动/协助用户进行隐私设置,特征分析技术也能增强用户与数字世界互动的能力。这样,如果能正确反映用户的偏好,隐私分析就能成为建立一个值得信赖的数字社会的关键因素。我们的重点是描述/收集用户的隐私偏好,并通过对健身领域收集的现有数据集进行实证研究,朝这个方向迈出了一步。我们旨在了解哪组问题更适合根据用户的隐私偏好来区分他们。研究结果表明,与复杂的与领域相关的问题相比,语义驱动型问题集(与领域无关的隐私偏好)更有助于区分用户。在此基础上,我们实施了一个推荐系统,为用户提供与隐私选择相关的合适推荐。随后,我们展示了所提出的推荐系统为用户提供的相关设置,并获得了较高的准确性。
{"title":"Leveraging privacy profiles to empower users in the digital society","authors":"Davide Di Ruscio,&nbsp;Paola Inverardi,&nbsp;Patrizio Migliarini,&nbsp;Phuong T. Nguyen","doi":"10.1007/s10515-024-00415-2","DOIUrl":"10.1007/s10515-024-00415-2","url":null,"abstract":"<div><p>Protecting privacy and ethics of citizens is among the core concerns raised by an increasingly digital society. Profiling users is common practice for software applications triggering the need for users, also enforced by laws, to manage privacy settings properly. Users need to properly manage these settings to protect personally identifiable information and express personal ethical preferences. This has shown to be very difficult for several concurrent reasons. However, profiling technologies can also empower users in their interaction with the digital world by reflecting personal ethical preferences and allowing for automatizing/assisting users in privacy settings. In this way, if properly reflecting users’ preferences, privacy profiling can become a key enabler for a trustworthy digital society. We focus on characterizing/collecting users’ privacy preferences and contribute a step in this direction through an empirical study on an existing dataset collected from the fitness domain. We aim to understand which set of questions is more appropriate to differentiate users according to their privacy preferences. The results reveal that a compact set of semantic-driven questions (about domain-independent privacy preferences) helps distinguish users better than a complex domain-dependent one. Based on the outcome, we implement a recommender system to provide users with suitable recommendations related to privacy choices. We then show that the proposed recommender system provides relevant settings to users, obtaining high accuracy.\u0000</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-024-00415-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139923666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An extensive study of the effects of different deep learning models on code vulnerability detection in Python code 不同深度学习模型对 Python 代码漏洞检测效果的广泛研究
IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-01-31 DOI: 10.1007/s10515-024-00413-4
Rongcun Wang, Senlei Xu, Xingyu Ji, Yuan Tian, Lina Gong, Ke Wang

Deep learning has achieved great progress in automated code vulnerability detection. Several code vulnerability detection approaches based on deep learning have been proposed. However, few studies empirically studied the impacts of different deep learning models on code vulnerability detection in Python. For this reason, we strive to cover many more code representation learning models and classification models for vulnerability detection. We design and conduct an empirical study for evaluating the effects of the eighteen deep learning architectures derived from combinations of three representation learning models, i.e., Word2Vec, fastText, and CodeBERT, and six classification models, i.e., random forest, XGBoost, Multi-Layer Perception (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gate Recurrent Unit (GRU) on code vulnerability detection in total. Additionally, two machine learning strategies i.e., the attention and bi-directional mechanisms are also empirically compared. The statistical significance and effect size analysis between different models are also conducted. In terms of precision, recall, and F-score, Word2Vec is better than Bidirectional Encoder Representations from Transformers CodeBERT and fastText. Likewise, long short-term memory (LSTM) and gated recurrent unit (GRU) are superior to other classification models we studied. The bi-directional LSTM and GRU with attention using Word2Vec are two optimal models for solving code vulnerability detection for Python code. Moreover, they have medium or large effect sizes on LSTM and GRU using only a single mechanism. Both the representation learning models and classification models have important influences on vulnerability detection in Python code. Likewise, the bi-directional and attention mechanisms can impact the performance of code vulnerability detection.

深度学习在自动代码漏洞检测方面取得了巨大进展。目前已经提出了几种基于深度学习的代码漏洞检测方法。然而,很少有研究对不同深度学习模型对 Python 代码漏洞检测的影响进行实证研究。为此,我们努力涵盖更多用于漏洞检测的代码表示学习模型和分类模型。我们设计并开展了一项实证研究,评估由三种表示学习模型(即 Word2Vec、fastText 和 CodeBERT)和六种分类模型(即随机森林、XGBoost、多层感知(MLP)、卷积神经网络(CNN)、长短期记忆(LSTM)、门递归单元(GRU))组合而成的十八种深度学习架构对代码漏洞检测的影响。此外,还对两种机器学习策略(即注意力机制和双向机制)进行了实证比较。此外,还对不同模型之间的统计显著性和效应大小进行了分析。就精确度、召回率和 F 分数而言,Word2Vec 优于来自转换器 CodeBERT 和 fastText 的双向编码器表示法。同样,长短期记忆(LSTM)和门控递归单元(GRU)也优于我们研究的其他分类模型。使用 Word2Vec 的双向 LSTM 和带有注意力的 GRU 是解决 Python 代码漏洞检测的两个最佳模型。此外,与只使用单一机制的 LSTM 和 GRU 相比,它们具有中等或较大的效应大小。表示学习模型和分类模型对 Python 代码的漏洞检测都有重要影响。同样,双向机制和注意力机制也会影响代码漏洞检测的性能。
{"title":"An extensive study of the effects of different deep learning models on code vulnerability detection in Python code","authors":"Rongcun Wang,&nbsp;Senlei Xu,&nbsp;Xingyu Ji,&nbsp;Yuan Tian,&nbsp;Lina Gong,&nbsp;Ke Wang","doi":"10.1007/s10515-024-00413-4","DOIUrl":"10.1007/s10515-024-00413-4","url":null,"abstract":"<div><p>Deep learning has achieved great progress in automated code vulnerability detection. Several code vulnerability detection approaches based on deep learning have been proposed. However, few studies empirically studied the impacts of different deep learning models on code vulnerability detection in Python. For this reason, we strive to cover many more code representation learning models and classification models for vulnerability detection. We design and conduct an empirical study for evaluating the effects of the eighteen deep learning architectures derived from combinations of three representation learning models, i.e., Word2Vec, fastText, and CodeBERT, and six classification models, i.e., random forest, XGBoost, Multi-Layer Perception (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gate Recurrent Unit (GRU) on code vulnerability detection in total. Additionally, two machine learning strategies i.e., the attention and bi-directional mechanisms are also empirically compared. The statistical significance and effect size analysis between different models are also conducted. In terms of <i>precision</i>, <i>recall</i>, and <i>F</i>-<i>score</i>, Word2Vec is better than Bidirectional Encoder Representations from Transformers CodeBERT and fastText. Likewise, long short-term memory (LSTM) and gated recurrent unit (GRU) are superior to other classification models we studied. The bi-directional LSTM and GRU with attention using Word2Vec are two optimal models for solving code vulnerability detection for Python code. Moreover, they have medium or large effect sizes on LSTM and GRU using only a single mechanism. Both the representation learning models and classification models have important influences on vulnerability detection in Python code. Likewise, the bi-directional and attention mechanisms can impact the performance of code vulnerability detection.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139657735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Automated Software Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1