arXiv - CS - Software Engineering最新文献_第8页

Choosing the Right Communication Protocol for your Web Application 为网络应用程序选择正确的通信协议

arXiv - CS - Software Engineering

Pub Date : 2024-09-11 DOI: arxiv-2409.07360

Mohamed Hassan

Selecting the appropriate communication protocol is crucial for optimizingthe performance, scalability, and user experience of web applications. In thediverse ecosystem of web technologies, various protocols like RESTful APIs,gRPC, WebSockets, and others serve distinct purposes. RESTful APIs are widelyfavored for their simplicity and stateless nature, making them ideal forstandard CRUD operations. They offer a straightforward approach to interactingwith resources over HTTP/1.1, providing broad compatibility and ease ofintegration across different platforms. However, in scenarios whereapplications require high efficiency and real-time communication, gRPC andWebSockets emerge as powerful alternatives. Each protocol comes with itsstrengths and limitations, influencing factors such as ease of implementation,performance under load, and support for complex data structures. RESTful APIs,while easy to use and widely supported, may introduce overhead due to theirstateless nature and reliance on multiple HTTP/1.1 requests. In contrast, gRPCadvanced features, while powerful, require a steeper learning curve and moresophisticated infrastructure. Similarly, WebSockets, while excellent forreal-time applications, require careful management of persistent connectionsand security considerations. This paper explores the key considerations inchoosing the right communication protocol, emphasizing the need to aligntechnical choices with application requirements and user expectations. Byunderstanding the unique attributes of each protocol, developers can makeinformed decisions that enhance the responsiveness and reliability of their webapplications. The choice of protocol can significantly impact the userexperience, scalability, and maintainability of the application, making it acritical decision in the web development process.

选择合适的通信协议对于优化网络应用程序的性能、可扩展性和用户体验至关重要。在多样化的网络技术生态系统中，RESTful API、gRPC、WebSockets 等各种协议都有各自不同的用途。RESTful API 因其简单性和无状态性而广受青睐，是标准 CRUD 操作的理想选择。它们提供了通过 HTTP/1.1 与资源交互的直接方法，具有广泛的兼容性，易于在不同平台间集成。然而，在应用程序需要高效率和实时通信的场景中，gRPC 和 WebSockets 成为了强大的替代方案。每种协议都有其优势和局限性，影响因素包括实施的难易程度、负载下的性能以及对复杂数据结构的支持。RESTful 应用程序接口虽然易于使用并得到广泛支持，但由于其无状态特性和对多个 HTTP/1.1 请求的依赖，可能会带来开销。相比之下，gRPC 高级功能虽然强大，但需要更陡峭的学习曲线和更复杂的基础设施。同样，WebSockets 虽然非常适合实时应用，但需要对持久连接进行仔细管理并考虑安全因素。本文探讨了选择正确通信协议的关键因素，强调了根据应用需求和用户期望进行技术选择的必要性。通过了解每种协议的独特属性，开发人员可以做出明智的决定，从而提高网络应用程序的响应速度和可靠性。协议的选择会对应用程序的使用体验、可扩展性和可维护性产生重大影响，因此是网络开发过程中的关键决策。

{"title":"Choosing the Right Communication Protocol for your Web Application","authors":"Mohamed Hassan","doi":"arxiv-2409.07360","DOIUrl":"https://doi.org/arxiv-2409.07360","url":null,"abstract":"Selecting the appropriate communication protocol is crucial for optimizing\u0000the performance, scalability, and user experience of web applications. In the\u0000diverse ecosystem of web technologies, various protocols like RESTful APIs,\u0000gRPC, WebSockets, and others serve distinct purposes. RESTful APIs are widely\u0000favored for their simplicity and stateless nature, making them ideal for\u0000standard CRUD operations. They offer a straightforward approach to interacting\u0000with resources over HTTP/1.1, providing broad compatibility and ease of\u0000integration across different platforms. However, in scenarios where\u0000applications require high efficiency and real-time communication, gRPC and\u0000WebSockets emerge as powerful alternatives. Each protocol comes with its\u0000strengths and limitations, influencing factors such as ease of implementation,\u0000performance under load, and support for complex data structures. RESTful APIs,\u0000while easy to use and widely supported, may introduce overhead due to their\u0000stateless nature and reliance on multiple HTTP/1.1 requests. In contrast, gRPC\u0000advanced features, while powerful, require a steeper learning curve and more\u0000sophisticated infrastructure. Similarly, WebSockets, while excellent for\u0000real-time applications, require careful management of persistent connections\u0000and security considerations. This paper explores the key considerations in\u0000choosing the right communication protocol, emphasizing the need to align\u0000technical choices with application requirements and user expectations. By\u0000understanding the unique attributes of each protocol, developers can make\u0000informed decisions that enhance the responsiveness and reliability of their web\u0000applications. The choice of protocol can significantly impact the user\u0000experience, scalability, and maintainability of the application, making it a\u0000critical decision in the web development process.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes 探索在工业测试维护流程中整合大型语言模型

arXiv - CS - Software Engineering

Pub Date : 2024-09-10 DOI: arxiv-2409.06416

Ludvig Lemner, Linnea Wahlgren, Gregory Gay, Nasser Mohammadiha, Jingxiong Liu, Joakim Wennerberg

Much of the cost and effort required during the software testing process isinvested in performing test maintenance - the addition, removal, ormodification of test cases to keep the test suite in sync with thesystem-under-test or to otherwise improve its quality. Tool support couldreduce the cost - and improve the quality - of test maintenance by automatingaspects of the process or by providing guidance and support to developers. In this study, we explore the capabilities and applications of large languagemodels (LLMs) - complex machine learning models adapted to textual analysis -to support test maintenance. We conducted a case study at Ericsson AB where weexplored the triggers that indicate the need for test maintenance, the actionsthat LLMs can take, and the considerations that must be made when deployingLLMs in an industrial setting. We also proposed and demonstratedimplementations of two multi-agent architectures that can predict which testcases require maintenance following a change to the source code. Collectively,these contributions advance our theoretical and practical understanding of howLLMs can be deployed to benefit industrial test maintenance processes.

软件测试过程中所需的大部分成本和精力都投入到了测试维护上--增加、删除或修改测试用例，以保持测试套件与被测系统同步，或以其他方式提高其质量。工具支持可以降低测试维护的成本并提高其质量，具体做法是实现流程自动化或为开发人员提供指导和支持。在本研究中，我们探讨了大型语言模型（LLM）--适用于文本分析的复杂机器学习模型--在支持测试维护方面的能力和应用。我们在爱立信公司进行了一项案例研究，探索了表明需要进行测试维护的触发因素、LLM 可以采取的行动以及在工业环境中部署 LLM 时必须考虑的因素。我们还提出并演示了两种多代理架构的实现方法，它们可以预测源代码更改后哪些测试用例需要维护。总之，这些贡献推进了我们对如何部署LLM 以造福工业测试维护流程的理论和实践理解。

{"title":"Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes","authors":"Ludvig Lemner, Linnea Wahlgren, Gregory Gay, Nasser Mohammadiha, Jingxiong Liu, Joakim Wennerberg","doi":"arxiv-2409.06416","DOIUrl":"https://doi.org/arxiv-2409.06416","url":null,"abstract":"Much of the cost and effort required during the software testing process is\u0000invested in performing test maintenance - the addition, removal, or\u0000modification of test cases to keep the test suite in sync with the\u0000system-under-test or to otherwise improve its quality. Tool support could\u0000reduce the cost - and improve the quality - of test maintenance by automating\u0000aspects of the process or by providing guidance and support to developers. In this study, we explore the capabilities and applications of large language\u0000models (LLMs) - complex machine learning models adapted to textual analysis -\u0000to support test maintenance. We conducted a case study at Ericsson AB where we\u0000explored the triggers that indicate the need for test maintenance, the actions\u0000that LLMs can take, and the considerations that must be made when deploying\u0000LLMs in an industrial setting. We also proposed and demonstrated\u0000implementations of two multi-agent architectures that can predict which test\u0000cases require maintenance following a change to the source code. Collectively,\u0000these contributions advance our theoretical and practical understanding of how\u0000LLMs can be deployed to benefit industrial test maintenance processes.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On Applying Bandit Algorithm to Fault Localization Techniques 将 Bandit 算法应用于故障定位技术

arXiv - CS - Software Engineering

Pub Date : 2024-09-10 DOI: arxiv-2409.06268

Masato Nakao, Kensei Hamamoto, Masateru Tsunoda, Amjed Tahir, Koji Toda, Akito Monden, Keitaro Nakasai, Kenichi Matsumoto

Developers must select a high-performance fault localization (FL) techniquefrom available ones. A conventional approach is to try to select only one FLtechnique that is expected to attain high performance before debuggingactivity. In contrast, we propose a new approach that dynamically selectsbetter FL techniques during debugging activity.

开发人员必须从现有的故障定位（FL）技术中选择一种高性能的技术。传统的方法是在调试前只选择一种有望达到高性能的 FL 技术。相比之下，我们提出了一种新方法，即在调试过程中动态选择更好的 FL 技术。

引用次数: 0

Development and Benchmarking of Multilingual Code Clone Detector 多语言代码克隆检测器的开发与基准测试

arXiv - CS - Software Engineering

Pub Date : 2024-09-10 DOI: arxiv-2409.06176

Wenqing Zhu, Norihiro Yoshida, Toshihiro Kamiya, Eunjong Choi, Hiroaki Takada

The diversity of programming languages is growing, making the languageextensibility of code clone detectors crucial. However, this is challenging formost existing clone detection detectors because the source code handler needsmodifications, which require specialist-level knowledge of the targetedlanguage and is time-consuming. Multilingual code clone detectors make iteasier to add new language support by providing syntax information of thetarget language only. To address the shortcomings of existing multilingualdetectors for language scalability and detection performance, we propose amultilingual code block extraction method based on ANTLR parser generation, andimplement a multilingual code clone detector (MSCCD), which supports the mostsignificant number of languages currently available and has the ability todetect Type-3 code clones. We follow the methodology of previous studies toevaluate the detection performance of the Java language. Compared to tenstate-of-the-art detectors, MSCCD performs at an average level while it alsosupports a significantly larger number of languages. Furthermore, we proposethe first multilingual syntactic code clone evaluation benchmark based on theCodeNet database. Our results reveal that even when applying the same detectionapproach, performance can vary markedly depending on the language of the sourcecode under investigation. Overall, MSCCD is the most balanced one among theevaluated tools when considering detection performance and languageextensibility.

编程语言的多样性与日俱增，使得代码克隆检测器的语言扩展性变得至关重要。然而，这对大多数现有的克隆检测器来说是个挑战，因为源代码处理程序需要修改，这需要目标语言的专业知识，而且非常耗时。多语言代码克隆检测器只提供目标语言的语法信息，因此更容易添加新的语言支持。针对现有多语言检测器在语言可扩展性和检测性能方面的不足，我们提出了基于 ANTLR 解析器生成的多语言代码块提取方法，并实现了多语言代码克隆检测器（MSCCD），该检测器支持目前最重要的语言数量，并具有检测第 3 类代码克隆的能力。我们采用以往研究的方法来评估 Java 语言的检测性能。与十种最先进的检测器相比，MSCCD 的性能处于平均水平，同时它支持的语言数量也大大增加。此外，我们还提出了首个基于 CodeNet 数据库的多语言句法代码克隆评估基准。我们的结果表明，即使采用相同的检测方法，性能也会因所调查源代码的语言不同而有明显差异。总体而言，考虑到检测性能和语言扩展性，MSCCD 是受评工具中最均衡的一种。

{"title":"Development and Benchmarking of Multilingual Code Clone Detector","authors":"Wenqing Zhu, Norihiro Yoshida, Toshihiro Kamiya, Eunjong Choi, Hiroaki Takada","doi":"arxiv-2409.06176","DOIUrl":"https://doi.org/arxiv-2409.06176","url":null,"abstract":"The diversity of programming languages is growing, making the language\u0000extensibility of code clone detectors crucial. However, this is challenging for\u0000most existing clone detection detectors because the source code handler needs\u0000modifications, which require specialist-level knowledge of the targeted\u0000language and is time-consuming. Multilingual code clone detectors make it\u0000easier to add new language support by providing syntax information of the\u0000target language only. To address the shortcomings of existing multilingual\u0000detectors for language scalability and detection performance, we propose a\u0000multilingual code block extraction method based on ANTLR parser generation, and\u0000implement a multilingual code clone detector (MSCCD), which supports the most\u0000significant number of languages currently available and has the ability to\u0000detect Type-3 code clones. We follow the methodology of previous studies to\u0000evaluate the detection performance of the Java language. Compared to ten\u0000state-of-the-art detectors, MSCCD performs at an average level while it also\u0000supports a significantly larger number of languages. Furthermore, we propose\u0000the first multilingual syntactic code clone evaluation benchmark based on the\u0000CodeNet database. Our results reveal that even when applying the same detection\u0000approach, performance can vary markedly depending on the language of the source\u0000code under investigation. Overall, MSCCD is the most balanced one among the\u0000evaluated tools when considering detection performance and language\u0000extensibility.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data HexaCoder：通过 Oracle 引导的合成训练数据安全生成代码

arXiv - CS - Software Engineering

Pub Date : 2024-09-10 DOI: arxiv-2409.06446

Hossein Hajipour, Lea Schönherr, Thorsten Holz, Mario Fritz

Large language models (LLMs) have shown great potential for automatic codegeneration and form the basis for various tools such as GitHub Copilot.However, recent studies highlight that many LLM-generated code contains serioussecurity vulnerabilities. While previous work tries to address this by trainingmodels that generate secure code, these attempts remain constrained by limitedaccess to training data and labor-intensive data preparation. In this paper, we introduce HexaCoder, a novel approach to enhance theability of LLMs to generate secure codes by automatically synthesizing securecodes, which reduces the effort of finding suitable training data. HexaCodercomprises two key components: an oracle-guided data synthesis pipeline and atwo-step process for secure code generation. The data synthesis pipelinegenerates pairs of vulnerable and fixed codes for specific Common WeaknessEnumeration (CWE) types by utilizing a state-of-the-art LLM for repairingvulnerable code. A security oracle identifies vulnerabilities, and astate-of-the-art LLM repairs them by extending and/or editing the codes,creating data pairs for fine-tuning using the Low-Rank Adaptation (LoRA)method. Each example of our fine-tuning dataset includes the necessarysecurity-related libraries and code that form the basis of our novel two-stepgeneration approach. This allows the model to integrate security-relevantlibraries before generating the main code, significantly reducing the number ofgenerated vulnerable codes by up to 85% compared to the baseline methods. Weperform extensive evaluations on three different benchmarks for four LLMs,demonstrating that HexaCoder not only improves the security of the generatedcode but also maintains a high level of functional correctness.

大型语言模型（LLM）在自动代码生成方面显示出巨大潜力，并成为 GitHub Copilot 等各种工具的基础。虽然以前的工作试图通过训练生成安全代码的模型来解决这个问题，但这些尝试仍然受到训练数据获取途径有限和数据准备耗费人力的限制。在本文中，我们介绍了 HexaCoder，这是一种通过自动合成安全代码来提高 LLM 生成安全代码能力的新方法，它减少了寻找合适训练数据的工作量。HexaCoderc 由两个关键部分组成：甲骨文指导的数据合成管道和安全代码生成的两步流程。数据合成流水线利用最先进的 LLM 修复漏洞代码，为特定的常见弱点枚举（CWE）类型生成成对的漏洞代码和固定代码。安全甲骨文会识别漏洞，而最先进的 LLM 会通过扩展和/或编辑代码来修复漏洞，并使用低库适应（LoRA）方法创建数据对进行微调。微调数据集的每个示例都包含必要的安全相关库和代码，这些库和代码构成了我们新颖的两步生成方法的基础。这样，模型就能在生成主代码之前集成安全相关的库，与基线方法相比，生成的易受攻击代码数量最多可减少 85%。我们对四种 LLM 的三种不同基准进行了广泛的评估，结果表明 HexaCoder 不仅提高了生成代码的安全性，还保持了较高的功能正确性。

{"title":"HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data","authors":"Hossein Hajipour, Lea Schönherr, Thorsten Holz, Mario Fritz","doi":"arxiv-2409.06446","DOIUrl":"https://doi.org/arxiv-2409.06446","url":null,"abstract":"Large language models (LLMs) have shown great potential for automatic code\u0000generation and form the basis for various tools such as GitHub Copilot.\u0000However, recent studies highlight that many LLM-generated code contains serious\u0000security vulnerabilities. While previous work tries to address this by training\u0000models that generate secure code, these attempts remain constrained by limited\u0000access to training data and labor-intensive data preparation. In this paper, we introduce HexaCoder, a novel approach to enhance the\u0000ability of LLMs to generate secure codes by automatically synthesizing secure\u0000codes, which reduces the effort of finding suitable training data. HexaCoder\u0000comprises two key components: an oracle-guided data synthesis pipeline and a\u0000two-step process for secure code generation. The data synthesis pipeline\u0000generates pairs of vulnerable and fixed codes for specific Common Weakness\u0000Enumeration (CWE) types by utilizing a state-of-the-art LLM for repairing\u0000vulnerable code. A security oracle identifies vulnerabilities, and a\u0000state-of-the-art LLM repairs them by extending and/or editing the codes,\u0000creating data pairs for fine-tuning using the Low-Rank Adaptation (LoRA)\u0000method. Each example of our fine-tuning dataset includes the necessary\u0000security-related libraries and code that form the basis of our novel two-step\u0000generation approach. This allows the model to integrate security-relevant\u0000libraries before generating the main code, significantly reducing the number of\u0000generated vulnerable codes by up to 85% compared to the baseline methods. We\u0000perform extensive evaluations on three different benchmarks for four LLMs,\u0000demonstrating that HexaCoder not only improves the security of the generated\u0000code but also maintains a high level of functional correctness.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Empirical Study of the Impact of Test Strategies on Online Optimization for Ensemble-Learning Defect Prediction 测试策略对集合学习缺陷预测在线优化影响的实证研究

arXiv - CS - Software Engineering

Pub Date : 2024-09-10 DOI: arxiv-2409.06264

Kensei Hamamoto, Masateru Tsunoda, Amjed Tahir, Kwabena Ebo Bennin, Akito Monden, Koji Toda, Keitaro Nakasai, Kenichi Matsumoto

Ensemble learning methods have been used to enhance the reliability of defectprediction models. However, there is an inconclusive stability of a singlemethod attaining the highest accuracy among various software projects. Thiswork aims to improve the performance of ensemble-learning defect predictionamong such projects by helping select the highest accuracy ensemble methods. Weemploy bandit algorithms (BA), an online optimization method, to select thehighest-accuracy ensemble method. Each software module is tested sequentially,and bandit algorithms utilize the test outcomes of the modules to evaluate theperformance of the ensemble learning methods. The test strategy followed mightimpact the testing effort and prediction accuracy when applying onlineoptimization. Hence, we analyzed the test order's influence on BA'sperformance. In our experiment, we used six popular defect prediction datasets,four ensemble learning methods such as bagging, and three test strategies suchas testing positive-prediction modules first (PF). Our results show that whenBA is applied with PF, the prediction accuracy improved on average, and thenumber of found defects increased by 7% on a minimum of five out of sixdatasets (although with a slight increase in the testing effort by about 4%from ordinal ensemble learning). Hence, BA with PF strategy is the mosteffective to attain the highest prediction accuracy using ensemble methods onvarious projects.

集合学习方法已被用于提高缺陷预测模型的可靠性。然而，在各种软件项目中，单一方法获得最高准确率的稳定性并不稳定。这项工作旨在通过帮助选择准确率最高的集合方法，提高集合学习缺陷预测在此类项目中的性能。我们采用一种在线优化方法--强盗算法（BA）来选择精度最高的集合方法。每个软件模块按顺序进行测试，匪算法利用模块的测试结果来评估集合学习方法的性能。在应用在线优化时，测试策略可能会影响测试工作量和预测精度。因此，我们分析了测试顺序对 BA 性能的影响。在实验中，我们使用了 6 个流行的缺陷预测数据集、4 种集合学习方法（如 bagging）和 3 种测试策略（如先测试正预测模块 (PF)）。实验结果表明，当应用带有 PF 的 BA 时，预测准确率平均有所提高，在六个数据集中的五个数据集上，发现的缺陷数量至少增加了 7%（尽管与顺序集合学习相比，测试工作量略微增加了约 4%）。因此，在各种项目中使用集合方法，使用 PF 策略的 BA 是获得最高预测精度的最有效方法。

{"title":"An Empirical Study of the Impact of Test Strategies on Online Optimization for Ensemble-Learning Defect Prediction","authors":"Kensei Hamamoto, Masateru Tsunoda, Amjed Tahir, Kwabena Ebo Bennin, Akito Monden, Koji Toda, Keitaro Nakasai, Kenichi Matsumoto","doi":"arxiv-2409.06264","DOIUrl":"https://doi.org/arxiv-2409.06264","url":null,"abstract":"Ensemble learning methods have been used to enhance the reliability of defect\u0000prediction models. However, there is an inconclusive stability of a single\u0000method attaining the highest accuracy among various software projects. This\u0000work aims to improve the performance of ensemble-learning defect prediction\u0000among such projects by helping select the highest accuracy ensemble methods. We\u0000employ bandit algorithms (BA), an online optimization method, to select the\u0000highest-accuracy ensemble method. Each software module is tested sequentially,\u0000and bandit algorithms utilize the test outcomes of the modules to evaluate the\u0000performance of the ensemble learning methods. The test strategy followed might\u0000impact the testing effort and prediction accuracy when applying online\u0000optimization. Hence, we analyzed the test order's influence on BA's\u0000performance. In our experiment, we used six popular defect prediction datasets,\u0000four ensemble learning methods such as bagging, and three test strategies such\u0000as testing positive-prediction modules first (PF). Our results show that when\u0000BA is applied with PF, the prediction accuracy improved on average, and the\u0000number of found defects increased by 7% on a minimum of five out of six\u0000datasets (although with a slight increase in the testing effort by about 4%\u0000from ordinal ensemble learning). Hence, BA with PF strategy is the most\u0000effective to attain the highest prediction accuracy using ensemble methods on\u0000various projects.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generative AI for Requirements Engineering: A Systematic Literature Review 用于需求工程的生成式人工智能：系统文献综述

arXiv - CS - Software Engineering

Pub Date : 2024-09-10 DOI: arxiv-2409.06741

Haowei Cheng, Jati H. Husen, Sien Reeve Peralta, Bowen Jiang, Nobukazu Yoshioka, Naoyasu Ubayashi, Hironori Washizaki

Context: Generative AI (GenAI) has emerged as a transformative tool insoftware engineering, with requirements engineering (RE) actively exploring itspotential to revolutionize processes and outcomes. The integration of GenAIinto RE presents both promising opportunities and significant challenges thatnecessitate systematic analysis and evaluation. Objective: This paper presentsa comprehensive systematic literature review (SLR) analyzing state-of-the-artapplications and innovative proposals leveraging GenAI in RE. It surveysstudies focusing on the utilization of GenAI to enhance RE processes whileidentifying key challenges and opportunities in this rapidly evolving field.Method: A rigorous SLR methodology was used to analyze 27 carefully selectedprimary studies in-depth. The review examined research questions pertaining tothe application of GenAI across various RE phases, the models and techniquesused, and the challenges encountered in implementation and adoption. Results:The most salient findings include i) a predominant focus on the early stages ofRE, particularly the elicitation and analysis of requirements, indicatingpotential for expansion into later phases; ii) the dominance of large languagemodels, especially the GPT series, highlighting the need for diverse AIapproaches; and iii) persistent challenges in domain-specific applications andthe interpretability of AI-generated outputs, underscoring areas requiringfurther research and development. Conclusions: The results highlight thecritical need for comprehensive evaluation frameworks, improved human-AIcollaboration models, and thorough consideration of ethical implications inGenAI-assisted RE. Future research should prioritize extending GenAIapplications across the entire RE lifecycle, enhancing domain-specificcapabilities, and developing strategies for responsible AI integration in REpractices.

背景：生成式人工智能（GenAI）已成为软件工程领域的变革性工具，而需求工程（RE）也在积极探索其彻底改变流程和结果的潜力。将 GenAI 整合到 RE 中既带来了大有可为的机遇，也面临着巨大的挑战，需要进行系统分析和评估。目标：本文介绍了全面系统的文献综述（SLR），分析了在 RE 中利用 GenAI 的最新应用和创新提案。它调查了有关利用 GenAI 增强可再生能源流程的研究，同时确定了这一快速发展领域的关键挑战和机遇：方法：采用严格的 SLR 方法深入分析了精心挑选的 27 项主要研究。方法：采用严格的 SLR 方法深入分析了精心挑选的 27 项主要研究，审查了与 GenAI 在可再生能源各阶段的应用、所使用的模型和技术以及在实施和采用过程中遇到的挑战有关的研究问题。结果：最突出的发现包括 i) 主要集中在 RE 的早期阶段，特别是需求的激发和分析，这表明有可能扩展到后期阶段；ii) 大型语言模型，特别是 GPT 系列占主导地位，这凸显了对多样化人工智能方法的需求；iii) 在特定领域应用和人工智能生成输出的可解释性方面持续存在挑战，这强调了需要进一步研究和开发的领域。结论：研究结果突出表明，在 GenAI 辅助 RE 中，迫切需要全面的评估框架、改进的人类-人工智能合作模型以及对伦理影响的全面考虑。未来的研究应优先考虑将 GenAI 应用扩展到整个可再生能源生命周期，增强特定领域的能力，并为负责任地将人工智能整合到可再生能源实践中制定战略。

{"title":"Generative AI for Requirements Engineering: A Systematic Literature Review","authors":"Haowei Cheng, Jati H. Husen, Sien Reeve Peralta, Bowen Jiang, Nobukazu Yoshioka, Naoyasu Ubayashi, Hironori Washizaki","doi":"arxiv-2409.06741","DOIUrl":"https://doi.org/arxiv-2409.06741","url":null,"abstract":"Context: Generative AI (GenAI) has emerged as a transformative tool in\u0000software engineering, with requirements engineering (RE) actively exploring its\u0000potential to revolutionize processes and outcomes. The integration of GenAI\u0000into RE presents both promising opportunities and significant challenges that\u0000necessitate systematic analysis and evaluation. Objective: This paper presents\u0000a comprehensive systematic literature review (SLR) analyzing state-of-the-art\u0000applications and innovative proposals leveraging GenAI in RE. It surveys\u0000studies focusing on the utilization of GenAI to enhance RE processes while\u0000identifying key challenges and opportunities in this rapidly evolving field.\u0000Method: A rigorous SLR methodology was used to analyze 27 carefully selected\u0000primary studies in-depth. The review examined research questions pertaining to\u0000the application of GenAI across various RE phases, the models and techniques\u0000used, and the challenges encountered in implementation and adoption. Results:\u0000The most salient findings include i) a predominant focus on the early stages of\u0000RE, particularly the elicitation and analysis of requirements, indicating\u0000potential for expansion into later phases; ii) the dominance of large language\u0000models, especially the GPT series, highlighting the need for diverse AI\u0000approaches; and iii) persistent challenges in domain-specific applications and\u0000the interpretability of AI-generated outputs, underscoring areas requiring\u0000further research and development. Conclusions: The results highlight the\u0000critical need for comprehensive evaluation frameworks, improved human-AI\u0000collaboration models, and thorough consideration of ethical implications in\u0000GenAI-assisted RE. Future research should prioritize extending GenAI\u0000applications across the entire RE lifecycle, enhancing domain-specific\u0000capabilities, and developing strategies for responsible AI integration in RE\u0000practices.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"214 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Think-on-Process: Dynamic Process Generation for Collaborative Development of Multi-Agent System 流程上的思考：多代理系统协作开发的动态流程生成

arXiv - CS - Software Engineering

Pub Date : 2024-09-10 DOI: arxiv-2409.06568

Leilei Lin, Yingming Zhou, Wenlong Chen, Chen Qian

Software development is a collaborative endeavor that requires individualsfrom different departments to work together in order to collectively develop ahigh-quality software system. In this context, people have begun to explore amethod that leverages multi-agent systems based on LLMs to carry out softwaredevelopment. However, existing research tends to rigidly fix the softwaredevelopment process in a framework in code form, thus failing to dynamicallyadjust the software development process in real-time to meet the more flexibleand variable software environment. In this paper, we propose a dynamic processgeneration framework, named ToP (Think-on-Process). The core idea of ToP is toleverage experiential knowledge (i.e., process models) to guide LLMs ingenerating software development processes (i.e., instances). These instanceswill guide multi-agent in software development and employ a compiler to providefeedback on the development outcomes. Subsequently, we utilize heuristicalgorithms to filter the instances and apply process mining algorithms toderive process model. Finally, the process model will be converted into text,formatted as prompts, to enhance the ability of LLMs to generate otherinstances. Experiments demonstrate that our framework ToP significantlyenhances the dynamic process generation capability of the GPT-3.5 and GPT-4 forfive categories of software development tasks.

软件开发是一项协作性工作，需要来自不同部门的人员通力合作，共同开发出高质量的软件系统。在这种情况下，人们开始探索利用基于 LLM 的多代理系统进行软件开发的方法。然而，现有的研究往往将软件开发过程僵化地固定在代码形式的框架中，无法实时动态地调整软件开发过程，以适应更加灵活多变的软件环境。在本文中，我们提出了一种动态流程生成框架，命名为 ToP（Think-on-Process）。ToP 的核心思想是利用经验知识（即流程模型）指导 LLM 生成软件开发流程（即实例）。这些实例将指导多机器人进行软件开发，并利用编译器提供开发结果反馈。随后，我们利用启发式算法对实例进行过滤，并应用流程挖掘算法生成流程模型。最后，流程模型将被转换成文本，格式为提示，以增强 LLM 生成其他实例的能力。实验证明，我们的 ToP 框架大大提高了 GPT-3.5 和 GPT-4 针对五类软件开发任务的动态流程生成能力。

{"title":"Think-on-Process: Dynamic Process Generation for Collaborative Development of Multi-Agent System","authors":"Leilei Lin, Yingming Zhou, Wenlong Chen, Chen Qian","doi":"arxiv-2409.06568","DOIUrl":"https://doi.org/arxiv-2409.06568","url":null,"abstract":"Software development is a collaborative endeavor that requires individuals\u0000from different departments to work together in order to collectively develop a\u0000high-quality software system. In this context, people have begun to explore a\u0000method that leverages multi-agent systems based on LLMs to carry out software\u0000development. However, existing research tends to rigidly fix the software\u0000development process in a framework in code form, thus failing to dynamically\u0000adjust the software development process in real-time to meet the more flexible\u0000and variable software environment. In this paper, we propose a dynamic process\u0000generation framework, named ToP (Think-on-Process). The core idea of ToP is to\u0000leverage experiential knowledge (i.e., process models) to guide LLMs in\u0000generating software development processes (i.e., instances). These instances\u0000will guide multi-agent in software development and employ a compiler to provide\u0000feedback on the development outcomes. Subsequently, we utilize heuristic\u0000algorithms to filter the instances and apply process mining algorithms to\u0000derive process model. Finally, the process model will be converted into text,\u0000formatted as prompts, to enhance the ability of LLMs to generate other\u0000instances. Experiments demonstrate that our framework ToP significantly\u0000enhances the dynamic process generation capability of the GPT-3.5 and GPT-4 for\u0000five categories of software development tasks.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"95 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

JavaVFC: Java Vulnerability Fixing Commits from Open-source Software JavaVFC：开源软件的 Java 漏洞修复承诺

arXiv - CS - Software Engineering

Pub Date : 2024-09-09 DOI: arxiv-2409.05576

Tan Bui, Yan Naing Tun, Yiran Cheng, Ivana Clairine Irsan, Ting Zhang, Hong Jin Kang

We present a comprehensive dataset of Java vulnerability-fixing commits(VFCs) to advance research in Java vulnerability analysis. Our dataset, derivedfrom thousands of open-source Java projects on GitHub, comprises two variants:JavaVFC and JavaVFC-extended. The dataset was constructed through a rigorousprocess involving heuristic rules and multiple rounds of manual labeling. Weinitially used keywords to filter candidate VFCs based on commit messages, thenrefined this keyword set through iterative manual labeling. The final labelinground achieved a precision score of 0.7 among three annotators. We applied therefined keyword set to 34,321 open-source Java repositories with over 50 GitHubstars, resulting in JavaVFC with 784 manually verified VFCs andJavaVFC-extended with 16,837 automatically identified VFCs. Both variants arepresented in a standardized JSONL format for easy access and analysis. Thisdataset supports various research endeavors, including VFC identification,fine-grained vulnerability detection, and automated vulnerability repair. TheJavaVFC and JavaVFC-extended are publicly available athttps://zenodo.org/records/13731781.

我们提出了一个全面的 Java 漏洞修复提交（VFC）数据集，以推进 Java 漏洞分析研究。我们的数据集来自 GitHub 上的数千个开源 Java 项目，包括两个变体：JavaVFC 和 JavaVFC-extended。该数据集是通过一个严格的过程构建的，其中包括启发式规则和多轮人工标注。我们最初根据提交消息使用关键字来筛选候选 VFC，然后通过迭代手动标注来完善关键字集。在三位标注者的共同努力下，最终的标注精确度达到了 0.7 分。我们将最终确定的关键字集应用于超过 50 个 GitHubstars 的 34,321 个开源 Java 代码库，最终产生了包含 784 个人工验证 VFC 的 JavaVFC 和包含 16,837 个自动识别 VFC 的 JavaVFC-extended。两种变体都以标准化的 JSONL 格式呈现，便于访问和分析。该数据集支持各种研究工作，包括 VFC 识别、细粒度漏洞检测和自动漏洞修复。JavaVFC和JavaVFC-extended可在https://zenodo.org/records/13731781。

{"title":"JavaVFC: Java Vulnerability Fixing Commits from Open-source Software","authors":"Tan Bui, Yan Naing Tun, Yiran Cheng, Ivana Clairine Irsan, Ting Zhang, Hong Jin Kang","doi":"arxiv-2409.05576","DOIUrl":"https://doi.org/arxiv-2409.05576","url":null,"abstract":"We present a comprehensive dataset of Java vulnerability-fixing commits\u0000(VFCs) to advance research in Java vulnerability analysis. Our dataset, derived\u0000from thousands of open-source Java projects on GitHub, comprises two variants:\u0000JavaVFC and JavaVFC-extended. The dataset was constructed through a rigorous\u0000process involving heuristic rules and multiple rounds of manual labeling. We\u0000initially used keywords to filter candidate VFCs based on commit messages, then\u0000refined this keyword set through iterative manual labeling. The final labeling\u0000round achieved a precision score of 0.7 among three annotators. We applied the\u0000refined keyword set to 34,321 open-source Java repositories with over 50 GitHub\u0000stars, resulting in JavaVFC with 784 manually verified VFCs and\u0000JavaVFC-extended with 16,837 automatically identified VFCs. Both variants are\u0000presented in a standardized JSONL format for easy access and analysis. This\u0000dataset supports various research endeavors, including VFC identification,\u0000fine-grained vulnerability detection, and automated vulnerability repair. The\u0000JavaVFC and JavaVFC-extended are publicly available at\u0000https://zenodo.org/records/13731781.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

$mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding $mathbb{USCD}$：通过不确定性感知的选择性对比解码改进 LLM 的代码生成

arXiv - CS - Software Engineering

Pub Date : 2024-09-09 DOI: arxiv-2409.05923

Shuai Wang, Liang Ding, Li Shen, Yong Luo, Zheng He, Wei Yu, Dacheng Tao

Large language models (LLMs) have shown remarkable capabilities in codegeneration. However, the effects of hallucinations (e.g., output noise) make itparticularly challenging for LLMs to generate high-quality code in one pass. Inthis work, we propose a simple and effective textbf{u}ncertainty-awaretextbf{s}elective textbf{c}ontrastive textbf{d}ecoding ($mathbb{USCD}$)mechanism to improve the quality of one-pass code generation in LLMs and reducethe impact of output noise. To be specific, we first elaborately designed anegative prompt (namely lame prompt) to output noise by removing input-outputexamples from the standard few-shot prompt. Our preliminary study shows thatthe Jensen-Shannon divergence (JS divergence) between token distributionuncertainty and the output noise is relatively low (approximately $0.25$),indicating their high relevance. Then, we selectively eliminate output noiseinduced by lame prompts based on the uncertainty of the prediction distributionfrom the standard prompt. Notably, our proposed plug-and-play mechanism is aninference-only method, enjoying appealing flexibility. Extensive experiments onwidely used benchmarks, e.g., HumanEval, MBPP, and MultiPL-E, upon several LLMs(i.e., Inocder-6b, CodeLlama-7b, WizardCoder-15b, StarCoder, and Llama2-7b),demonstrate that our proposed USCD significantly improves one-pass codegeneration, with an average textit{pass@$1$} scores increase of 16.59%. Wewill release code and data on GitHub.

大型语言模型（LLMs）在代码生成方面表现出了非凡的能力。然而，由于幻觉（如输出噪声）的影响，LLMs 要一次性生成高质量的代码尤其具有挑战性。在这项工作中，我们提出了一种简单有效的不确定性感知（textbf{u}ncertainty-awaretextbf{s}elective textbf{c}ontrastivetextbf{d}ecoding（$mathbb{USCD}$）机制，以提高 LLM 一次生成代码的质量，并降低输出噪声的影响。具体来说，我们首先精心设计了一种消极提示（即跛脚提示），通过从标准的几发提示中移除输入-输出示例来消除输出噪声。初步研究表明，令牌分布不确定性与输出噪声之间的詹森-香农分歧（JS 分歧）相对较低（约为 0.25 美元），这表明它们具有很高的相关性。然后，我们根据标准提示的预测分布的不确定性，有选择地消除跛脚提示引起的输出噪声。值得注意的是，我们提出的即插即用机制是一种纯推理方法，具有极高的灵活性。在广泛使用的基准（如HumanEval、MBPP和MultiPL-E）和多个LLM（即Inocder-6b、CodeLlama-7b、WizardCoder-15b、StarCoder和Llama2-7b）上进行的大量实验表明，我们提出的USCD显著提高了单通代码生成能力，平均textit{pass@$1$}得分提高了16.59%。我们将在 GitHub 上发布代码和数据。

{"title":"$mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding","authors":"Shuai Wang, Liang Ding, Li Shen, Yong Luo, Zheng He, Wei Yu, Dacheng Tao","doi":"arxiv-2409.05923","DOIUrl":"https://doi.org/arxiv-2409.05923","url":null,"abstract":"Large language models (LLMs) have shown remarkable capabilities in code\u0000generation. However, the effects of hallucinations (e.g., output noise) make it\u0000particularly challenging for LLMs to generate high-quality code in one pass. In\u0000this work, we propose a simple and effective textbf{u}ncertainty-aware\u0000textbf{s}elective textbf{c}ontrastive textbf{d}ecoding ($mathbb{USCD}$)\u0000mechanism to improve the quality of one-pass code generation in LLMs and reduce\u0000the impact of output noise. To be specific, we first elaborately designed a\u0000negative prompt (namely lame prompt) to output noise by removing input-output\u0000examples from the standard few-shot prompt. Our preliminary study shows that\u0000the Jensen-Shannon divergence (JS divergence) between token distribution\u0000uncertainty and the output noise is relatively low (approximately $0.25$),\u0000indicating their high relevance. Then, we selectively eliminate output noise\u0000induced by lame prompts based on the uncertainty of the prediction distribution\u0000from the standard prompt. Notably, our proposed plug-and-play mechanism is an\u0000inference-only method, enjoying appealing flexibility. Extensive experiments on\u0000widely used benchmarks, e.g., HumanEval, MBPP, and MultiPL-E, upon several LLMs\u0000(i.e., Inocder-6b, CodeLlama-7b, WizardCoder-15b, StarCoder, and Llama2-7b),\u0000demonstrate that our proposed USCD significantly improves one-pass code\u0000generation, with an average textit{pass@$1$} scores increase of 16.59%. We\u0000will release code and data on GitHub.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0