Selecting the appropriate communication protocol is crucial for optimizing the performance, scalability, and user experience of web applications. In the diverse ecosystem of web technologies, various protocols like RESTful APIs, gRPC, WebSockets, and others serve distinct purposes. RESTful APIs are widely favored for their simplicity and stateless nature, making them ideal for standard CRUD operations. They offer a straightforward approach to interacting with resources over HTTP/1.1, providing broad compatibility and ease of integration across different platforms. However, in scenarios where applications require high efficiency and real-time communication, gRPC and WebSockets emerge as powerful alternatives. Each protocol comes with its strengths and limitations, influencing factors such as ease of implementation, performance under load, and support for complex data structures. RESTful APIs, while easy to use and widely supported, may introduce overhead due to their stateless nature and reliance on multiple HTTP/1.1 requests. In contrast, gRPC advanced features, while powerful, require a steeper learning curve and more sophisticated infrastructure. Similarly, WebSockets, while excellent for real-time applications, require careful management of persistent connections and security considerations. This paper explores the key considerations in choosing the right communication protocol, emphasizing the need to align technical choices with application requirements and user expectations. By understanding the unique attributes of each protocol, developers can make informed decisions that enhance the responsiveness and reliability of their web applications. The choice of protocol can significantly impact the user experience, scalability, and maintainability of the application, making it a critical decision in the web development process.
{"title":"Choosing the Right Communication Protocol for your Web Application","authors":"Mohamed Hassan","doi":"arxiv-2409.07360","DOIUrl":"https://doi.org/arxiv-2409.07360","url":null,"abstract":"Selecting the appropriate communication protocol is crucial for optimizing\u0000the performance, scalability, and user experience of web applications. In the\u0000diverse ecosystem of web technologies, various protocols like RESTful APIs,\u0000gRPC, WebSockets, and others serve distinct purposes. RESTful APIs are widely\u0000favored for their simplicity and stateless nature, making them ideal for\u0000standard CRUD operations. They offer a straightforward approach to interacting\u0000with resources over HTTP/1.1, providing broad compatibility and ease of\u0000integration across different platforms. However, in scenarios where\u0000applications require high efficiency and real-time communication, gRPC and\u0000WebSockets emerge as powerful alternatives. Each protocol comes with its\u0000strengths and limitations, influencing factors such as ease of implementation,\u0000performance under load, and support for complex data structures. RESTful APIs,\u0000while easy to use and widely supported, may introduce overhead due to their\u0000stateless nature and reliance on multiple HTTP/1.1 requests. In contrast, gRPC\u0000advanced features, while powerful, require a steeper learning curve and more\u0000sophisticated infrastructure. Similarly, WebSockets, while excellent for\u0000real-time applications, require careful management of persistent connections\u0000and security considerations. This paper explores the key considerations in\u0000choosing the right communication protocol, emphasizing the need to align\u0000technical choices with application requirements and user expectations. By\u0000understanding the unique attributes of each protocol, developers can make\u0000informed decisions that enhance the responsiveness and reliability of their web\u0000applications. The choice of protocol can significantly impact the user\u0000experience, scalability, and maintainability of the application, making it a\u0000critical decision in the web development process.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Much of the cost and effort required during the software testing process is invested in performing test maintenance - the addition, removal, or modification of test cases to keep the test suite in sync with the system-under-test or to otherwise improve its quality. Tool support could reduce the cost - and improve the quality - of test maintenance by automating aspects of the process or by providing guidance and support to developers. In this study, we explore the capabilities and applications of large language models (LLMs) - complex machine learning models adapted to textual analysis - to support test maintenance. We conducted a case study at Ericsson AB where we explored the triggers that indicate the need for test maintenance, the actions that LLMs can take, and the considerations that must be made when deploying LLMs in an industrial setting. We also proposed and demonstrated implementations of two multi-agent architectures that can predict which test cases require maintenance following a change to the source code. Collectively, these contributions advance our theoretical and practical understanding of how LLMs can be deployed to benefit industrial test maintenance processes.
{"title":"Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes","authors":"Ludvig Lemner, Linnea Wahlgren, Gregory Gay, Nasser Mohammadiha, Jingxiong Liu, Joakim Wennerberg","doi":"arxiv-2409.06416","DOIUrl":"https://doi.org/arxiv-2409.06416","url":null,"abstract":"Much of the cost and effort required during the software testing process is\u0000invested in performing test maintenance - the addition, removal, or\u0000modification of test cases to keep the test suite in sync with the\u0000system-under-test or to otherwise improve its quality. Tool support could\u0000reduce the cost - and improve the quality - of test maintenance by automating\u0000aspects of the process or by providing guidance and support to developers. In this study, we explore the capabilities and applications of large language\u0000models (LLMs) - complex machine learning models adapted to textual analysis -\u0000to support test maintenance. We conducted a case study at Ericsson AB where we\u0000explored the triggers that indicate the need for test maintenance, the actions\u0000that LLMs can take, and the considerations that must be made when deploying\u0000LLMs in an industrial setting. We also proposed and demonstrated\u0000implementations of two multi-agent architectures that can predict which test\u0000cases require maintenance following a change to the source code. Collectively,\u0000these contributions advance our theoretical and practical understanding of how\u0000LLMs can be deployed to benefit industrial test maintenance processes.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Developers must select a high-performance fault localization (FL) technique from available ones. A conventional approach is to try to select only one FL technique that is expected to attain high performance before debugging activity. In contrast, we propose a new approach that dynamically selects better FL techniques during debugging activity.
{"title":"On Applying Bandit Algorithm to Fault Localization Techniques","authors":"Masato Nakao, Kensei Hamamoto, Masateru Tsunoda, Amjed Tahir, Koji Toda, Akito Monden, Keitaro Nakasai, Kenichi Matsumoto","doi":"arxiv-2409.06268","DOIUrl":"https://doi.org/arxiv-2409.06268","url":null,"abstract":"Developers must select a high-performance fault localization (FL) technique\u0000from available ones. A conventional approach is to try to select only one FL\u0000technique that is expected to attain high performance before debugging\u0000activity. In contrast, we propose a new approach that dynamically selects\u0000better FL techniques during debugging activity.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The diversity of programming languages is growing, making the language extensibility of code clone detectors crucial. However, this is challenging for most existing clone detection detectors because the source code handler needs modifications, which require specialist-level knowledge of the targeted language and is time-consuming. Multilingual code clone detectors make it easier to add new language support by providing syntax information of the target language only. To address the shortcomings of existing multilingual detectors for language scalability and detection performance, we propose a multilingual code block extraction method based on ANTLR parser generation, and implement a multilingual code clone detector (MSCCD), which supports the most significant number of languages currently available and has the ability to detect Type-3 code clones. We follow the methodology of previous studies to evaluate the detection performance of the Java language. Compared to ten state-of-the-art detectors, MSCCD performs at an average level while it also supports a significantly larger number of languages. Furthermore, we propose the first multilingual syntactic code clone evaluation benchmark based on the CodeNet database. Our results reveal that even when applying the same detection approach, performance can vary markedly depending on the language of the source code under investigation. Overall, MSCCD is the most balanced one among the evaluated tools when considering detection performance and language extensibility.
{"title":"Development and Benchmarking of Multilingual Code Clone Detector","authors":"Wenqing Zhu, Norihiro Yoshida, Toshihiro Kamiya, Eunjong Choi, Hiroaki Takada","doi":"arxiv-2409.06176","DOIUrl":"https://doi.org/arxiv-2409.06176","url":null,"abstract":"The diversity of programming languages is growing, making the language\u0000extensibility of code clone detectors crucial. However, this is challenging for\u0000most existing clone detection detectors because the source code handler needs\u0000modifications, which require specialist-level knowledge of the targeted\u0000language and is time-consuming. Multilingual code clone detectors make it\u0000easier to add new language support by providing syntax information of the\u0000target language only. To address the shortcomings of existing multilingual\u0000detectors for language scalability and detection performance, we propose a\u0000multilingual code block extraction method based on ANTLR parser generation, and\u0000implement a multilingual code clone detector (MSCCD), which supports the most\u0000significant number of languages currently available and has the ability to\u0000detect Type-3 code clones. We follow the methodology of previous studies to\u0000evaluate the detection performance of the Java language. Compared to ten\u0000state-of-the-art detectors, MSCCD performs at an average level while it also\u0000supports a significantly larger number of languages. Furthermore, we propose\u0000the first multilingual syntactic code clone evaluation benchmark based on the\u0000CodeNet database. Our results reveal that even when applying the same detection\u0000approach, performance can vary markedly depending on the language of the source\u0000code under investigation. Overall, MSCCD is the most balanced one among the\u0000evaluated tools when considering detection performance and language\u0000extensibility.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hossein Hajipour, Lea Schönherr, Thorsten Holz, Mario Fritz
Large language models (LLMs) have shown great potential for automatic code generation and form the basis for various tools such as GitHub Copilot. However, recent studies highlight that many LLM-generated code contains serious security vulnerabilities. While previous work tries to address this by training models that generate secure code, these attempts remain constrained by limited access to training data and labor-intensive data preparation. In this paper, we introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes by automatically synthesizing secure codes, which reduces the effort of finding suitable training data. HexaCoder comprises two key components: an oracle-guided data synthesis pipeline and a two-step process for secure code generation. The data synthesis pipeline generates pairs of vulnerable and fixed codes for specific Common Weakness Enumeration (CWE) types by utilizing a state-of-the-art LLM for repairing vulnerable code. A security oracle identifies vulnerabilities, and a state-of-the-art LLM repairs them by extending and/or editing the codes, creating data pairs for fine-tuning using the Low-Rank Adaptation (LoRA) method. Each example of our fine-tuning dataset includes the necessary security-related libraries and code that form the basis of our novel two-step generation approach. This allows the model to integrate security-relevant libraries before generating the main code, significantly reducing the number of generated vulnerable codes by up to 85% compared to the baseline methods. We perform extensive evaluations on three different benchmarks for four LLMs, demonstrating that HexaCoder not only improves the security of the generated code but also maintains a high level of functional correctness.
{"title":"HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data","authors":"Hossein Hajipour, Lea Schönherr, Thorsten Holz, Mario Fritz","doi":"arxiv-2409.06446","DOIUrl":"https://doi.org/arxiv-2409.06446","url":null,"abstract":"Large language models (LLMs) have shown great potential for automatic code\u0000generation and form the basis for various tools such as GitHub Copilot.\u0000However, recent studies highlight that many LLM-generated code contains serious\u0000security vulnerabilities. While previous work tries to address this by training\u0000models that generate secure code, these attempts remain constrained by limited\u0000access to training data and labor-intensive data preparation. In this paper, we introduce HexaCoder, a novel approach to enhance the\u0000ability of LLMs to generate secure codes by automatically synthesizing secure\u0000codes, which reduces the effort of finding suitable training data. HexaCoder\u0000comprises two key components: an oracle-guided data synthesis pipeline and a\u0000two-step process for secure code generation. The data synthesis pipeline\u0000generates pairs of vulnerable and fixed codes for specific Common Weakness\u0000Enumeration (CWE) types by utilizing a state-of-the-art LLM for repairing\u0000vulnerable code. A security oracle identifies vulnerabilities, and a\u0000state-of-the-art LLM repairs them by extending and/or editing the codes,\u0000creating data pairs for fine-tuning using the Low-Rank Adaptation (LoRA)\u0000method. Each example of our fine-tuning dataset includes the necessary\u0000security-related libraries and code that form the basis of our novel two-step\u0000generation approach. This allows the model to integrate security-relevant\u0000libraries before generating the main code, significantly reducing the number of\u0000generated vulnerable codes by up to 85% compared to the baseline methods. We\u0000perform extensive evaluations on three different benchmarks for four LLMs,\u0000demonstrating that HexaCoder not only improves the security of the generated\u0000code but also maintains a high level of functional correctness.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ensemble learning methods have been used to enhance the reliability of defect prediction models. However, there is an inconclusive stability of a single method attaining the highest accuracy among various software projects. This work aims to improve the performance of ensemble-learning defect prediction among such projects by helping select the highest accuracy ensemble methods. We employ bandit algorithms (BA), an online optimization method, to select the highest-accuracy ensemble method. Each software module is tested sequentially, and bandit algorithms utilize the test outcomes of the modules to evaluate the performance of the ensemble learning methods. The test strategy followed might impact the testing effort and prediction accuracy when applying online optimization. Hence, we analyzed the test order's influence on BA's performance. In our experiment, we used six popular defect prediction datasets, four ensemble learning methods such as bagging, and three test strategies such as testing positive-prediction modules first (PF). Our results show that when BA is applied with PF, the prediction accuracy improved on average, and the number of found defects increased by 7% on a minimum of five out of six datasets (although with a slight increase in the testing effort by about 4% from ordinal ensemble learning). Hence, BA with PF strategy is the most effective to attain the highest prediction accuracy using ensemble methods on various projects.
集合学习方法已被用于提高缺陷预测模型的可靠性。然而,在各种软件项目中,单一方法获得最高准确率的稳定性并不稳定。这项工作旨在通过帮助选择准确率最高的集合方法,提高集合学习缺陷预测在此类项目中的性能。我们采用一种在线优化方法--强盗算法(BA)来选择精度最高的集合方法。每个软件模块按顺序进行测试,匪算法利用模块的测试结果来评估集合学习方法的性能。在应用在线优化时,测试策略可能会影响测试工作量和预测精度。因此,我们分析了测试顺序对 BA 性能的影响。在实验中,我们使用了 6 个流行的缺陷预测数据集、4 种集合学习方法(如 bagging)和 3 种测试策略(如先测试正预测模块 (PF))。实验结果表明,当应用带有 PF 的 BA 时,预测准确率平均有所提高,在六个数据集中的五个数据集上,发现的缺陷数量至少增加了 7%(尽管与顺序集合学习相比,测试工作量略微增加了约 4%)。因此,在各种项目中使用集合方法,使用 PF 策略的 BA 是获得最高预测精度的最有效方法。
{"title":"An Empirical Study of the Impact of Test Strategies on Online Optimization for Ensemble-Learning Defect Prediction","authors":"Kensei Hamamoto, Masateru Tsunoda, Amjed Tahir, Kwabena Ebo Bennin, Akito Monden, Koji Toda, Keitaro Nakasai, Kenichi Matsumoto","doi":"arxiv-2409.06264","DOIUrl":"https://doi.org/arxiv-2409.06264","url":null,"abstract":"Ensemble learning methods have been used to enhance the reliability of defect\u0000prediction models. However, there is an inconclusive stability of a single\u0000method attaining the highest accuracy among various software projects. This\u0000work aims to improve the performance of ensemble-learning defect prediction\u0000among such projects by helping select the highest accuracy ensemble methods. We\u0000employ bandit algorithms (BA), an online optimization method, to select the\u0000highest-accuracy ensemble method. Each software module is tested sequentially,\u0000and bandit algorithms utilize the test outcomes of the modules to evaluate the\u0000performance of the ensemble learning methods. The test strategy followed might\u0000impact the testing effort and prediction accuracy when applying online\u0000optimization. Hence, we analyzed the test order's influence on BA's\u0000performance. In our experiment, we used six popular defect prediction datasets,\u0000four ensemble learning methods such as bagging, and three test strategies such\u0000as testing positive-prediction modules first (PF). Our results show that when\u0000BA is applied with PF, the prediction accuracy improved on average, and the\u0000number of found defects increased by 7% on a minimum of five out of six\u0000datasets (although with a slight increase in the testing effort by about 4%\u0000from ordinal ensemble learning). Hence, BA with PF strategy is the most\u0000effective to attain the highest prediction accuracy using ensemble methods on\u0000various projects.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haowei Cheng, Jati H. Husen, Sien Reeve Peralta, Bowen Jiang, Nobukazu Yoshioka, Naoyasu Ubayashi, Hironori Washizaki
Context: Generative AI (GenAI) has emerged as a transformative tool in software engineering, with requirements engineering (RE) actively exploring its potential to revolutionize processes and outcomes. The integration of GenAI into RE presents both promising opportunities and significant challenges that necessitate systematic analysis and evaluation. Objective: This paper presents a comprehensive systematic literature review (SLR) analyzing state-of-the-art applications and innovative proposals leveraging GenAI in RE. It surveys studies focusing on the utilization of GenAI to enhance RE processes while identifying key challenges and opportunities in this rapidly evolving field. Method: A rigorous SLR methodology was used to analyze 27 carefully selected primary studies in-depth. The review examined research questions pertaining to the application of GenAI across various RE phases, the models and techniques used, and the challenges encountered in implementation and adoption. Results: The most salient findings include i) a predominant focus on the early stages of RE, particularly the elicitation and analysis of requirements, indicating potential for expansion into later phases; ii) the dominance of large language models, especially the GPT series, highlighting the need for diverse AI approaches; and iii) persistent challenges in domain-specific applications and the interpretability of AI-generated outputs, underscoring areas requiring further research and development. Conclusions: The results highlight the critical need for comprehensive evaluation frameworks, improved human-AI collaboration models, and thorough consideration of ethical implications in GenAI-assisted RE. Future research should prioritize extending GenAI applications across the entire RE lifecycle, enhancing domain-specific capabilities, and developing strategies for responsible AI integration in RE practices.
背景:生成式人工智能(GenAI)已成为软件工程领域的变革性工具,而需求工程(RE)也在积极探索其彻底改变流程和结果的潜力。将 GenAI 整合到 RE 中既带来了大有可为的机遇,也面临着巨大的挑战,需要进行系统分析和评估。目标:本文介绍了全面系统的文献综述(SLR),分析了在 RE 中利用 GenAI 的最新应用和创新提案。它调查了有关利用 GenAI 增强可再生能源流程的研究,同时确定了这一快速发展领域的关键挑战和机遇:方法:采用严格的 SLR 方法深入分析了精心挑选的 27 项主要研究。方法:采用严格的 SLR 方法深入分析了精心挑选的 27 项主要研究,审查了与 GenAI 在可再生能源各阶段的应用、所使用的模型和技术以及在实施和采用过程中遇到的挑战有关的研究问题。结果:最突出的发现包括 i) 主要集中在 RE 的早期阶段,特别是需求的激发和分析,这表明有可能扩展到后期阶段;ii) 大型语言模型,特别是 GPT 系列占主导地位,这凸显了对多样化人工智能方法的需求;iii) 在特定领域应用和人工智能生成输出的可解释性方面持续存在挑战,这强调了需要进一步研究和开发的领域。结论:研究结果突出表明,在 GenAI 辅助 RE 中,迫切需要全面的评估框架、改进的人类-人工智能合作模型以及对伦理影响的全面考虑。未来的研究应优先考虑将 GenAI 应用扩展到整个可再生能源生命周期,增强特定领域的能力,并为负责任地将人工智能整合到可再生能源实践中制定战略。
{"title":"Generative AI for Requirements Engineering: A Systematic Literature Review","authors":"Haowei Cheng, Jati H. Husen, Sien Reeve Peralta, Bowen Jiang, Nobukazu Yoshioka, Naoyasu Ubayashi, Hironori Washizaki","doi":"arxiv-2409.06741","DOIUrl":"https://doi.org/arxiv-2409.06741","url":null,"abstract":"Context: Generative AI (GenAI) has emerged as a transformative tool in\u0000software engineering, with requirements engineering (RE) actively exploring its\u0000potential to revolutionize processes and outcomes. The integration of GenAI\u0000into RE presents both promising opportunities and significant challenges that\u0000necessitate systematic analysis and evaluation. Objective: This paper presents\u0000a comprehensive systematic literature review (SLR) analyzing state-of-the-art\u0000applications and innovative proposals leveraging GenAI in RE. It surveys\u0000studies focusing on the utilization of GenAI to enhance RE processes while\u0000identifying key challenges and opportunities in this rapidly evolving field.\u0000Method: A rigorous SLR methodology was used to analyze 27 carefully selected\u0000primary studies in-depth. The review examined research questions pertaining to\u0000the application of GenAI across various RE phases, the models and techniques\u0000used, and the challenges encountered in implementation and adoption. Results:\u0000The most salient findings include i) a predominant focus on the early stages of\u0000RE, particularly the elicitation and analysis of requirements, indicating\u0000potential for expansion into later phases; ii) the dominance of large language\u0000models, especially the GPT series, highlighting the need for diverse AI\u0000approaches; and iii) persistent challenges in domain-specific applications and\u0000the interpretability of AI-generated outputs, underscoring areas requiring\u0000further research and development. Conclusions: The results highlight the\u0000critical need for comprehensive evaluation frameworks, improved human-AI\u0000collaboration models, and thorough consideration of ethical implications in\u0000GenAI-assisted RE. Future research should prioritize extending GenAI\u0000applications across the entire RE lifecycle, enhancing domain-specific\u0000capabilities, and developing strategies for responsible AI integration in RE\u0000practices.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"214 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software development is a collaborative endeavor that requires individuals from different departments to work together in order to collectively develop a high-quality software system. In this context, people have begun to explore a method that leverages multi-agent systems based on LLMs to carry out software development. However, existing research tends to rigidly fix the software development process in a framework in code form, thus failing to dynamically adjust the software development process in real-time to meet the more flexible and variable software environment. In this paper, we propose a dynamic process generation framework, named ToP (Think-on-Process). The core idea of ToP is to leverage experiential knowledge (i.e., process models) to guide LLMs in generating software development processes (i.e., instances). These instances will guide multi-agent in software development and employ a compiler to provide feedback on the development outcomes. Subsequently, we utilize heuristic algorithms to filter the instances and apply process mining algorithms to derive process model. Finally, the process model will be converted into text, formatted as prompts, to enhance the ability of LLMs to generate other instances. Experiments demonstrate that our framework ToP significantly enhances the dynamic process generation capability of the GPT-3.5 and GPT-4 for five categories of software development tasks.
{"title":"Think-on-Process: Dynamic Process Generation for Collaborative Development of Multi-Agent System","authors":"Leilei Lin, Yingming Zhou, Wenlong Chen, Chen Qian","doi":"arxiv-2409.06568","DOIUrl":"https://doi.org/arxiv-2409.06568","url":null,"abstract":"Software development is a collaborative endeavor that requires individuals\u0000from different departments to work together in order to collectively develop a\u0000high-quality software system. In this context, people have begun to explore a\u0000method that leverages multi-agent systems based on LLMs to carry out software\u0000development. However, existing research tends to rigidly fix the software\u0000development process in a framework in code form, thus failing to dynamically\u0000adjust the software development process in real-time to meet the more flexible\u0000and variable software environment. In this paper, we propose a dynamic process\u0000generation framework, named ToP (Think-on-Process). The core idea of ToP is to\u0000leverage experiential knowledge (i.e., process models) to guide LLMs in\u0000generating software development processes (i.e., instances). These instances\u0000will guide multi-agent in software development and employ a compiler to provide\u0000feedback on the development outcomes. Subsequently, we utilize heuristic\u0000algorithms to filter the instances and apply process mining algorithms to\u0000derive process model. Finally, the process model will be converted into text,\u0000formatted as prompts, to enhance the ability of LLMs to generate other\u0000instances. Experiments demonstrate that our framework ToP significantly\u0000enhances the dynamic process generation capability of the GPT-3.5 and GPT-4 for\u0000five categories of software development tasks.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"95 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tan Bui, Yan Naing Tun, Yiran Cheng, Ivana Clairine Irsan, Ting Zhang, Hong Jin Kang
We present a comprehensive dataset of Java vulnerability-fixing commits (VFCs) to advance research in Java vulnerability analysis. Our dataset, derived from thousands of open-source Java projects on GitHub, comprises two variants: JavaVFC and JavaVFC-extended. The dataset was constructed through a rigorous process involving heuristic rules and multiple rounds of manual labeling. We initially used keywords to filter candidate VFCs based on commit messages, then refined this keyword set through iterative manual labeling. The final labeling round achieved a precision score of 0.7 among three annotators. We applied the refined keyword set to 34,321 open-source Java repositories with over 50 GitHub stars, resulting in JavaVFC with 784 manually verified VFCs and JavaVFC-extended with 16,837 automatically identified VFCs. Both variants are presented in a standardized JSONL format for easy access and analysis. This dataset supports various research endeavors, including VFC identification, fine-grained vulnerability detection, and automated vulnerability repair. The JavaVFC and JavaVFC-extended are publicly available at https://zenodo.org/records/13731781.
{"title":"JavaVFC: Java Vulnerability Fixing Commits from Open-source Software","authors":"Tan Bui, Yan Naing Tun, Yiran Cheng, Ivana Clairine Irsan, Ting Zhang, Hong Jin Kang","doi":"arxiv-2409.05576","DOIUrl":"https://doi.org/arxiv-2409.05576","url":null,"abstract":"We present a comprehensive dataset of Java vulnerability-fixing commits\u0000(VFCs) to advance research in Java vulnerability analysis. Our dataset, derived\u0000from thousands of open-source Java projects on GitHub, comprises two variants:\u0000JavaVFC and JavaVFC-extended. The dataset was constructed through a rigorous\u0000process involving heuristic rules and multiple rounds of manual labeling. We\u0000initially used keywords to filter candidate VFCs based on commit messages, then\u0000refined this keyword set through iterative manual labeling. The final labeling\u0000round achieved a precision score of 0.7 among three annotators. We applied the\u0000refined keyword set to 34,321 open-source Java repositories with over 50 GitHub\u0000stars, resulting in JavaVFC with 784 manually verified VFCs and\u0000JavaVFC-extended with 16,837 automatically identified VFCs. Both variants are\u0000presented in a standardized JSONL format for easy access and analysis. This\u0000dataset supports various research endeavors, including VFC identification,\u0000fine-grained vulnerability detection, and automated vulnerability repair. The\u0000JavaVFC and JavaVFC-extended are publicly available at\u0000https://zenodo.org/records/13731781.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuai Wang, Liang Ding, Li Shen, Yong Luo, Zheng He, Wei Yu, Dacheng Tao
Large language models (LLMs) have shown remarkable capabilities in code generation. However, the effects of hallucinations (e.g., output noise) make it particularly challenging for LLMs to generate high-quality code in one pass. In this work, we propose a simple and effective textbf{u}ncertainty-aware textbf{s}elective textbf{c}ontrastive textbf{d}ecoding ($mathbb{USCD}$) mechanism to improve the quality of one-pass code generation in LLMs and reduce the impact of output noise. To be specific, we first elaborately designed a negative prompt (namely lame prompt) to output noise by removing input-output examples from the standard few-shot prompt. Our preliminary study shows that the Jensen-Shannon divergence (JS divergence) between token distribution uncertainty and the output noise is relatively low (approximately $0.25$), indicating their high relevance. Then, we selectively eliminate output noise induced by lame prompts based on the uncertainty of the prediction distribution from the standard prompt. Notably, our proposed plug-and-play mechanism is an inference-only method, enjoying appealing flexibility. Extensive experiments on widely used benchmarks, e.g., HumanEval, MBPP, and MultiPL-E, upon several LLMs (i.e., Inocder-6b, CodeLlama-7b, WizardCoder-15b, StarCoder, and Llama2-7b), demonstrate that our proposed USCD significantly improves one-pass code generation, with an average textit{pass@$1$} scores increase of 16.59%. We will release code and data on GitHub.
{"title":"$mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding","authors":"Shuai Wang, Liang Ding, Li Shen, Yong Luo, Zheng He, Wei Yu, Dacheng Tao","doi":"arxiv-2409.05923","DOIUrl":"https://doi.org/arxiv-2409.05923","url":null,"abstract":"Large language models (LLMs) have shown remarkable capabilities in code\u0000generation. However, the effects of hallucinations (e.g., output noise) make it\u0000particularly challenging for LLMs to generate high-quality code in one pass. In\u0000this work, we propose a simple and effective textbf{u}ncertainty-aware\u0000textbf{s}elective textbf{c}ontrastive textbf{d}ecoding ($mathbb{USCD}$)\u0000mechanism to improve the quality of one-pass code generation in LLMs and reduce\u0000the impact of output noise. To be specific, we first elaborately designed a\u0000negative prompt (namely lame prompt) to output noise by removing input-output\u0000examples from the standard few-shot prompt. Our preliminary study shows that\u0000the Jensen-Shannon divergence (JS divergence) between token distribution\u0000uncertainty and the output noise is relatively low (approximately $0.25$),\u0000indicating their high relevance. Then, we selectively eliminate output noise\u0000induced by lame prompts based on the uncertainty of the prediction distribution\u0000from the standard prompt. Notably, our proposed plug-and-play mechanism is an\u0000inference-only method, enjoying appealing flexibility. Extensive experiments on\u0000widely used benchmarks, e.g., HumanEval, MBPP, and MultiPL-E, upon several LLMs\u0000(i.e., Inocder-6b, CodeLlama-7b, WizardCoder-15b, StarCoder, and Llama2-7b),\u0000demonstrate that our proposed USCD significantly improves one-pass code\u0000generation, with an average textit{pass@$1$} scores increase of 16.59%. We\u0000will release code and data on GitHub.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}