arXiv - CS - Software Engineering最新文献_第2页

Leveraging Reviewer Experience in Code Review Comment Generation 在代码审查注释生成中利用审查员经验

arXiv - CS - Software Engineering

Pub Date : 2024-09-17 DOI: arxiv-2409.10959

Hong Yi Lin, Patanamon Thongtanunam, Christoph Treude, Michael W. Godfrey, Chunhua Liu, Wachiraphan Charoenwet

Modern code review is a ubiquitous software quality assurance process aimedat identifying potential issues within newly written code. Despite itseffectiveness, the process demands large amounts of effort from the humanreviewers involved. To help alleviate this workload, researchers have traineddeep learning models to imitate human reviewers in providing natural languagecode reviews. Formally, this task is known as code review comment generation.Prior work has demonstrated improvements in this task by leveraging machinelearning techniques and neural models, such as transfer learning and thetransformer architecture. However, the quality of the model generated reviewsremain sub-optimal due to the quality of the open-source code review data usedin model training. This is in part due to the data obtained from open-sourceprojects where code reviews are conducted in a public forum, and reviewerspossess varying levels of software development experience, potentiallyaffecting the quality of their feedback. To accommodate for this variation, wepropose a suite of experience-aware training methods that utilise thereviewers' past authoring and reviewing experiences as signals for reviewquality. Specifically, we propose experience-aware loss functions (ELF), whichuse the reviewers' authoring and reviewing ownership of a project as weights inthe model's loss function. Through this method, experienced reviewers' codereviews yield larger influence over the model's behaviour. Compared to the SOTAmodel, ELF was able to generate higher quality reviews in terms of accuracy,informativeness, and comment types generated. The key contribution of this workis the demonstration of how traditional software engineering concepts such asreviewer experience can be integrated into the design of AI-based automatedcode review models.

现代代码审查是一种无处不在的软件质量保证流程，旨在识别新编写代码中的潜在问题。尽管效果显著，但这一过程要求相关的人工审核人员付出大量精力。为了帮助减轻这一工作量，研究人员训练了深度学习模型来模仿人类审查员提供自然语言代码审查。先前的工作已经证明，利用机器学习技术和神经模型（如迁移学习和变换器架构）可以改进这项任务。然而，由于模型训练中使用的开源代码评论数据的质量问题，模型生成的评论质量仍未达到最佳。这部分是由于从开源项目中获取的数据是在公共论坛上进行代码审查的，而审查者拥有不同程度的软件开发经验，这可能会影响他们的反馈质量。为了适应这种差异，我们提出了一套经验感知训练方法，利用审阅者过去的编写和审阅经验作为审阅质量的信号。具体来说，我们提出了经验感知损失函数（ELF），它将审稿人在项目中的创作和审稿所有权作为模型损失函数的权重。通过这种方法，经验丰富的审稿人的代码评审对模型行为的影响更大。与 SOTA 模型相比，ELF 能够在准确性、信息量和评论类型方面生成更高质量的评论。这项工作的主要贡献在于展示了如何将审阅者经验等传统软件工程概念集成到基于人工智能的自动代码审阅模型的设计中。

{"title":"Leveraging Reviewer Experience in Code Review Comment Generation","authors":"Hong Yi Lin, Patanamon Thongtanunam, Christoph Treude, Michael W. Godfrey, Chunhua Liu, Wachiraphan Charoenwet","doi":"arxiv-2409.10959","DOIUrl":"https://doi.org/arxiv-2409.10959","url":null,"abstract":"Modern code review is a ubiquitous software quality assurance process aimed\u0000at identifying potential issues within newly written code. Despite its\u0000effectiveness, the process demands large amounts of effort from the human\u0000reviewers involved. To help alleviate this workload, researchers have trained\u0000deep learning models to imitate human reviewers in providing natural language\u0000code reviews. Formally, this task is known as code review comment generation.\u0000Prior work has demonstrated improvements in this task by leveraging machine\u0000learning techniques and neural models, such as transfer learning and the\u0000transformer architecture. However, the quality of the model generated reviews\u0000remain sub-optimal due to the quality of the open-source code review data used\u0000in model training. This is in part due to the data obtained from open-source\u0000projects where code reviews are conducted in a public forum, and reviewers\u0000possess varying levels of software development experience, potentially\u0000affecting the quality of their feedback. To accommodate for this variation, we\u0000propose a suite of experience-aware training methods that utilise the\u0000reviewers' past authoring and reviewing experiences as signals for review\u0000quality. Specifically, we propose experience-aware loss functions (ELF), which\u0000use the reviewers' authoring and reviewing ownership of a project as weights in\u0000the model's loss function. Through this method, experienced reviewers' code\u0000reviews yield larger influence over the model's behaviour. Compared to the SOTA\u0000model, ELF was able to generate higher quality reviews in terms of accuracy,\u0000informativeness, and comment types generated. The key contribution of this work\u0000is the demonstration of how traditional software engineering concepts such as\u0000reviewer experience can be integrated into the design of AI-based automated\u0000code review models.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Context-Dependent Interactable Graphical User Interface Element Detection for VR Applications 用于虚拟现实应用的与上下文相关的可交互图形用户界面元素检测

arXiv - CS - Software Engineering

Pub Date : 2024-09-17 DOI: arxiv-2409.10811

Shuqing Li, Binchang Li, Yepang Liu, Cuiyun Gao, Jianping Zhang, Shing-Chi Cheung, Michael R. Lyu

In recent years, Virtual Reality (VR) has emerged as a transformativetechnology, offering users immersive and interactive experiences acrossdiversified virtual environments. Users can interact with VR apps throughinteractable GUI elements (IGEs) on the stereoscopic three-dimensional (3D)graphical user interface (GUI). The accurate recognition of these IGEs isinstrumental, serving as the foundation of many software engineering tasks,including automated testing and effective GUI search. The most recent IGEdetection approaches for 2D mobile apps typically train a supervised objectdetection model based on a large-scale manually-labeled GUI dataset, usuallywith a pre-defined set of clickable GUI element categories like buttons andspinners. Such approaches can hardly be applied to IGE detection in VR apps,due to a multitude of challenges including complexities posed byopen-vocabulary and heterogeneous IGE categories, intricacies ofcontext-sensitive interactability, and the necessities of precise spatialperception and visual-semantic alignment for accurate IGE detection results.Thus, it is necessary to embark on the IGE research tailored to VR apps. Inthis paper, we propose the first zero-shot cOntext-sensitive inteRactable GUIElemeNT dEtection framework for virtual Reality apps, named Orienter. Byimitating human behaviors, Orienter observes and understands the semanticcontexts of VR app scenes first, before performing the detection. The detectionprocess is iterated within a feedback-directed validation and reflection loop.Specifically, Orienter contains three components, including (1) Semanticcontext comprehension, (2) Reflection-directed IGE candidate detection, and (3)Context-sensitive interactability classification. Extensive experiments on thedataset demonstrate that Orienter is more effective than the state-of-the-artGUI element detection approaches.

近年来，虚拟现实（VR）已成为一种变革性技术，为用户提供了跨越多样化虚拟环境的沉浸式交互体验。用户可以通过立体三维（3D）图形用户界面（GUI）上的可交互 GUI 元素（IGE）与 VR 应用程序进行交互。准确识别这些 IGE 至关重要，是许多软件工程任务（包括自动测试和有效的图形用户界面搜索）的基础。最新的 2D 移动应用程序 IGE 检测方法通常基于大规模手动标记的 GUI 数据集来训练监督对象检测模型，该数据集通常包含一组预定义的可点击 GUI 元素类别，如按钮和旋钮。这种方法很难应用于 VR 应用中的 IGE 检测，因为它面临诸多挑战，包括开放词汇和异构 IGE 类别带来的复杂性、上下文敏感交互性的错综复杂性，以及为获得准确的 IGE 检测结果而进行精确空间感知和视觉语义对齐的必要性。在本文中，我们提出了首个用于虚拟现实应用程序的零镜头文本敏感可交互图形检测框架，命名为Orienter。通过模仿人类行为，Orienter 首先观察并理解虚拟现实应用场景的语义上下文，然后再执行检测。具体来说，Orienter包含三个组件，包括（1）语义上下文理解；（2）反思导向的IGE候选检测；（3）上下文敏感的可交互性分类。在数据集上进行的大量实验证明，Orienter 比最先进的图形用户界面元素检测方法更有效。

{"title":"Context-Dependent Interactable Graphical User Interface Element Detection for VR Applications","authors":"Shuqing Li, Binchang Li, Yepang Liu, Cuiyun Gao, Jianping Zhang, Shing-Chi Cheung, Michael R. Lyu","doi":"arxiv-2409.10811","DOIUrl":"https://doi.org/arxiv-2409.10811","url":null,"abstract":"In recent years, Virtual Reality (VR) has emerged as a transformative\u0000technology, offering users immersive and interactive experiences across\u0000diversified virtual environments. Users can interact with VR apps through\u0000interactable GUI elements (IGEs) on the stereoscopic three-dimensional (3D)\u0000graphical user interface (GUI). The accurate recognition of these IGEs is\u0000instrumental, serving as the foundation of many software engineering tasks,\u0000including automated testing and effective GUI search. The most recent IGE\u0000detection approaches for 2D mobile apps typically train a supervised object\u0000detection model based on a large-scale manually-labeled GUI dataset, usually\u0000with a pre-defined set of clickable GUI element categories like buttons and\u0000spinners. Such approaches can hardly be applied to IGE detection in VR apps,\u0000due to a multitude of challenges including complexities posed by\u0000open-vocabulary and heterogeneous IGE categories, intricacies of\u0000context-sensitive interactability, and the necessities of precise spatial\u0000perception and visual-semantic alignment for accurate IGE detection results.\u0000Thus, it is necessary to embark on the IGE research tailored to VR apps. In\u0000this paper, we propose the first zero-shot cOntext-sensitive inteRactable GUI\u0000ElemeNT dEtection framework for virtual Reality apps, named Orienter. By\u0000imitating human behaviors, Orienter observes and understands the semantic\u0000contexts of VR app scenes first, before performing the detection. The detection\u0000process is iterated within a feedback-directed validation and reflection loop.\u0000Specifically, Orienter contains three components, including (1) Semantic\u0000context comprehension, (2) Reflection-directed IGE candidate detection, and (3)\u0000Context-sensitive interactability classification. Extensive experiments on the\u0000dataset demonstrate that Orienter is more effective than the state-of-the-art\u0000GUI element detection approaches.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Reinforcement Learning Environment for Automatic Code Optimization in the MLIR Compiler 在 MLIR 编译器中自动优化代码的强化学习环境

arXiv - CS - Software Engineering

Pub Date : 2024-09-17 DOI: arxiv-2409.11068

Nazim Bendib, Iheb Nassim Aouadj, Riyadh Baghdadi

Code optimization is a crucial task aimed at enhancing code performance.However, this process is often tedious and complex, highlighting the necessityfor automatic code optimization techniques. Reinforcement Learning (RL), amachine learning technique, has emerged as a promising approach for tacklingsuch complex optimization problems. In this project, we introduce the first RLenvironment for the MLIR compiler, dedicated to facilitating MLIR compilerresearch, and enabling automatic code optimization using Multi-ActionReinforcement Learning. We also propose a novel formulation of the action spaceas a Cartesian product of simpler action subspaces, enabling more efficient andeffective optimizations. Experimental results demonstrate that our proposedenvironment allows for an effective optimization of MLIR operations, and yieldscomparable performance to TensorFlow, surpassing it in multiple cases,highlighting the potential of RL-based optimization in compiler frameworks.

代码优化是一项旨在提高代码性能的重要任务。然而，这一过程往往繁琐而复杂，这凸显了自动代码优化技术的必要性。强化学习（RL）作为一种机器学习技术，已成为解决此类复杂优化问题的一种有前途的方法。在本项目中，我们为 MLIR 编译器引入了第一个 RL 环境，致力于促进 MLIR 编译器的研究，并使用多动作强化学习实现自动代码优化。我们还提出了一种新颖的方法，将行动空间表述为更简单的行动子空间的笛卡尔乘积，从而实现更高效、更有效的优化。实验结果表明，我们提出的环境可以有效优化多动作强化学习（MLIR）操作，其性能可与 TensorFlow 相媲美，并在多种情况下超过了 TensorFlow，这凸显了基于 RL 的优化在编译器框架中的潜力。

引用次数: 0

LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents LLM-Agent-UMF：基于 LLM 的面向多主动/被动核心代理无缝集成的代理统一建模框架

arXiv - CS - Software Engineering

Pub Date : 2024-09-17 DOI: arxiv-2409.11393

Amine B. Hassouna, Hana Chaari, Ines Belhaj

The integration of tools in LLM-based agents overcame the difficulties ofstandalone LLMs and traditional agents' limited capabilities. However, theconjunction of these technologies and the proposed enhancements in severalstate-of-the-art works followed a non-unified software architecture resultingin a lack of modularity. Indeed, they focused mainly on functionalities andoverlooked the definition of the component's boundaries within the agent. Thiscaused terminological and architectural ambiguities between researchers whichwe addressed in this paper by proposing a unified framework that establishes aclear foundation for LLM-based agents' development from both functional andsoftware architectural perspectives. Our framework, LLM-Agent-UMF (LLM-based Agent Unified Modeling Framework),clearly distinguishes between the different components of an agent, settingLLMs, and tools apart from a newly introduced element: the core-agent, playingthe role of the central coordinator of the agent which comprises five modules:planning, memory, profile, action, and security, the latter often neglected inprevious works. Differences in the internal structure of core-agents led us toclassify them into a taxonomy of passive and active types. Based on this, weproposed different multi-core agent architectures combining uniquecharacteristics of various individual agents. For evaluation purposes, we applied this framework to a selection ofstate-of-the-art agents, thereby demonstrating its alignment with theirfunctionalities and clarifying the overlooked architectural aspects. Moreover,we thoroughly assessed four of our proposed architectures by integratingdistinctive agents into hybrid active/passive core-agents' systems. Thisanalysis provided clear insights into potential improvements and highlightedthe challenges involved in the combination of specific agents.

在基于 LLM 的代理中集成工具克服了独立 LLM 的困难和传统代理的有限能力。然而，这些技术的结合以及一些最新著作中提出的增强功能都遵循了非统一的软件架构，导致缺乏模块化。事实上，它们主要关注的是功能，而忽略了代理内部组件边界的定义。这就造成了研究者之间术语和架构上的歧义，我们在本文中提出了一个统一的框架，从功能和软件架构两个角度为基于 LLM 的代理开发奠定了清晰的基础，从而解决了这一问题。我们的框架，即 LLM-Agent-UMF（基于 LLM 的代理统一建模框架），明确区分了代理的不同组成部分，将 LLM 和工具与一个新引入的元素区分开来：核心代理，扮演代理中心协调者的角色，由五个模块组成：规划、记忆、配置文件、行动和安全，后者在以前的工作中经常被忽视。核心代理内部结构的差异促使我们将其分为被动型和主动型。在此基础上，我们提出了不同的多核代理架构，将不同代理的独特特征结合在一起。出于评估的目的，我们将这一框架应用于部分最先进的代理，从而证明了该框架与代理功能的一致性，并澄清了被忽视的架构方面的问题。此外，我们还通过将不同的代理集成到主动/被动混合核心代理系统中，对我们提出的四种架构进行了全面评估。这一分析为潜在的改进提供了清晰的洞察力，并强调了特定代理组合所面临的挑战。

{"title":"LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents","authors":"Amine B. Hassouna, Hana Chaari, Ines Belhaj","doi":"arxiv-2409.11393","DOIUrl":"https://doi.org/arxiv-2409.11393","url":null,"abstract":"The integration of tools in LLM-based agents overcame the difficulties of\u0000standalone LLMs and traditional agents' limited capabilities. However, the\u0000conjunction of these technologies and the proposed enhancements in several\u0000state-of-the-art works followed a non-unified software architecture resulting\u0000in a lack of modularity. Indeed, they focused mainly on functionalities and\u0000overlooked the definition of the component's boundaries within the agent. This\u0000caused terminological and architectural ambiguities between researchers which\u0000we addressed in this paper by proposing a unified framework that establishes a\u0000clear foundation for LLM-based agents' development from both functional and\u0000software architectural perspectives. Our framework, LLM-Agent-UMF (LLM-based Agent Unified Modeling Framework),\u0000clearly distinguishes between the different components of an agent, setting\u0000LLMs, and tools apart from a newly introduced element: the core-agent, playing\u0000the role of the central coordinator of the agent which comprises five modules:\u0000planning, memory, profile, action, and security, the latter often neglected in\u0000previous works. Differences in the internal structure of core-agents led us to\u0000classify them into a taxonomy of passive and active types. Based on this, we\u0000proposed different multi-core agent architectures combining unique\u0000characteristics of various individual agents. For evaluation purposes, we applied this framework to a selection of\u0000state-of-the-art agents, thereby demonstrating its alignment with their\u0000functionalities and clarifying the overlooked architectural aspects. Moreover,\u0000we thoroughly assessed four of our proposed architectures by integrating\u0000distinctive agents into hybrid active/passive core-agents' systems. This\u0000analysis provided clear insights into potential improvements and highlighted\u0000the challenges involved in the combination of specific agents.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"118 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer SuperCoder2.0：探索 LLM 作为自主程序员可行性的技术报告

arXiv - CS - Software Engineering

Pub Date : 2024-09-17 DOI: arxiv-2409.11190

Anmol Gautam, Kishore Kumar, Adarsh Jha, Mukunda NS, Ishaan Bhola

We present SuperCoder2.0, an advanced autonomous system designed to enhancesoftware development through artificial intelligence. The system combines anAI-native development approach with intelligent agents to enable fullyautonomous coding. Key focus areas include a retry mechanism with error outputtraceback, comprehensive code rewriting and replacement using Abstract SyntaxTree (ast) parsing to minimize linting issues, code embedding technique forretrieval-augmented generation, and a focus on localizing methods forproblem-solving rather than identifying specific line numbers. The methodologyemploys a three-step hierarchical search space reduction approach for code basenavigation and bug localization:utilizing Retrieval Augmented Generation (RAG)and a Repository File Level Map to identify candidate files, (2) narrowing downto the most relevant files using a File Level Schematic Map, and (3) extracting'relevant locations' within these files. Code editing is performed through atwo-part module comprising CodeGeneration and CodeEditing, which generatesmultiple solutions at different temperature values and replaces entire methodsor classes to maintain code integrity. A feedback loop executesrepository-level test cases to validate and refine solutions. Experimentsconducted on the SWE-bench Lite dataset demonstrate SuperCoder2.0'seffectiveness, achieving correct file localization in 84.33% of cases withinthe top 5 candidates and successfully resolving 34% of test instances. Thisperformance places SuperCoder2.0 fourth globally on the SWE-bench leaderboard.The system's ability to handle diverse repositories and problem typeshighlights its potential as a versatile tool for autonomous softwaredevelopment. Future work will focus on refining the code editing process andexploring advanced embedding models for improved natural language to codemapping.

我们介绍的超级编码器 2.0 是一种先进的自主系统，旨在通过人工智能增强软件开发能力。该系统将人工智能原生开发方法与智能代理相结合，实现了完全自主的编码。重点领域包括：带有错误输出回溯功能的重试机制；使用抽象语法树（ast）解析技术进行全面的代码重写和替换，以最大限度地减少剔除问题；用于检索增强生成的代码嵌入技术；以及将重点放在解决问题的本地化方法上，而不是识别具体的行号。该方法采用三步分层搜索空间缩减法进行代码库导航和错误定位：利用检索增强生成（RAG）和资源库文件级地图识别候选文件；(2) 利用文件级示意图缩小最相关文件的范围；(3) 在这些文件中提取 "相关位置"。代码编辑通过由代码生成和代码编辑两部分组成的模块进行，该模块在不同温度值下生成多个解决方案，并替换整个方法或类，以保持代码的完整性。反馈回路执行存储库级测试用例，以验证和完善解决方案。在 SWE-bench Lite 数据集上进行的实验证明了 SuperCoder2.0 的有效性，在前 5 个候选案例中，84.33% 的案例实现了正确的文件定位，并成功解决了 34% 的测试实例。该系统处理不同资源库和问题类型的能力凸显了其作为自主软件开发多功能工具的潜力。未来的工作重点是完善代码编辑流程，并探索先进的嵌入模型，以改进从自然语言到代码的映射。

{"title":"SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer","authors":"Anmol Gautam, Kishore Kumar, Adarsh Jha, Mukunda NS, Ishaan Bhola","doi":"arxiv-2409.11190","DOIUrl":"https://doi.org/arxiv-2409.11190","url":null,"abstract":"We present SuperCoder2.0, an advanced autonomous system designed to enhance\u0000software development through artificial intelligence. The system combines an\u0000AI-native development approach with intelligent agents to enable fully\u0000autonomous coding. Key focus areas include a retry mechanism with error output\u0000traceback, comprehensive code rewriting and replacement using Abstract Syntax\u0000Tree (ast) parsing to minimize linting issues, code embedding technique for\u0000retrieval-augmented generation, and a focus on localizing methods for\u0000problem-solving rather than identifying specific line numbers. The methodology\u0000employs a three-step hierarchical search space reduction approach for code base\u0000navigation and bug localization:utilizing Retrieval Augmented Generation (RAG)\u0000and a Repository File Level Map to identify candidate files, (2) narrowing down\u0000to the most relevant files using a File Level Schematic Map, and (3) extracting\u0000'relevant locations' within these files. Code editing is performed through a\u0000two-part module comprising CodeGeneration and CodeEditing, which generates\u0000multiple solutions at different temperature values and replaces entire methods\u0000or classes to maintain code integrity. A feedback loop executes\u0000repository-level test cases to validate and refine solutions. Experiments\u0000conducted on the SWE-bench Lite dataset demonstrate SuperCoder2.0's\u0000effectiveness, achieving correct file localization in 84.33% of cases within\u0000the top 5 candidates and successfully resolving 34% of test instances. This\u0000performance places SuperCoder2.0 fourth globally on the SWE-bench leaderboard.\u0000The system's ability to handle diverse repositories and problem types\u0000highlights its potential as a versatile tool for autonomous software\u0000development. Future work will focus on refining the code editing process and\u0000exploring advanced embedding models for improved natural language to code\u0000mapping.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Control-flow Reconstruction Attacks on Business Process Models 业务流程模型的控制流重构攻击

arXiv - CS - Software Engineering

Pub Date : 2024-09-17 DOI: arxiv-2409.10986

Henrik Kirchmann, Stephan A. Fahrenkrog-Petersen, Felix Mannhardt, Matthias Weidlich

Process models may be automatically generated from event logs that containas-is data of a business process. While such models generalize over thecontrol-flow of specific, recorded process executions, they are often alsoannotated with behavioural statistics, such as execution frequencies.Basedthereon, once a model is published, certain insights about the original processexecutions may be reconstructed, so that an external party may extractconfidential information about the business process. This work is the first toempirically investigate such reconstruction attempts based on process models.To this end, we propose different play-out strategies that reconstruct thecontrol-flow from process trees, potentially exploiting frequency annotations.To assess the potential success of such reconstruction attacks on processmodels, and hence the risks imposed by publishing them, we compare thereconstructed process executions with those of the original log for severalreal-world datasets.

流程模型可以从包含业务流程现存数据的事件日志中自动生成。虽然这些模型概括了具体记录的流程执行的控制流，但它们通常也标注了行为统计数据，如执行频率。因此，一旦模型被发布，有关原始流程执行的某些见解就可能被重建，这样外部方就可以提取有关业务流程的机密信息。为此，我们提出了不同的播放策略，从流程树中重建控制流，并可能利用频率注释。为了评估这种对流程模型的重建攻击的潜在成功率，以及发布流程模型所带来的风险，我们比较了几个真实世界数据集的流程执行情况和原始日志的执行情况。

引用次数: 0

An Empirical Study of Sensitive Information in Logs 日志中敏感信息的实证研究

arXiv - CS - Software Engineering

Pub Date : 2024-09-17 DOI: arxiv-2409.11313

Roozbeh Aghili, Heng Li, Foutse Khomh

Software logs, generated during the runtime of software systems, areessential for various development and analysis activities, such as anomalydetection and failure diagnosis. However, the presence of sensitive informationin these logs poses significant privacy concerns, particularly regardingPersonally Identifiable Information (PII) and quasi-identifiers that could leadto re-identification risks. While general data privacy has been extensivelystudied, the specific domain of privacy in software logs remains underexplored,with inconsistent definitions of sensitivity and a lack of standardizedguidelines for anonymization. To mitigate this gap, this study offers acomprehensive analysis of privacy in software logs from multiple perspectives.We start by performing an analysis of 25 publicly available log datasets toidentify potentially sensitive attributes. Based on the result of this step, wefocus on three perspectives: privacy regulations, research literature, andindustry practices. We first analyze key data privacy regulations, such as theGeneral Data Protection Regulation (GDPR) and the California Consumer PrivacyAct (CCPA), to understand the legal requirements concerning sensitiveinformation in logs. Second, we conduct a systematic literature review toidentify common privacy attributes and practices in log anonymization,revealing gaps in existing approaches. Finally, we survey 45 industryprofessionals to capture practical insights on log anonymization practices. Ourfindings shed light on various perspectives of log privacy and reveal industrychallenges, such as technical and efficiency issues while highlighting the needfor standardized guidelines. By combining insights from regulatory, academic,and industry perspectives, our study aims to provide a clearer framework foridentifying and protecting sensitive information in software logs.

软件系统运行期间生成的软件日志对于异常检测和故障诊断等各种开发和分析活动至关重要。然而，这些日志中存在的敏感信息带来了严重的隐私问题，尤其是个人身份信息（PII）和可能导致重新识别风险的准标识符。虽然对一般数据隐私已经进行了广泛研究，但对软件日志隐私这一特定领域的研究仍然不足，对敏感性的定义不一致，也缺乏标准化的匿名化指南。为了缩小这一差距，本研究从多个角度对软件日志中的隐私进行了全面分析。我们首先对 25 个公开可用的日志数据集进行了分析，以确定潜在的敏感属性。在此基础上，我们重点从隐私法规、研究文献和行业实践三个角度进行分析。首先，我们分析了主要的数据隐私法规，如《一般数据保护条例》（GDPR）和《加州消费者隐私法案》（CCPA），以了解有关日志中敏感信息的法律要求。其次，我们进行了系统的文献综述，以确定日志匿名化中常见的隐私属性和实践，揭示现有方法中的不足。最后，我们对 45 位行业专业人士进行了调查，以获取有关日志匿名化实践的实用见解。我们的调查结果揭示了日志隐私的各种观点，揭示了行业面临的挑战，如技术和效率问题，同时强调了标准化指南的必要性。我们的研究结合了监管、学术和行业的观点，旨在为识别和保护软件日志中的敏感信息提供一个更清晰的框架。

{"title":"An Empirical Study of Sensitive Information in Logs","authors":"Roozbeh Aghili, Heng Li, Foutse Khomh","doi":"arxiv-2409.11313","DOIUrl":"https://doi.org/arxiv-2409.11313","url":null,"abstract":"Software logs, generated during the runtime of software systems, are\u0000essential for various development and analysis activities, such as anomaly\u0000detection and failure diagnosis. However, the presence of sensitive information\u0000in these logs poses significant privacy concerns, particularly regarding\u0000Personally Identifiable Information (PII) and quasi-identifiers that could lead\u0000to re-identification risks. While general data privacy has been extensively\u0000studied, the specific domain of privacy in software logs remains underexplored,\u0000with inconsistent definitions of sensitivity and a lack of standardized\u0000guidelines for anonymization. To mitigate this gap, this study offers a\u0000comprehensive analysis of privacy in software logs from multiple perspectives.\u0000We start by performing an analysis of 25 publicly available log datasets to\u0000identify potentially sensitive attributes. Based on the result of this step, we\u0000focus on three perspectives: privacy regulations, research literature, and\u0000industry practices. We first analyze key data privacy regulations, such as the\u0000General Data Protection Regulation (GDPR) and the California Consumer Privacy\u0000Act (CCPA), to understand the legal requirements concerning sensitive\u0000information in logs. Second, we conduct a systematic literature review to\u0000identify common privacy attributes and practices in log anonymization,\u0000revealing gaps in existing approaches. Finally, we survey 45 industry\u0000professionals to capture practical insights on log anonymization practices. Our\u0000findings shed light on various perspectives of log privacy and reveal industry\u0000challenges, such as technical and efficiency issues while highlighting the need\u0000for standardized guidelines. By combining insights from regulatory, academic,\u0000and industry perspectives, our study aims to provide a clearer framework for\u0000identifying and protecting sensitive information in software logs.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching VulnLLMEval：评估软件漏洞检测和修补中大型语言模型的框架

arXiv - CS - Software Engineering

Pub Date : 2024-09-16 DOI: arxiv-2409.10756

Arastoo Zibaeirad, Marco Vieira

Large Language Models (LLMs) have shown promise in tasks like codetranslation, prompting interest in their potential for automating softwarevulnerability detection (SVD) and patching (SVP). To further research in thisarea, establishing a benchmark is essential for evaluating the strengths andlimitations of LLMs in these tasks. Despite their capabilities, questionsremain regarding whether LLMs can accurately analyze complex vulnerabilitiesand generate appropriate patches. This paper introduces VulnLLMEval, aframework designed to assess the performance of LLMs in identifying andpatching vulnerabilities in C code. Our study includes 307 real-worldvulnerabilities extracted from the Linux kernel, creating a well-curateddataset that includes both vulnerable and patched code. This dataset, based onreal-world code, provides a diverse and representative testbed for evaluatingLLM performance in SVD and SVP tasks, offering a robust foundation for rigorousassessment. Our results reveal that LLMs often struggle with distinguishingbetween vulnerable and patched code. Furthermore, in SVP tasks, these modelstend to oversimplify the code, producing solutions that may not be directlyusable without further refinement.

大型语言模型（LLM）在代码翻译等任务中大有可为，这促使人们对其在软件漏洞自动检测（SVD）和修补（SVP）方面的潜力产生了兴趣。为了进一步推动这一领域的研究，建立一个基准对于评估 LLM 在这些任务中的优势和局限性至关重要。尽管 LLM 功能强大，但人们对其能否准确分析复杂的漏洞并生成适当的补丁仍存有疑问。本文介绍了 VulnLLMEval，这是一个旨在评估 LLMs 在识别和修补 C 代码中的漏洞方面的性能的框架。我们的研究包括从 Linux 内核中提取的 307 个真实世界的漏洞，创建了一个经过精心整理的数据集，其中既包括易受攻击的代码，也包括已打补丁的代码。这个基于真实世界代码的数据集为评估 LLM 在 SVD 和 SVP 任务中的性能提供了一个多样化且具有代表性的测试平台，为严格的评估奠定了坚实的基础。我们的结果表明，LLM 在区分易受攻击代码和已打补丁代码方面经常遇到困难。此外，在 SVP 任务中，这些模型倾向于过度简化代码，产生的解决方案在没有进一步完善的情况下可能无法直接使用。

{"title":"VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching","authors":"Arastoo Zibaeirad, Marco Vieira","doi":"arxiv-2409.10756","DOIUrl":"https://doi.org/arxiv-2409.10756","url":null,"abstract":"Large Language Models (LLMs) have shown promise in tasks like code\u0000translation, prompting interest in their potential for automating software\u0000vulnerability detection (SVD) and patching (SVP). To further research in this\u0000area, establishing a benchmark is essential for evaluating the strengths and\u0000limitations of LLMs in these tasks. Despite their capabilities, questions\u0000remain regarding whether LLMs can accurately analyze complex vulnerabilities\u0000and generate appropriate patches. This paper introduces VulnLLMEval, a\u0000framework designed to assess the performance of LLMs in identifying and\u0000patching vulnerabilities in C code. Our study includes 307 real-world\u0000vulnerabilities extracted from the Linux kernel, creating a well-curated\u0000dataset that includes both vulnerable and patched code. This dataset, based on\u0000real-world code, provides a diverse and representative testbed for evaluating\u0000LLM performance in SVD and SVP tasks, offering a robust foundation for rigorous\u0000assessment. Our results reveal that LLMs often struggle with distinguishing\u0000between vulnerable and patched code. Furthermore, in SVP tasks, these models\u0000tend to oversimplify the code, producing solutions that may not be directly\u0000usable without further refinement.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models 代码漏洞检测：新兴大型语言模型的比较分析

arXiv - CS - Software Engineering

Pub Date : 2024-09-16 DOI: arxiv-2409.10490

Shaznin Sultana, Sadia Afreen, Nasir U. Eisty

The growing trend of vulnerability issues in software development as a resultof a large dependence on open-source projects has received considerableattention recently. This paper investigates the effectiveness of Large LanguageModels (LLMs) in identifying vulnerabilities within codebases, with a focus onthe latest advancements in LLM technology. Through a comparative analysis, weassess the performance of emerging LLMs, specifically Llama, CodeLlama, Gemma,and CodeGemma, alongside established state-of-the-art models such as BERT,RoBERTa, and GPT-3. Our study aims to shed light on the capabilities of LLMs invulnerability detection, contributing to the enhancement of software securitypractices across diverse open-source repositories. We observe that CodeGemmaachieves the highest F1-score of 58 and a Recall of 87, amongst the recentadditions of large language models to detect software security vulnerabilities.

由于大量依赖开源项目，软件开发中的漏洞问题呈增长趋势，这一问题最近受到了广泛关注。本文研究了大型语言模型（LLM）在识别代码库中的漏洞方面的有效性，重点关注 LLM 技术的最新进展。通过比较分析，我们评估了新兴 LLM（特别是 Llama、CodeLlama、Gemma 和 CodeGemma）与 BERT、RoBERTa 和 GPT-3 等成熟的最先进模型的性能。我们的研究旨在揭示 LLMs 的漏洞检测能力，从而有助于加强不同开源软件库中的软件安全实践。我们观察到，在最近增加的用于检测软件安全漏洞的大型语言模型中，CodeGemma 的 F1 分数最高，为 58 分，召回率为 87 分。

引用次数: 0

Confidence in Assurance 2.0 Cases 信心保证 2.0 案例

arXiv - CS - Software Engineering

Pub Date : 2024-09-16 DOI: arxiv-2409.10665

Robin Bloomfield, John Rushby

An assurance case should provide justifiable confidence in the truth of aclaim about some critical property of a system or procedure, such as safety orsecurity. We consider how confidence can be assessed in the rigorous approachwe call Assurance 2.0. Our goal is indefeasible confidence and we approach it from four differentperspectives: logical soundness, probabilistic assessment, dialecticalexamination, and residual risks.

保证案例应为系统或程序的某些关键属性（如安全或保障）声明的真实性提供合理的信心。我们考虑如何通过我们称之为 "保证 2.0 "的严格方法来评估可信度。我们的目标是不可辩驳的可信度，我们从四个不同的角度来看待这个问题：逻辑合理性、概率评估、辩证审查和残余风险。

引用次数: 0