Shuqing Li, Binchang Li, Yepang Liu, Cuiyun Gao, Jianping Zhang, Shing-Chi Cheung, Michael R. Lyu
In recent years, Virtual Reality (VR) has emerged as a transformative technology, offering users immersive and interactive experiences across diversified virtual environments. Users can interact with VR apps through interactable GUI elements (IGEs) on the stereoscopic three-dimensional (3D) graphical user interface (GUI). The accurate recognition of these IGEs is instrumental, serving as the foundation of many software engineering tasks, including automated testing and effective GUI search. The most recent IGE detection approaches for 2D mobile apps typically train a supervised object detection model based on a large-scale manually-labeled GUI dataset, usually with a pre-defined set of clickable GUI element categories like buttons and spinners. Such approaches can hardly be applied to IGE detection in VR apps, due to a multitude of challenges including complexities posed by open-vocabulary and heterogeneous IGE categories, intricacies of context-sensitive interactability, and the necessities of precise spatial perception and visual-semantic alignment for accurate IGE detection results. Thus, it is necessary to embark on the IGE research tailored to VR apps. In this paper, we propose the first zero-shot cOntext-sensitive inteRactable GUI ElemeNT dEtection framework for virtual Reality apps, named Orienter. By imitating human behaviors, Orienter observes and understands the semantic contexts of VR app scenes first, before performing the detection. The detection process is iterated within a feedback-directed validation and reflection loop. Specifically, Orienter contains three components, including (1) Semantic context comprehension, (2) Reflection-directed IGE candidate detection, and (3) Context-sensitive interactability classification. Extensive experiments on the dataset demonstrate that Orienter is more effective than the state-of-the-art GUI element detection approaches.
{"title":"Context-Dependent Interactable Graphical User Interface Element Detection for VR Applications","authors":"Shuqing Li, Binchang Li, Yepang Liu, Cuiyun Gao, Jianping Zhang, Shing-Chi Cheung, Michael R. Lyu","doi":"arxiv-2409.10811","DOIUrl":"https://doi.org/arxiv-2409.10811","url":null,"abstract":"In recent years, Virtual Reality (VR) has emerged as a transformative\u0000technology, offering users immersive and interactive experiences across\u0000diversified virtual environments. Users can interact with VR apps through\u0000interactable GUI elements (IGEs) on the stereoscopic three-dimensional (3D)\u0000graphical user interface (GUI). The accurate recognition of these IGEs is\u0000instrumental, serving as the foundation of many software engineering tasks,\u0000including automated testing and effective GUI search. The most recent IGE\u0000detection approaches for 2D mobile apps typically train a supervised object\u0000detection model based on a large-scale manually-labeled GUI dataset, usually\u0000with a pre-defined set of clickable GUI element categories like buttons and\u0000spinners. Such approaches can hardly be applied to IGE detection in VR apps,\u0000due to a multitude of challenges including complexities posed by\u0000open-vocabulary and heterogeneous IGE categories, intricacies of\u0000context-sensitive interactability, and the necessities of precise spatial\u0000perception and visual-semantic alignment for accurate IGE detection results.\u0000Thus, it is necessary to embark on the IGE research tailored to VR apps. In\u0000this paper, we propose the first zero-shot cOntext-sensitive inteRactable GUI\u0000ElemeNT dEtection framework for virtual Reality apps, named Orienter. By\u0000imitating human behaviors, Orienter observes and understands the semantic\u0000contexts of VR app scenes first, before performing the detection. The detection\u0000process is iterated within a feedback-directed validation and reflection loop.\u0000Specifically, Orienter contains three components, including (1) Semantic\u0000context comprehension, (2) Reflection-directed IGE candidate detection, and (3)\u0000Context-sensitive interactability classification. Extensive experiments on the\u0000dataset demonstrate that Orienter is more effective than the state-of-the-art\u0000GUI element detection approaches.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hong Yi Lin, Patanamon Thongtanunam, Christoph Treude, Michael W. Godfrey, Chunhua Liu, Wachiraphan Charoenwet
Modern code review is a ubiquitous software quality assurance process aimed at identifying potential issues within newly written code. Despite its effectiveness, the process demands large amounts of effort from the human reviewers involved. To help alleviate this workload, researchers have trained deep learning models to imitate human reviewers in providing natural language code reviews. Formally, this task is known as code review comment generation. Prior work has demonstrated improvements in this task by leveraging machine learning techniques and neural models, such as transfer learning and the transformer architecture. However, the quality of the model generated reviews remain sub-optimal due to the quality of the open-source code review data used in model training. This is in part due to the data obtained from open-source projects where code reviews are conducted in a public forum, and reviewers possess varying levels of software development experience, potentially affecting the quality of their feedback. To accommodate for this variation, we propose a suite of experience-aware training methods that utilise the reviewers' past authoring and reviewing experiences as signals for review quality. Specifically, we propose experience-aware loss functions (ELF), which use the reviewers' authoring and reviewing ownership of a project as weights in the model's loss function. Through this method, experienced reviewers' code reviews yield larger influence over the model's behaviour. Compared to the SOTA model, ELF was able to generate higher quality reviews in terms of accuracy, informativeness, and comment types generated. The key contribution of this work is the demonstration of how traditional software engineering concepts such as reviewer experience can be integrated into the design of AI-based automated code review models.
现代代码审查是一种无处不在的软件质量保证流程,旨在识别新编写代码中的潜在问题。尽管效果显著,但这一过程要求相关的人工审核人员付出大量精力。为了帮助减轻这一工作量,研究人员训练了深度学习模型来模仿人类审查员提供自然语言代码审查。先前的工作已经证明,利用机器学习技术和神经模型(如迁移学习和变换器架构)可以改进这项任务。然而,由于模型训练中使用的开源代码评论数据的质量问题,模型生成的评论质量仍未达到最佳。这部分是由于从开源项目中获取的数据是在公共论坛上进行代码审查的,而审查者拥有不同程度的软件开发经验,这可能会影响他们的反馈质量。为了适应这种差异,我们提出了一套经验感知训练方法,利用审阅者过去的编写和审阅经验作为审阅质量的信号。具体来说,我们提出了经验感知损失函数(ELF),它将审稿人在项目中的创作和审稿所有权作为模型损失函数的权重。通过这种方法,经验丰富的审稿人的代码评审对模型行为的影响更大。与 SOTA 模型相比,ELF 能够在准确性、信息量和评论类型方面生成更高质量的评论。这项工作的主要贡献在于展示了如何将审阅者经验等传统软件工程概念集成到基于人工智能的自动代码审阅模型的设计中。
{"title":"Leveraging Reviewer Experience in Code Review Comment Generation","authors":"Hong Yi Lin, Patanamon Thongtanunam, Christoph Treude, Michael W. Godfrey, Chunhua Liu, Wachiraphan Charoenwet","doi":"arxiv-2409.10959","DOIUrl":"https://doi.org/arxiv-2409.10959","url":null,"abstract":"Modern code review is a ubiquitous software quality assurance process aimed\u0000at identifying potential issues within newly written code. Despite its\u0000effectiveness, the process demands large amounts of effort from the human\u0000reviewers involved. To help alleviate this workload, researchers have trained\u0000deep learning models to imitate human reviewers in providing natural language\u0000code reviews. Formally, this task is known as code review comment generation.\u0000Prior work has demonstrated improvements in this task by leveraging machine\u0000learning techniques and neural models, such as transfer learning and the\u0000transformer architecture. However, the quality of the model generated reviews\u0000remain sub-optimal due to the quality of the open-source code review data used\u0000in model training. This is in part due to the data obtained from open-source\u0000projects where code reviews are conducted in a public forum, and reviewers\u0000possess varying levels of software development experience, potentially\u0000affecting the quality of their feedback. To accommodate for this variation, we\u0000propose a suite of experience-aware training methods that utilise the\u0000reviewers' past authoring and reviewing experiences as signals for review\u0000quality. Specifically, we propose experience-aware loss functions (ELF), which\u0000use the reviewers' authoring and reviewing ownership of a project as weights in\u0000the model's loss function. Through this method, experienced reviewers' code\u0000reviews yield larger influence over the model's behaviour. Compared to the SOTA\u0000model, ELF was able to generate higher quality reviews in terms of accuracy,\u0000informativeness, and comment types generated. The key contribution of this work\u0000is the demonstration of how traditional software engineering concepts such as\u0000reviewer experience can be integrated into the design of AI-based automated\u0000code review models.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Code optimization is a crucial task aimed at enhancing code performance. However, this process is often tedious and complex, highlighting the necessity for automatic code optimization techniques. Reinforcement Learning (RL), a machine learning technique, has emerged as a promising approach for tackling such complex optimization problems. In this project, we introduce the first RL environment for the MLIR compiler, dedicated to facilitating MLIR compiler research, and enabling automatic code optimization using Multi-Action Reinforcement Learning. We also propose a novel formulation of the action space as a Cartesian product of simpler action subspaces, enabling more efficient and effective optimizations. Experimental results demonstrate that our proposed environment allows for an effective optimization of MLIR operations, and yields comparable performance to TensorFlow, surpassing it in multiple cases, highlighting the potential of RL-based optimization in compiler frameworks.
{"title":"A Reinforcement Learning Environment for Automatic Code Optimization in the MLIR Compiler","authors":"Nazim Bendib, Iheb Nassim Aouadj, Riyadh Baghdadi","doi":"arxiv-2409.11068","DOIUrl":"https://doi.org/arxiv-2409.11068","url":null,"abstract":"Code optimization is a crucial task aimed at enhancing code performance.\u0000However, this process is often tedious and complex, highlighting the necessity\u0000for automatic code optimization techniques. Reinforcement Learning (RL), a\u0000machine learning technique, has emerged as a promising approach for tackling\u0000such complex optimization problems. In this project, we introduce the first RL\u0000environment for the MLIR compiler, dedicated to facilitating MLIR compiler\u0000research, and enabling automatic code optimization using Multi-Action\u0000Reinforcement Learning. We also propose a novel formulation of the action space\u0000as a Cartesian product of simpler action subspaces, enabling more efficient and\u0000effective optimizations. Experimental results demonstrate that our proposed\u0000environment allows for an effective optimization of MLIR operations, and yields\u0000comparable performance to TensorFlow, surpassing it in multiple cases,\u0000highlighting the potential of RL-based optimization in compiler frameworks.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The integration of tools in LLM-based agents overcame the difficulties of standalone LLMs and traditional agents' limited capabilities. However, the conjunction of these technologies and the proposed enhancements in several state-of-the-art works followed a non-unified software architecture resulting in a lack of modularity. Indeed, they focused mainly on functionalities and overlooked the definition of the component's boundaries within the agent. This caused terminological and architectural ambiguities between researchers which we addressed in this paper by proposing a unified framework that establishes a clear foundation for LLM-based agents' development from both functional and software architectural perspectives. Our framework, LLM-Agent-UMF (LLM-based Agent Unified Modeling Framework), clearly distinguishes between the different components of an agent, setting LLMs, and tools apart from a newly introduced element: the core-agent, playing the role of the central coordinator of the agent which comprises five modules: planning, memory, profile, action, and security, the latter often neglected in previous works. Differences in the internal structure of core-agents led us to classify them into a taxonomy of passive and active types. Based on this, we proposed different multi-core agent architectures combining unique characteristics of various individual agents. For evaluation purposes, we applied this framework to a selection of state-of-the-art agents, thereby demonstrating its alignment with their functionalities and clarifying the overlooked architectural aspects. Moreover, we thoroughly assessed four of our proposed architectures by integrating distinctive agents into hybrid active/passive core-agents' systems. This analysis provided clear insights into potential improvements and highlighted the challenges involved in the combination of specific agents.
{"title":"LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents","authors":"Amine B. Hassouna, Hana Chaari, Ines Belhaj","doi":"arxiv-2409.11393","DOIUrl":"https://doi.org/arxiv-2409.11393","url":null,"abstract":"The integration of tools in LLM-based agents overcame the difficulties of\u0000standalone LLMs and traditional agents' limited capabilities. However, the\u0000conjunction of these technologies and the proposed enhancements in several\u0000state-of-the-art works followed a non-unified software architecture resulting\u0000in a lack of modularity. Indeed, they focused mainly on functionalities and\u0000overlooked the definition of the component's boundaries within the agent. This\u0000caused terminological and architectural ambiguities between researchers which\u0000we addressed in this paper by proposing a unified framework that establishes a\u0000clear foundation for LLM-based agents' development from both functional and\u0000software architectural perspectives. Our framework, LLM-Agent-UMF (LLM-based Agent Unified Modeling Framework),\u0000clearly distinguishes between the different components of an agent, setting\u0000LLMs, and tools apart from a newly introduced element: the core-agent, playing\u0000the role of the central coordinator of the agent which comprises five modules:\u0000planning, memory, profile, action, and security, the latter often neglected in\u0000previous works. Differences in the internal structure of core-agents led us to\u0000classify them into a taxonomy of passive and active types. Based on this, we\u0000proposed different multi-core agent architectures combining unique\u0000characteristics of various individual agents. For evaluation purposes, we applied this framework to a selection of\u0000state-of-the-art agents, thereby demonstrating its alignment with their\u0000functionalities and clarifying the overlooked architectural aspects. Moreover,\u0000we thoroughly assessed four of our proposed architectures by integrating\u0000distinctive agents into hybrid active/passive core-agents' systems. This\u0000analysis provided clear insights into potential improvements and highlighted\u0000the challenges involved in the combination of specific agents.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present SuperCoder2.0, an advanced autonomous system designed to enhance software development through artificial intelligence. The system combines an AI-native development approach with intelligent agents to enable fully autonomous coding. Key focus areas include a retry mechanism with error output traceback, comprehensive code rewriting and replacement using Abstract Syntax Tree (ast) parsing to minimize linting issues, code embedding technique for retrieval-augmented generation, and a focus on localizing methods for problem-solving rather than identifying specific line numbers. The methodology employs a three-step hierarchical search space reduction approach for code base navigation and bug localization:utilizing Retrieval Augmented Generation (RAG) and a Repository File Level Map to identify candidate files, (2) narrowing down to the most relevant files using a File Level Schematic Map, and (3) extracting 'relevant locations' within these files. Code editing is performed through a two-part module comprising CodeGeneration and CodeEditing, which generates multiple solutions at different temperature values and replaces entire methods or classes to maintain code integrity. A feedback loop executes repository-level test cases to validate and refine solutions. Experiments conducted on the SWE-bench Lite dataset demonstrate SuperCoder2.0's effectiveness, achieving correct file localization in 84.33% of cases within the top 5 candidates and successfully resolving 34% of test instances. This performance places SuperCoder2.0 fourth globally on the SWE-bench leaderboard. The system's ability to handle diverse repositories and problem types highlights its potential as a versatile tool for autonomous software development. Future work will focus on refining the code editing process and exploring advanced embedding models for improved natural language to code mapping.
{"title":"SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer","authors":"Anmol Gautam, Kishore Kumar, Adarsh Jha, Mukunda NS, Ishaan Bhola","doi":"arxiv-2409.11190","DOIUrl":"https://doi.org/arxiv-2409.11190","url":null,"abstract":"We present SuperCoder2.0, an advanced autonomous system designed to enhance\u0000software development through artificial intelligence. The system combines an\u0000AI-native development approach with intelligent agents to enable fully\u0000autonomous coding. Key focus areas include a retry mechanism with error output\u0000traceback, comprehensive code rewriting and replacement using Abstract Syntax\u0000Tree (ast) parsing to minimize linting issues, code embedding technique for\u0000retrieval-augmented generation, and a focus on localizing methods for\u0000problem-solving rather than identifying specific line numbers. The methodology\u0000employs a three-step hierarchical search space reduction approach for code base\u0000navigation and bug localization:utilizing Retrieval Augmented Generation (RAG)\u0000and a Repository File Level Map to identify candidate files, (2) narrowing down\u0000to the most relevant files using a File Level Schematic Map, and (3) extracting\u0000'relevant locations' within these files. Code editing is performed through a\u0000two-part module comprising CodeGeneration and CodeEditing, which generates\u0000multiple solutions at different temperature values and replaces entire methods\u0000or classes to maintain code integrity. A feedback loop executes\u0000repository-level test cases to validate and refine solutions. Experiments\u0000conducted on the SWE-bench Lite dataset demonstrate SuperCoder2.0's\u0000effectiveness, achieving correct file localization in 84.33% of cases within\u0000the top 5 candidates and successfully resolving 34% of test instances. This\u0000performance places SuperCoder2.0 fourth globally on the SWE-bench leaderboard.\u0000The system's ability to handle diverse repositories and problem types\u0000highlights its potential as a versatile tool for autonomous software\u0000development. Future work will focus on refining the code editing process and\u0000exploring advanced embedding models for improved natural language to code\u0000mapping.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Henrik Kirchmann, Stephan A. Fahrenkrog-Petersen, Felix Mannhardt, Matthias Weidlich
Process models may be automatically generated from event logs that contain as-is data of a business process. While such models generalize over the control-flow of specific, recorded process executions, they are often also annotated with behavioural statistics, such as execution frequencies.Based thereon, once a model is published, certain insights about the original process executions may be reconstructed, so that an external party may extract confidential information about the business process. This work is the first to empirically investigate such reconstruction attempts based on process models. To this end, we propose different play-out strategies that reconstruct the control-flow from process trees, potentially exploiting frequency annotations. To assess the potential success of such reconstruction attacks on process models, and hence the risks imposed by publishing them, we compare the reconstructed process executions with those of the original log for several real-world datasets.
{"title":"Control-flow Reconstruction Attacks on Business Process Models","authors":"Henrik Kirchmann, Stephan A. Fahrenkrog-Petersen, Felix Mannhardt, Matthias Weidlich","doi":"arxiv-2409.10986","DOIUrl":"https://doi.org/arxiv-2409.10986","url":null,"abstract":"Process models may be automatically generated from event logs that contain\u0000as-is data of a business process. While such models generalize over the\u0000control-flow of specific, recorded process executions, they are often also\u0000annotated with behavioural statistics, such as execution frequencies.Based\u0000thereon, once a model is published, certain insights about the original process\u0000executions may be reconstructed, so that an external party may extract\u0000confidential information about the business process. This work is the first to\u0000empirically investigate such reconstruction attempts based on process models.\u0000To this end, we propose different play-out strategies that reconstruct the\u0000control-flow from process trees, potentially exploiting frequency annotations.\u0000To assess the potential success of such reconstruction attacks on process\u0000models, and hence the risks imposed by publishing them, we compare the\u0000reconstructed process executions with those of the original log for several\u0000real-world datasets.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software logs, generated during the runtime of software systems, are essential for various development and analysis activities, such as anomaly detection and failure diagnosis. However, the presence of sensitive information in these logs poses significant privacy concerns, particularly regarding Personally Identifiable Information (PII) and quasi-identifiers that could lead to re-identification risks. While general data privacy has been extensively studied, the specific domain of privacy in software logs remains underexplored, with inconsistent definitions of sensitivity and a lack of standardized guidelines for anonymization. To mitigate this gap, this study offers a comprehensive analysis of privacy in software logs from multiple perspectives. We start by performing an analysis of 25 publicly available log datasets to identify potentially sensitive attributes. Based on the result of this step, we focus on three perspectives: privacy regulations, research literature, and industry practices. We first analyze key data privacy regulations, such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), to understand the legal requirements concerning sensitive information in logs. Second, we conduct a systematic literature review to identify common privacy attributes and practices in log anonymization, revealing gaps in existing approaches. Finally, we survey 45 industry professionals to capture practical insights on log anonymization practices. Our findings shed light on various perspectives of log privacy and reveal industry challenges, such as technical and efficiency issues while highlighting the need for standardized guidelines. By combining insights from regulatory, academic, and industry perspectives, our study aims to provide a clearer framework for identifying and protecting sensitive information in software logs.
{"title":"An Empirical Study of Sensitive Information in Logs","authors":"Roozbeh Aghili, Heng Li, Foutse Khomh","doi":"arxiv-2409.11313","DOIUrl":"https://doi.org/arxiv-2409.11313","url":null,"abstract":"Software logs, generated during the runtime of software systems, are\u0000essential for various development and analysis activities, such as anomaly\u0000detection and failure diagnosis. However, the presence of sensitive information\u0000in these logs poses significant privacy concerns, particularly regarding\u0000Personally Identifiable Information (PII) and quasi-identifiers that could lead\u0000to re-identification risks. While general data privacy has been extensively\u0000studied, the specific domain of privacy in software logs remains underexplored,\u0000with inconsistent definitions of sensitivity and a lack of standardized\u0000guidelines for anonymization. To mitigate this gap, this study offers a\u0000comprehensive analysis of privacy in software logs from multiple perspectives.\u0000We start by performing an analysis of 25 publicly available log datasets to\u0000identify potentially sensitive attributes. Based on the result of this step, we\u0000focus on three perspectives: privacy regulations, research literature, and\u0000industry practices. We first analyze key data privacy regulations, such as the\u0000General Data Protection Regulation (GDPR) and the California Consumer Privacy\u0000Act (CCPA), to understand the legal requirements concerning sensitive\u0000information in logs. Second, we conduct a systematic literature review to\u0000identify common privacy attributes and practices in log anonymization,\u0000revealing gaps in existing approaches. Finally, we survey 45 industry\u0000professionals to capture practical insights on log anonymization practices. Our\u0000findings shed light on various perspectives of log privacy and reveal industry\u0000challenges, such as technical and efficiency issues while highlighting the need\u0000for standardized guidelines. By combining insights from regulatory, academic,\u0000and industry perspectives, our study aims to provide a clearer framework for\u0000identifying and protecting sensitive information in software logs.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large Language Models (LLMs) have shown promise in tasks like code translation, prompting interest in their potential for automating software vulnerability detection (SVD) and patching (SVP). To further research in this area, establishing a benchmark is essential for evaluating the strengths and limitations of LLMs in these tasks. Despite their capabilities, questions remain regarding whether LLMs can accurately analyze complex vulnerabilities and generate appropriate patches. This paper introduces VulnLLMEval, a framework designed to assess the performance of LLMs in identifying and patching vulnerabilities in C code. Our study includes 307 real-world vulnerabilities extracted from the Linux kernel, creating a well-curated dataset that includes both vulnerable and patched code. This dataset, based on real-world code, provides a diverse and representative testbed for evaluating LLM performance in SVD and SVP tasks, offering a robust foundation for rigorous assessment. Our results reveal that LLMs often struggle with distinguishing between vulnerable and patched code. Furthermore, in SVP tasks, these models tend to oversimplify the code, producing solutions that may not be directly usable without further refinement.
{"title":"VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching","authors":"Arastoo Zibaeirad, Marco Vieira","doi":"arxiv-2409.10756","DOIUrl":"https://doi.org/arxiv-2409.10756","url":null,"abstract":"Large Language Models (LLMs) have shown promise in tasks like code\u0000translation, prompting interest in their potential for automating software\u0000vulnerability detection (SVD) and patching (SVP). To further research in this\u0000area, establishing a benchmark is essential for evaluating the strengths and\u0000limitations of LLMs in these tasks. Despite their capabilities, questions\u0000remain regarding whether LLMs can accurately analyze complex vulnerabilities\u0000and generate appropriate patches. This paper introduces VulnLLMEval, a\u0000framework designed to assess the performance of LLMs in identifying and\u0000patching vulnerabilities in C code. Our study includes 307 real-world\u0000vulnerabilities extracted from the Linux kernel, creating a well-curated\u0000dataset that includes both vulnerable and patched code. This dataset, based on\u0000real-world code, provides a diverse and representative testbed for evaluating\u0000LLM performance in SVD and SVP tasks, offering a robust foundation for rigorous\u0000assessment. Our results reveal that LLMs often struggle with distinguishing\u0000between vulnerable and patched code. Furthermore, in SVP tasks, these models\u0000tend to oversimplify the code, producing solutions that may not be directly\u0000usable without further refinement.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The growing trend of vulnerability issues in software development as a result of a large dependence on open-source projects has received considerable attention recently. This paper investigates the effectiveness of Large Language Models (LLMs) in identifying vulnerabilities within codebases, with a focus on the latest advancements in LLM technology. Through a comparative analysis, we assess the performance of emerging LLMs, specifically Llama, CodeLlama, Gemma, and CodeGemma, alongside established state-of-the-art models such as BERT, RoBERTa, and GPT-3. Our study aims to shed light on the capabilities of LLMs in vulnerability detection, contributing to the enhancement of software security practices across diverse open-source repositories. We observe that CodeGemma achieves the highest F1-score of 58 and a Recall of 87, amongst the recent additions of large language models to detect software security vulnerabilities.
{"title":"Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models","authors":"Shaznin Sultana, Sadia Afreen, Nasir U. Eisty","doi":"arxiv-2409.10490","DOIUrl":"https://doi.org/arxiv-2409.10490","url":null,"abstract":"The growing trend of vulnerability issues in software development as a result\u0000of a large dependence on open-source projects has received considerable\u0000attention recently. This paper investigates the effectiveness of Large Language\u0000Models (LLMs) in identifying vulnerabilities within codebases, with a focus on\u0000the latest advancements in LLM technology. Through a comparative analysis, we\u0000assess the performance of emerging LLMs, specifically Llama, CodeLlama, Gemma,\u0000and CodeGemma, alongside established state-of-the-art models such as BERT,\u0000RoBERTa, and GPT-3. Our study aims to shed light on the capabilities of LLMs in\u0000vulnerability detection, contributing to the enhancement of software security\u0000practices across diverse open-source repositories. We observe that CodeGemma\u0000achieves the highest F1-score of 58 and a Recall of 87, amongst the recent\u0000additions of large language models to detect software security vulnerabilities.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An assurance case should provide justifiable confidence in the truth of a claim about some critical property of a system or procedure, such as safety or security. We consider how confidence can be assessed in the rigorous approach we call Assurance 2.0. Our goal is indefeasible confidence and we approach it from four different perspectives: logical soundness, probabilistic assessment, dialectical examination, and residual risks.
{"title":"Confidence in Assurance 2.0 Cases","authors":"Robin Bloomfield, John Rushby","doi":"arxiv-2409.10665","DOIUrl":"https://doi.org/arxiv-2409.10665","url":null,"abstract":"An assurance case should provide justifiable confidence in the truth of a\u0000claim about some critical property of a system or procedure, such as safety or\u0000security. We consider how confidence can be assessed in the rigorous approach\u0000we call Assurance 2.0. Our goal is indefeasible confidence and we approach it from four different\u0000perspectives: logical soundness, probabilistic assessment, dialectical\u0000examination, and residual risks.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}