Journal of Systems and Software最新文献_第7页

MicroIRC: Instance-level Root Cause Localization for Microservice Systems MicroIRC：微服务系统的实例级根源定位

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software

Pub Date : 2024-06-22 DOI: 10.1016/j.jss.2024.112145

Yuhan Zhu , Jian Wang , Bing Li , Yuqi Zhao , Zekun Zhang , Yiming Xiong , Shiping Chen

The use of microservice architecture is gaining popularity in the development of web applications. However, identifying the root cause of a failure can be challenging due to the complexity of interconnected microservices, long service invocation links, dynamic changes in service states, and the abundance of service deployment nodes. Furthermore, as each microservice may have multiple instances, it can be difficult to identify instance-level failures promptly and effectively when the microservice topology and failure types change dynamically. To address this issue, we propose MicroIRC (Instance-level Root Cause Localization for Microservice Systems), a novel metrics-based approach that localizes root causes at the instance level while exhibiting robustness to adapt to dynamic changes in topology and new types of anomalies. We begin by training a graph neural network to fit different root cause types based on extracted time series features of microservice system metrics. Next, we construct a heterogeneous weighted topology (HWT) of microservice systems and execute a personalized random walk to identify root cause candidates. These candidates, along with real-time metrics from the anomalous time window, are then fed into the trained graph neural network to generate a ranked root cause list. Experiments conducted on five real-world datasets demonstrate that MicroIRC can accurately locate the root cause of microservices at the instance level, achieving a precision rate of 93.1% for the top five results. Furthermore, compared to the state-of-the-art methods, MicroIRC can improve the accuracy of root cause localization by more than 17% at the service level and more than 11.5% at the instance level. Remarkably, it exhibits robustness in scenarios involving new failure types, achieving an accuracy of 84.2% for the top result amid dynamic topological changes.

在网络应用程序开发中，微服务架构的使用越来越普及。然而，由于互联微服务的复杂性、较长的服务调用链路、服务状态的动态变化以及大量的服务部署节点，识别故障的根本原因可能具有挑战性。此外，由于每个微服务可能有多个实例，当微服务拓扑和故障类型动态变化时，很难及时有效地识别实例级故障。为了解决这个问题，我们提出了 MicroIRC（微服务系统实例级故障根源定位），这是一种基于度量的新方法，可定位实例级故障根源，同时表现出适应拓扑动态变化和新异常类型的鲁棒性。首先，我们根据提取的微服务系统指标时间序列特征训练图神经网络，以适应不同的根本原因类型。接下来，我们构建微服务系统的异构加权拓扑（HWT），并执行个性化随机漫步来识别根源候选。然后，将这些候选者以及异常时间窗口中的实时指标输入训练有素的图神经网络，生成排序的根本原因列表。在五个实际数据集上进行的实验表明，MicroIRC 可以在实例级别准确定位微服务的根本原因，前五项结果的精确率达到 93.1%。此外，与最先进的方法相比，MicroIRC 在服务级的根本原因定位精度提高了 17% 以上，在实例级的根本原因定位精度提高了 11.5% 以上。值得注意的是，它在涉及新故障类型的情况下表现出了鲁棒性，在动态拓扑变化中，最高结果的准确率达到了 84.2%。

{"title":"MicroIRC: Instance-level Root Cause Localization for Microservice Systems","authors":"Yuhan Zhu , Jian Wang , Bing Li , Yuqi Zhao , Zekun Zhang , Yiming Xiong , Shiping Chen","doi":"10.1016/j.jss.2024.112145","DOIUrl":"https://doi.org/10.1016/j.jss.2024.112145","url":null,"abstract":"<div><p>The use of microservice architecture is gaining popularity in the development of web applications. However, identifying the root cause of a failure can be challenging due to the complexity of interconnected microservices, long service invocation links, dynamic changes in service states, and the abundance of service deployment nodes. Furthermore, as each microservice may have multiple instances, it can be difficult to identify instance-level failures promptly and effectively when the microservice topology and failure types change dynamically. To address this issue, we propose MicroIRC (Instance-level Root Cause Localization for Microservice Systems), a novel metrics-based approach that localizes root causes at the instance level while exhibiting robustness to adapt to dynamic changes in topology and new types of anomalies. We begin by training a graph neural network to fit different root cause types based on extracted time series features of microservice system metrics. Next, we construct a heterogeneous weighted topology (HWT) of microservice systems and execute a personalized random walk to identify root cause candidates. These candidates, along with real-time metrics from the anomalous time window, are then fed into the trained graph neural network to generate a ranked root cause list. Experiments conducted on five real-world datasets demonstrate that MicroIRC can accurately locate the root cause of microservices at the instance level, achieving a precision rate of 93.1% for the top five results. Furthermore, compared to the state-of-the-art methods, MicroIRC can improve the accuracy of root cause localization by more than 17% at the service level and more than 11.5% at the instance level. Remarkably, it exhibits robustness in scenarios involving new failure types, achieving an accuracy of 84.2% for the top result amid dynamic topological changes.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141481943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Runtime verification on abstract finite state models 抽象有限状态模型的运行时验证

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software

Pub Date : 2024-06-22 DOI: 10.1016/j.jss.2024.112138

K.P. Jevitha , Bharat Jayaraman , M. Sethumadhavan

Finite-state models are ubiquitous in the study of concurrent systems, especially controllers and servers that operate in a repetitive cycle. In this paper, we show how to extract finite state models from a run of a multi-threaded Java program and carry out runtime verification of correctness properties. These properties include data-oriented and control-oriented properties; the former express correctness conditions over the data fields of objects, while the latter are concerned with the correct flow of control among the modules of larger software. As the extracted models can become very large for long runs, the focus of this paper is on constructing reduced models with user-defined abstraction functions that map a larger domain space to a smaller one. The abstraction functions should be chosen so that the resulting model is property preserving, i.e., proving a property on the abstract model carries over to the concrete model. The main contribution of this paper is in showing how runtime verification can be made efficient through online property checking on property-preserving abstract models. The property specification language resembles a propositional linear temporal logic augmented with simple datatypes and operators. Classic concurrency examples and larger case studies (Multi-rotor Drone Controller, OAuth Protocol) are presented in order to demonstrate the usefulness of our proposed techniques, which are incorporated in an Eclipse plug-in for runtime visualization and verification of Java programs.

有限状态模型在并发系统研究中无处不在，尤其是在重复循环运行的控制器和服务器中。本文展示了如何从多线程 Java 程序的运行中提取有限状态模型，并对其正确性属性进行运行时验证。这些属性包括面向数据的属性和面向控制的属性；前者表达了对象数据字段的正确性条件，后者则关注大型软件模块间控制流的正确性。由于提取的模型在长时间运行时会变得非常庞大，本文的重点是利用用户定义的抽象函数构建缩小模型，将较大的领域空间映射到较小的领域空间。抽象函数的选择应使生成的模型具有属性保护性，也就是说，在抽象模型上证明一个属性可以延续到具体模型上。本文的主要贡献在于展示了如何通过对属性保持抽象模型进行在线属性检查来提高运行时验证的效率。属性规范语言类似于命题线性时态逻辑，并添加了简单的数据类型和运算符。本文介绍了经典并发示例和大型案例研究（多旋翼无人机控制器、OAuth 协议），以展示我们提出的技术的实用性，这些技术已被纳入用于 Java 程序运行时可视化和验证的 Eclipse 插件中。

{"title":"Runtime verification on abstract finite state models","authors":"K.P. Jevitha , Bharat Jayaraman , M. Sethumadhavan","doi":"10.1016/j.jss.2024.112138","DOIUrl":"https://doi.org/10.1016/j.jss.2024.112138","url":null,"abstract":"<div><p>Finite-state models are ubiquitous in the study of concurrent systems, especially controllers and servers that operate in a repetitive cycle. In this paper, we show how to extract finite state models from a run of a multi-threaded Java program and carry out runtime verification of correctness properties. These properties include data-oriented and control-oriented properties; the former express correctness conditions over the data fields of objects, while the latter are concerned with the correct flow of control among the modules of larger software. As the extracted models can become very large for long runs, the focus of this paper is on constructing reduced models with user-defined abstraction functions that map a larger domain space to a smaller one. The abstraction functions should be chosen so that the resulting model is property preserving, i.e., proving a property on the abstract model carries over to the concrete model. The main contribution of this paper is in showing how runtime verification can be made efficient through online property checking on property-preserving abstract models. The property specification language resembles a propositional linear temporal logic augmented with simple datatypes and operators. Classic concurrency examples and larger case studies (Multi-rotor Drone Controller, OAuth Protocol) are presented in order to demonstrate the usefulness of our proposed techniques, which are incorporated in an Eclipse plug-in for runtime visualization and verification of Java programs.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141541800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SynthoMinds: Bridging human programming intuition with retrieval, analogy, and reasoning in program synthesis SynthoMinds：将人类编程直觉与程序合成中的检索、类比和推理联系起来

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software

Pub Date : 2024-06-21 DOI: 10.1016/j.jss.2024.112140

Qianwen Gou , Yunwei Dong , Qiao Ke

Program synthesis revolutionizes software development by automatically generating executable programs based on given specifications. An emerging trend is to augment generative models with external memory before generating programs. Better memory, in general, leads to better results. However, existing models tend to devolve into a copy mechanism, where retrieved memories are copied directly into the generative model, leading to misinformation or confusion. A sharp performance decline is caused when the retrieved memories are irrelevant or incorrect.

Inspired by the human programming process—sketching a solution before programming, we propose SynthoMinds. A novel framework that decomposes program synthesis tasks into retrieval, analogy, and reasoning, enabling the generation of programs by leveraging knowledge learned from previously solved solutions. Specifically, given a natural language (NL) description, SynthoMinds first retrieves similar programs via a retrieval module, and then mines the retrieved memories for some insightful revelations via an analogy module. The revelation acts as a bird’s-eye view of a program without delving into implementation details. The reasoning module harnesses the power of insightful revelations and NL to generate programs. Experimental results demonstrate that mining revelations from retrieved memories significantly outperforms existing baselines.

程序综合可根据给定的规格自动生成可执行程序，从而彻底改变软件开发。一种新兴的趋势是在生成程序之前使用外部内存来增强生成模型。一般来说，更好的内存会带来更好的结果。然而，现有的模型往往会演变成一种复制机制，将检索到的内存直接复制到生成模型中，从而导致信息错误或混乱。受人类编程过程的启发，我们提出了 SynthoMinds。我们提出的 SynthoMinds 是一个新颖的框架，可将程序合成任务分解为检索、类比和推理，从而利用从以前的解决方案中学到的知识生成程序。具体来说，给定一个自然语言（NL）描述，SynthoMinds 首先通过检索模块检索类似的程序，然后通过类比模块挖掘检索到的记忆中一些有洞察力的启示。这些启示可作为程序的鸟瞰图，而无需深入研究实现细节。推理模块利用有洞察力的启示和 NL 生成程序。实验结果表明，从检索记忆中挖掘启示的效果明显优于现有基线。

{"title":"SynthoMinds: Bridging human programming intuition with retrieval, analogy, and reasoning in program synthesis","authors":"Qianwen Gou , Yunwei Dong , Qiao Ke","doi":"10.1016/j.jss.2024.112140","DOIUrl":"https://doi.org/10.1016/j.jss.2024.112140","url":null,"abstract":"<div><p>Program synthesis revolutionizes software development by automatically generating executable programs based on given specifications. An emerging trend is to augment generative models with external memory before generating programs. Better memory, in general, leads to better results. However, existing models tend to devolve into a copy mechanism, where retrieved memories are copied directly into the generative model, leading to misinformation or confusion. A sharp performance decline is caused when the retrieved memories are irrelevant or incorrect.</p><p>Inspired by the human programming process—sketching a solution before programming, we propose SynthoMinds. A novel framework that decomposes program synthesis tasks into retrieval, analogy, and reasoning, enabling the generation of programs by leveraging knowledge learned from previously solved solutions. Specifically, given a natural language (NL) description, SynthoMinds first retrieves similar programs via a retrieval module, and then mines the retrieved memories for some insightful revelations via an analogy module. The revelation acts as a bird’s-eye view of a program without delving into implementation details. The reasoning module harnesses the power of insightful revelations and NL to generate programs. Experimental results demonstrate that mining revelations from retrieved memories significantly outperforms existing baselines.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141481941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

End-to-end log statement generation at block-level 在块级生成端到端日志语句

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software

Pub Date : 2024-06-21 DOI: 10.1016/j.jss.2024.112146

Ying Fu , Meng Yan , Pinjia He , Chao Liu , Xiaohong Zhang , Dan Yang

Logging is crucial in software development for addressing runtime issues but can pose challenges. Logging encompasses four essential sub-tasks: whether to log (Whether), where to log (Position), which log level (Level), and what information to log (Message). While existing approaches have performed well, they suffer from two limitations. Firstly, they address only a subset of the logging sub-tasks. Secondly, most of them focus on generating single log statements at class or method level, potentially overlooking multiple log statements within those scopes.

To address these issues, we propose ELogger, which enables end-to-end log statement generation at block-level. Furthermore, ELogger implements block-level log generation, enabling it to handle multiple log statements within different code blocks of a method. Evaluation results indicate that ELogger correctly predicts all four sub-tasks in 19.55% of cases. Compared to the baselines that combined existing approaches for end-to-end log statement generation, ELogger demonstrates a significant improvement with a 50.85% to 78.21% average increase. Additionally, ELogger correctly predicts whether to log in 71.68% of cases, two sub-tasks (Whether and Position) in 58.29% of cases, and three sub-tasks (Whether, Position, and Level) in 41.97% of cases, all of which outperform the baselines.

日志记录在软件开发中对于解决运行时问题至关重要，但也会带来挑战。日志记录包括四个重要的子任务：是否记录日志（是否）、记录在哪里（位置）、哪种日志级别（级别）以及记录什么信息（信息）。虽然现有方法性能良好，但它们存在两个局限性。首先，它们只解决了日志子任务的一部分。为了解决这些问题，我们提出了 ELogger，它可以在块级生成端到端的日志语句。此外，ELogger 实现了块级日志生成，使其能够处理一个方法的不同代码块中的多个日志语句。评估结果表明，在 19.55% 的情况下，ELogger 能正确预测所有四个子任务。与结合现有方法生成端到端日志语句的基线相比，ELogger 的性能有了显著提高，平均提高了 50.85% 到 78.21%。此外，在 71.68% 的案例中，ELogger 能正确预测是否记录日志；在 58.29% 的案例中，ELogger 能正确预测两个子任务（是否和位置）；在 41.97% 的案例中，ELogger 能正确预测三个子任务（是否、位置和级别），所有这些都优于基线。

{"title":"End-to-end log statement generation at block-level","authors":"Ying Fu , Meng Yan , Pinjia He , Chao Liu , Xiaohong Zhang , Dan Yang","doi":"10.1016/j.jss.2024.112146","DOIUrl":"https://doi.org/10.1016/j.jss.2024.112146","url":null,"abstract":"<div><p>Logging is crucial in software development for addressing runtime issues but can pose challenges. Logging encompasses four essential sub-tasks: whether to log (Whether), where to log (Position), which log level (Level), and what information to log (Message). While existing approaches have performed well, they suffer from two limitations. Firstly, they address only a subset of the logging sub-tasks. Secondly, most of them focus on generating single log statements at class or method level, potentially overlooking multiple log statements within those scopes.</p><p>To address these issues, we propose ELogger, which enables end-to-end log statement generation at block-level. Furthermore, ELogger implements block-level log generation, enabling it to handle multiple log statements within different code blocks of a method. Evaluation results indicate that ELogger correctly predicts all four sub-tasks in 19.55% of cases. Compared to the baselines that combined existing approaches for end-to-end log statement generation, ELogger demonstrates a significant improvement with a 50.85% to 78.21% average increase. Additionally, ELogger correctly predicts whether to log in 71.68% of cases, two sub-tasks (Whether and Position) in 58.29% of cases, and three sub-tasks (Whether, Position, and Level) in 41.97% of cases, all of which outperform the baselines.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141481939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GDPR compliance via software evolution: Weaving security controls in software design 通过软件进化实现 GDPR 合规：在软件设计中编织安全控制

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software

Pub Date : 2024-06-21 DOI: 10.1016/j.jss.2024.112144

Vanessa Ayala-Rivera , A. Omar Portillo-Dominguez , Liliana Pasquale

Software should comply with international privacy laws, like the General Data Protection Regulation (GDPR). However, implementing appropriate technical controls is often an error-prone and time-consuming process. This is partly due to the limited knowledge of software engineers about privacy and security. This paper proposes SoCo, a semi-automated approach to support organizations in achieving software compliance with the GDPR data protection principles. To do so, SoCo supports engineers in identifying and integrating appropriate technical controls in sequence diagrams during the design phase. SoCo includes a technique to assist engineers to identify data processing activities in software applications modeled as sequence diagrams that may need to comply with the GDPR, a catalog of privacy and security controls that engineers can use to fix non-compliant activities, and a technique to implement such controls in the non-compliant sequence diagrams. Our evaluation results show that SoCo can help software engineers identify and design appropriate security controls to address GDPR violations and required moderate manual effort when applied to a substantive open-source application.

Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board.

软件应遵守国际隐私法，如《通用数据保护条例》（GDPR）。然而，实施适当的技术控制往往是一个容易出错且耗时的过程。部分原因在于软件工程师对隐私和安全的了解有限。本文提出的 SoCo 是一种半自动化方法，用于支持企业实现软件符合 GDPR 数据保护原则。为此，SoCo 支持工程师在设计阶段在序列图中识别和集成适当的技术控制。SoCo 包括一种技术，可帮助工程师识别以序列图为模型的软件应用程序中可能需要遵守 GDPR 的数据处理活动；一种隐私和安全控制目录，工程师可用来修复不合规的活动；以及一种在不合规的序列图中实施此类控制的技术。我们的评估结果表明，SoCo 可以帮助软件工程师识别和设计适当的安全控制措施，以解决违反 GDPR 的问题。

{"title":"GDPR compliance via software evolution: Weaving security controls in software design","authors":"Vanessa Ayala-Rivera , A. Omar Portillo-Dominguez , Liliana Pasquale","doi":"10.1016/j.jss.2024.112144","DOIUrl":"https://doi.org/10.1016/j.jss.2024.112144","url":null,"abstract":"<div><p>Software should comply with international privacy laws, like the General Data Protection Regulation (GDPR). However, implementing appropriate technical controls is often an error-prone and time-consuming process. This is partly due to the limited knowledge of software engineers about privacy and security. This paper proposes SoCo, a semi-automated approach to support organizations in achieving software compliance with the GDPR data protection principles. To do so, SoCo supports engineers in identifying and integrating appropriate technical controls in sequence diagrams during the design phase. SoCo includes a technique to assist engineers to identify data processing activities in software applications modeled as sequence diagrams that may need to comply with the GDPR, a catalog of privacy and security controls that engineers can use to fix non-compliant activities, and a technique to implement such controls in the non-compliant sequence diagrams. Our evaluation results show that SoCo can help software engineers identify and design appropriate security controls to address GDPR violations and required moderate manual effort when applied to a substantive open-source application.</p><p><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0164121224001894/pdfft?md5=5fd4ce6238c3011c648651420115965a&pid=1-s2.0-S0164121224001894-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141481940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the effectiveness of hybrid pooling in mixup-based graph learning for language processing 论混合池在基于混合的图学习语言处理中的有效性

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software

Pub Date : 2024-06-17 DOI: 10.1016/j.jss.2024.112139

Zeming Dong , Qiang Hu , Zhenya Zhang , Yuejun Guo , Maxime Cordy , Mike Papadakis , Yves Le Traon , Jianjun Zhao

Graph neural network (GNN)-based graph learning has been popular in natural language and programming language processing, particularly in text and source code classification. Typically, GNNs are constructed by incorporating alternating layers which learn transformations of graph node features, along with graph pooling layers that use graph pooling operators (e.g., Max-pooling) to effectively reduce the number of nodes while preserving the semantic information of the graph. Recently, to enhance GNNs in graph learning tasks, Manifold-Mixup, a data augmentation technique that produces synthetic graph data by linearly mixing a pair of graph data and their labels, has been widely adopted. However, the performance of Manifold-Mixup can be highly affected by graph pooling operators, and there have not been many studies that are dedicated to uncovering such affection. To bridge this gap, we take an early step to explore how graph pooling operators affect the performance of Mixup-based graph learning. To that end, we conduct a comprehensive empirical study by applying Manifold-Mixup to a formal characterization of graph pooling based on 11 graph pooling operations (9 hybrid pooling operators, 2 non-hybrid pooling operators). The experimental results on both natural language datasets (Gossipcop, Politifact) and programming language datasets (JAVA250, Python800) demonstrate that hybrid pooling operators are more effective for Manifold-Mixup than the standard Max-pooling and the state-of-the-art graph multiset transformer (GMT) pooling, in terms of producing more accurate and robust GNN models.

Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board.

基于图神经网络（GNN）的图学习在自然语言和编程语言处理领域，尤其是文本和源代码分类领域很受欢迎。通常情况下，图神经网络是通过交替层（学习图节点特征的变换）和图池层（使用图池运算符（如最大池化）有效减少节点数量，同时保留图的语义信息）来构建的。最近，为了增强 GNN 在图学习任务中的作用，Manifold-Mixup（一种通过线性混合一对图数据及其标签来生成合成图数据的数据增强技术）被广泛采用。然而，Manifold-Mixup 的性能可能会受到图池算子的严重影响，而专门揭示这种影响的研究并不多。为了弥补这一空白，我们率先探索了图池算子如何影响基于 Mixup 的图学习性能。为此，我们进行了一项全面的实证研究，将 Manifold-Mixup 应用于基于 11 个图池操作（9 个混合池操作，2 个非混合池操作）的图池正式表征。在自然语言数据集（Gossipcop、Politifact）和编程语言数据集（JAVA250、Python800）上的实验结果表明，与标准的最大池化（Max-pooling）和最先进的图多集变换器（GMT）池化相比，混合池化算子对 Manifold-Mixup 更为有效，能生成更准确、更健壮的 GNN 模型。

{"title":"On the effectiveness of hybrid pooling in mixup-based graph learning for language processing","authors":"Zeming Dong , Qiang Hu , Zhenya Zhang , Yuejun Guo , Maxime Cordy , Mike Papadakis , Yves Le Traon , Jianjun Zhao","doi":"10.1016/j.jss.2024.112139","DOIUrl":"https://doi.org/10.1016/j.jss.2024.112139","url":null,"abstract":"<div><p><em>Graph neural network (GNN)</em>-based graph learning has been popular in natural language and programming language processing, particularly in text and source code classification. Typically, GNNs are constructed by incorporating alternating layers which learn transformations of graph node features, along with graph pooling layers that use graph pooling operators (e.g., Max-pooling) to effectively reduce the number of nodes while preserving the semantic information of the graph. Recently, to enhance GNNs in graph learning tasks, <em>Manifold-Mixup</em>, a data augmentation technique that produces synthetic graph data by linearly mixing a pair of graph data and their labels, has been widely adopted. However, the performance of <em>Manifold-Mixup</em> can be highly affected by graph pooling operators, and there have not been many studies that are dedicated to uncovering such affection. To bridge this gap, we take an early step to explore how graph pooling operators affect the performance of Mixup-based graph learning. To that end, we conduct a comprehensive empirical study by applying <em>Manifold-Mixup</em> to a formal characterization of graph pooling based on 11 graph pooling operations (9 hybrid pooling operators, 2 non-hybrid pooling operators). The experimental results on both natural language datasets (Gossipcop, Politifact) and programming language datasets (JAVA250, Python800) demonstrate that hybrid pooling operators are more effective for <em>Manifold-Mixup</em> than the standard Max-pooling and the state-of-the-art graph multiset transformer (GMT) pooling, in terms of producing more accurate and robust GNN models.</p><p><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141481942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Impermanent identifiers: Enhanced source code comprehension and refactoring 无常标识符增强源代码理解和重构能力

IF 3.7 2区计算机科学 Q1 Computer Science

Journal of Systems and Software

Pub Date : 2024-06-17 DOI: 10.1016/j.jss.2024.112137

Eduardo Martins Guerra , André A.S. Ivo , Fernando O. Pereira , Romain Robbes , Andrea Janes , Fábio Fagundes Silveira

In response to the prevailing challenges in contemporary software development, this article introduces an innovative approach to code augmentation centered around Impermanent Identifiers. The primary goal is to enhance the software development experience by introducing dynamic identifiers that adapt to changing contexts, facilitating more efficient interactions between developers and source code, ultimately advancing comprehension, maintenance, and collaboration in software development. Additionally, this study rigorously evaluates the adoption and acceptance of Impermanent Identifiers within the software development landscape. Through a comprehensive empirical examination, we investigate how developers perceive and integrate this approach into their daily programming practices, exploring perceived benefits, potential barriers, and factors influencing its adoption. In summary, this article charts a new course for code augmentation, proposing Impermanent Identifiers as its cornerstone while assessing their feasibility and acceptance among developers. This interdisciplinary research seeks to contribute to the continuous improvement of software development practices and the progress of code augmentation technology.

为应对当代软件开发中普遍存在的挑战，本文介绍了一种以无常标识符为核心的代码增强创新方法。其主要目标是通过引入能适应不断变化的上下文的动态标识符来增强软件开发体验，促进开发人员与源代码之间更高效的交互，最终推动软件开发中的理解、维护和协作。此外，本研究还对无常标识符在软件开发领域的应用和接受程度进行了严格评估。通过全面的实证研究，我们调查了开发人员是如何看待这种方法并将其融入日常编程实践中的，探索了他们所感知到的好处、潜在的障碍以及影响其采用的因素。总之，这篇文章为代码扩充描绘了一条新的道路，提出以无常标识符为基石，同时评估其可行性和开发人员的接受程度。这项跨学科研究旨在促进软件开发实践的不断改进和代码增强技术的进步。

{"title":"Impermanent identifiers: Enhanced source code comprehension and refactoring","authors":"Eduardo Martins Guerra , André A.S. Ivo , Fernando O. Pereira , Romain Robbes , Andrea Janes , Fábio Fagundes Silveira","doi":"10.1016/j.jss.2024.112137","DOIUrl":"https://doi.org/10.1016/j.jss.2024.112137","url":null,"abstract":"<div><p>In response to the prevailing challenges in contemporary software development, this article introduces an innovative approach to code augmentation centered around <em>Impermanent Identifiers</em>. The primary goal is to enhance the software development experience by introducing dynamic identifiers that adapt to changing contexts, facilitating more efficient interactions between developers and source code, ultimately advancing comprehension, maintenance, and collaboration in software development. Additionally, this study rigorously evaluates the adoption and acceptance of <em>Impermanent Identifiers</em> within the software development landscape. Through a comprehensive empirical examination, we investigate how developers perceive and integrate this approach into their daily programming practices, exploring perceived benefits, potential barriers, and factors influencing its adoption. In summary, this article charts a new course for code augmentation, proposing <em>Impermanent Identifiers</em> as its cornerstone while assessing their feasibility and acceptance among developers. This interdisciplinary research seeks to contribute to the continuous improvement of software development practices and the progress of code augmentation technology.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141434350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BugOss: A benchmark of real-world regression bugs for empirical investigation of regression fuzzing techniques BugOss：用于回归模糊技术实证研究的真实世界回归错误基准

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software

Pub Date : 2024-06-15 DOI: 10.1016/j.jss.2024.112119

Jeewoong Kim, Shin Hong

This paper presents the design and the constitution of BugOss, a real-world regression bug benchmark for empirical study of regression fuzzing techniques. To reproduce the actual project context where a regression bug was introduced, each bug case of BugOss pinpoints the exact bug-inducing commit and provides a specific test oracle considering the presence of other co-existing bugs. BugOss currently comprises 20 real-world bug cases from 20 open-source C/C++ projects, which had been reported by the OSS-Fuzz projects and confirmed by the project maintainers. The empirical investigation with two regression fuzzing techniques show that, with the bug cases in BugOss, the regression fuzzing techniques perform differently depending on the given project context. In addition, the experiments imply that BugOss encompasses various cases of regression bugs in real-world, thus the bug cases would be useful for empirically investigating regression fuzzing techniques.

本文介绍了 BugOss 的设计和构成，这是一个用于回归模糊技术实证研究的真实世界回归错误基准。为了再现引入回归错误的实际项目环境，BugOss 的每个错误案例都精确定位了引发错误的提交，并在考虑到其他同时存在的错误的情况下提供了特定的测试oracle。BugOss 目前包含来自 20 个开源 C/C++ 项目的 20 个真实错误案例，这些案例由 OSS-Fuzz 项目报告并由项目维护者确认。使用两种回归模糊技术进行的实证调查表明，对于 BugOss 中的错误案例，回归模糊技术会根据给定的项目环境发挥不同的作用。此外，实验还表明 BugOss 包含了现实世界中的各种回归错误案例，因此错误案例对回归模糊技术的实证研究非常有用。

引用次数: 0

Understanding Virtual Onboarding Dynamics and Developer Turnover Intention in the Era of Pandemic 了解大流行时代的虚拟入职动态和开发人员离职意向

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software

Pub Date : 2024-06-13 DOI: 10.1016/j.jss.2024.112136

Gorkem Akdur , Mehmet N. Aydin , Gizdem Akdur

This study examines the dynamics of virtual onboarding (VO) for Salesforce Commerce Cloud developers during the COVID-19 pandemic in a multinational software company. The newly developed Virtual Integration and Retention Framework (VIRF), which provides an improved understanding of VO, customized to the opportunities and challenges presented by the pandemic, is the fundamental concept of this study.

A two-staged, higher-order constructed (HOC) quantitative research approach was used for the study, revealing a negative relationship between VO success and the challenges brought on by the pandemic. This emphasizes how difficult it can be to transition to remote work settings, especially regarding how operational effectiveness and employee well-being interact.

Furthermore, the study demonstrates the positive connection between VO success and the delivery of technology and equipment during the pandemic. This result emphasizes how important logistical support is to the effectiveness of remote work arrangements. The study's key findings show positive impact of successful VO on developers' job satisfaction and workplace relationship quality (WRQ). Strong VO practices are essential to improve employee retention, as evidenced by the inverse correlation between these factors and turnover intentions. The study uses mediation analysis, with job satisfaction and WRQ acting as mediators, to further clarify how VO success influences turnover intentions.

This study offers an in-depth understanding of VO practices during the pandemic. It discusses the future of remote work and onboarding procedures while navigating the immediate difficulties caused by the outbreak. The study emphasizes how important VO is for improving WRQ, decreasing turnover intentions of developers within the software company, and improving job satisfaction. These insights benefit organizations trying to improve developer integration and retention in changing work environments and improve their remote work strategies.

本研究探讨了一家跨国软件公司的 Salesforce Commerce Cloud 开发人员在 COVID-19 大流行期间的虚拟入职（VO）动态。本研究的基本概念是新开发的虚拟整合与保留框架（VIRF），该框架针对大流行病带来的机遇和挑战，提供了对虚拟入职的更好理解。本研究采用了两阶段、高阶建构（HOC）定量研究方法，揭示了虚拟入职成功与大流行病带来的挑战之间的负相关关系。这强调了向远程工作环境过渡是多么困难，尤其是在运营效率和员工福利如何相互作用方面。此外，研究还表明，在大流行病期间，虚拟运营商的成功与技术和设备的交付之间存在正相关。这一结果强调了后勤支持对于远程工作安排的有效性有多么重要。研究的主要发现表明，成功的虚拟办公室对开发人员的工作满意度和工作场所关系质量（WRQ）具有积极影响。强有力的虚拟办公室实践对于提高员工留任率至关重要，这些因素与离职意向之间的反向相关性就证明了这一点。本研究采用中介分析法，以工作满意度和 WRQ 为中介，进一步阐明了虚拟办公室的成功如何影响离职意向。本研究深入了解了大流行病期间的虚拟办公室实践，讨论了远程工作和入职程序的未来，同时探讨了大流行病造成的直接困难。研究强调了虚拟办公室对于提高 WRQ、降低软件公司内开发人员的离职意愿以及提高工作满意度的重要性。这些见解有助于企业在不断变化的工作环境中提高开发人员的融入度和留任率，并改进其远程工作战略。

{"title":"Understanding Virtual Onboarding Dynamics and Developer Turnover Intention in the Era of Pandemic","authors":"Gorkem Akdur , Mehmet N. Aydin , Gizdem Akdur","doi":"10.1016/j.jss.2024.112136","DOIUrl":"10.1016/j.jss.2024.112136","url":null,"abstract":"<div><p>This study examines the dynamics of virtual onboarding (VO) for Salesforce Commerce Cloud developers during the COVID-19 pandemic in a multinational software company. The newly developed Virtual Integration and Retention Framework (VIRF), which provides an improved understanding of VO, customized to the opportunities and challenges presented by the pandemic, is the fundamental concept of this study.</p><p>A two-staged, higher-order constructed (HOC) quantitative research approach was used for the study, revealing a negative relationship between VO success and the challenges brought on by the pandemic. This emphasizes how difficult it can be to transition to remote work settings, especially regarding how operational effectiveness and employee well-being interact.</p><p>Furthermore, the study demonstrates the positive connection between VO success and the delivery of technology and equipment during the pandemic. This result emphasizes how important logistical support is to the effectiveness of remote work arrangements. The study's key findings show positive impact of successful VO on developers' job satisfaction and workplace relationship quality (WRQ). Strong VO practices are essential to improve employee retention, as evidenced by the inverse correlation between these factors and turnover intentions. The study uses mediation analysis, with job satisfaction and WRQ acting as mediators, to further clarify how VO success influences turnover intentions.</p><p>This study offers an in-depth understanding of VO practices during the pandemic. It discusses the future of remote work and onboarding procedures while navigating the immediate difficulties caused by the outbreak. The study emphasizes how important VO is for improving WRQ, decreasing turnover intentions of developers within the software company, and improving job satisfaction. These insights benefit organizations trying to improve developer integration and retention in changing work environments and improve their remote work strategies.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141394936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data preparation for Deep Learning based Code Smell Detection: A systematic literature review 基于深度学习的代码气味检测的数据准备：系统性文献综述

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software

Pub Date : 2024-06-12 DOI: 10.1016/j.jss.2024.112131

Fengji Zhang , Zexian Zhang , Jacky Wai Keung , Xiangru Tang , Zhen Yang , Xiao Yu , Wenhua Hu

Code Smell Detection (CSD) plays a crucial role in improving software quality and maintainability. And Deep Learning (DL) techniques have emerged as a promising approach for CSD due to their superior performance. However, the effectiveness of DL-based CSD methods heavily relies on the quality of the training data. Despite its importance, little attention has been paid to analyzing the data preparation process. This systematic literature review analyzes the data preparation techniques used in DL-based CSD methods. We identify 36 relevant papers published by December 2023 and provide a thorough analysis of the critical considerations in constructing CSD datasets, including data requirements, collection, labeling, and cleaning. We also summarize seven primary challenges and corresponding solutions in the literature. Finally, we offer actionable recommendations for preparing and accessing high-quality CSD data, emphasizing the importance of data diversity, standardization, and accessibility. This survey provides valuable insights for researchers and practitioners to harness the full potential of DL techniques in CSD.

代码气味检测（CSD）在提高软件质量和可维护性方面发挥着至关重要的作用。而深度学习（DL）技术因其卓越的性能，已成为一种很有前途的 CSD 方法。然而，基于深度学习的 CSD 方法的有效性在很大程度上取决于训练数据的质量。尽管数据准备过程非常重要，但人们却很少关注数据准备过程的分析。本系统性文献综述分析了基于 DL 的 CSD 方法中使用的数据准备技术。我们确定了在 2023 年 12 月之前发表的 36 篇相关论文，并对构建 CSD 数据集的关键考虑因素进行了全面分析，包括数据要求、收集、标记和清理。我们还总结了文献中的七个主要挑战和相应的解决方案。最后，我们为准备和获取高质量的 CSD 数据提供了可行的建议，强调了数据多样性、标准化和可获取性的重要性。本调查报告为研究人员和从业人员提供了宝贵的见解，帮助他们在 CSD 中充分发挥 DL 技术的潜力。

{"title":"Data preparation for Deep Learning based Code Smell Detection: A systematic literature review","authors":"Fengji Zhang , Zexian Zhang , Jacky Wai Keung , Xiangru Tang , Zhen Yang , Xiao Yu , Wenhua Hu","doi":"10.1016/j.jss.2024.112131","DOIUrl":"10.1016/j.jss.2024.112131","url":null,"abstract":"<div><p><u>C</u>ode <u>S</u>mell <u>D</u>etection (CSD) plays a crucial role in improving software quality and maintainability. And <u>D</u>eep <u>L</u>earning (DL) techniques have emerged as a promising approach for CSD due to their superior performance. However, the effectiveness of DL-based CSD methods heavily relies on the quality of the training data. Despite its importance, little attention has been paid to analyzing the data preparation process. This systematic literature review analyzes the data preparation techniques used in DL-based CSD methods. We identify 36 relevant papers published by December 2023 and provide a thorough analysis of the critical considerations in constructing CSD datasets, including data requirements, collection, labeling, and cleaning. We also summarize seven primary challenges and corresponding solutions in the literature. Finally, we offer actionable recommendations for preparing and accessing high-quality CSD data, emphasizing the importance of data diversity, standardization, and accessibility. This survey provides valuable insights for researchers and practitioners to harness the full potential of DL techniques in CSD.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141390690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0