Information and Software Technology最新文献_第6页

In memoriam of professor Guenther Ruhe: Contributions to the software product management research and practice 纪念 Guenther Ruhe 教授：对软件产品管理研究和实践的贡献

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-08-16 DOI: 10.1016/j.infsof.2024.107548

Hans-Bernd Kittlaus , Andrey Saltan

For many years, Guenther Ruhe was a fellow of the International Software Product Management Association, playing an important role in shaping the academic and practical landscapes of Software Product Management (SPM). This editorial note honors Ruhe's enduring impact on the SPM Body of Knowledge, evaluating his extensive contributions to SPM research and practice, and recognizing his legacy in shaping the future trajectory of the field. By examining Ruhe's academic publications and his role in developing the SPM Body of Knowledge, we highlight key areas of his influence, particularly in release planning and requirements engineering. His integration of empirical research into SPM has notably enhanced the discipline's rigor and relevance. Ruhe's contributions to the SPM Body of Knowledge are profound and far-reaching, establishing his work as a cornerstone for ongoing research and practice in SPM.

多年来，Guenther Ruhe 一直是国际软件产品管理协会的研究员，在塑造软件产品管理 (SPM) 的学术和实践景观方面发挥了重要作用。这篇社论旨在表彰 Ruhe 对 SPM 知识体系的持久影响，评估他对 SPM 研究和实践的广泛贡献，并肯定他在塑造该领域未来发展轨迹方面的遗产。通过研究 Ruhe 的学术著作和他在开发 SPM 知识体系中的作用，我们强调了他影响的关键领域，尤其是在发布计划和需求工程方面。他将实证研究融入 SPM，显著提高了该学科的严谨性和相关性。Ruhe 对 SPM 知识体系的贡献意义深远，为 SPM 的持续研究和实践奠定了基石。

引用次数: 0

XDrain: Effective log parsing in log streams using fixed-depth forest XDrain：使用固定深度森林对日志流进行有效的日志解析

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-08-14 DOI: 10.1016/j.infsof.2024.107546

Changjian Liu , Yang Tian , Siyu Yu , Donghui Gao , Yifan Wu , Suqun Huang , Xiaochun Hu , Ningjiang Chen

Logs record rich information that can help operators diagnose system failure [1]. Analyzing logs in log streams can expedite the diagnostic process and effectively mitigate the impact of failures. Log parsing is a prerequisite for automated log analysis, which transforms semi-structured logs into structured logs. However, the effectiveness of existing parsers has only been evaluated on a limited set of logs, which lack sufficient log types. After conducting a more comprehensive evaluation of the existing log parser, we identified the following deficiencies: (1) Variable-starting logs can make some log parsers error-prone. (2) The order of logs in a log stream can have a great impact on the effectiveness. We proposes XDrain to satisfy these challenges by using fixed-depth forest. XDrain first shuffles the order of logs and the order of words within each log a few times. Secondly, XDrain will generate parsing forest for all the logs generated after the shuffling. Finally, the final log template is generated by voting. Evaluation results show that XDrain outperforms existing log parsers on two widely-used accuracy metrics and is immune to inappropriate log order. XDrain only takes about 97.89 s to parse one million logs on average.

日志记录了丰富的信息，可以帮助操作员诊断系统故障[1]。分析日志流中的日志可以加快诊断过程，有效减轻故障的影响。日志解析是自动日志分析的先决条件，它能将半结构化日志转化为结构化日志。然而，现有解析器的有效性只在有限的日志集上进行过评估，缺乏足够的日志类型。在对现有日志解析器进行更全面的评估后，我们发现了以下不足：(1) 可变起始日志会使某些日志解析器容易出错。(2) 日志流中日志的顺序会对效果产生很大影响。我们提出了 XDrain，通过使用固定深度的森林来应对这些挑战。XDrain 首先会对日志的顺序和每个日志中单词的顺序进行多次调整。其次，XDrain 会为洗牌后生成的所有日志生成解析森林。最后，通过投票生成最终的日志模板。评估结果表明，XDrain 在两个广泛使用的准确度指标上都优于现有的日志解析器，并且不受不恰当的日志顺序的影响。XDrain 解析 100 万条日志平均只需 97.89 秒。

{"title":"XDrain: Effective log parsing in log streams using fixed-depth forest","authors":"Changjian Liu , Yang Tian , Siyu Yu , Donghui Gao , Yifan Wu , Suqun Huang , Xiaochun Hu , Ningjiang Chen","doi":"10.1016/j.infsof.2024.107546","DOIUrl":"10.1016/j.infsof.2024.107546","url":null,"abstract":"<div><p>Logs record rich information that can help operators diagnose system failure <span><span>[1]</span></span>. Analyzing logs in log streams can expedite the diagnostic process and effectively mitigate the impact of failures. Log parsing is a prerequisite for automated log analysis, which transforms semi-structured logs into structured logs. However, the effectiveness of existing parsers has only been evaluated on a limited set of logs, which lack sufficient log types. After conducting a more comprehensive evaluation of the existing log parser, we identified the following deficiencies: (1) Variable-starting logs can make some log parsers error-prone. (2) The order of logs in a log stream can have a great impact on the effectiveness. We proposes XDrain to satisfy these challenges by using fixed-depth forest. XDrain first shuffles the order of logs and the order of words within each log a few times. Secondly, XDrain will generate parsing forest for all the logs generated after the shuffling. Finally, the final log template is generated by voting. Evaluation results show that XDrain outperforms existing log parsers on two widely-used accuracy metrics and is immune to inappropriate log order. XDrain only takes about 97.89 s to parse one million logs on average.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"176 ","pages":"Article 107546"},"PeriodicalIF":3.8,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141984747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Don’t forget to change these functions! recommending co-changed functions in modern code review 不要忘记更改这些函数！在现代代码审查中推荐共同更改函数

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-08-10 DOI: 10.1016/j.infsof.2024.107547

Yang Hong, Chakkrit Tantithamthavorn, Patanamon Thongtanunam, Aldeida Aleti

Context:

Code review is effective and widely used, yet still time-consuming. Especially, in large-scale software systems, developers may forget to change other related functions that must be changed together (aka. co-changes). This may increase the number of review iterations and reviewing time, thus delaying the code review process. Based on our analysis of 66 projects from five open-source systems, we find that there are 16%–33% of code reviews where at least one function must be co-changed, but was not initially changed.

Objectives:

This study aims to propose an approach to recommend co-changed functions in the context of modern code review, which could reduce reviewing time and iterations and help developers identify functions that need to be changed together.

Methods:

We propose CoChangeFinder, a novel method that employs a Graph Neural Network (GNN) to recommend co-changed functions for newly submitted code changes. Then, we conduct a quantitative and qualitative evaluation of CoChangeFinder with 66 studied large-scale open-source software projects.

Results:

Our evaluation results show that our CoChangeFinder outperforms the state-of-the-art approach, achieving 3.44% to 40.45% for top-k accuracy, 2.00% to 26.07% for Recall@k, and 0.04 to 0.21 for mean average precision better than the baseline approach. In addition, our CoChangeFinder demonstrates the capacity to pinpoint the functions related to logic changes.

Conclusion:

Our CoChangeFinder outperforms the baseline approach (i.e., TARMAQ) in recommending co-changed functions during the code review process. Based on our findings, CoChangeFinder could help developers save their time and effort, reduce review iterations, and enhance the efficiency of the code review process.

背景：代码审查虽然有效且应用广泛，但仍然耗时。特别是在大型软件系统中，开发人员可能会忘记修改其他必须一起修改的相关功能（又称共同修改）。这可能会增加审查迭代次数和审查时间，从而延误代码审查过程。目标：本研究旨在提出一种在现代代码审查背景下推荐共变函数的方法，它可以减少审查时间和迭代次数，并帮助开发人员识别需要一起更改的函数。方法：我们提出了 CoChangeFinder，这是一种采用图神经网络（GNN）为新提交的代码更改推荐共变函数的新方法。结果：我们的评估结果表明，CoChangeFinder优于最先进的方法，在top-k准确率上达到了3.44%到40.45%，在Recall@k上达到了2.00%到26.07%，在平均精度上达到了0.04到0.21。结论：在代码审查过程中，我们的 CoChangeFinder 在推荐共变函数方面优于基准方法（即 TARMAQ）。基于我们的研究结果，CoChangeFinder 可以帮助开发人员节省时间和精力，减少审查迭代，并提高代码审查过程的效率。

{"title":"Don’t forget to change these functions! recommending co-changed functions in modern code review","authors":"Yang Hong, Chakkrit Tantithamthavorn, Patanamon Thongtanunam, Aldeida Aleti","doi":"10.1016/j.infsof.2024.107547","DOIUrl":"10.1016/j.infsof.2024.107547","url":null,"abstract":"<div><h3>Context:</h3><p>Code review is effective and widely used, yet still time-consuming. Especially, in large-scale software systems, developers may forget to change other related functions that must be changed together (aka. co-changes). This may increase the number of review iterations and reviewing time, thus delaying the code review process. Based on our analysis of 66 projects from five open-source systems, we find that there are 16%–33% of code reviews where at least one function must be co-changed, but was not initially changed.</p></div><div><h3>Objectives:</h3><p>This study aims to propose an approach to recommend co-changed functions in the context of modern code review, which could reduce reviewing time and iterations and help developers identify functions that need to be changed together.</p></div><div><h3>Methods:</h3><p>We propose <span>CoChangeFinder</span>, a novel method that employs a Graph Neural Network (GNN) to recommend co-changed functions for newly submitted code changes. Then, we conduct a quantitative and qualitative evaluation of <span>CoChangeFinder</span> with 66 studied large-scale open-source software projects.</p></div><div><h3>Results:</h3><p>Our evaluation results show that our <span>CoChangeFinder</span> outperforms the state-of-the-art approach, achieving 3.44% to 40.45% for top-k accuracy, 2.00% to 26.07% for Recall@k, and 0.04 to 0.21 for mean average precision better than the baseline approach. In addition, our <span>CoChangeFinder</span> demonstrates the capacity to pinpoint the functions related to logic changes.</p></div><div><h3>Conclusion:</h3><p>Our <span>CoChangeFinder</span> outperforms the baseline approach (i.e., TARMAQ) in recommending co-changed functions during the code review process. Based on our findings, <span>CoChangeFinder</span> could help developers save their time and effort, reduce review iterations, and enhance the efficiency of the code review process.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"176 ","pages":"Article 107547"},"PeriodicalIF":3.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0950584924001526/pdfft?md5=c441a69fab78652cf4e529fda2be63fc&pid=1-s2.0-S0950584924001526-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141984746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FCTree: Visualization of function calls in execution FCTree：执行中的函数调用可视化

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-08-02 DOI: 10.1016/j.infsof.2024.107545

Fangfang Zhou , Yilun Fan , Shenglan Lv , Lijia Jiang , Zhuo Chen , Jian Yuan , Feijiang Han , Haojin Jiang , Genghuai Bai , Ying Zhao

Function calls in execution contain rich bivariate, hierarchical, and chronological information. Many visualizations have been adopted to analyze function calls in execution for program testing, vulnerability locating, and malware detection. However, we conducted a pilot study and revealed that existing single-viewed function call visualizations fail to present the bivariate, hierarchical, and chronological information comprehensively. A new function call visualization named FCTree is proposed in this work to deal with this situation. Learned from advantages of existing visualizations and iterative discussions with actual users, FCTree uses a compact and aligned hierarchical layout design to present the bivariate and hierarchical information and adopts a glyph design to present the chronological information. Subjective and objective experiments in the laboratory and a field study in a real-world scenario were conducted to evaluate the effectiveness of FCTree.

执行中的函数调用包含丰富的双变量、层次和时间信息。许多可视化方法被用于分析执行中的函数调用，以进行程序测试、漏洞定位和恶意软件检测。然而，我们进行了一项试验性研究，发现现有的单视角函数调用可视化无法全面呈现双变量、层次和时间信息。针对这种情况，我们提出了一种名为 FCTree 的新函数调用可视化方法。借鉴现有可视化的优点，并与实际用户反复讨论，FCTree 采用紧凑、对齐的分层布局设计来呈现双变量和分层信息，并采用字形设计来呈现时间信息。为了评估 FCTree 的有效性，我们在实验室进行了主观和客观实验，并在现实世界中进行了实地研究。

引用次数: 0

A rule-based decision model to support technical debt decisions: A multiple case study of web and mobile app startups 支持技术债务决策的基于规则的决策模型：网络和移动应用程序初创企业的多案例研究

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-31 DOI: 10.1016/j.infsof.2024.107542

Abdullah Aldaeej , Carolyn Seaman

Context

Software startups are immature software organizations that focus on the development of a single software product or service. This organizational context accumulates a lot of technical debt to cope with constraints such as limited resources and product-market fit uncertainty. While some research has explored technical debt in startups, there is no study that investigates how software startups should make technical debt decisions throughout the startup evolution stages.

Objective

The objective of this study is to understand how technical debt decisions are made, and how such decisions should have been made in hindsight.

Method

We conducted a multiple embedded case study to investigate technical debt decisions in five web/mobile app startups. For each case, we interviewed the case founder and developer (a total of 17 participants across cases). In addition, we collected some public documents about the five startups. The data were analyzed using qualitative data analysis techniques.

Results

We developed a rule-based decision model that summarizes the logic to effectively make technical debt decisions throughout the startup evolution stages. In addition, we evaluated the model by conducting follow-up interviews with three participants.

Conclusion

The study provides a decision model that reflects actual practice, and is designed to help software teams in startups when making technical debt decisions throughout the startup evolution stages.

软件初创企业是不成熟的软件组织，专注于开发单一的软件产品或服务。这种组织环境会积累大量技术债务，以应对资源有限和产品与市场契合度不确定等制约因素。虽然有些研究探讨了初创企业的技术债务问题，但还没有研究调查软件初创企业在整个初创企业发展阶段应如何做出技术债务决策。本研究的目的是了解技术债务决策是如何做出的，以及事后应如何做出此类决策。我们开展了一项多重嵌入式案例研究，以调查五家网络/移动应用程序初创企业的技术债务决策。我们对每个案例的创始人和开发人员进行了访谈（各案例共有 17 名参与者）。此外，我们还收集了有关这五家初创公司的一些公开文件。我们使用定性数据分析技术对数据进行了分析。我们开发了一个基于规则的决策模型，总结了在初创企业发展的各个阶段有效做出技术债务决策的逻辑。此外，我们还对三位参与者进行了后续访谈，对模型进行了评估。本研究提供了一个反映实际情况的决策模型，旨在帮助初创企业的软件团队在整个初创企业发展阶段做出技术债务决策。

{"title":"A rule-based decision model to support technical debt decisions: A multiple case study of web and mobile app startups","authors":"Abdullah Aldaeej , Carolyn Seaman","doi":"10.1016/j.infsof.2024.107542","DOIUrl":"10.1016/j.infsof.2024.107542","url":null,"abstract":"<div><h3>Context</h3><p>Software startups are immature software organizations that focus on the development of a single software product or service. This organizational context accumulates a lot of technical debt to cope with constraints such as limited resources and product-market fit uncertainty. While some research has explored technical debt in startups, there is no study that investigates how software startups should make technical debt decisions throughout the startup evolution stages.</p></div><div><h3>Objective</h3><p>The objective of this study is to understand how technical debt decisions are made, and how such decisions should have been made in hindsight.</p></div><div><h3>Method</h3><p>We conducted a multiple embedded case study to investigate technical debt decisions in five web/mobile app startups. For each case, we interviewed the case founder and developer (a total of 17 participants across cases). In addition, we collected some public documents about the five startups. The data were analyzed using qualitative data analysis techniques.</p></div><div><h3>Results</h3><p>We developed a rule-based decision model that summarizes the logic to effectively make technical debt decisions throughout the startup evolution stages. In addition, we evaluated the model by conducting follow-up interviews with three participants.</p></div><div><h3>Conclusion</h3><p>The study provides a decision model that reflects actual practice, and is designed to help software teams in startups when making technical debt decisions throughout the startup evolution stages.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"175 ","pages":"Article 107542"},"PeriodicalIF":3.8,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On meetings involving remote software teams: A systematic literature review 关于涉及远程软件团队的会议：系统文献综述

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-30 DOI: 10.1016/j.infsof.2024.107541

Anielle S.L. de Andrade , Victoria Jackson , Rafael Prikladnicki , André van der Hoek

Context:

The adoption of remote work models and the global nature of software projects have significantly transformed collaboration and communication within the software development industry. Remote meetings have become a common means of collaboration for software development teams.

Objective:

This study seeks to enhance our understanding of remote meeting practices in software teams. It identifies the benefits of remote meetings, the problems associated with remote meetings, tools used to facilitate remote meetings and provides recommended good practices. The study employs a systematic literature review to assist remote teams in improving their meeting practices and identifying areas for future research.

Methods:

We conducted a systematic literature review that involved searching multiple databases and employing quantitative and qualitative analysis techniques on the identified set of studies to answer our research questions.

Results:

The search yielded 30 papers offering valuable insights into remote meeting practices in software teams. Remote meetings offer advantages over traditional in-person meetings such as increased effectiveness and ease of attendance. However, challenges exist such as technological issues, ineffective collaboration, and reduced team socialization. Identified good practices to mitigate the challenges include inserting breaks in longer meetings, catch-up time at the start of meeting, communicating goals in advance of the meeting, and pre-recording demos.

Conclusion:

The study explored remote meetings in software teams. We identified advantages that remote meetings have in comparison to in-person meetings, challenges to remote meetings, and good practices along with supportive tooling. While the practices help in promoting effective meetings, additional research is required to further improve remote meeting experiences. Researching topics such as investigating different types of meetings common to software development teams along with the potential for novel tools to better support meetings will help identify additional practices and tools that can benefit remote teams.

远程工作模式的采用和软件项目的全球性质极大地改变了软件开发行业内的协作和交流。远程会议已成为软件开发团队常用的协作方式。本研究旨在加深我们对软件团队远程会议实践的了解。它指出了远程会议的好处、与远程会议相关的问题、用于促进远程会议的工具，并提供了推荐的良好实践。本研究采用了系统性文献回顾的方法，以帮助远程团队改进其会议实践，并确定未来的研究领域。我们进行了系统性的文献综述，包括搜索多个数据库，并对确定的一系列研究采用定量和定性分析技术，以回答我们的研究问题。搜索结果显示，有 30 篇论文对软件团队的远程会议实践提出了有价值的见解。与传统的面对面会议相比，远程会议具有更高的效率和出席的便利性等优势。但是，也存在一些挑战，如技术问题、协作效率低下、团队社交减少等。为减轻这些挑战而确定的良好做法包括：在较长的会议中插入休息时间、在会议开始时安排补课时间、在会议前传达目标以及预先录制演示。这项研究探讨了软件团队中的远程会议。我们发现了远程会议与面对面会议相比所具有的优势、远程会议所面临的挑战以及良好做法和辅助工具。虽然这些做法有助于提高会议效率，但仍需开展更多研究，以进一步改善远程会议体验。对软件开发团队常见的不同类型会议以及更好地支持会议的新型工具的潜力等主题进行研究，将有助于确定更多有益于远程团队的实践和工具。

{"title":"On meetings involving remote software teams: A systematic literature review","authors":"Anielle S.L. de Andrade , Victoria Jackson , Rafael Prikladnicki , André van der Hoek","doi":"10.1016/j.infsof.2024.107541","DOIUrl":"10.1016/j.infsof.2024.107541","url":null,"abstract":"<div><h3>Context:</h3><p>The adoption of remote work models and the global nature of software projects have significantly transformed collaboration and communication within the software development industry. Remote meetings have become a common means of collaboration for software development teams.</p></div><div><h3>Objective:</h3><p>This study seeks to enhance our understanding of remote meeting practices in software teams. It identifies the benefits of remote meetings, the problems associated with remote meetings, tools used to facilitate remote meetings and provides recommended good practices. The study employs a systematic literature review to assist remote teams in improving their meeting practices and identifying areas for future research.</p></div><div><h3>Methods:</h3><p>We conducted a systematic literature review that involved searching multiple databases and employing quantitative and qualitative analysis techniques on the identified set of studies to answer our research questions.</p></div><div><h3>Results:</h3><p>The search yielded 30 papers offering valuable insights into remote meeting practices in software teams. Remote meetings offer advantages over traditional in-person meetings such as increased effectiveness and ease of attendance. However, challenges exist such as technological issues, ineffective collaboration, and reduced team socialization. Identified good practices to mitigate the challenges include inserting breaks in longer meetings, catch-up time at the start of meeting, communicating goals in advance of the meeting, and pre-recording demos.</p></div><div><h3>Conclusion:</h3><p>The study explored remote meetings in software teams. We identified advantages that remote meetings have in comparison to in-person meetings, challenges to remote meetings, and good practices along with supportive tooling. While the practices help in promoting effective meetings, additional research is required to further improve remote meeting experiences. Researching topics such as investigating different types of meetings common to software development teams along with the potential for novel tools to better support meetings will help identify additional practices and tools that can benefit remote teams.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"175 ","pages":"Article 107541"},"PeriodicalIF":3.8,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hidden code vulnerability detection: A study of the Graph-BiLSTM algorithm 隐藏代码漏洞检测：图-BiLSTM 算法研究

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-30 DOI: 10.1016/j.infsof.2024.107544

Kao Ge, Qing-Bang Han

Context:

The accelerated growth of the Internet and the advent of artificial intelligence have led to a heightened interdependence of open source products, which has in turn resulted in a rise in the frequency of security incidents. Consequently, the cost-effective, fast and efficient detection of hidden code vulnerabilities in open source software products has become an urgent challenge for both academic and engineering communities.

Objectives:

In response to this pressing need, a novel and efficient code vulnerability detection model has been proposed: the Graph-Bi-Directional Long Short-Term Memory Network Algorithm (Graph-BiLSTM). The algorithm is designed to enable the detection of vulnerabilities in Github’s code commit records on a large scale, at low cost and in an efficient manner.

Methods:

In order to extract the most effective code vulnerability features, state-of-the-art vulnerability datasets were compared in order to identify the optimal training dataset. Initially, the Joern tool was employed to transform function-level code blocks into Code Property Graphs (CPGs). Thereafter, structural features (degree centrality, Katz centrality, and closeness centrality) of these CPGs were computed and combined with the embedding features of the node sequences to form a two-dimensional feature vector space for the function-level code blocks. Subsequently, the BiLSTM network algorithm was employed for the automated extraction and iterative model training of a substantial number of vulnerability code samples. Finally, the trained algorithmic model was applied to code commit records of open-source software products on GitHub, achieving effective detection of hidden code vulnerabilities.

Conclusion:

Experimental results indicate that the PrimeVul dataset represents the most optimal resource for vulnerability detection. Moreover, the Graph-BiLSTM model demonstrated superior performance in terms of accuracy, training cost, and inference time when compared to state-of-the-art algorithms for the detection of vulnerabilities in open-source software code on GitHub. This highlights the significant value of the model for engineering applications.

互联网的加速发展和人工智能的出现导致开源产品之间的相互依赖性增强，进而导致安全事件的频率上升。因此，如何经济、快速、高效地检测开源软件产品中隐藏的代码漏洞，已成为学术界和工程界面临的一项紧迫挑战。针对这一迫切需求，我们提出了一种新颖高效的代码漏洞检测模型：图形-双向长短期记忆网络算法（Graph-BiLSTM）。该算法旨在大规模、低成本、高效率地检测 Github 代码提交记录中的漏洞。为了提取最有效的代码漏洞特征，对最先进的漏洞数据集进行了比较，以确定最佳训练数据集。首先，使用 Joern 工具将函数级代码块转换为代码属性图（CPG）。之后，计算这些 CPG 的结构特征（度中心性、卡茨中心性和接近中心性），并将其与节点序列的嵌入特征相结合，形成功能级代码块的二维特征向量空间。随后，采用 BiLSTM 网络算法对大量漏洞代码样本进行自动提取和迭代模型训练。最后，将训练好的算法模型应用于 GitHub 上开源软件产品的代码提交记录，实现了对隐藏代码漏洞的有效检测。实验结果表明，PrimeVul 数据集是漏洞检测的最佳资源。此外，在检测 GitHub 上开源软件代码中的漏洞时，与最先进的算法相比，Graph-BiLSTM 模型在准确性、训练成本和推理时间方面都表现出了卓越的性能。这凸显了该模型在工程应用中的重要价值。

{"title":"Hidden code vulnerability detection: A study of the Graph-BiLSTM algorithm","authors":"Kao Ge, Qing-Bang Han","doi":"10.1016/j.infsof.2024.107544","DOIUrl":"10.1016/j.infsof.2024.107544","url":null,"abstract":"<div><h3>Context:</h3><p>The accelerated growth of the Internet and the advent of artificial intelligence have led to a heightened interdependence of open source products, which has in turn resulted in a rise in the frequency of security incidents. Consequently, the cost-effective, fast and efficient detection of hidden code vulnerabilities in open source software products has become an urgent challenge for both academic and engineering communities.</p></div><div><h3>Objectives:</h3><p>In response to this pressing need, a novel and efficient code vulnerability detection model has been proposed: the Graph-Bi-Directional Long Short-Term Memory Network Algorithm (Graph-BiLSTM). The algorithm is designed to enable the detection of vulnerabilities in Github’s code commit records on a large scale, at low cost and in an efficient manner.</p></div><div><h3>Methods:</h3><p>In order to extract the most effective code vulnerability features, state-of-the-art vulnerability datasets were compared in order to identify the optimal training dataset. Initially, the Joern tool was employed to transform function-level code blocks into Code Property Graphs (CPGs). Thereafter, structural features (degree centrality, Katz centrality, and closeness centrality) of these CPGs were computed and combined with the embedding features of the node sequences to form a two-dimensional feature vector space for the function-level code blocks. Subsequently, the BiLSTM network algorithm was employed for the automated extraction and iterative model training of a substantial number of vulnerability code samples. Finally, the trained algorithmic model was applied to code commit records of open-source software products on GitHub, achieving effective detection of hidden code vulnerabilities.</p></div><div><h3>Conclusion:</h3><p>Experimental results indicate that the PrimeVul dataset represents the most optimal resource for vulnerability detection. Moreover, the Graph-BiLSTM model demonstrated superior performance in terms of accuracy, training cost, and inference time when compared to state-of-the-art algorithms for the detection of vulnerabilities in open-source software code on GitHub. This highlights the significant value of the model for engineering applications.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"175 ","pages":"Article 107544"},"PeriodicalIF":3.8,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automated description generation for software patches 自动生成软件补丁说明

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-29 DOI: 10.1016/j.infsof.2024.107543

Thanh Trong Vu, Tuan-Dung Bui, Thanh-Dat Do, Thu-Trang Nguyen, Hieu Dinh Vo, Son Nguyen

Software patches are pivotal in refining and evolving codebases, addressing bugs, vulnerabilities, and optimizations. Patch descriptions provide detailed accounts of changes, aiding comprehension and collaboration among developers. However, manual description creation poses challenges in terms of time consumption and variations in quality and detail. In this paper, we propose PatchExplainer, an approach that addresses these challenges by framing patch description generation as a machine translation task. In PatchExplainer, we leverage explicit representations of critical elements, historical context, and syntactic conventions. Moreover, the translation model in PatchExplainer is designed with an awareness of description similarity. Particularly, the model is explicitly trained to recognize and incorporate similarities present in patch descriptions clustered into groups, improving its ability to generate accurate and consistent descriptions across similar patches. The dual objectives maximize similarity and accurately predict affiliating groups. Our experimental results on a large dataset of real-world software patches show that PatchExplainer consistently outperforms existing methods, with improvements up to 189% in BLEU, 5.7X in Exact Match rate, and 154% in Semantic Similarity, affirming its effectiveness in generating software patch descriptions.

软件补丁是完善和发展代码库、解决错误、漏洞和优化的关键。补丁说明提供了详细的变更说明，有助于开发人员理解和协作。然而，手动创建说明会耗费大量时间，而且质量和细节会有差异。在本文中，我们提出了 PatchExplainer，这是一种通过将补丁描述生成作为机器翻译任务来应对这些挑战的方法。在 PatchExplainer 中，我们利用了关键元素、历史背景和句法习惯的显式表示。此外，PatchExplainer 中的翻译模型在设计时考虑到了描述的相似性。特别是，我们对模型进行了明确的训练，使其能够识别并纳入聚类成组的补丁描述中存在的相似性，从而提高其在相似补丁中生成准确一致描述的能力。双重目标既能最大限度地提高相似性，又能准确预测隶属群体。我们在一个大型真实世界软件补丁数据集上的实验结果表明，PatchExplainer 的性能始终优于现有方法，BLEU 提高了 189%，精确匹配率提高了 5.7 倍，语义相似度提高了 154%，这充分证明了它在生成软件补丁描述方面的有效性。

{"title":"Automated description generation for software patches","authors":"Thanh Trong Vu, Tuan-Dung Bui, Thanh-Dat Do, Thu-Trang Nguyen, Hieu Dinh Vo, Son Nguyen","doi":"10.1016/j.infsof.2024.107543","DOIUrl":"10.1016/j.infsof.2024.107543","url":null,"abstract":"<div><p>Software patches are pivotal in refining and evolving codebases, addressing bugs, vulnerabilities, and optimizations. Patch descriptions provide detailed accounts of changes, aiding comprehension and collaboration among developers. However, manual description creation poses challenges in terms of time consumption and variations in quality and detail. In this paper, we propose <span>PatchExplainer</span>, an approach that addresses these challenges by framing patch description generation as a machine translation task. In <span>PatchExplainer</span>, we leverage explicit representations of critical elements, historical context, and syntactic conventions. Moreover, the translation model in <span>PatchExplainer</span> is designed with an awareness of description similarity. Particularly, the model is <em>explicitly</em> trained to recognize and incorporate similarities present in patch descriptions clustered into groups, improving its ability to generate accurate and consistent descriptions across similar patches. The dual objectives maximize similarity and accurately predict affiliating groups. Our experimental results on a large dataset of real-world software patches show that <span>PatchExplainer</span> consistently outperforms existing methods, with improvements up to 189% in <em>BLEU</em>, 5.7X in <em>Exact Match</em> rate, and 154% in <em>Semantic Similarity</em>, affirming its effectiveness in generating software patch descriptions.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"177 ","pages":"Article 107543"},"PeriodicalIF":3.8,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0950584924001484/pdfft?md5=d70964a215e22a7c1a1c6018c85b6e2f&pid=1-s2.0-S0950584924001484-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142241468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Qubernetes: Towards a unified cloud-native execution platform for hybrid classic-quantum computing Qubernetes：为经典-量子混合计算打造统一的云原生执行平台

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-26 DOI: 10.1016/j.infsof.2024.107529

Vlad Stirbu, Otso Kinanen, Majid Haghparast, Tommi Mikkonen

Context:

The emergence of quantum computing proposes a revolutionary paradigm that can radically transform numerous scientific and industrial application domains. The ability of quantum computers to scale computations beyond what the current computers are capable of implies better performance and efficiency for certain algorithmic tasks.

Objective:

However, to benefit from such improvement, quantum computers must be integrated with existing software systems, a process that is not straightforward. In this paper, we propose a unified execution model that addresses the challenges that emerge from building hybrid classical-quantum applications at scale.

Method:

Following the Design Science Research methodology, we proposed a convention for mapping quantum resources and artifacts to Kubernetes concepts. Then, in an experimental Kubernetes cluster, we conducted experiments for scheduling and executing quantum tasks on both quantum simulators and hardware.

Results:

The experimental results demonstrate that the proposed platform Qubernetes (or Kubernetes for quantum) exposes the quantum computation tasks and hardware capabilities following established cloud-native principles, allowing seamless integration into the larger Kubernetes ecosystem.

Conclusion:

The quantum computing potential cannot be realized without seamless integration into classical computing. By validating that it is practical to execute quantum tasks in a Kubernetes infrastructure, we pave the way for leveraging the existing Kubernetes ecosystem as an enabler for hybrid classical-quantum computing.

背景：量子计算的出现提出了一种革命性的范式，可以从根本上改变众多科学和工业应用领域。目标：然而，要从这种改进中获益，量子计算机必须与现有软件系统集成，而这一过程并不简单。在本文中，我们提出了一种统一的执行模型，以解决在大规模构建经典-量子混合应用过程中出现的挑战。方法：按照设计科学研究方法，我们提出了一种将量子资源和工件映射到 Kubernetes 概念的约定。结果：实验结果表明，拟议的平台Qubernetes（或量子Kubernetes）按照既定的云原生原则公开了量子计算任务和硬件能力，允许无缝集成到更大的Kubernetes生态系统中。通过验证在 Kubernetes 基础架构中执行量子任务是切实可行的，我们为利用现有 Kubernetes 生态系统推动经典-量子混合计算铺平了道路。

{"title":"Qubernetes: Towards a unified cloud-native execution platform for hybrid classic-quantum computing","authors":"Vlad Stirbu, Otso Kinanen, Majid Haghparast, Tommi Mikkonen","doi":"10.1016/j.infsof.2024.107529","DOIUrl":"10.1016/j.infsof.2024.107529","url":null,"abstract":"<div><h3>Context:</h3><p>The emergence of quantum computing proposes a revolutionary paradigm that can radically transform numerous scientific and industrial application domains. The ability of quantum computers to scale computations beyond what the current computers are capable of implies better performance and efficiency for certain algorithmic tasks.</p></div><div><h3>Objective:</h3><p>However, to benefit from such improvement, quantum computers must be integrated with existing software systems, a process that is not straightforward. In this paper, we propose a unified execution model that addresses the challenges that emerge from building hybrid classical-quantum applications at scale.</p></div><div><h3>Method:</h3><p>Following the Design Science Research methodology, we proposed a convention for mapping quantum resources and artifacts to Kubernetes concepts. Then, in an experimental Kubernetes cluster, we conducted experiments for scheduling and executing quantum tasks on both quantum simulators and hardware.</p></div><div><h3>Results:</h3><p>The experimental results demonstrate that the proposed platform Qubernetes (or Kubernetes for quantum) exposes the quantum computation tasks and hardware capabilities following established cloud-native principles, allowing seamless integration into the larger Kubernetes ecosystem.</p></div><div><h3>Conclusion:</h3><p>The quantum computing potential cannot be realized without seamless integration into classical computing. By validating that it is practical to execute quantum tasks in a Kubernetes infrastructure, we pave the way for leveraging the existing Kubernetes ecosystem as an enabler for hybrid classical-quantum computing.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"175 ","pages":"Article 107529"},"PeriodicalIF":3.8,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0950584924001344/pdfft?md5=f5a427a08f3dbd8b3b7ccb4f62b577ea&pid=1-s2.0-S0950584924001344-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141838848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Extraction and empirical evaluation of GUI-level invariants as GUI Oracles in mobile app testing 在移动应用程序测试中提取图形用户界面级不变式并对其进行实证评估

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology

Pub Date : 2024-07-23 DOI: 10.1016/j.infsof.2024.107531

Ali Asghar Yarifard , Saeed Araban , Samad Paydar , Vahid Garousi , Maurizio Morisio , Riccardo Coppola

Context

Mobile apps (software) are used in almost all aspects of daily life by billions of people. Given the widespread use of mobile apps in various domains, the demand for systematic testing of their Graphical User Interfaces (GUI) is crucial. Despite the significant advances in automated mobile app testing over the last decade, certain challenges remain, most notably the app-specific GUI test-oracle problem, which can significantly hinder the effective detection of defects in mobile apps. In this study, we introduce the use of GUI-level invariants, referred to as GUI invariants, as app-specific GUI oracles in GUI test cases to address this challenge.

Methods

We propose a semi-automatic solution to extract GUI invariants and use them as app-specific GUI oracles in test cases. We use the mutation testing technique to evaluate the (fault detection) effectiveness of the GUI oracles used. In addition, we evaluate their quality aspects, namely correctness, understandability, and compatibility, from the perspective of human experts using a questionnaire survey.

Results

The empirical results show that the GUI oracles used are effective and helpful, as they improved the fault-detection effectiveness of the empirical test suites ranging from 18% to 32%. These results also highlight the efficacy of GUI oracles used in identifying various defects, including crashing and non-crashing functional issues, and surpassing the performance of existing tools in fault-detection rates. Additionally, the questionnaire survey outcomes indicate that the GUI oracles used are correct, understandable, and compatible.

Conclusions

Based on the empirical results, we can conclude that using GUI invariants as GUI oracles can be useful and effective in mobile app testing.

背景移动应用程序（软件）几乎应用于数十亿人日常生活的方方面面。鉴于移动应用程序在各个领域的广泛应用，对其图形用户界面（GUI）进行系统测试的需求至关重要。尽管自动移动应用程序测试在过去十年中取得了重大进展，但某些挑战依然存在，其中最突出的是特定于应用程序的图形用户界面测试障碍问题，该问题会严重阻碍移动应用程序缺陷的有效检测。在本研究中，我们介绍了在图形用户界面测试用例中使用图形用户界面级不变式（简称为图形用户界面不变式）作为特定于应用程序的图形用户界面oracle，以应对这一挑战。我们使用突变测试技术来评估所使用的图形用户界面规则的（故障检测）有效性。此外，我们还通过问卷调查从人类专家的角度评估了它们的质量方面，即正确性、可理解性和兼容性。结果实证结果表明，所使用的图形用户界面特例是有效和有用的，因为它们提高了实证测试套件的故障检测效率，提高幅度从 18% 到 32% 不等。这些结果还凸显了所使用的图形用户界面工具在识别各种缺陷（包括崩溃和非崩溃功能问题）方面的功效，以及在故障检测率方面超越现有工具的性能。此外，问卷调查结果表明，所使用的图形用户界面规则是正确的、可理解的和兼容的。

{"title":"Extraction and empirical evaluation of GUI-level invariants as GUI Oracles in mobile app testing","authors":"Ali Asghar Yarifard , Saeed Araban , Samad Paydar , Vahid Garousi , Maurizio Morisio , Riccardo Coppola","doi":"10.1016/j.infsof.2024.107531","DOIUrl":"10.1016/j.infsof.2024.107531","url":null,"abstract":"<div><h3>Context</h3><p>Mobile apps (software) are used in almost all aspects of daily life by billions of people. Given the widespread use of mobile apps in various domains, the demand for systematic testing of their Graphical User Interfaces (GUI) is crucial. Despite the significant advances in automated mobile app testing over the last decade, certain challenges remain, most notably the app-specific GUI test-oracle problem, which can significantly hinder the effective detection of defects in mobile apps. In this study, we introduce the use of GUI-level invariants, referred to as GUI invariants, as app-specific GUI oracles in GUI test cases to address this challenge.</p></div><div><h3>Methods</h3><p>We propose a semi-automatic solution to extract GUI invariants and use them as app-specific GUI oracles in test cases. We use the mutation testing technique to evaluate the (fault detection) effectiveness of the GUI oracles used. In addition, we evaluate their quality aspects, namely correctness, understandability, and compatibility, from the perspective of human experts using a questionnaire survey.</p></div><div><h3>Results</h3><p>The empirical results show that the GUI oracles used are effective and helpful, as they improved the fault-detection effectiveness of the empirical test suites ranging from 18% to 32%. These results also highlight the efficacy of GUI oracles used in identifying various defects, including crashing and non-crashing functional issues, and surpassing the performance of existing tools in fault-detection rates. Additionally, the questionnaire survey outcomes indicate that the GUI oracles used are correct, understandable, and compatible.</p></div><div><h3>Conclusions</h3><p>Based on the empirical results, we can conclude that using GUI invariants as GUI oracles can be useful and effective in mobile app testing.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"177 ","pages":"Article 107531"},"PeriodicalIF":3.8,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0950584924001368/pdfft?md5=df482f7792ae43af274444769943b80c&pid=1-s2.0-S0950584924001368-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141844371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0