Francesco Altiero, Anna Corazza, Sergio Di Martino, Adriano Peron, Luigi Libero Lucio Starace
Regression test prioritization (RTP) is an active research field, aiming at re-ordering the tests in a test suite to maximize the rate at which faults are detected. A number of RTP strategies have been proposed, leveraging different factors to reorder tests. Some techniques include an analysis of changed source code, to assign higher priority to tests stressing modified parts of the codebase. Still, most of these change-based solutions focus on simple text-level comparisons among versions. We believe that measuring source code changes in a more refined way, capable of discriminating between mere textual changes (e.g., renaming of a local variable) and more structural changes (e.g., changes in the control flow), could lead to significant benefits in RTP, under the assumption that major structural changes are also more likely to introduce faults. To this end, we propose two novel RTP techniques that leverage tree kernels (TK), a class of similarity functions largely used in Natural Language Processing on tree-structured data. In particular, we apply TKs to abstract syntax trees of source code, to more precisely quantify the extent of structural changes in the source code, and prioritize tests accordingly. We assessed the effectiveness of the proposals by conducting an empirical study on five real-world Java projects, also used in a number of RTP-related papers. We automatically generated, for each considered pair of software versions (i.e., old version, new version) in the evolution of the involved projects, 100 variations with artificially injected faults, leading to over 5k different software evolution scenarios overall. We compared the proposed prioritization approaches against well-known prioritization techniques, evaluating both their effectiveness and their execution times. Our findings show that leveraging more refined code change analysis techniques to quantify the extent of changes in source code can lead to relevant improvements in prioritization effectiveness, while typically introducing negligible overheads due to their execution.
{"title":"Regression test prioritization leveraging source code similarity with tree kernels","authors":"Francesco Altiero, Anna Corazza, Sergio Di Martino, Adriano Peron, Luigi Libero Lucio Starace","doi":"10.1002/smr.2653","DOIUrl":"10.1002/smr.2653","url":null,"abstract":"<p>Regression test prioritization (RTP) is an active research field, aiming at re-ordering the tests in a test suite to maximize the rate at which faults are detected. A number of RTP strategies have been proposed, leveraging different factors to reorder tests. Some techniques include an analysis of changed source code, to assign higher priority to tests stressing modified parts of the codebase. Still, most of these change-based solutions focus on simple text-level comparisons among versions. We believe that measuring source code changes in a more refined way, capable of discriminating between mere textual changes (e.g., renaming of a local variable) and more structural changes (e.g., changes in the control flow), could lead to significant benefits in RTP, under the assumption that major structural changes are also more likely to introduce faults. To this end, we propose two novel RTP techniques that leverage <i>tree kernels</i> (TK), a class of similarity functions largely used in Natural Language Processing on tree-structured data. In particular, we apply TKs to abstract syntax trees of source code, to more precisely quantify the extent of structural changes in the source code, and prioritize tests accordingly. We assessed the effectiveness of the proposals by conducting an empirical study on five real-world Java projects, also used in a number of RTP-related papers. We automatically generated, for each considered pair of software versions (i.e., old version, new version) in the evolution of the involved projects, 100 variations with artificially injected faults, leading to over 5k different software evolution scenarios overall. We compared the proposed prioritization approaches against well-known prioritization techniques, evaluating both their effectiveness and their execution times. Our findings show that leveraging more refined code change analysis techniques to quantify the extent of changes in source code can lead to relevant improvements in prioritization effectiveness, while typically introducing negligible overheads due to their execution.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"36 8","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luca Negrini, Vincenzo Arceri, Agostino Cortesi, Pietro Ferrara
In this paper, we introduce Tarsis, a new abstract domain based on the abstract interpretation theory that approximates string values through finite state automata. The main novelty of Tarsis is that it works over an alphabet of strings instead of single characters. On the one hand, such an approach requires a more complex and refined definition of the lattice operators and of the abstract semantics of string operators. On the other hand, it is in position to obtain strictly more precise results than state-of-the-art approaches. We compare Tarsis both with simpler domains and with the standard automata model, targeting case studies containing standard yet challenging string manipulations. The performance gain w.r.t. the standard automata model is also assessed, measuring the speed-up gained by Tarsis. Experiments confirm that Tarsis can obtain precise results without incurring in excessive computational costs.
{"title":"Tarsis: An effective automata-based abstract domain for string analysis","authors":"Luca Negrini, Vincenzo Arceri, Agostino Cortesi, Pietro Ferrara","doi":"10.1002/smr.2647","DOIUrl":"10.1002/smr.2647","url":null,"abstract":"<p>In this paper, we introduce <span>Tarsis</span>, a new abstract domain based on the abstract interpretation theory that approximates string values through finite state automata. The main novelty of <span>Tarsis</span> is that it works over an alphabet of strings instead of single characters. On the one hand, such an approach requires a more complex and refined definition of the lattice operators and of the abstract semantics of string operators. On the other hand, it is in position to obtain strictly more precise results than state-of-the-art approaches. We compare <span>Tarsis</span> both with simpler domains and with the standard automata model, targeting case studies containing standard yet challenging string manipulations. The performance gain w.r.t. the standard automata model is also assessed, measuring the speed-up gained by <span>Tarsis</span>. Experiments confirm that <span>Tarsis</span> can obtain precise results without incurring in excessive computational costs.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"36 8","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/smr.2647","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saad Shafiq, Christoph Mayr-Dorn, Atif Mashkoor, Alexander Egyed
In software development, developer turnover is among the primary reasons for project failures, leading to a great void of knowledge and strain for newcomers. Unfortunately, no established methods exist to measure how the problem domain knowledge is distributed among developers. Awareness of how this knowledge evolves and is owned by key developers in a project helps stakeholders reduce risks caused by turnover. To this end, this paper introduces a novel, realistic representation of problem domain knowledge distribution: the ConceptRealm. To construct the ConceptRealm, we employ a latent Dirichlet allocation model to represent textual features obtained from 300 K issues and 1.3 M comments from 518 open-source projects. We analyze whether the newly emerged issues and developers share similar concepts or how aligned the individual developers' concepts are with the team over time. We also investigate the impact of leaving developers on the frequency of concepts. Finally, we also evaluate the soundness of our approach on a closed-source software project, thus allowing the validation of the results from a practical standpoint. We find out that the ConceptRealm can represent the problem domain knowledge within a project and can be utilized to predict the alignment of developers with issues. We also observe that projects exhibit many keepers independent of project maturity and that abruptly leaving keepers correlates with a decline of their core concepts as the remaining developers cannot quickly familiarize themselves with those concepts.
{"title":"Balanced knowledge distribution among software development teams—Observations from open- and closed-source software development","authors":"Saad Shafiq, Christoph Mayr-Dorn, Atif Mashkoor, Alexander Egyed","doi":"10.1002/smr.2655","DOIUrl":"10.1002/smr.2655","url":null,"abstract":"<p>In software development, developer turnover is among the primary reasons for project failures, leading to a great void of knowledge and strain for newcomers. Unfortunately, no established methods exist to measure how the problem domain knowledge is distributed among developers. Awareness of how this knowledge evolves and is owned by key developers in a project helps stakeholders reduce risks caused by turnover. To this end, this paper introduces a novel, realistic representation of problem domain knowledge distribution: the <i>ConceptRealm</i>. To construct the <i>ConceptRealm</i>, we employ a latent Dirichlet allocation model to represent textual features obtained from 300 K issues and 1.3 M comments from 518 open-source projects. We analyze whether the newly emerged issues and developers share similar concepts or how aligned the individual developers' concepts are with the team over time. We also investigate the impact of leaving developers on the frequency of concepts. Finally, we also evaluate the soundness of our approach on a closed-source software project, thus allowing the validation of the results from a practical standpoint. We find out that the <i>ConceptRealm</i> can represent the problem domain knowledge within a project and can be utilized to predict the alignment of developers with issues. We also observe that projects exhibit many keepers independent of project maturity and that abruptly leaving keepers correlates with a decline of their core concepts as the remaining developers cannot quickly familiarize themselves with those concepts.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"36 8","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/smr.2655","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roxane Koitz-Hristov, Thomas Sterner, Lukas Stracke, Franz Wotawa
As software projects evolve and grow in size and complexity, so do their test suites. Test suite reduction (TSR) aims at reducing the size of a test suite by removing redundant and obsolete test cases based on a coverage metric while preserving its fault detection capabilities. The contributions of this paper are twofold: (1) We examine a lesser-known coverage criterion, that is, checked coverage. Checked coverage not only investigates if a part of the code was executed but also if it was checked by a test oracle. In an empirical evaluation, we performed TSR based on different reduction algorithms, coverage metrics, and open-source Java projects with our own TSR tool to determine the most effective and efficient combination of metric and method. (2) Given the results of the first evaluation, we further investigate the potential of parameter optimization in regard to a genetic reduction algorithm. In particular, we focus on finding a general setting for the parameters crossover rate and mutation rate such that test suites can be reduced in a reasonable time while maintaining a high fault detection power.
随着软件项目的发展、规模和复杂性的增加,其测试套件也在不断增加。测试套件缩减(TSR)的目的是根据覆盖率指标删除多余和过时的测试用例,从而缩小测试套件的规模,同时保留其故障检测能力。本文有两方面的贡献:(1)我们研究了一个鲜为人知的覆盖率标准,即校验覆盖率。校验覆盖率不仅考察代码的一部分是否被执行,还考察它是否被测试甲骨文所校验。在实证评估中,我们使用自己的 TSR 工具,根据不同的缩减算法、覆盖率度量和开源 Java 项目执行了 TSR,以确定度量和方法的最有效组合。(2) 鉴于第一次评估的结果,我们进一步研究了遗传缩减算法参数优化的潜力。特别是,我们将重点放在寻找交叉率和突变率参数的一般设置上,以便在合理的时间内减少测试套件,同时保持较高的故障检测能力。
{"title":"On the suitability of checked coverage and genetic parameter tuning in test suite reduction","authors":"Roxane Koitz-Hristov, Thomas Sterner, Lukas Stracke, Franz Wotawa","doi":"10.1002/smr.2656","DOIUrl":"10.1002/smr.2656","url":null,"abstract":"<p>As software projects evolve and grow in size and complexity, so do their test suites. Test suite reduction (TSR) aims at reducing the size of a test suite by removing redundant and obsolete test cases based on a coverage metric while preserving its fault detection capabilities. The contributions of this paper are twofold: (1) We examine a lesser-known coverage criterion, that is, checked coverage. Checked coverage not only investigates if a part of the code was executed but also if it was checked by a test oracle. In an empirical evaluation, we performed TSR based on different reduction algorithms, coverage metrics, and open-source Java projects with our own TSR tool to determine the most effective and efficient combination of metric and method. (2) Given the results of the first evaluation, we further investigate the potential of parameter optimization in regard to a genetic reduction algorithm. In particular, we focus on finding a general setting for the parameters crossover rate and mutation rate such that test suites can be reduced in a reasonable time while maintaining a high fault detection power.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"36 8","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/smr.2656","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data protection is the most prior and valuable concern as stealing personal/organizational data by cyber criminals may arise issues ranging from privacy disclosure to system hacking. Blockchain is considered a top leading technology with the influence to reshape future industries, but like other revolutionary technologies, blockchain has its merits and demerits as well. However, due to its nascent stage, blockchain technology poses numerous challenges that hinder the effective development of blockchain. To uncover this potential hindrance and group the relevant barriers, a total of 12 challenges that may hinder the progress of blockchain application development have been identified from a sample of 52 primary studies. Based on the results of our research, the six challenges are considered critical an account for more than 30% occurrence. The critical challenges are “Lack of Proper Development Tools and Technology,” “Security Precaution Measures,” “Lack of Governance and Standards,” “Lack of Professional Expertise with Essential Skills and Knowledge,” “Lack of Organizational Support,” and “Interoperability Integration.” Using a comprehensive systematic literature review (SLR) and a questionnaire survey, a list of 65 solutions/practices has been identified to address the challenges that were identified. These solutions/practices will help blockchain developers to address the identified challenges and develop benign blockchain application in future. The results of our questionnaire survey largely align with the findings of the SLR. However, there are variations in the ranking of the challenges between the two datasets. The finding of this paper is to provide insights that can assist in streamlining and optimizing the development process of blockchain applications with greater ease and efficiency.
{"title":"Challenges and solutions in the development of blockchain applications: Extraction from SLR and empirical study","authors":"Maria Nabi, Muhammad Ilyas, Jamil Ahmad","doi":"10.1002/smr.2651","DOIUrl":"10.1002/smr.2651","url":null,"abstract":"<p>Data protection is the most prior and valuable concern as stealing personal/organizational data by cyber criminals may arise issues ranging from privacy disclosure to system hacking. Blockchain is considered a top leading technology with the influence to reshape future industries, but like other revolutionary technologies, blockchain has its merits and demerits as well. However, due to its nascent stage, blockchain technology poses numerous challenges that hinder the effective development of blockchain. To uncover this potential hindrance and group the relevant barriers, a total of 12 challenges that may hinder the progress of blockchain application development have been identified from a sample of 52 primary studies. Based on the results of our research, the six challenges are considered critical an account for more than 30% occurrence. The critical challenges are “Lack of Proper Development Tools and Technology,” “Security Precaution Measures,” “Lack of Governance and Standards,” “Lack of Professional Expertise with Essential Skills and Knowledge,” “Lack of Organizational Support,” and “Interoperability Integration.” Using a comprehensive systematic literature review (SLR) and a questionnaire survey, a list of 65 solutions/practices has been identified to address the challenges that were identified. These solutions/practices will help blockchain developers to address the identified challenges and develop benign blockchain application in future. The results of our questionnaire survey largely align with the findings of the SLR. However, there are variations in the ranking of the challenges between the two datasets. The finding of this paper is to provide insights that can assist in streamlining and optimizing the development process of blockchain applications with greater ease and efficiency.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"36 8","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The software development life cycle relies heavily on the software release note, a crucial document. Various practitioners, including project managers and clients, benefit from release notes as they provide an overview of the latest software release. However, the manual generation of release notes is a time-consuming and stressful task. Researchers have recently proposed automated techniques to generate release notes, saving developers' time and enhancing their understanding of software projects. Unfortunately, the adoption of these tools in practice remains limited. To address this gap, we have taken steps to understand the expectations and requirements of practitioners regarding release note generation techniques before implementing new automated approaches. Consequently, our approach involves two main stages: First, we conduct a comprehensive review of the relevant literature and analyze existing release notes from GitHub repositories to gain insights into the current practices. Second, we conduct an online survey study to gather input from practitioners and understand their expectations regarding release notes. We have reviewed 16 papers related to release notes and explored 3347 release notes from 21 GitHub repositories. Our analysis revealed key artifacts present in release note contents, including issues (29%), pull requests (32%), commits (19%), and common vulnerabilities and exposures (CVE) issues (6%). Additionally, we conducted a survey study involving 32 professionals to understand the essential information that should be included in release notes based on users' roles. For instance, project managers were more interested in learning about new features rather than less critical bug fixes. Furthermore, we identified gaps in existing systems and essential factors to consider when implementing release notes techniques in software engineering. The insights gained from our study can guide future research directions and assist practitioners in generating release notes with relevant content, thus improving the overall quality of documentation in software development.
{"title":"Practitioners' expectations on automated release note generation techniques","authors":"Sristy Sumana Nath, Banani Roy","doi":"10.1002/smr.2657","DOIUrl":"10.1002/smr.2657","url":null,"abstract":"<p>The software development life cycle relies heavily on the software release note, a crucial document. Various practitioners, including project managers and clients, benefit from release notes as they provide an overview of the latest software release. However, the manual generation of release notes is a time-consuming and stressful task. Researchers have recently proposed automated techniques to generate release notes, saving developers' time and enhancing their understanding of software projects. Unfortunately, the adoption of these tools in practice remains limited. To address this gap, we have taken steps to understand the expectations and requirements of practitioners regarding release note generation techniques before implementing new automated approaches. Consequently, our approach involves two main stages: First, we conduct a comprehensive review of the relevant literature and analyze existing release notes from GitHub repositories to gain insights into the current practices. Second, we conduct an online survey study to gather input from practitioners and understand their expectations regarding release notes. We have reviewed 16 papers related to release notes and explored 3347 release notes from 21 GitHub repositories. Our analysis revealed key artifacts present in release note contents, including issues (29%), pull requests (32%), commits (19%), and common vulnerabilities and exposures (CVE) issues (6%). Additionally, we conducted a survey study involving 32 professionals to understand the essential information that should be included in release notes based on users' roles. For instance, project managers were more interested in learning about new features rather than less critical bug fixes. Furthermore, we identified gaps in existing systems and essential factors to consider when implementing release notes techniques in software engineering. The insights gained from our study can guide future research directions and assist practitioners in generating release notes with relevant content, thus improving the overall quality of documentation in software development.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"36 8","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139857902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edna Dias Canedo, Emille Catarine Rodrigues Cançado, Alana Paula Barbosa Mota, Ian Nery Bandeira, Pedro Henrique Teixeira Costa, Fernanda Lima, Luis Amaral, Rodrigo Bonifácio
Design Thinking techniques have been widely used in software requirements elicitation to understand the necessities of stakeholders and end-users. However, there is a lack of evidence of their effectiveness when applied to guide the development process of a system targeting vulnerable populations. What are the implications of using Design Thinking techniques to elicit requirements in a community of former inmates—and what would be the benefits of and challenges in this deployment? In this paper, we report our experience of using Design Thinking for requirements elicitation of a mobile application customized for the former inmates of the Brazilian prison system and their families. We applied techniques such as Brainstorming, Stakeholder Mapping, Personas Creation, Rapid Ethnography, and Interviews to obtain relevant data and create several prototypes. These techniques contribute to the development of an uncommon application that aims to help the reintegration process of former inmates into society. Our results validate the initial hypothesis that such techniques, when applied to a sensitive context, assist product development that meets the end-users' needs by creating a higher quality product. The main limitation of the research was the lack of access to low-literacy end-users and/or former inmates without previous experience using mobile devices.
{"title":"Using Design Thinking to break social barriers: An experience report with former inmates","authors":"Edna Dias Canedo, Emille Catarine Rodrigues Cançado, Alana Paula Barbosa Mota, Ian Nery Bandeira, Pedro Henrique Teixeira Costa, Fernanda Lima, Luis Amaral, Rodrigo Bonifácio","doi":"10.1002/smr.2648","DOIUrl":"10.1002/smr.2648","url":null,"abstract":"<p>Design Thinking techniques have been widely used in software requirements elicitation to understand the necessities of stakeholders and end-users. However, there is a lack of evidence of their effectiveness when applied to guide the development process of a system targeting vulnerable populations. What are the implications of using Design Thinking techniques to elicit requirements in a community of former inmates—and what would be the benefits of and challenges in this deployment? In this paper, we report our experience of using Design Thinking for requirements elicitation of a mobile application customized for the former inmates of the Brazilian prison system and their families. We applied techniques such as Brainstorming, Stakeholder Mapping, Personas Creation, Rapid Ethnography, and Interviews to obtain relevant data and create several prototypes. These techniques contribute to the development of an uncommon application that aims to help the reintegration process of former inmates into society. Our results validate the initial hypothesis that such techniques, when applied to a sensitive context, assist product development that meets the end-users' needs by creating a higher quality product. The main limitation of the research was the lack of access to low-literacy end-users and/or former inmates without previous experience using mobile devices.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"36 7","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinjie Wei, Jie Wang, Chang-ai Sun, Dave Towey, Shoufeng Zhang, Wanqing Zuo, Yiming Yu, Ruoyi Ruan, Guyang Song
Distributed systems have been widely used in many safety-critical areas. Any abnormalities (e.g., service interruption or service quality degradation) could lead to application crashes or decrease user satisfaction. These things may cause serious economic losses. Among the various quality assurance approaches for distributed systems, log-based anomaly detection (LAD) has become a popular research topic. Its popularity relates to system logs being able to record and reveal important run-time information. This paper presents a general LAD framework for distributed systems. Log grouping and feature-pattern mining are two crucial LAD components that impact on the anomaly-detection effectiveness. We also present a systematic survey of techniques in these two directions; propose classification frameworks for log grouping and feature patterns; and summarize four log-grouping techniques and five feature patterns (which refer to invariant relationships among logs that can be used for anomaly detection). To evaluate their applicability, we report on the findings when applying existing techniques to Ray, a popular industrial distributed system. Based on these findings, several open issues are identified, which provide potential guidance for future research and development.
分布式系统已广泛应用于许多安全关键领域。任何异常情况(如服务中断或服务质量下降)都可能导致应用程序崩溃或用户满意度下降。这些都可能造成严重的经济损失。在分布式系统的各种质量保证方法中,基于日志的异常检测(LAD)已成为一个热门研究课题。它的流行与系统日志能够记录和揭示重要的运行时信息有关。本文介绍了一种适用于分布式系统的通用 LAD 框架。日志分组和特征模式挖掘是影响异常检测效果的两个关键 LAD 组成部分。我们还对这两个方向的技术进行了系统调查,提出了日志分组和特征模式的分类框架,并总结了四种日志分组技术和五种特征模式(指日志之间的不变关系,可用于异常检测)。为了评估这些技术的适用性,我们报告了将现有技术应用于流行的工业分布式系统 Ray 时的发现。基于这些发现,我们确定了几个有待解决的问题,为未来的研究和开发提供了潜在的指导。
{"title":"Log-based anomaly detection for distributed systems: State of the art, industry experience, and open issues","authors":"Xinjie Wei, Jie Wang, Chang-ai Sun, Dave Towey, Shoufeng Zhang, Wanqing Zuo, Yiming Yu, Ruoyi Ruan, Guyang Song","doi":"10.1002/smr.2650","DOIUrl":"10.1002/smr.2650","url":null,"abstract":"<p>Distributed systems have been widely used in many safety-critical areas. Any abnormalities (e.g., service interruption or service quality degradation) could lead to application crashes or decrease user satisfaction. These things may cause serious economic losses. Among the various quality assurance approaches for distributed systems, log-based anomaly detection (LAD) has become a popular research topic. Its popularity relates to system logs being able to record and reveal important run-time information. This paper presents a general LAD framework for distributed systems. Log grouping and feature-pattern mining are two crucial LAD components that impact on the anomaly-detection effectiveness. We also present a systematic survey of techniques in these two directions; propose classification frameworks for log grouping and feature patterns; and summarize four log-grouping techniques and five feature patterns (which refer to invariant relationships among logs that can be used for anomaly detection). To evaluate their applicability, we report on the findings when applying existing techniques to Ray, a popular industrial distributed system. Based on these findings, several open issues are identified, which provide potential guidance for future research and development.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"36 8","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139772939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Code clone detection is a critical problem in software development and maintenance domains. It aims to identify functionally identical or similar code fragments within an application. Existing works formulate the code clone detection task as a binary classification problem which predicts a code pair as a clone or not based on a pre-defined threshold. In reality, there are various types of code clone subject to the degree of how a pair of code fragments are similar to each other. To investigate the effect of different code clone detection manners on the clone detection result, we propose Gated Recurrent Residual Learning Networks (GRRLN), a novel neural network model for code clone detection. To train GRRLN, we first represent each code fragment as a statement-level tree sequence derived from the whole abstract syntax tree (AST). Then, a gated recurrent neural network with residual connections is adopted to fully extract the semantics of all individual statement trees together with their dependency relationships across the input statement sequence. Finally, the output representations of code fragments by GRRLN are used for similarity calculation and clone detection. We evaluate GRRLN using two real-world datasets for code clone detection and clone type classification. Experiments show that GRRLN achieves promising and compelling results and meanwhile needs significantly less time and memory consumption compared with the state-of-the-art methods.
{"title":"GRRLN: Gated Recurrent Residual Learning Networks for code clone detection","authors":"Xiangping Zhang, Jianxun Liu, Min Shi","doi":"10.1002/smr.2649","DOIUrl":"10.1002/smr.2649","url":null,"abstract":"<p>Code clone detection is a critical problem in software development and maintenance domains. It aims to identify functionally identical or similar code fragments within an application. Existing works formulate the code clone detection task as a binary classification problem which predicts a code pair as a clone or not based on a pre-defined threshold. In reality, there are various types of code clone subject to the degree of how a pair of code fragments are similar to each other. To investigate the effect of different code clone detection manners on the clone detection result, we propose Gated Recurrent Residual Learning Networks (GRRLN), a novel neural network model for code clone detection. To train GRRLN, we first represent each code fragment as a statement-level tree sequence derived from the whole abstract syntax tree (AST). Then, a gated recurrent neural network with residual connections is adopted to fully extract the semantics of all individual statement trees together with their dependency relationships across the input statement sequence. Finally, the output representations of code fragments by GRRLN are used for similarity calculation and clone detection. We evaluate GRRLN using two real-world datasets for code clone detection and clone type classification. Experiments show that GRRLN achieves promising and compelling results and meanwhile needs significantly less time and memory consumption compared with the state-of-the-art methods.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"36 7","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139856635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Shameem, Chiranjeev Kumar, Bibhas Chandra, Arif Ali Khan, Md. Nadeem Ahmed, J. M. Verner, Mohammad Nadeem, Muhammad Azeem Akbar
Software development projects depend on collaborative teams. In the past 50 years of research, various studies have explored the effect of software engineer's personality traits and cultural values on team performance. These studies have led to better understand these relationships; however, how the personality traits and cultural values influence the team effectiveness is still far away in the literature. This research aims to investigate the relationships between social and psychological complexities (including personality traits and cultural values), team coordination, team motivation, and team success (which comprises team effectiveness and team climate) to explore the impact of social and psychological issues on software team success. An online survey targeting software development professionals and unstructured interviews were followed for data collection. We received 112 responses from software developers working in different countries. Findings indicate that personality traits and cultural values, that is, consciousness, openness, harmony, and autonomy, have positive relationship with team coordination effectiveness, while other factors such as neuroticism, embeddedness, hierarchy, and mastery were found to be related negatively with it. These negative relationships can be mitigated by motivating team members appropriately. Based on our research findings, we conclude that the negative impact caused by different personality and cultural traits could be reduced by improving team coordination effectiveness using effective motivation.
{"title":"The impact of personality traits and cultural values on coordination effectiveness: A study of software development teams effectiveness","authors":"Mohammad Shameem, Chiranjeev Kumar, Bibhas Chandra, Arif Ali Khan, Md. Nadeem Ahmed, J. M. Verner, Mohammad Nadeem, Muhammad Azeem Akbar","doi":"10.1002/smr.2652","DOIUrl":"10.1002/smr.2652","url":null,"abstract":"<p>Software development projects depend on collaborative teams. In the past 50 years of research, various studies have explored the effect of software engineer's personality traits and cultural values on team performance. These studies have led to better understand these relationships; however, how the personality traits and cultural values influence the team effectiveness is still far away in the literature. This research aims to investigate the relationships between social and psychological complexities (including personality traits and cultural values), team coordination, team motivation, and team success (which comprises team effectiveness and team climate) to explore the impact of social and psychological issues on software team success. An online survey targeting software development professionals and unstructured interviews were followed for data collection. We received 112 responses from software developers working in different countries. Findings indicate that personality traits and cultural values, that is, consciousness, openness, harmony, and autonomy, have positive relationship with team coordination effectiveness, while other factors such as neuroticism, embeddedness, hierarchy, and mastery were found to be related negatively with it. These negative relationships can be mitigated by motivating team members appropriately. Based on our research findings, we conclude that the negative impact caused by different personality and cultural traits could be reduced by improving team coordination effectiveness using effective motivation.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"36 7","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}