Pub Date : 2024-06-08DOI: 10.1007/s10664-024-10454-8
Mohammad Jamil Ahmad, Katerina Goseva-Popstojanova, Robyn R. Lutz
This paper aims to improve software fault-proneness prediction by investigating the unexplored effects on classification performance of the temporal decisions made by practitioners and researchers regarding (i) the interval for which they will collect longitudinal features (software metrics data), and (ii) the interval for which they will predict software bugs (the target variable). We call these specifics of the data used for training and of the target variable being predicted the learning approach, and explore the impact of the two most common learning approaches on the performance of software fault-proneness prediction, both within a single release of a software product and across releases. The paper presents empirical results from a study based on data extracted from 64 releases of twelve open-source projects. Results show that the learning approach has a substantial, and typically unacknowledged, impact on classification performance. Specifically, we show that one learning approach leads to significantly better performance than the other, both within-release and across-releases. Furthermore, this paper uncovers that, for within-release predictions, the difference in classification performance is due to different levels of class imbalance in the two learning approaches. Our findings show that improved specification of the learning approach is essential to understanding and explaining the performance of fault-proneness prediction models, as well as to avoiding misleading comparisons among them. The paper concludes with some practical recommendations and research directions based on our findings toward improved software fault-proneness prediction.
{"title":"The untold impact of learning approaches on software fault-proneness predictions: an analysis of temporal aspects","authors":"Mohammad Jamil Ahmad, Katerina Goseva-Popstojanova, Robyn R. Lutz","doi":"10.1007/s10664-024-10454-8","DOIUrl":"https://doi.org/10.1007/s10664-024-10454-8","url":null,"abstract":"<p>This paper aims to improve software fault-proneness prediction by investigating the unexplored effects on classification performance of the temporal decisions made by practitioners and researchers regarding (i) the interval for which they will collect longitudinal features (software metrics data), and (ii) the interval for which they will predict software bugs (the target variable). We call these specifics of the data used for training and of the target variable being predicted the <i>learning approach</i>, and explore the impact of the two most common learning approaches on the performance of software fault-proneness prediction, both within a single release of a software product and across releases. The paper presents empirical results from a study based on data extracted from 64 releases of twelve open-source projects. Results show that the learning approach has a substantial, and typically unacknowledged, impact on classification performance. Specifically, we show that one learning approach leads to significantly better performance than the other, both within-release and across-releases. Furthermore, this paper uncovers that, for within-release predictions, the difference in classification performance is due to different levels of class imbalance in the two learning approaches. Our findings show that improved specification of the learning approach is essential to understanding and explaining the performance of fault-proneness prediction models, as well as to avoiding misleading comparisons among them. The paper concludes with some practical recommendations and research directions based on our findings toward improved software fault-proneness prediction.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"65 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141521632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The COVID-19 pandemic changed the way we live, work and the way we conduct research. With the restrictions of lockdowns and social distancing, various impacts were experienced by many software engineering researchers, especially whose studies depend on human participants. We conducted a mixed methods study to understand the extent of this impact. Through a detailed survey with 89 software engineering researchers working with human participants around the world and a further nine follow-up interviews, we identified the key challenges faced, the adaptations made, and the surprising fringe benefits of conducting research involving human participants during the pandemic. Our findings also revealed that in retrospect, many researchers did not wish to revert to the old ways of conducting human-orienfted research. Based on our analysis and insights, we share recommendations on how to conduct remote studies with human participants effectively in an increasingly hybrid world when face-to-face engagement is not possible or where remote participation is preferred.
{"title":"Challenges, adaptations, and fringe benefits of conducting software engineering research with human participants during the COVID-19 pandemic","authors":"Anuradha Madugalla, Tanjila Kanij, Rashina Hoda, Dulaji Hidellaarachchi, Aastha Pant, Samia Ferdousi, John Grundy","doi":"10.1007/s10664-024-10490-4","DOIUrl":"https://doi.org/10.1007/s10664-024-10490-4","url":null,"abstract":"<p>The COVID-19 pandemic changed the way we live, work and the way we conduct research. With the restrictions of lockdowns and social distancing, various impacts were experienced by many software engineering researchers, especially whose studies depend on human participants. We conducted a mixed methods study to understand the extent of this impact. Through a detailed survey with 89 software engineering researchers working with human participants around the world and a further nine follow-up interviews, we identified the key challenges faced, the adaptations made, and the surprising fringe benefits of conducting research involving human participants during the pandemic. Our findings also revealed that in retrospect, many researchers did not wish to revert to the old ways of conducting human-orienfted research. Based on our analysis and insights, we share recommendations on how to conduct remote studies with human participants effectively in an increasingly hybrid world when face-to-face engagement is not possible or where remote participation is preferred.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"238 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141521635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-05DOI: 10.1007/s10664-024-10487-z
Xingfang Wu, Eric Laufer, Heng Li, Foutse Khomh, Santhosh Srinivasan, Jayden Luo
With the rapid growth of the developer community, the amount of posts on online technical forums has been growing rapidly, which poses difficulties for users to filter useful posts and find important information. Tags provide a concise feature dimension for users to locate their interested posts and for search engines to index the most relevant posts according to the queries. Most tags are only focused on the technical perspective (e.g., program language, platform, tool). In most cases, forum posts in online developer communities reveal the author’s intentions to solve a problem, ask for advice, share information, etc. The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy. By referencing previous studies and learning from industrial perspectives, we create a refined taxonomy for the intentions of technical forum posts. Through manual labeling and analysis on a sampled post dataset extracted from online forums, we understand the relevance between the constitution of posts (code, error messages) and their intentions. Furthermore, inspired by our manual study, we design a pre-trained transformer-based model to automatically predict post intentions. The best variant of our intention prediction framework, which achieves a Micro F1-score of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787, outperforms the state-of-the-art baseline approach. Our characterization and automated classification of forum posts regarding their intentions may help forum maintainers or third-party tool developers improve the organization and retrieval of posts on technical forums.
随着开发人员社区的迅速发展,在线技术论坛上的帖子数量也在快速增长,这给用户筛选有用帖子和查找重要信息带来了困难。标签为用户提供了一个简明的功能维度,便于他们查找感兴趣的帖子,也便于搜索引擎根据查询结果索引最相关的帖子。大多数标签只侧重于技术角度(如程序语言、平台、工具)。在大多数情况下,在线开发者社区的论坛帖子会显示作者解决问题、寻求建议、分享信息等的意图。对帖子意图的建模可以为当前的标签分类法提供一个额外的维度。通过参考以前的研究并借鉴行业观点,我们为技术论坛帖子的意图创建了一个完善的分类标准。通过对从在线论坛中提取的帖子数据集进行手动标记和分析,我们了解了帖子的构成(代码、错误信息)与其意图之间的相关性。此外,受人工研究的启发,我们设计了一个基于转换器的预训练模型来自动预测帖子的意图。我们的意图预测框架的最佳变体取得了 0.589 的 Micro F1 分数、62.6% 到 87.8% 的 Top 1-3 准确率和 0.787 的平均 AUC,优于最先进的基线方法。我们对论坛帖子意图的表征和自动分类可以帮助论坛维护者或第三方工具开发人员改进技术论坛帖子的组织和检索。
{"title":"Characterizing and classifying developer forum posts with their intentions","authors":"Xingfang Wu, Eric Laufer, Heng Li, Foutse Khomh, Santhosh Srinivasan, Jayden Luo","doi":"10.1007/s10664-024-10487-z","DOIUrl":"https://doi.org/10.1007/s10664-024-10487-z","url":null,"abstract":"<p>With the rapid growth of the developer community, the amount of posts on online technical forums has been growing rapidly, which poses difficulties for users to filter useful posts and find important information. Tags provide a concise feature dimension for users to locate their interested posts and for search engines to index the most relevant posts according to the queries. Most tags are only focused on the technical perspective (e.g., program language, platform, tool). In most cases, forum posts in online developer communities reveal the author’s intentions to solve a problem, ask for advice, share information, etc. The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy. By referencing previous studies and learning from industrial perspectives, we create a refined taxonomy for the intentions of technical forum posts. Through manual labeling and analysis on a sampled post dataset extracted from online forums, we understand the relevance between the constitution of posts (code, error messages) and their intentions. Furthermore, inspired by our manual study, we design a pre-trained transformer-based model to automatically predict post intentions. The best variant of our intention prediction framework, which achieves a Micro F1-score of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787, outperforms the state-of-the-art baseline approach. Our characterization and automated classification of forum posts regarding their intentions may help forum maintainers or third-party tool developers improve the organization and retrieval of posts on technical forums.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"25 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141255989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-05DOI: 10.1007/s10664-024-10448-6
Zeyang Ma, Shouvick Mondal, Tse-Hsun (Peter) Chen, Haoxiang Zhang, Ahmed E. Hassan
Developers rely on software ecosystems such as Maven to manage and reuse external libraries (i.e., dependencies). Due to the complexity of the used dependencies, developers may face challenges in choosing which library to use and whether they should upgrade or downgrade a library. One important factor that affects this decision is the number of potential vulnerabilities in a library and its dependencies. Therefore, state-of-the-art platforms such as Maven Repository (MVN) and Open Source Insights (OSI) help developers in making such a decision by presenting vulnerability information associated with every dependency. In this paper, we first conduct an empirical study to understand how the two platforms, MVN and OSI, present and categorize vulnerability information. We found that these two platforms may either overestimate or underestimate the number of associated vulnerabilities in a dependency, and they lack prioritization mechanisms on which dependencies are more likely to cause an issue. Hence, we propose a tool named VulNet to address the limitations we found in MVN and OSI. Through an evaluation of 19,886 versions of the top 200 popular libraries, we find VulNet includes 90.5% and 65.8% of the dependencies that were omitted by MVN and OSI, respectively. VulNet also helps reduce 27% of potentially unreachable or less impactful vulnerabilities listed by OSI in test dependencies. Finally, our user study with 24 participants gave VulNet an average rating of 4.5/5 in presenting and prioritizing vulnerable dependencies, compared to 2.83 (MVN) and 3.14 (OSI).
开发人员依赖 Maven 等软件生态系统来管理和重用外部库(即依赖库)。由于所用依赖库的复杂性,开发人员在选择使用哪个库以及是否应升级或降级某个库时可能会面临挑战。影响这一决定的一个重要因素是库及其依赖关系中潜在漏洞的数量。因此,最先进的平台,如 Maven Repository (MVN) 和 Open Source Insights (OSI),通过提供与每个依赖关系相关的漏洞信息,帮助开发人员做出这样的决定。在本文中,我们首先进行了一项实证研究,以了解 MVN 和 OSI 这两个平台是如何呈现和分类漏洞信息的。我们发现,这两个平台可能会高估或低估依赖关系中相关漏洞的数量,而且它们缺乏优先排序机制,无法确定哪些依赖关系更有可能导致问题。因此,我们提出了一个名为 VulNet 的工具,以解决我们在 MVN 和 OSI 中发现的局限性。通过对前 200 个流行库的 19886 个版本进行评估,我们发现 VulNet 分别包含了 MVN 和 OSI 遗漏的 90.5% 和 65.8% 的依赖关系。VulNet 还帮助减少了 OSI 在测试依赖项中列出的 27% 可能无法访问或影响较小的漏洞。最后,我们对 24 位参与者进行的用户研究显示,VulNet 在呈现和优先处理易受攻击的依赖性方面的平均评分为 4.5/5,而 MVN 和 OSI 的评分分别为 2.83 和 3.14。
{"title":"VulNet: Towards improving vulnerability management in the Maven ecosystem","authors":"Zeyang Ma, Shouvick Mondal, Tse-Hsun (Peter) Chen, Haoxiang Zhang, Ahmed E. Hassan","doi":"10.1007/s10664-024-10448-6","DOIUrl":"https://doi.org/10.1007/s10664-024-10448-6","url":null,"abstract":"<p>Developers rely on software ecosystems such as Maven to manage and reuse external libraries (i.e., dependencies). Due to the complexity of the used dependencies, developers may face challenges in choosing which library to use and whether they should upgrade or downgrade a library. One important factor that affects this decision is the number of potential vulnerabilities in a library and its dependencies. Therefore, state-of-the-art platforms such as Maven Repository (MVN) and Open Source Insights (OSI) help developers in making such a decision by presenting vulnerability information associated with every dependency. In this paper, we first conduct an empirical study to understand how the two platforms, MVN and OSI, present and categorize vulnerability information. We found that these two platforms may either overestimate or underestimate the number of associated vulnerabilities in a dependency, and they lack prioritization mechanisms on which dependencies are more likely to cause an issue. Hence, we propose a tool named VulNet to address the limitations we found in MVN and OSI. Through an evaluation of 19,886 versions of the top 200 popular libraries, we find VulNet includes 90.5% and 65.8% of the dependencies that were omitted by MVN and OSI, respectively. VulNet also helps reduce 27% of potentially unreachable or less impactful vulnerabilities listed by OSI in test dependencies. Finally, our user study with 24 participants gave VulNet an average rating of 4.5/5 in presenting and prioritizing vulnerable dependencies, compared to 2.83 (MVN) and 3.14 (OSI).</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"38 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-05DOI: 10.1007/s10664-024-10494-0
Antônio José A. da Silva, Renan G. Vieira, Diego P. P. Mesquita, João Paulo P. Gomes, Lincoln S. Rocha
<h3 data-test="abstract-sub-heading">Context</h3><p>Exception handling (EH) bugs stem from incorrect usage of exception handling mechanisms (EHMs) and often incur severe consequences (e.g., system downtime, data loss, and security risk). Tracking EH bugs is particularly relevant for contemporary systems (e.g., cloud- and AI-based systems), in which the software’s sophisticated logic is an additional threat to the correct use of the EHM. On top of that, bug reporters seldom can tag EH bugs — since it may require an encompassing knowledge of the software’s EH strategy. Surprisingly, to the best of our knowledge, there is no automated procedure to identify EH bugs from report descriptions.</p><h3 data-test="abstract-sub-heading">Objective</h3><p>First, we aim to evaluate the extent to which Natural Language Processing (NLP) and Machine Learning (ML) can be used to reliably label EH bugs using the text fields from bug reports (e.g., summary, description, and comments). Second, we aim to provide a reliably labeled dataset that the community can use in future endeavors. Overall, we expect our work to raise the community’s awareness regarding the importance of EH bugs.</p><h3 data-test="abstract-sub-heading">Method</h3><p>We manually analyzed 4,516 bug reports from the four main components of Apache’s Hadoop project, out of which we labeled <span>(approx 20%)</span> (943) as EH bugs. We also labeled 2,584 non-EH bugs analyzing their bug-fixing code and creating a dataset composed of 7,100 bug reports. Then, we used word embedding techniques (Bag-of-Words and TF-IDF) to summarize the textual fields of bug reports. Subsequently, we used these embeddings to fit five classes of ML methods and evaluate them on unseen data. We also evaluated a pre-trained transformer-based model using the complete textual fields. We have also evaluated whether considering only EH keywords is enough to achieve high predictive performance.</p><h3 data-test="abstract-sub-heading">Results</h3><p>Our results show that using a pre-trained DistilBERT with a linear layer trained with our proposed dataset can reasonably label EH bugs, achieving ROC-AUC scores of up to 0.88. The combination of NLP and ML traditional techniques achieved ROC-AUC scores of up to 0.74 and recall up to 0.56. As a sanity check, we also evaluate methods using embeddings extracted solely from keywords. Considering ROC-AUC as the primary concern, for the majority of ML methods tested, the analysis suggests that keywords alone are not sufficient to characterize reports of EH bugs, although this can change based on other metrics (such as recall and precision) or ML methods (e.g., Random Forest).</p><h3 data-test="abstract-sub-heading">Conclusions</h3><p>To the best of our knowledge, this is the first study addressing the problem of automatic labeling of EH bugs. Based on our results, we can conclude that the use of ML techniques, specially transformer-base models, sounds promising to automate the task of labeling
{"title":"Towards automatic labeling of exception handling bugs: A case study of 10 years bug-fixing in Apache Hadoop","authors":"Antônio José A. da Silva, Renan G. Vieira, Diego P. P. Mesquita, João Paulo P. Gomes, Lincoln S. Rocha","doi":"10.1007/s10664-024-10494-0","DOIUrl":"https://doi.org/10.1007/s10664-024-10494-0","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Exception handling (EH) bugs stem from incorrect usage of exception handling mechanisms (EHMs) and often incur severe consequences (e.g., system downtime, data loss, and security risk). Tracking EH bugs is particularly relevant for contemporary systems (e.g., cloud- and AI-based systems), in which the software’s sophisticated logic is an additional threat to the correct use of the EHM. On top of that, bug reporters seldom can tag EH bugs — since it may require an encompassing knowledge of the software’s EH strategy. Surprisingly, to the best of our knowledge, there is no automated procedure to identify EH bugs from report descriptions.</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>First, we aim to evaluate the extent to which Natural Language Processing (NLP) and Machine Learning (ML) can be used to reliably label EH bugs using the text fields from bug reports (e.g., summary, description, and comments). Second, we aim to provide a reliably labeled dataset that the community can use in future endeavors. Overall, we expect our work to raise the community’s awareness regarding the importance of EH bugs.</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>We manually analyzed 4,516 bug reports from the four main components of Apache’s Hadoop project, out of which we labeled <span>(approx 20%)</span> (943) as EH bugs. We also labeled 2,584 non-EH bugs analyzing their bug-fixing code and creating a dataset composed of 7,100 bug reports. Then, we used word embedding techniques (Bag-of-Words and TF-IDF) to summarize the textual fields of bug reports. Subsequently, we used these embeddings to fit five classes of ML methods and evaluate them on unseen data. We also evaluated a pre-trained transformer-based model using the complete textual fields. We have also evaluated whether considering only EH keywords is enough to achieve high predictive performance.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Our results show that using a pre-trained DistilBERT with a linear layer trained with our proposed dataset can reasonably label EH bugs, achieving ROC-AUC scores of up to 0.88. The combination of NLP and ML traditional techniques achieved ROC-AUC scores of up to 0.74 and recall up to 0.56. As a sanity check, we also evaluate methods using embeddings extracted solely from keywords. Considering ROC-AUC as the primary concern, for the majority of ML methods tested, the analysis suggests that keywords alone are not sufficient to characterize reports of EH bugs, although this can change based on other metrics (such as recall and precision) or ML methods (e.g., Random Forest).</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>To the best of our knowledge, this is the first study addressing the problem of automatic labeling of EH bugs. Based on our results, we can conclude that the use of ML techniques, specially transformer-base models, sounds promising to automate the task of labeling","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"23 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-04DOI: 10.1007/s10664-024-10493-1
Tavian Barnes, Ken Jen Lee, Cristina Tavares, Gema Rodríguez-Pérez, Meiyappan Nagappan
The traditional path to a software engineering career usually involves a post-secondary diploma in Software Engineering, Computer Science, or a related field. However, many individuals working as software engineers take a non-traditional path to their careers, starting from other industries or fields of study. This paper explores the barriers that individuals with non-traditional educational and occupational backgrounds face when pursuing a software engineering career and potential strategies to overcome those barriers. A two-stage methodology was used, consisting of an exploratory study followed by a follow-up survey. The exploratory study consisted of a grounded-theory-based qualitative analysis of relevant Reddit data to yield a framework around the barriers and possible mitigation strategies. These findings were then supplemented through a follow-up survey. Understanding these barriers and what strategies could be effective is an important step towards making software engineering more accessible to individuals with non-traditional backgrounds. In addition to fostering functional diversity, this might also serve to tackle labor shortages within the software engineering industry.
{"title":"Towards understanding barriers and mitigation strategies of software engineers with non-traditional educational and occupational backgrounds","authors":"Tavian Barnes, Ken Jen Lee, Cristina Tavares, Gema Rodríguez-Pérez, Meiyappan Nagappan","doi":"10.1007/s10664-024-10493-1","DOIUrl":"https://doi.org/10.1007/s10664-024-10493-1","url":null,"abstract":"<p>The traditional path to a software engineering career usually involves a post-secondary diploma in Software Engineering, Computer Science, or a related field. However, many individuals working as software engineers take a non-traditional path to their careers, starting from other industries or fields of study. This paper explores the barriers that individuals with non-traditional educational and occupational backgrounds face when pursuing a software engineering career and potential strategies to overcome those barriers. A two-stage methodology was used, consisting of an exploratory study followed by a follow-up survey. The exploratory study consisted of a grounded-theory-based qualitative analysis of relevant Reddit data to yield a framework around the barriers and possible mitigation strategies. These findings were then supplemented through a follow-up survey. Understanding these barriers and what strategies could be effective is an important step towards making software engineering more accessible to individuals with non-traditional backgrounds. In addition to fostering functional diversity, this might also serve to tackle labor shortages within the software engineering industry.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"34 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-04DOI: 10.1007/s10664-024-10480-6
Musengamana Jean de Dieu, Peng Liang, Mojtaba Shahin, Chen Yang, Zengyang Li
Mining Software Repositories (MSR) has become an essential activity in software development. Mining architectural information (e.g., architectural models) to support architecting activities, such as architecture understanding, has received significant attention in recent years. However, there is a lack of clarity on what literature on mining architectural information is available. Consequently, this may create difficulty for practitioners to understand and adopt the state-of-the-art research results, such as what approaches should be adopted to mine what architectural information in order to support architecting activities. It also hinders researchers from being aware of the challenges and remedies for the identified research gaps. We aim to identify, analyze, and synthesize the literature on mining architectural information in software repositories in terms of architectural information and sources mined, architecting activities supported, approaches and tools used, and challenges faced. A Systematic Mapping Study (SMS) has been conducted on the literature published between January 2006 and December 2022. Of the 104 primary studies finally selected, 7 categories of architectural information have been mined, among which architectural description is the most mined architectural information; 11 categories of sources have been leveraged for mining architectural information, among which version control system (e.g., GitHub) is the most popular source; 11 architecting activities can be supported by the mined architectural information, among which architecture understanding is the most supported activity; 95 approaches and 56 tools were proposed and employed in mining architectural information; and 4 types of challenges in mining architectural information were identified. This SMS provides researchers with promising future directions and help practitioners be aware of what approaches and tools can be used to mine what architectural information from what sources to support various architecting activities.
{"title":"Mining architectural information: A systematic mapping study","authors":"Musengamana Jean de Dieu, Peng Liang, Mojtaba Shahin, Chen Yang, Zengyang Li","doi":"10.1007/s10664-024-10480-6","DOIUrl":"https://doi.org/10.1007/s10664-024-10480-6","url":null,"abstract":"<p>Mining Software Repositories (MSR) has become an essential activity in software development. Mining architectural information (e.g., architectural models) to support architecting activities, such as architecture understanding, has received significant attention in recent years. However, there is a lack of clarity on what literature on mining architectural information is available. Consequently, this may create difficulty for practitioners to understand and adopt the state-of-the-art research results, such as what approaches should be adopted to mine what architectural information in order to support architecting activities. It also hinders researchers from being aware of the challenges and remedies for the identified research gaps. We aim to identify, analyze, and synthesize the literature on mining architectural information in software repositories in terms of architectural information and sources mined, architecting activities supported, approaches and tools used, and challenges faced. A Systematic Mapping Study (SMS) has been conducted on the literature published between January 2006 and December 2022. Of the 104 primary studies finally selected, 7 categories of architectural information have been mined, among which architectural description is the most mined architectural information; 11 categories of sources have been leveraged for mining architectural information, among which version control system (e.g., GitHub) is the most popular source; 11 architecting activities can be supported by the mined architectural information, among which architecture understanding is the most supported activity; 95 approaches and 56 tools were proposed and employed in mining architectural information; and 4 types of challenges in mining architectural information were identified. This SMS provides researchers with promising future directions and help practitioners be aware of what approaches and tools can be used to mine what architectural information from what sources to support various architecting activities.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"8 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141255987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-04DOI: 10.1007/s10664-024-10485-1
Amir M. Ebrahimi, Bram Adams, Gustavo A. Oliva, Ahmed E. Hassan
The proxy pattern is a well-known design pattern with numerous use cases in several sectors of the software industry (e.g., network applications, microservices, and IoT). As such, the use of the proxy pattern is also a common approach in the development of complex decentralized applications (DApps) on the Ethereum blockchain. A contract that implements the proxy pattern (proxy contract) acts as a layer between the clients and the target contract, enabling greater flexibility (e.g., data validation checks) and upgradeability (e.g., online smart contract replacement with zero downtime) in DApp development. Despite the importance of proxy contracts, little is known about (i) how their prevalence changed over time, (ii) the ways in which developers integrate proxies in the design of DApps, and (iii) what proxy types are being most commonly leveraged by developers. In this paper, we present a large-scale exploratory study on the use of the proxy pattern in Ethereum. We analyze a dataset of all Ethereum smart contracts as of Sep. 2022 containing 50M smart contracts and 1.6B transactions, and apply both quantitative and qualitative methods in order to (i) determine the prevalence of proxy contracts, (ii) understand the ways they are deployed and integrated into applications, and (iii) uncover the prevalence of different types of proxy contracts. Our findings reveal that 14.2% of all deployed smart contracts are proxy contracts. We show that proxy contracts are being more actively used than non-proxy contracts. Also, the usage of proxy contracts in various contexts, transactions involving proxy contracts, and adoption of proxy contracts by users have shown an upward trend over time, peaking at the end of our study period. They are either deployed through off-chain scripts or on-chain factory contracts, with the former and latter being employed in 39.1% and 60.9% of identified usage contexts in turn. We found that while the majority (67.8%) of proxies act as an interceptor, 32.2% enables upgradeability. Proxy contracts are typically (79%) implemented based on known reference implementations with 29.4% being of type ERC-1167, a class of proxies that aims to cheaply reuse and clone contracts’ functionality. Our evaluation shows that our proposed behavioral proxy detection method has a precision and recall of 100% in detecting active proxies. Finally, we derive a set of practical recommendations for developers and introduce open research questions to guide future research on the topic.
{"title":"A large-scale exploratory study on the proxy pattern in Ethereum","authors":"Amir M. Ebrahimi, Bram Adams, Gustavo A. Oliva, Ahmed E. Hassan","doi":"10.1007/s10664-024-10485-1","DOIUrl":"https://doi.org/10.1007/s10664-024-10485-1","url":null,"abstract":"<p>The proxy pattern is a well-known design pattern with numerous use cases in several sectors of the software industry (e.g., network applications, microservices, and IoT). As such, the use of the proxy pattern is also a common approach in the development of complex decentralized applications (DApps) on the Ethereum blockchain. A contract that implements the proxy pattern (proxy contract) acts as a layer between the clients and the target contract, enabling greater flexibility (e.g., data validation checks) and upgradeability (e.g., online smart contract replacement with zero downtime) in DApp development. Despite the importance of proxy contracts, little is known about (i) how their prevalence changed over time, (ii) the ways in which developers integrate proxies in the design of DApps, and (iii) what proxy types are being most commonly leveraged by developers. In this paper, we present a large-scale exploratory study on the use of the proxy pattern in Ethereum. We analyze a dataset of all Ethereum smart contracts as of Sep. 2022 containing 50M smart contracts and 1.6B transactions, and apply both quantitative and qualitative methods in order to (i) determine the prevalence of proxy contracts, (ii) understand the ways they are deployed and integrated into applications, and (iii) uncover the prevalence of different types of proxy contracts. Our findings reveal that 14.2% of all deployed smart contracts are proxy contracts. We show that proxy contracts are being more actively used than non-proxy contracts. Also, the usage of proxy contracts in various contexts, transactions involving proxy contracts, and adoption of proxy contracts by users have shown an upward trend over time, peaking at the end of our study period. They are either deployed through off-chain scripts or on-chain factory contracts, with the former and latter being employed in 39.1% and 60.9% of identified usage contexts in turn. We found that while the majority (67.8%) of proxies act as an interceptor, 32.2% enables upgradeability. Proxy contracts are typically (79%) implemented based on known reference implementations with 29.4% being of type ERC-1167, a class of proxies that aims to cheaply reuse and clone contracts’ functionality. Our evaluation shows that our proposed behavioral proxy detection method has a precision and recall of 100% in detecting active proxies. Finally, we derive a set of practical recommendations for developers and introduce open research questions to guide future research on the topic.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"51 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-04DOI: 10.1007/s10664-024-10495-z
Siyu Jiang, Yuwen Chen, Zhenhang He, Yunpeng Shang, Le Ma
Cross-Project Defect Prediction (CPDP) is a promising research field that focuses on detecting defects in projects with limited labeled data by utilizing prediction models trained on projects with abundant data. However, previous CPDP approaches based on Abstract Syntax Tree (AST) have often encountered challenges in effectively acquiring semantic and syntactic information, resulting in their limited ability to combine both productively. This issue arises primarily from the practice of flattening the AST into a linear sequence in many AST-based methods, leading to the loss of hierarchical syntactic structure and structural information within the code. Besides, other AST-based methods use a recursive way to traverse the tree-structured AST, which is susceptible to gradient vanishing. To alleviate these concerns, we introduce a novel CPDP method named defect prediction via Semantic and Syntactic Encoding (SSE) that enhances Zhang’s approach by encoding semantic and syntactic information while retaining and considering AST structure. Specifically, we perform pre-training on a large corpus using a language model to learn semantic information. Next, we present a new rule for splitting AST into subtrees to avoid vanishing gradients. Then, the absolute paths originating from the root node and leading to the leaf nodes are encoded as hierarchical syntactic information. Finally, we design an encoder to integrate syntactic information into semantic information and leverage Bi-directional Long-Short Term Memory to learn the entire tree representation for prediction. Experimental results on 12 benchmark projects illustrate that the SSE method we proposed surpasses current state-of-the-art methods.
{"title":"Cross-project defect prediction via semantic and syntactic encoding","authors":"Siyu Jiang, Yuwen Chen, Zhenhang He, Yunpeng Shang, Le Ma","doi":"10.1007/s10664-024-10495-z","DOIUrl":"https://doi.org/10.1007/s10664-024-10495-z","url":null,"abstract":"<p>Cross-Project Defect Prediction (CPDP) is a promising research field that focuses on detecting defects in projects with limited labeled data by utilizing prediction models trained on projects with abundant data. However, previous CPDP approaches based on Abstract Syntax Tree (AST) have often encountered challenges in effectively acquiring semantic and syntactic information, resulting in their limited ability to combine both productively. This issue arises primarily from the practice of flattening the AST into a linear sequence in many AST-based methods, leading to the loss of hierarchical syntactic structure and structural information within the code. Besides, other AST-based methods use a recursive way to traverse the tree-structured AST, which is susceptible to gradient vanishing. To alleviate these concerns, we introduce a novel CPDP method named defect prediction via Semantic and Syntactic Encoding (SSE) that enhances Zhang’s approach by encoding semantic and syntactic information while retaining and considering AST structure. Specifically, we perform pre-training on a large corpus using a language model to learn semantic information. Next, we present a new rule for splitting AST into subtrees to avoid vanishing gradients. Then, the absolute paths originating from the root node and leading to the leaf nodes are encoded as hierarchical syntactic information. Finally, we design an encoder to integrate syntactic information into semantic information and leverage Bi-directional Long-Short Term Memory to learn the entire tree representation for prediction. Experimental results on 12 benchmark projects illustrate that the SSE method we proposed surpasses current state-of-the-art methods.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"26 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-03DOI: 10.1007/s10664-024-10484-2
Beiqi Zhang, Liming Fu, Peng Liang, Jiaxin Yu, Chong Wang
<p>Code review is widely known as one of the best practices for software quality assurance in software development. In a typical code review process, reviewers check the code committed by developers to ensure the quality of the code, during which reviewers and developers would communicate with each other in review comments to exchange necessary information. As a result, understanding the information in review comments is a prerequisite for reviewers and developers to conduct an effective code review. Code snippet, as a special form of code, can be used to convey necessary information in code reviews. For example, reviewers can use code snippets to make suggestions or elaborate their ideas to meet developers’ information needs in code reviews. However, little research has focused on the practices of providing code snippets in code reviews. To bridge this gap, we conduct a mixed-methods study to mine information and knowledge related to code snippets in code reviews, which can help practitioners and researchers get a better understanding about using code snippets in code review. Specifically, our study includes two phases: mining code review data and conducting practitioners’ survey. In Phase 1, we conducted an exploratory study to mine code review data from two popular developer communities (i.e., OpenStack and Qt). We manually labelled 69,604 review comments and finally identified 3,213 review comments that contain code snippets. Based on the code review data collected, we analyzed the extent of using code snippets, the reviewers’ purposes of providing code snippets, the developers’ acceptance of code snippet suggestions, and the reasons that developers do not accept code snippet suggestions in code reviews. In Phase 2, we used an online questionnaire to survey practitioners from industry. By analyzing the 63 valid responses we received, we explored the scenarios reviewers provide code snippets, the developers’ attitudes towards code snippets, and the characteristics of code snippets developers expect reviewers to provide in code reviews. Our results show that: (1) code snippets are not frequently used in code reviews, and most of the code snippets are provided by reviewers rather than developers; (2) the purposes of reviewers providing code snippets in code reviews are <i>Suggestion</i> and <i>Citation</i>, in which <i>Suggestion</i> is the main purpose; (3) most developers would accept reviewers’ code snippet suggestions; (4) the most common reasons that developers do not accept reviewers’ code snippet suggestions in code reviews are <i>difference in the opinions between developers and reviewers</i> and <i>reviewer’s suggestion is flawed</i>; (5) reviewers often provide code snippets in code reviews <i>when code is more illustrate than words</i>; (6) most developers hold positive attitudes towards code snippet comments; and (7) most developers expect that code snippets in review comments are <i>understandable</i> and <i>fitting into existing co
{"title":"Demystifying code snippets in code reviews: a study of the OpenStack and Qt communities and a practitioner survey","authors":"Beiqi Zhang, Liming Fu, Peng Liang, Jiaxin Yu, Chong Wang","doi":"10.1007/s10664-024-10484-2","DOIUrl":"https://doi.org/10.1007/s10664-024-10484-2","url":null,"abstract":"<p>Code review is widely known as one of the best practices for software quality assurance in software development. In a typical code review process, reviewers check the code committed by developers to ensure the quality of the code, during which reviewers and developers would communicate with each other in review comments to exchange necessary information. As a result, understanding the information in review comments is a prerequisite for reviewers and developers to conduct an effective code review. Code snippet, as a special form of code, can be used to convey necessary information in code reviews. For example, reviewers can use code snippets to make suggestions or elaborate their ideas to meet developers’ information needs in code reviews. However, little research has focused on the practices of providing code snippets in code reviews. To bridge this gap, we conduct a mixed-methods study to mine information and knowledge related to code snippets in code reviews, which can help practitioners and researchers get a better understanding about using code snippets in code review. Specifically, our study includes two phases: mining code review data and conducting practitioners’ survey. In Phase 1, we conducted an exploratory study to mine code review data from two popular developer communities (i.e., OpenStack and Qt). We manually labelled 69,604 review comments and finally identified 3,213 review comments that contain code snippets. Based on the code review data collected, we analyzed the extent of using code snippets, the reviewers’ purposes of providing code snippets, the developers’ acceptance of code snippet suggestions, and the reasons that developers do not accept code snippet suggestions in code reviews. In Phase 2, we used an online questionnaire to survey practitioners from industry. By analyzing the 63 valid responses we received, we explored the scenarios reviewers provide code snippets, the developers’ attitudes towards code snippets, and the characteristics of code snippets developers expect reviewers to provide in code reviews. Our results show that: (1) code snippets are not frequently used in code reviews, and most of the code snippets are provided by reviewers rather than developers; (2) the purposes of reviewers providing code snippets in code reviews are <i>Suggestion</i> and <i>Citation</i>, in which <i>Suggestion</i> is the main purpose; (3) most developers would accept reviewers’ code snippet suggestions; (4) the most common reasons that developers do not accept reviewers’ code snippet suggestions in code reviews are <i>difference in the opinions between developers and reviewers</i> and <i>reviewer’s suggestion is flawed</i>; (5) reviewers often provide code snippets in code reviews <i>when code is more illustrate than words</i>; (6) most developers hold positive attitudes towards code snippet comments; and (7) most developers expect that code snippets in review comments are <i>understandable</i> and <i>fitting into existing co","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"6 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141255962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}