Issue reports are a pivotal interface between developers and users for receiving information about bugs in their products. In practice, reproducing those bugs is challenging, since issue reports often contain incorrect information or lack sufficient information. Furthermore, the poor quality of issue reports would have the effect of delaying the entire bug-fixing process. To enhance bug comprehension and facilitate bug reproduction, GitHub Issue allows users to embed visuals such as images and videos to complement the textual description. Hence, we conduct an empirical study on 34 active GitHub repositories to quantitatively analyze the difference between visual issue reports and non-visual ones, and qualitatively analyze the characteristics of visuals and the usage of visuals in bug types. Our results show that visual issue reports have a significantly higher probability of reporting bugs. Visual reports also tend to receive the first comment and complete the conversation in a relatively shorter time. Visuals are frequently used to present the program behavior and the user interface, with the major purpose of introducing problems in reports. Additionally, we observe that visuals are commonly used to report GUI-related bugs, but they are rarely used to report configuration bugs in comparison to non-visual issue reports. To summarize, our work highlights the role of visual play in the bug-fixing process and lays the foundation for future research to support bug comprehension by exploiting visuals.
{"title":"Understanding the characteristics and the role of visual issue reports","authors":"Hiroki Kuramoto, Dong Wang, Masanari Kondo, Yutaro Kashiwa, Yasutaka Kamei, Naoyasu Ubayashi","doi":"10.1007/s10664-024-10459-3","DOIUrl":"https://doi.org/10.1007/s10664-024-10459-3","url":null,"abstract":"<p>Issue reports are a pivotal interface between developers and users for receiving information about bugs in their products. In practice, reproducing those bugs is challenging, since issue reports often contain incorrect information or lack sufficient information. Furthermore, the poor quality of issue reports would have the effect of delaying the entire bug-fixing process. To enhance bug comprehension and facilitate bug reproduction, GitHub Issue allows users to embed visuals such as images and videos to complement the textual description. Hence, we conduct an empirical study on 34 active GitHub repositories to quantitatively analyze the difference between visual issue reports and non-visual ones, and qualitatively analyze the characteristics of visuals and the usage of visuals in bug types. Our results show that visual issue reports have a significantly higher probability of reporting bugs. Visual reports also tend to receive the first comment and complete the conversation in a relatively shorter time. Visuals are frequently used to present the program behavior and the user interface, with the major purpose of introducing problems in reports. Additionally, we observe that visuals are commonly used to report GUI-related bugs, but they are rarely used to report configuration bugs in comparison to non-visual issue reports. To summarize, our work highlights the role of visual play in the bug-fixing process and lays the foundation for future research to support bug comprehension by exploiting visuals.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"23 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-08DOI: 10.1007/s10664-024-10496-y
Wachiraphan Charoenwet, Patanamon Thongtanunam, Van-Thuan Pham, Christoph Treude
Identifying security issues early is encouraged to reduce the latent negative impacts on the software systems. Code review is a widely-used method that allows developers to manually inspect modified code, catching security issues during a software development cycle. However, existing code review studies often focus on known vulnerabilities, neglecting coding weaknesses, which can introduce real-world security issues that are more visible through code review. The practices of code reviews in identifying such coding weaknesses are not yet fully investigated. To better understand this, we conducted an empirical case study in two large open-source projects, OpenSSL and PHP. Based on 135,560 code review comments, we found that reviewers raised security concerns in 35 out of 40 coding weakness categories. Surprisingly, some coding weaknesses related to past vulnerabilities, such as memory errors and resource management, were discussed less often than the vulnerabilities. Developers attempted to address raised security concerns in many cases (39%-41%), but a substantial portion was merely acknowledged (30%-36%), and some went unfixed due to disagreements about solutions (18%-20%). This highlights that coding weaknesses can slip through code review even when identified. Our findings suggest that reviewers can identify various coding weaknesses leading to security issues during code reviews. However, these results also reveal shortcomings in current code review practices, indicating the need for more effective mechanisms or support for increasing awareness of security issue management in code reviews.
{"title":"Toward effective secure code reviews: an empirical study of security-related coding weaknesses","authors":"Wachiraphan Charoenwet, Patanamon Thongtanunam, Van-Thuan Pham, Christoph Treude","doi":"10.1007/s10664-024-10496-y","DOIUrl":"https://doi.org/10.1007/s10664-024-10496-y","url":null,"abstract":"<p>Identifying security issues early is encouraged to reduce the latent negative impacts on the software systems. Code review is a widely-used method that allows developers to manually inspect modified code, catching security issues during a software development cycle. However, existing code review studies often focus on known vulnerabilities, neglecting coding weaknesses, which can introduce real-world security issues that are more visible through code review. The practices of code reviews in identifying such coding weaknesses are not yet fully investigated. To better understand this, we conducted an empirical case study in two large open-source projects, OpenSSL and PHP. Based on 135,560 code review comments, we found that reviewers raised security concerns in 35 out of 40 coding weakness categories. Surprisingly, some coding weaknesses related to past vulnerabilities, such as memory errors and resource management, were discussed less often than the vulnerabilities. Developers attempted to address raised security concerns in many cases (39%-41%), but a substantial portion was merely acknowledged (30%-36%), and some went unfixed due to disagreements about solutions (18%-20%). This highlights that coding weaknesses can slip through code review even when identified. Our findings suggest that reviewers can identify various coding weaknesses leading to security issues during code reviews. However, these results also reveal shortcomings in current code review practices, indicating the need for more effective mechanisms or support for increasing awareness of security issue management in code reviews.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"204 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141521634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-08DOI: 10.1007/s10664-024-10454-8
Mohammad Jamil Ahmad, Katerina Goseva-Popstojanova, Robyn R. Lutz
This paper aims to improve software fault-proneness prediction by investigating the unexplored effects on classification performance of the temporal decisions made by practitioners and researchers regarding (i) the interval for which they will collect longitudinal features (software metrics data), and (ii) the interval for which they will predict software bugs (the target variable). We call these specifics of the data used for training and of the target variable being predicted the learning approach, and explore the impact of the two most common learning approaches on the performance of software fault-proneness prediction, both within a single release of a software product and across releases. The paper presents empirical results from a study based on data extracted from 64 releases of twelve open-source projects. Results show that the learning approach has a substantial, and typically unacknowledged, impact on classification performance. Specifically, we show that one learning approach leads to significantly better performance than the other, both within-release and across-releases. Furthermore, this paper uncovers that, for within-release predictions, the difference in classification performance is due to different levels of class imbalance in the two learning approaches. Our findings show that improved specification of the learning approach is essential to understanding and explaining the performance of fault-proneness prediction models, as well as to avoiding misleading comparisons among them. The paper concludes with some practical recommendations and research directions based on our findings toward improved software fault-proneness prediction.
{"title":"The untold impact of learning approaches on software fault-proneness predictions: an analysis of temporal aspects","authors":"Mohammad Jamil Ahmad, Katerina Goseva-Popstojanova, Robyn R. Lutz","doi":"10.1007/s10664-024-10454-8","DOIUrl":"https://doi.org/10.1007/s10664-024-10454-8","url":null,"abstract":"<p>This paper aims to improve software fault-proneness prediction by investigating the unexplored effects on classification performance of the temporal decisions made by practitioners and researchers regarding (i) the interval for which they will collect longitudinal features (software metrics data), and (ii) the interval for which they will predict software bugs (the target variable). We call these specifics of the data used for training and of the target variable being predicted the <i>learning approach</i>, and explore the impact of the two most common learning approaches on the performance of software fault-proneness prediction, both within a single release of a software product and across releases. The paper presents empirical results from a study based on data extracted from 64 releases of twelve open-source projects. Results show that the learning approach has a substantial, and typically unacknowledged, impact on classification performance. Specifically, we show that one learning approach leads to significantly better performance than the other, both within-release and across-releases. Furthermore, this paper uncovers that, for within-release predictions, the difference in classification performance is due to different levels of class imbalance in the two learning approaches. Our findings show that improved specification of the learning approach is essential to understanding and explaining the performance of fault-proneness prediction models, as well as to avoiding misleading comparisons among them. The paper concludes with some practical recommendations and research directions based on our findings toward improved software fault-proneness prediction.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"65 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141521632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The COVID-19 pandemic changed the way we live, work and the way we conduct research. With the restrictions of lockdowns and social distancing, various impacts were experienced by many software engineering researchers, especially whose studies depend on human participants. We conducted a mixed methods study to understand the extent of this impact. Through a detailed survey with 89 software engineering researchers working with human participants around the world and a further nine follow-up interviews, we identified the key challenges faced, the adaptations made, and the surprising fringe benefits of conducting research involving human participants during the pandemic. Our findings also revealed that in retrospect, many researchers did not wish to revert to the old ways of conducting human-orienfted research. Based on our analysis and insights, we share recommendations on how to conduct remote studies with human participants effectively in an increasingly hybrid world when face-to-face engagement is not possible or where remote participation is preferred.
{"title":"Challenges, adaptations, and fringe benefits of conducting software engineering research with human participants during the COVID-19 pandemic","authors":"Anuradha Madugalla, Tanjila Kanij, Rashina Hoda, Dulaji Hidellaarachchi, Aastha Pant, Samia Ferdousi, John Grundy","doi":"10.1007/s10664-024-10490-4","DOIUrl":"https://doi.org/10.1007/s10664-024-10490-4","url":null,"abstract":"<p>The COVID-19 pandemic changed the way we live, work and the way we conduct research. With the restrictions of lockdowns and social distancing, various impacts were experienced by many software engineering researchers, especially whose studies depend on human participants. We conducted a mixed methods study to understand the extent of this impact. Through a detailed survey with 89 software engineering researchers working with human participants around the world and a further nine follow-up interviews, we identified the key challenges faced, the adaptations made, and the surprising fringe benefits of conducting research involving human participants during the pandemic. Our findings also revealed that in retrospect, many researchers did not wish to revert to the old ways of conducting human-orienfted research. Based on our analysis and insights, we share recommendations on how to conduct remote studies with human participants effectively in an increasingly hybrid world when face-to-face engagement is not possible or where remote participation is preferred.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"238 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141521635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-05DOI: 10.1007/s10664-024-10487-z
Xingfang Wu, Eric Laufer, Heng Li, Foutse Khomh, Santhosh Srinivasan, Jayden Luo
With the rapid growth of the developer community, the amount of posts on online technical forums has been growing rapidly, which poses difficulties for users to filter useful posts and find important information. Tags provide a concise feature dimension for users to locate their interested posts and for search engines to index the most relevant posts according to the queries. Most tags are only focused on the technical perspective (e.g., program language, platform, tool). In most cases, forum posts in online developer communities reveal the author’s intentions to solve a problem, ask for advice, share information, etc. The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy. By referencing previous studies and learning from industrial perspectives, we create a refined taxonomy for the intentions of technical forum posts. Through manual labeling and analysis on a sampled post dataset extracted from online forums, we understand the relevance between the constitution of posts (code, error messages) and their intentions. Furthermore, inspired by our manual study, we design a pre-trained transformer-based model to automatically predict post intentions. The best variant of our intention prediction framework, which achieves a Micro F1-score of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787, outperforms the state-of-the-art baseline approach. Our characterization and automated classification of forum posts regarding their intentions may help forum maintainers or third-party tool developers improve the organization and retrieval of posts on technical forums.
随着开发人员社区的迅速发展,在线技术论坛上的帖子数量也在快速增长,这给用户筛选有用帖子和查找重要信息带来了困难。标签为用户提供了一个简明的功能维度,便于他们查找感兴趣的帖子,也便于搜索引擎根据查询结果索引最相关的帖子。大多数标签只侧重于技术角度(如程序语言、平台、工具)。在大多数情况下,在线开发者社区的论坛帖子会显示作者解决问题、寻求建议、分享信息等的意图。对帖子意图的建模可以为当前的标签分类法提供一个额外的维度。通过参考以前的研究并借鉴行业观点,我们为技术论坛帖子的意图创建了一个完善的分类标准。通过对从在线论坛中提取的帖子数据集进行手动标记和分析,我们了解了帖子的构成(代码、错误信息)与其意图之间的相关性。此外,受人工研究的启发,我们设计了一个基于转换器的预训练模型来自动预测帖子的意图。我们的意图预测框架的最佳变体取得了 0.589 的 Micro F1 分数、62.6% 到 87.8% 的 Top 1-3 准确率和 0.787 的平均 AUC,优于最先进的基线方法。我们对论坛帖子意图的表征和自动分类可以帮助论坛维护者或第三方工具开发人员改进技术论坛帖子的组织和检索。
{"title":"Characterizing and classifying developer forum posts with their intentions","authors":"Xingfang Wu, Eric Laufer, Heng Li, Foutse Khomh, Santhosh Srinivasan, Jayden Luo","doi":"10.1007/s10664-024-10487-z","DOIUrl":"https://doi.org/10.1007/s10664-024-10487-z","url":null,"abstract":"<p>With the rapid growth of the developer community, the amount of posts on online technical forums has been growing rapidly, which poses difficulties for users to filter useful posts and find important information. Tags provide a concise feature dimension for users to locate their interested posts and for search engines to index the most relevant posts according to the queries. Most tags are only focused on the technical perspective (e.g., program language, platform, tool). In most cases, forum posts in online developer communities reveal the author’s intentions to solve a problem, ask for advice, share information, etc. The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy. By referencing previous studies and learning from industrial perspectives, we create a refined taxonomy for the intentions of technical forum posts. Through manual labeling and analysis on a sampled post dataset extracted from online forums, we understand the relevance between the constitution of posts (code, error messages) and their intentions. Furthermore, inspired by our manual study, we design a pre-trained transformer-based model to automatically predict post intentions. The best variant of our intention prediction framework, which achieves a Micro F1-score of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787, outperforms the state-of-the-art baseline approach. Our characterization and automated classification of forum posts regarding their intentions may help forum maintainers or third-party tool developers improve the organization and retrieval of posts on technical forums.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"25 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141255989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-05DOI: 10.1007/s10664-024-10448-6
Zeyang Ma, Shouvick Mondal, Tse-Hsun (Peter) Chen, Haoxiang Zhang, Ahmed E. Hassan
Developers rely on software ecosystems such as Maven to manage and reuse external libraries (i.e., dependencies). Due to the complexity of the used dependencies, developers may face challenges in choosing which library to use and whether they should upgrade or downgrade a library. One important factor that affects this decision is the number of potential vulnerabilities in a library and its dependencies. Therefore, state-of-the-art platforms such as Maven Repository (MVN) and Open Source Insights (OSI) help developers in making such a decision by presenting vulnerability information associated with every dependency. In this paper, we first conduct an empirical study to understand how the two platforms, MVN and OSI, present and categorize vulnerability information. We found that these two platforms may either overestimate or underestimate the number of associated vulnerabilities in a dependency, and they lack prioritization mechanisms on which dependencies are more likely to cause an issue. Hence, we propose a tool named VulNet to address the limitations we found in MVN and OSI. Through an evaluation of 19,886 versions of the top 200 popular libraries, we find VulNet includes 90.5% and 65.8% of the dependencies that were omitted by MVN and OSI, respectively. VulNet also helps reduce 27% of potentially unreachable or less impactful vulnerabilities listed by OSI in test dependencies. Finally, our user study with 24 participants gave VulNet an average rating of 4.5/5 in presenting and prioritizing vulnerable dependencies, compared to 2.83 (MVN) and 3.14 (OSI).
开发人员依赖 Maven 等软件生态系统来管理和重用外部库(即依赖库)。由于所用依赖库的复杂性,开发人员在选择使用哪个库以及是否应升级或降级某个库时可能会面临挑战。影响这一决定的一个重要因素是库及其依赖关系中潜在漏洞的数量。因此,最先进的平台,如 Maven Repository (MVN) 和 Open Source Insights (OSI),通过提供与每个依赖关系相关的漏洞信息,帮助开发人员做出这样的决定。在本文中,我们首先进行了一项实证研究,以了解 MVN 和 OSI 这两个平台是如何呈现和分类漏洞信息的。我们发现,这两个平台可能会高估或低估依赖关系中相关漏洞的数量,而且它们缺乏优先排序机制,无法确定哪些依赖关系更有可能导致问题。因此,我们提出了一个名为 VulNet 的工具,以解决我们在 MVN 和 OSI 中发现的局限性。通过对前 200 个流行库的 19886 个版本进行评估,我们发现 VulNet 分别包含了 MVN 和 OSI 遗漏的 90.5% 和 65.8% 的依赖关系。VulNet 还帮助减少了 OSI 在测试依赖项中列出的 27% 可能无法访问或影响较小的漏洞。最后,我们对 24 位参与者进行的用户研究显示,VulNet 在呈现和优先处理易受攻击的依赖性方面的平均评分为 4.5/5,而 MVN 和 OSI 的评分分别为 2.83 和 3.14。
{"title":"VulNet: Towards improving vulnerability management in the Maven ecosystem","authors":"Zeyang Ma, Shouvick Mondal, Tse-Hsun (Peter) Chen, Haoxiang Zhang, Ahmed E. Hassan","doi":"10.1007/s10664-024-10448-6","DOIUrl":"https://doi.org/10.1007/s10664-024-10448-6","url":null,"abstract":"<p>Developers rely on software ecosystems such as Maven to manage and reuse external libraries (i.e., dependencies). Due to the complexity of the used dependencies, developers may face challenges in choosing which library to use and whether they should upgrade or downgrade a library. One important factor that affects this decision is the number of potential vulnerabilities in a library and its dependencies. Therefore, state-of-the-art platforms such as Maven Repository (MVN) and Open Source Insights (OSI) help developers in making such a decision by presenting vulnerability information associated with every dependency. In this paper, we first conduct an empirical study to understand how the two platforms, MVN and OSI, present and categorize vulnerability information. We found that these two platforms may either overestimate or underestimate the number of associated vulnerabilities in a dependency, and they lack prioritization mechanisms on which dependencies are more likely to cause an issue. Hence, we propose a tool named VulNet to address the limitations we found in MVN and OSI. Through an evaluation of 19,886 versions of the top 200 popular libraries, we find VulNet includes 90.5% and 65.8% of the dependencies that were omitted by MVN and OSI, respectively. VulNet also helps reduce 27% of potentially unreachable or less impactful vulnerabilities listed by OSI in test dependencies. Finally, our user study with 24 participants gave VulNet an average rating of 4.5/5 in presenting and prioritizing vulnerable dependencies, compared to 2.83 (MVN) and 3.14 (OSI).</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"38 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-05DOI: 10.1007/s10664-024-10494-0
Antônio José A. da Silva, Renan G. Vieira, Diego P. P. Mesquita, João Paulo P. Gomes, Lincoln S. Rocha
<h3 data-test="abstract-sub-heading">Context</h3><p>Exception handling (EH) bugs stem from incorrect usage of exception handling mechanisms (EHMs) and often incur severe consequences (e.g., system downtime, data loss, and security risk). Tracking EH bugs is particularly relevant for contemporary systems (e.g., cloud- and AI-based systems), in which the software’s sophisticated logic is an additional threat to the correct use of the EHM. On top of that, bug reporters seldom can tag EH bugs — since it may require an encompassing knowledge of the software’s EH strategy. Surprisingly, to the best of our knowledge, there is no automated procedure to identify EH bugs from report descriptions.</p><h3 data-test="abstract-sub-heading">Objective</h3><p>First, we aim to evaluate the extent to which Natural Language Processing (NLP) and Machine Learning (ML) can be used to reliably label EH bugs using the text fields from bug reports (e.g., summary, description, and comments). Second, we aim to provide a reliably labeled dataset that the community can use in future endeavors. Overall, we expect our work to raise the community’s awareness regarding the importance of EH bugs.</p><h3 data-test="abstract-sub-heading">Method</h3><p>We manually analyzed 4,516 bug reports from the four main components of Apache’s Hadoop project, out of which we labeled <span>(approx 20%)</span> (943) as EH bugs. We also labeled 2,584 non-EH bugs analyzing their bug-fixing code and creating a dataset composed of 7,100 bug reports. Then, we used word embedding techniques (Bag-of-Words and TF-IDF) to summarize the textual fields of bug reports. Subsequently, we used these embeddings to fit five classes of ML methods and evaluate them on unseen data. We also evaluated a pre-trained transformer-based model using the complete textual fields. We have also evaluated whether considering only EH keywords is enough to achieve high predictive performance.</p><h3 data-test="abstract-sub-heading">Results</h3><p>Our results show that using a pre-trained DistilBERT with a linear layer trained with our proposed dataset can reasonably label EH bugs, achieving ROC-AUC scores of up to 0.88. The combination of NLP and ML traditional techniques achieved ROC-AUC scores of up to 0.74 and recall up to 0.56. As a sanity check, we also evaluate methods using embeddings extracted solely from keywords. Considering ROC-AUC as the primary concern, for the majority of ML methods tested, the analysis suggests that keywords alone are not sufficient to characterize reports of EH bugs, although this can change based on other metrics (such as recall and precision) or ML methods (e.g., Random Forest).</p><h3 data-test="abstract-sub-heading">Conclusions</h3><p>To the best of our knowledge, this is the first study addressing the problem of automatic labeling of EH bugs. Based on our results, we can conclude that the use of ML techniques, specially transformer-base models, sounds promising to automate the task of labeling
{"title":"Towards automatic labeling of exception handling bugs: A case study of 10 years bug-fixing in Apache Hadoop","authors":"Antônio José A. da Silva, Renan G. Vieira, Diego P. P. Mesquita, João Paulo P. Gomes, Lincoln S. Rocha","doi":"10.1007/s10664-024-10494-0","DOIUrl":"https://doi.org/10.1007/s10664-024-10494-0","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Exception handling (EH) bugs stem from incorrect usage of exception handling mechanisms (EHMs) and often incur severe consequences (e.g., system downtime, data loss, and security risk). Tracking EH bugs is particularly relevant for contemporary systems (e.g., cloud- and AI-based systems), in which the software’s sophisticated logic is an additional threat to the correct use of the EHM. On top of that, bug reporters seldom can tag EH bugs — since it may require an encompassing knowledge of the software’s EH strategy. Surprisingly, to the best of our knowledge, there is no automated procedure to identify EH bugs from report descriptions.</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>First, we aim to evaluate the extent to which Natural Language Processing (NLP) and Machine Learning (ML) can be used to reliably label EH bugs using the text fields from bug reports (e.g., summary, description, and comments). Second, we aim to provide a reliably labeled dataset that the community can use in future endeavors. Overall, we expect our work to raise the community’s awareness regarding the importance of EH bugs.</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>We manually analyzed 4,516 bug reports from the four main components of Apache’s Hadoop project, out of which we labeled <span>(approx 20%)</span> (943) as EH bugs. We also labeled 2,584 non-EH bugs analyzing their bug-fixing code and creating a dataset composed of 7,100 bug reports. Then, we used word embedding techniques (Bag-of-Words and TF-IDF) to summarize the textual fields of bug reports. Subsequently, we used these embeddings to fit five classes of ML methods and evaluate them on unseen data. We also evaluated a pre-trained transformer-based model using the complete textual fields. We have also evaluated whether considering only EH keywords is enough to achieve high predictive performance.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Our results show that using a pre-trained DistilBERT with a linear layer trained with our proposed dataset can reasonably label EH bugs, achieving ROC-AUC scores of up to 0.88. The combination of NLP and ML traditional techniques achieved ROC-AUC scores of up to 0.74 and recall up to 0.56. As a sanity check, we also evaluate methods using embeddings extracted solely from keywords. Considering ROC-AUC as the primary concern, for the majority of ML methods tested, the analysis suggests that keywords alone are not sufficient to characterize reports of EH bugs, although this can change based on other metrics (such as recall and precision) or ML methods (e.g., Random Forest).</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>To the best of our knowledge, this is the first study addressing the problem of automatic labeling of EH bugs. Based on our results, we can conclude that the use of ML techniques, specially transformer-base models, sounds promising to automate the task of labeling","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"23 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-04DOI: 10.1007/s10664-024-10493-1
Tavian Barnes, Ken Jen Lee, Cristina Tavares, Gema Rodríguez-Pérez, Meiyappan Nagappan
The traditional path to a software engineering career usually involves a post-secondary diploma in Software Engineering, Computer Science, or a related field. However, many individuals working as software engineers take a non-traditional path to their careers, starting from other industries or fields of study. This paper explores the barriers that individuals with non-traditional educational and occupational backgrounds face when pursuing a software engineering career and potential strategies to overcome those barriers. A two-stage methodology was used, consisting of an exploratory study followed by a follow-up survey. The exploratory study consisted of a grounded-theory-based qualitative analysis of relevant Reddit data to yield a framework around the barriers and possible mitigation strategies. These findings were then supplemented through a follow-up survey. Understanding these barriers and what strategies could be effective is an important step towards making software engineering more accessible to individuals with non-traditional backgrounds. In addition to fostering functional diversity, this might also serve to tackle labor shortages within the software engineering industry.
{"title":"Towards understanding barriers and mitigation strategies of software engineers with non-traditional educational and occupational backgrounds","authors":"Tavian Barnes, Ken Jen Lee, Cristina Tavares, Gema Rodríguez-Pérez, Meiyappan Nagappan","doi":"10.1007/s10664-024-10493-1","DOIUrl":"https://doi.org/10.1007/s10664-024-10493-1","url":null,"abstract":"<p>The traditional path to a software engineering career usually involves a post-secondary diploma in Software Engineering, Computer Science, or a related field. However, many individuals working as software engineers take a non-traditional path to their careers, starting from other industries or fields of study. This paper explores the barriers that individuals with non-traditional educational and occupational backgrounds face when pursuing a software engineering career and potential strategies to overcome those barriers. A two-stage methodology was used, consisting of an exploratory study followed by a follow-up survey. The exploratory study consisted of a grounded-theory-based qualitative analysis of relevant Reddit data to yield a framework around the barriers and possible mitigation strategies. These findings were then supplemented through a follow-up survey. Understanding these barriers and what strategies could be effective is an important step towards making software engineering more accessible to individuals with non-traditional backgrounds. In addition to fostering functional diversity, this might also serve to tackle labor shortages within the software engineering industry.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"34 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-04DOI: 10.1007/s10664-024-10480-6
Musengamana Jean de Dieu, Peng Liang, Mojtaba Shahin, Chen Yang, Zengyang Li
Mining Software Repositories (MSR) has become an essential activity in software development. Mining architectural information (e.g., architectural models) to support architecting activities, such as architecture understanding, has received significant attention in recent years. However, there is a lack of clarity on what literature on mining architectural information is available. Consequently, this may create difficulty for practitioners to understand and adopt the state-of-the-art research results, such as what approaches should be adopted to mine what architectural information in order to support architecting activities. It also hinders researchers from being aware of the challenges and remedies for the identified research gaps. We aim to identify, analyze, and synthesize the literature on mining architectural information in software repositories in terms of architectural information and sources mined, architecting activities supported, approaches and tools used, and challenges faced. A Systematic Mapping Study (SMS) has been conducted on the literature published between January 2006 and December 2022. Of the 104 primary studies finally selected, 7 categories of architectural information have been mined, among which architectural description is the most mined architectural information; 11 categories of sources have been leveraged for mining architectural information, among which version control system (e.g., GitHub) is the most popular source; 11 architecting activities can be supported by the mined architectural information, among which architecture understanding is the most supported activity; 95 approaches and 56 tools were proposed and employed in mining architectural information; and 4 types of challenges in mining architectural information were identified. This SMS provides researchers with promising future directions and help practitioners be aware of what approaches and tools can be used to mine what architectural information from what sources to support various architecting activities.
{"title":"Mining architectural information: A systematic mapping study","authors":"Musengamana Jean de Dieu, Peng Liang, Mojtaba Shahin, Chen Yang, Zengyang Li","doi":"10.1007/s10664-024-10480-6","DOIUrl":"https://doi.org/10.1007/s10664-024-10480-6","url":null,"abstract":"<p>Mining Software Repositories (MSR) has become an essential activity in software development. Mining architectural information (e.g., architectural models) to support architecting activities, such as architecture understanding, has received significant attention in recent years. However, there is a lack of clarity on what literature on mining architectural information is available. Consequently, this may create difficulty for practitioners to understand and adopt the state-of-the-art research results, such as what approaches should be adopted to mine what architectural information in order to support architecting activities. It also hinders researchers from being aware of the challenges and remedies for the identified research gaps. We aim to identify, analyze, and synthesize the literature on mining architectural information in software repositories in terms of architectural information and sources mined, architecting activities supported, approaches and tools used, and challenges faced. A Systematic Mapping Study (SMS) has been conducted on the literature published between January 2006 and December 2022. Of the 104 primary studies finally selected, 7 categories of architectural information have been mined, among which architectural description is the most mined architectural information; 11 categories of sources have been leveraged for mining architectural information, among which version control system (e.g., GitHub) is the most popular source; 11 architecting activities can be supported by the mined architectural information, among which architecture understanding is the most supported activity; 95 approaches and 56 tools were proposed and employed in mining architectural information; and 4 types of challenges in mining architectural information were identified. This SMS provides researchers with promising future directions and help practitioners be aware of what approaches and tools can be used to mine what architectural information from what sources to support various architecting activities.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"8 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141255987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-04DOI: 10.1007/s10664-024-10485-1
Amir M. Ebrahimi, Bram Adams, Gustavo A. Oliva, Ahmed E. Hassan
The proxy pattern is a well-known design pattern with numerous use cases in several sectors of the software industry (e.g., network applications, microservices, and IoT). As such, the use of the proxy pattern is also a common approach in the development of complex decentralized applications (DApps) on the Ethereum blockchain. A contract that implements the proxy pattern (proxy contract) acts as a layer between the clients and the target contract, enabling greater flexibility (e.g., data validation checks) and upgradeability (e.g., online smart contract replacement with zero downtime) in DApp development. Despite the importance of proxy contracts, little is known about (i) how their prevalence changed over time, (ii) the ways in which developers integrate proxies in the design of DApps, and (iii) what proxy types are being most commonly leveraged by developers. In this paper, we present a large-scale exploratory study on the use of the proxy pattern in Ethereum. We analyze a dataset of all Ethereum smart contracts as of Sep. 2022 containing 50M smart contracts and 1.6B transactions, and apply both quantitative and qualitative methods in order to (i) determine the prevalence of proxy contracts, (ii) understand the ways they are deployed and integrated into applications, and (iii) uncover the prevalence of different types of proxy contracts. Our findings reveal that 14.2% of all deployed smart contracts are proxy contracts. We show that proxy contracts are being more actively used than non-proxy contracts. Also, the usage of proxy contracts in various contexts, transactions involving proxy contracts, and adoption of proxy contracts by users have shown an upward trend over time, peaking at the end of our study period. They are either deployed through off-chain scripts or on-chain factory contracts, with the former and latter being employed in 39.1% and 60.9% of identified usage contexts in turn. We found that while the majority (67.8%) of proxies act as an interceptor, 32.2% enables upgradeability. Proxy contracts are typically (79%) implemented based on known reference implementations with 29.4% being of type ERC-1167, a class of proxies that aims to cheaply reuse and clone contracts’ functionality. Our evaluation shows that our proposed behavioral proxy detection method has a precision and recall of 100% in detecting active proxies. Finally, we derive a set of practical recommendations for developers and introduce open research questions to guide future research on the topic.
{"title":"A large-scale exploratory study on the proxy pattern in Ethereum","authors":"Amir M. Ebrahimi, Bram Adams, Gustavo A. Oliva, Ahmed E. Hassan","doi":"10.1007/s10664-024-10485-1","DOIUrl":"https://doi.org/10.1007/s10664-024-10485-1","url":null,"abstract":"<p>The proxy pattern is a well-known design pattern with numerous use cases in several sectors of the software industry (e.g., network applications, microservices, and IoT). As such, the use of the proxy pattern is also a common approach in the development of complex decentralized applications (DApps) on the Ethereum blockchain. A contract that implements the proxy pattern (proxy contract) acts as a layer between the clients and the target contract, enabling greater flexibility (e.g., data validation checks) and upgradeability (e.g., online smart contract replacement with zero downtime) in DApp development. Despite the importance of proxy contracts, little is known about (i) how their prevalence changed over time, (ii) the ways in which developers integrate proxies in the design of DApps, and (iii) what proxy types are being most commonly leveraged by developers. In this paper, we present a large-scale exploratory study on the use of the proxy pattern in Ethereum. We analyze a dataset of all Ethereum smart contracts as of Sep. 2022 containing 50M smart contracts and 1.6B transactions, and apply both quantitative and qualitative methods in order to (i) determine the prevalence of proxy contracts, (ii) understand the ways they are deployed and integrated into applications, and (iii) uncover the prevalence of different types of proxy contracts. Our findings reveal that 14.2% of all deployed smart contracts are proxy contracts. We show that proxy contracts are being more actively used than non-proxy contracts. Also, the usage of proxy contracts in various contexts, transactions involving proxy contracts, and adoption of proxy contracts by users have shown an upward trend over time, peaking at the end of our study period. They are either deployed through off-chain scripts or on-chain factory contracts, with the former and latter being employed in 39.1% and 60.9% of identified usage contexts in turn. We found that while the majority (67.8%) of proxies act as an interceptor, 32.2% enables upgradeability. Proxy contracts are typically (79%) implemented based on known reference implementations with 29.4% being of type ERC-1167, a class of proxies that aims to cheaply reuse and clone contracts’ functionality. Our evaluation shows that our proposed behavioral proxy detection method has a precision and recall of 100% in detecting active proxies. Finally, we derive a set of practical recommendations for developers and introduce open research questions to guide future research on the topic.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"51 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}