Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering最新文献_第3页

Towards a More Structured Peer Review Process with Empirical Standards 以经验标准建立更结构化的同行评议过程

Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2021-06-18 DOI: 10.1145/3463274.3463359

Arham Arshad, Taher Ahmed Ghaleb, P. Ralph

Context. Empirical research consistently demonstrates that that scholarly peer review is ineffective, unreliable, and prejudiced. In principle, the solution is to move from contemporary, unstructured, essay-like reviewing to more structured, checklist-like reviewing. The Task Force created models—called “empirical standards”—of the software engineering community’s expectations for different popular methodologies. Objective. This paper presents a tool for facilitating more structured reviewing by generating review checklists from the empirical standards. Design. A tool that generates pre-submission and review forms using the empirical standards for software engineering research was designed and implemented. The pre-submission and review forms can be used by authors and reviewers, respectively, to determine whether a manuscript meets the software engineering community’s expectations for the particular kind of research conducted. Evaluation. The proposed tool can be empirically evaluated using lab or field randomized experiments as well as qualitative research. Huge, impractical studies involving splitting a conference program committee are not necessary to establish the effectiveness of the standards, checklists and structured review. Conclusions. The checklist generator enables more structured peer reviews, which in turn should improve review quality, reliability, thoroughness, and readability. Empirical research is needed to assess the effectiveness of the tool and the standards.

上下文。实证研究一致表明，学术同行评议是无效的、不可靠的和有偏见的。原则上，解决方案是从现代的、非结构化的、像论文一样的复习转向更结构化的、像检查表一样的复习。Task Force创建了软件工程社区对不同流行方法的期望的模型——称为“经验标准”。目标。本文提出了一种工具，通过从经验标准生成审查清单来促进更结构化的审查。设计。设计并实现了一个使用软件工程研究的经验标准生成预提交和评审表单的工具。作者和审稿人可以分别使用预提交和审查表单，以确定手稿是否满足软件工程社区对所进行的特定研究类型的期望。评估。提出的工具可以使用实验室或现场随机实验以及定性研究进行经验评估。要建立标准、检查表和结构化审查的有效性，没有必要进行涉及拆分会议计划委员会的大规模、不切实际的研究。结论。检查表生成器支持更结构化的同行评审，这反过来应该提高评审的质量、可靠性、彻底性和可读性。需要实证研究来评估工具和标准的有效性。

{"title":"Towards a More Structured Peer Review Process with Empirical Standards","authors":"Arham Arshad, Taher Ahmed Ghaleb, P. Ralph","doi":"10.1145/3463274.3463359","DOIUrl":"https://doi.org/10.1145/3463274.3463359","url":null,"abstract":"Context. Empirical research consistently demonstrates that that scholarly peer review is ineffective, unreliable, and prejudiced. In principle, the solution is to move from contemporary, unstructured, essay-like reviewing to more structured, checklist-like reviewing. The Task Force created models—called “empirical standards”—of the software engineering community’s expectations for different popular methodologies. Objective. This paper presents a tool for facilitating more structured reviewing by generating review checklists from the empirical standards. Design. A tool that generates pre-submission and review forms using the empirical standards for software engineering research was designed and implemented. The pre-submission and review forms can be used by authors and reviewers, respectively, to determine whether a manuscript meets the software engineering community’s expectations for the particular kind of research conducted. Evaluation. The proposed tool can be empirically evaluated using lab or field randomized experiments as well as qualitative research. Huge, impractical studies involving splitting a conference program committee are not necessary to establish the effectiveness of the standards, checklists and structured review. Conclusions. The checklist generator enables more structured peer reviews, which in turn should improve review quality, reliability, thoroughness, and readability. Empirical research is needed to assess the effectiveness of the tool and the standards.","PeriodicalId":328024,"journal":{"name":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130488054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering 第25届软件工程评估与评估国际会议论文集

Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2021-06-18 DOI: 10.1145/3463274

引用次数: 27

The Connection between the Sustainability Impacts of Software Products and the Role of Software Engineers 软件产品的可持续性影响与软件工程师角色之间的联系

Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2021-06-18 DOI: 10.1145/3463274.3463346

Dominic Lammert

Context: It is impossible to imagine our everyday and professional lives without software. Consequently, software products, especially socio-technical systems, have more or less obvious impacts on almost all areas of our society. For this purpose, a group of scientists worldwide has developed the Sustainability Awareness Framework (SusAF) which examines the impacts on five interrelated dimensions: social, individual, environmental, economic, and technical. According to this framework, we should design software to maintain or improve the Sustainability Impacts. Designing for sustainability is a major challenge that can profoundly change the field of activity – particular for Software Engineers. Objectives: The aim of the thesis work is to analyze the current role of Software Engineers and relate it to Sustainability Impacts of Software Products in order to contribute to this paradigm shift. This should provide a basis for follow-up works. The question in which direction exactly the Software Engineer should develop and how exactly this path can be followed is still owed by the scientific community. Perhaps universities will have to adapt the curriculum in the training of Software Engineers, politics could possibly initiate support programs in the field of sustainability for software companies, or maybe software sustainability certifications could emerge. In any case, Software Engineers must adapt to the times and acquire the necessary knowledge, the skills and the competencies. Results: The results of the dissertation are a better understanding of the needed paradigm shift of Software Engineers and complement the SusAF that to better support sustainability design. The extended SusAF is intended for both training and corporate use.

背景:我们无法想象没有软件的日常生活和职业生活。因此，软件产品，尤其是社会技术系统，对我们社会的几乎所有领域都有或多或少明显的影响。为此，世界各地的一组科学家开发了可持续发展意识框架(SusAF)，该框架研究了五个相互关联的方面的影响:社会、个人、环境、经济和技术。根据这个框架，我们应该设计软件来维持或改善可持续性影响。可持续性设计是一项重大挑战，它可以深刻地改变活动领域——特别是对软件工程师来说。目的:本文的目的是分析软件工程师的当前角色，并将其与软件产品的可持续性影响联系起来，以促进这种范式转变。这将为后续工作提供依据。软件工程师究竟应该朝哪个方向发展，以及如何准确地遵循这条道路，这个问题仍然是科学界所欠的。也许大学将不得不调整软件工程师培训的课程，政治可能会启动软件公司可持续性领域的支持计划，或者软件可持续性认证可能会出现。在任何情况下，软件工程师都必须适应时代，获得必要的知识、技能和能力。结果:论文的结果是更好地理解了软件工程师所需的范式转变，并补充了SusAF，以更好地支持可持续性设计。扩展的SusAF用于培训和企业使用。

{"title":"The Connection between the Sustainability Impacts of Software Products and the Role of Software Engineers","authors":"Dominic Lammert","doi":"10.1145/3463274.3463346","DOIUrl":"https://doi.org/10.1145/3463274.3463346","url":null,"abstract":"Context: It is impossible to imagine our everyday and professional lives without software. Consequently, software products, especially socio-technical systems, have more or less obvious impacts on almost all areas of our society. For this purpose, a group of scientists worldwide has developed the Sustainability Awareness Framework (SusAF) which examines the impacts on five interrelated dimensions: social, individual, environmental, economic, and technical. According to this framework, we should design software to maintain or improve the Sustainability Impacts. Designing for sustainability is a major challenge that can profoundly change the field of activity – particular for Software Engineers. Objectives: The aim of the thesis work is to analyze the current role of Software Engineers and relate it to Sustainability Impacts of Software Products in order to contribute to this paradigm shift. This should provide a basis for follow-up works. The question in which direction exactly the Software Engineer should develop and how exactly this path can be followed is still owed by the scientific community. Perhaps universities will have to adapt the curriculum in the training of Software Engineers, politics could possibly initiate support programs in the field of sustainability for software companies, or maybe software sustainability certifications could emerge. In any case, Software Engineers must adapt to the times and acquire the necessary knowledge, the skills and the competencies. Results: The results of the dissertation are a better understanding of the needed paradigm shift of Software Engineers and complement the SusAF that to better support sustainability design. The extended SusAF is intended for both training and corporate use.","PeriodicalId":328024,"journal":{"name":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122664012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DebtHunter: A Machine Learning-based Approach for Detecting Self-Admitted Technical Debt DebtHunter:一种基于机器学习的方法来检测自我承认的技术债务

Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2021-06-18 DOI: 10.1145/3463274.3464455

Irene Sala, Antonela Tommasel, F. Fontana

Due to limited time, budget or resources, a team is prone to introduce code that does not follow the best software development practices. This code that introduces instability in the software projects is known as Technical Debt (TD). Often, TD intentionally manifests in source code, which is known as Self-Admitted Technical Debt (SATD). This paper presents DebtHunter, a natural language processing (NLP)- and machine learning (ML)- based approach for identifying and classifying SATD in source code comments. The proposed classification approach combines two classification phases for differentiating between the multiple debt types. Evaluations over 10 open source systems, containing more than 259k comments, showed that the approach was able to improve the performance of others in the literature. The presented approach is supported by a tool that can help developers to effectively manage SATD. The tool complements the analysis over Java source code by allowing developers to also examine the associated issue tracker. DebtHunter can be used in a continuous evolution environment to monitor the development process and make developers aware of how and where SATD is introduced, thus helping them to manage and resolve it.

由于时间、预算或资源有限，团队很容易引入不遵循最佳软件开发实践的代码。在软件项目中引入不稳定性的代码被称为技术债务(TD)。通常，TD有意地体现在源代码中，这被称为自我承认的技术债务(SATD)。本文介绍了DebtHunter，一种基于自然语言处理(NLP)和机器学习(ML)的方法，用于识别和分类源代码注释中的SATD。建议的分类方法结合了两个分类阶段，以区分多种债务类型。对10个开源系统的评估，包含超过259k条评论，表明该方法能够提高文献中其他方法的性能。所提出的方法由一个工具支持，该工具可以帮助开发人员有效地管理SATD。该工具还允许开发人员检查相关的问题跟踪器，从而补充了对Java源代码的分析。DebtHunter可以在持续发展的环境中使用，以监视开发过程，并使开发人员了解如何以及在何处引入了SATD，从而帮助他们管理和解决它。

{"title":"DebtHunter: A Machine Learning-based Approach for Detecting Self-Admitted Technical Debt","authors":"Irene Sala, Antonela Tommasel, F. Fontana","doi":"10.1145/3463274.3464455","DOIUrl":"https://doi.org/10.1145/3463274.3464455","url":null,"abstract":"Due to limited time, budget or resources, a team is prone to introduce code that does not follow the best software development practices. This code that introduces instability in the software projects is known as Technical Debt (TD). Often, TD intentionally manifests in source code, which is known as Self-Admitted Technical Debt (SATD). This paper presents DebtHunter, a natural language processing (NLP)- and machine learning (ML)- based approach for identifying and classifying SATD in source code comments. The proposed classification approach combines two classification phases for differentiating between the multiple debt types. Evaluations over 10 open source systems, containing more than 259k comments, showed that the approach was able to improve the performance of others in the literature. The presented approach is supported by a tool that can help developers to effectively manage SATD. The tool complements the analysis over Java source code by allowing developers to also examine the associated issue tracker. DebtHunter can be used in a continuous evolution environment to monitor the development process and make developers aware of how and where SATD is introduced, thus helping them to manage and resolve it.","PeriodicalId":328024,"journal":{"name":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124451878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

About the Assessment of Grey Literature in Software Engineering 软件工程中灰色文献的评价

Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2021-06-18 DOI: 10.1145/3463274.3463362

G. D. Angelis, F. Lonetti

There is an ongoing interest in the Software Engineering field for multivocal literature reviews including grey literature. However, at the same time, the role of the grey literature is still controversial, and the benefits of its inclusion in systematic reviews are object of discussion. Some of these arguments concern the quality assessment methods for grey literature entries, which is often considered a challenging and critical task. On the one hand, apart from a few proposals, there is a lack of an acknowledged methodological support for the inclusion of Software Engineering grey literature in systematic surveys. On the other hand, the unstructured shape of the grey literature contents could lead to bias in the evaluation process impacting on the quality of the surveys. This work leverages an approach on fuzzy Likert scales, and it proposes a methodology for managing the explicit uncertainties emerging during the assessment of entries from the grey literature. The methodology also strengthens the adoption of consensus policies that take into account the individual confidence level expressed for each of the collected scores.

在软件工程领域，包括灰色文献在内的多声音文献综述正在引起人们的兴趣。然而，与此同时，灰色文献的作用仍然存在争议，将其纳入系统评价的好处是讨论的对象。其中一些争论涉及灰色文献条目的质量评估方法，这通常被认为是一项具有挑战性和关键的任务。一方面，除了一些建议之外，在系统调查中包含软件工程灰色文献缺乏公认的方法支持。另一方面，灰色文献内容的非结构化形状可能导致评价过程中的偏差，影响调查的质量。这项工作利用了模糊李克特量表的方法，并提出了一种方法，用于管理在灰色文献中评估条目期间出现的明确不确定性。该方法还加强了协商一致政策的采用，该政策考虑到每个收集到的分数所表示的个人置信度。

{"title":"About the Assessment of Grey Literature in Software Engineering","authors":"G. D. Angelis, F. Lonetti","doi":"10.1145/3463274.3463362","DOIUrl":"https://doi.org/10.1145/3463274.3463362","url":null,"abstract":"There is an ongoing interest in the Software Engineering field for multivocal literature reviews including grey literature. However, at the same time, the role of the grey literature is still controversial, and the benefits of its inclusion in systematic reviews are object of discussion. Some of these arguments concern the quality assessment methods for grey literature entries, which is often considered a challenging and critical task. On the one hand, apart from a few proposals, there is a lack of an acknowledged methodological support for the inclusion of Software Engineering grey literature in systematic surveys. On the other hand, the unstructured shape of the grey literature contents could lead to bias in the evaluation process impacting on the quality of the surveys. This work leverages an approach on fuzzy Likert scales, and it proposes a methodology for managing the explicit uncertainties emerging during the assessment of entries from the grey literature. The methodology also strengthens the adoption of consensus policies that take into account the individual confidence level expressed for each of the collected scores.","PeriodicalId":328024,"journal":{"name":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114416693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Recommender Systems for Software Project Managers 软件项目经理推荐系统

Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2021-06-18 DOI: 10.1145/3463274.3463951

Liang Wei, Luiz Fernando Capretz

The design of recommendation systems is based on complex information processing and big data interaction. This personalized view has evolved into a hot area in the past decade, where applications might have been proved to help for solving problem in the software development field. Therefore, with the evolvement of Recommendation System in Software Engineering (RSSE), the coordination of software projects with their stakeholders is improving. This experiment examines four open source recommender systems and implemented a customized recommender engine with two industrial-oriented packages: Lenskit and Mahout. Each of the main functions was examined and issues were identified during the experiment.

推荐系统的设计是基于复杂信息处理和大数据交互的。在过去的十年中，这种个性化的观点已经发展成为一个热门领域，应用程序可能已经被证明可以帮助解决软件开发领域的问题。因此，随着软件工程推荐系统(RSSE)的发展，软件项目与其利益相关者之间的协调性不断提高。本实验研究了四个开源推荐系统，并使用两个面向工业的包:Lenskit和Mahout实现了一个定制的推荐引擎。在实验过程中，对每个主要功能进行了检查并确定了问题。

引用次数: 0

Evaluating the Effectiveness of Problem Frames for Contextual Modeling of Cyber-Physical Systems: a Tool Suite with Adaptive User Interfaces 评估信息物理系统上下文建模的问题框架的有效性:一个具有自适应用户界面的工具套件

Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2021-06-18 DOI: 10.1145/3463274.3463344

Waqas Junaid

Bridging the gap between academic research and industrial application is an important issue to promote Jackson's Problem Frames approach (PF) to the software engineering community. Various attempts have been made to tackle this problem, such as defining formal semantics of PF for software development, and providing a semi-formal approach to model transformations of problem diagrams, with automated tool support. In this paper, we propose to exclusively focus on exploring and evaluating the effectiveness of Jackson's problem diagrams for modeling the context of cyber-physical systems, by developing a suite of support tools enhanced with adaptive user interfaces, and empirically and comprehensively assess its usability. This paper introduces the state of the art, corresponding research questions, research methodologies and current progress of our research.

弥合学术研究和工业应用之间的差距是将Jackson的问题框架方法(PF)推广到软件工程社区的一个重要问题。已经进行了各种尝试来解决这个问题，例如为软件开发定义PF的形式化语义，并提供一种半形式化的方法来对问题图进行模型转换，并提供自动化工具支持。在本文中，我们建议通过开发一套增强了自适应用户界面的支持工具，专门探索和评估杰克逊问题图在网络物理系统环境建模方面的有效性，并对其可用性进行实证和全面评估。本文介绍了目前的研究现状、相应的研究问题、研究方法以及目前的研究进展。

引用次数: 1

Open Data-driven Usability Improvements of Static Code Analysis and its Challenges 开放数据驱动的静态代码分析可用性改进及其挑战

Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2021-06-18 DOI: 10.1145/3463274.3463808

Emma Söderberg, Luke Church, Martin Höst

Context: Software development is moving towards a place where data about development is gathered in a systematic fashion in order to improve the practice, for example, in tuning of static code analysis. However, this kind of data gathering has so far primarily happened within organizations, which is unfortunate as it tends to favor larger organizations with more resources for maintenance of developer tools. Objective: Over the years, we have seen a lot of benefits from open source and recently there has been a lot of development in open data. We see this as an opportunity for cross-organisation community building and wonder to what extent the views on using and sharing open source software developer tools carry across to open data-driven tuning of software development tools. Method: An exploratory study with 11 participants divided into 3 focus groups discussing using and sharing of static code analyzers and data about these analyzers. Results: While using and sharing open-source code (analyzers in this case) is perceived in a positive light as part of the practice of modern software development, sharing data is met with skepticism and uncertainty. Developers are concerned about threats to the company brand, exposure of intellectual property, legal liabilities, and to what extent data is context-specific to a certain organisation. Conclusions: Sharing data in software development is different from sharing data about software development. We need to better understand how we can provide solutions for sharing of software development data in a fashion that reduces risk and enables openness.

背景:软件开发正朝着这样一个方向发展:为了改进实践，以系统的方式收集有关开发的数据，例如，在调整静态代码分析时。然而，到目前为止，这种类型的数据收集主要发生在组织内部，这是不幸的，因为它倾向于拥有更多资源来维护开发人员工具的大型组织。目标:多年来，我们看到了开源带来的很多好处，最近在开放数据方面也有了很大的发展。我们认为这是一个跨组织社区建设的机会，并想知道使用和共享开源软件开发人员工具的观点在多大程度上可以跨越到开放数据驱动的软件开发工具调优。方法:将11名参与者分为3个焦点组进行探索性研究，讨论静态代码分析器的使用和共享以及这些分析器的数据。结果:虽然使用和共享开源代码(在本例中是分析程序)作为现代软件开发实践的一部分被认为是积极的，但共享数据却受到怀疑和不确定性的影响。开发者关心的是对公司品牌的威胁、知识产权的暴露、法律责任，以及数据在多大程度上是特定组织的特定环境。结论:软件开发中的数据共享不同于软件开发中的数据共享。我们需要更好地理解我们如何能够以一种降低风险和开放的方式为软件开发数据的共享提供解决方案。

{"title":"Open Data-driven Usability Improvements of Static Code Analysis and its Challenges","authors":"Emma Söderberg, Luke Church, Martin Höst","doi":"10.1145/3463274.3463808","DOIUrl":"https://doi.org/10.1145/3463274.3463808","url":null,"abstract":"Context: Software development is moving towards a place where data about development is gathered in a systematic fashion in order to improve the practice, for example, in tuning of static code analysis. However, this kind of data gathering has so far primarily happened within organizations, which is unfortunate as it tends to favor larger organizations with more resources for maintenance of developer tools. Objective: Over the years, we have seen a lot of benefits from open source and recently there has been a lot of development in open data. We see this as an opportunity for cross-organisation community building and wonder to what extent the views on using and sharing open source software developer tools carry across to open data-driven tuning of software development tools. Method: An exploratory study with 11 participants divided into 3 focus groups discussing using and sharing of static code analyzers and data about these analyzers. Results: While using and sharing open-source code (analyzers in this case) is perceived in a positive light as part of the practice of modern software development, sharing data is met with skepticism and uncertainty. Developers are concerned about threats to the company brand, exposure of intellectual property, legal liabilities, and to what extent data is context-specific to a certain organisation. Conclusions: Sharing data in software development is different from sharing data about software development. We need to better understand how we can provide solutions for sharing of software development data in a fashion that reduces risk and enables openness.","PeriodicalId":328024,"journal":{"name":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115468378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards a corpus for credibility assessment in software practitioner blog articles 面向软件从业者博客文章可信度评估的语料库

Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2021-06-18 DOI: 10.1145/3463274.3463330

Ashley Williams, M. Shardlow, A. Rainer

Background: Blogs are a source of grey literature which are widely adopted by software practitioners for disseminating opinion and experience. Analysing such articles can provide useful insights into the state–of–practice for software engineering research. However, there are challenges in identifying higher quality content from the large quantity of articles available. Credibility assessment can help in identifying quality content, though there is a lack of existing corpora. Credibility is typically measured through a series of conceptual criteria, with ’argumentation’ and ’evidence’ being two important criteria. Objective: We create a corpus labelled for argumentation and evidence that can aid the credibility community. The corpus consists of articles from the blog of a single software practitioner and is publicly available. Method: Three annotators label the corpus with a series of conceptual credibility criteria, reaching an agreement of 0.82 (Fleiss’ Kappa). We present preliminary analysis of the corpus by using it to investigate the identification of claim sentences (one of our ten labels). Results: We train four systems (Bert, KNN, Decision Tree and SVM) using three feature sets (Bag of Words, Topic Modelling and InferSent), achieving an F1 score of 0.64 using InferSent and a Linear SVM. Conclusions: Our preliminary results are promising, indicating that the corpus can help future studies in detecting the credibility of grey literature. Future research will investigate the degree to which the sentence level annotations can infer the credibility of the overall document.

背景:博客是灰色文献的来源，被软件从业者广泛采用，用于传播意见和经验。分析这些文章可以为软件工程研究的实践状态提供有用的见解。然而，在从大量可用文章中识别高质量内容方面存在挑战。尽管缺乏现有的语料库，但可信度评估可以帮助识别高质量的内容。可信度通常是通过一系列概念标准来衡量的，其中“论证”和“证据”是两个重要的标准。目的:我们创建一个标记为论证和证据的语料库，可以帮助可信度社区。语料库由来自单个软件从业者博客的文章组成，并且是公开可用的。方法:三位注释者用一系列概念可信度标准对语料库进行标注，一致性为0.82 (Fleiss’Kappa)。我们提出了语料库的初步分析，使用它来调查索赔句(我们的十个标签之一)的识别。结果:我们使用三个特征集(Bag of Words, Topic Modelling和InferSent)训练了四个系统(Bert, KNN, Decision Tree和SVM)，使用InferSent和线性支持向量机获得了0.64的F1分数。结论:我们的初步结果是有希望的，表明语料库可以帮助未来的研究检测灰色文献的可信度。未来的研究将探讨句子级注释在多大程度上可以推断整个文档的可信度。

{"title":"Towards a corpus for credibility assessment in software practitioner blog articles","authors":"Ashley Williams, M. Shardlow, A. Rainer","doi":"10.1145/3463274.3463330","DOIUrl":"https://doi.org/10.1145/3463274.3463330","url":null,"abstract":"Background: Blogs are a source of grey literature which are widely adopted by software practitioners for disseminating opinion and experience. Analysing such articles can provide useful insights into the state–of–practice for software engineering research. However, there are challenges in identifying higher quality content from the large quantity of articles available. Credibility assessment can help in identifying quality content, though there is a lack of existing corpora. Credibility is typically measured through a series of conceptual criteria, with ’argumentation’ and ’evidence’ being two important criteria. Objective: We create a corpus labelled for argumentation and evidence that can aid the credibility community. The corpus consists of articles from the blog of a single software practitioner and is publicly available. Method: Three annotators label the corpus with a series of conceptual credibility criteria, reaching an agreement of 0.82 (Fleiss’ Kappa). We present preliminary analysis of the corpus by using it to investigate the identification of claim sentences (one of our ten labels). Results: We train four systems (Bert, KNN, Decision Tree and SVM) using three feature sets (Bag of Words, Topic Modelling and InferSent), achieving an F1 score of 0.64 using InferSent and a Linear SVM. Conclusions: Our preliminary results are promising, indicating that the corpus can help future studies in detecting the credibility of grey literature. Future research will investigate the degree to which the sentence level annotations can infer the credibility of the overall document.","PeriodicalId":328024,"journal":{"name":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115506553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Detection and Elimination of Systematic Labeling Bias in Code Reviewer Recommendation Systems 代码审稿人推荐系统中系统标签偏差的检测与消除

Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2021-06-18 DOI: 10.1145/3463274.3463336

K. A. Tecimer, Eray Tüzün, Hamdi Dibeklioğlu, H. Erdogmus

Reviewer selection in modern code review is crucial for effective code reviews. Several techniques exist for recommending reviewers appropriate for a given pull request (PR). Most code reviewer recommendation techniques in the literature build and evaluate their models based on datasets collected from real projects using open-source or industrial practices. The techniques invariably presume that these datasets reliably represent the “ground truth.” In the context of a classification problem, ground truth refers to the objectively correct labels of a class used to build models from a dataset or evaluate a model’s performance. In a project dataset used to build a code reviewer recommendation system, the recommended code reviewer picked for a PR is usually assumed to be the best code reviewer for that PR. However, in practice, the recommended code reviewer may not be the best possible code reviewer, or even a qualified one. Recent code reviewer recommendation studies suggest that the datasets used tend to suffer from systematic labeling bias, making the ground truth unreliable. Therefore, models and recommendation systems built on such datasets may perform poorly in real practice. In this study, we introduce a novel approach to automatically detect and eliminate systematic labeling bias in code reviewer recommendation systems. The bias that we remove results from selecting reviewers that do not ensure a permanently successful fix for a bug-related PR. To demonstrate the effectiveness of our approach, we evaluated it on two open-source project datasets —HIVE and QT Creator— and with five code reviewer recommendation techniques —Profile-Based, RSTrace, Naive Bayes, k-NN, and Decision Tree. Our debiasing approach appears promising since it improved the Mean Reciprocal Rank (MRR) of the evaluated techniques up to 26% in the datasets used.

现代代码审查中的审稿人选择对有效的代码审查至关重要。有几种技术可以为给定的pull request (PR)推荐合适的审阅者。文献中的大多数代码审查推荐技术都是基于使用开源或工业实践从真实项目中收集的数据集来构建和评估模型的。这些技术总是假定这些数据集可靠地代表了“基本事实”。在分类问题的上下文中，基础真值是指用于从数据集构建模型或评估模型性能的类的客观正确标签。在用于构建代码审查员推荐系统的项目数据集中，为PR选择的被推荐的代码审查员通常被认为是该PR的最佳代码审查员。然而，在实践中，被推荐的代码审查员可能不是最好的代码审查员，甚至不是合格的代码审查员。最近的代码审稿人推荐研究表明，所使用的数据集往往存在系统性的标签偏见，使基本事实不可靠。因此，建立在这些数据集上的模型和推荐系统在实际应用中可能表现不佳。在这项研究中，我们引入了一种新的方法来自动检测和消除代码审稿人推荐系统中的系统标签偏差。我们从选择评审者中删除不能确保永久成功修复bug相关PR的结果的偏见。为了证明我们方法的有效性，我们在两个开源项目数据集(hive和QT Creator)上对其进行了评估，并使用了五种代码评审者推荐技术(基于概要文件、RSTrace、朴素贝叶斯、k-NN和决策树)。我们的去偏方法看起来很有希望，因为它在使用的数据集中将评估技术的平均倒数秩(MRR)提高了26%。

{"title":"Detection and Elimination of Systematic Labeling Bias in Code Reviewer Recommendation Systems","authors":"K. A. Tecimer, Eray Tüzün, Hamdi Dibeklioğlu, H. Erdogmus","doi":"10.1145/3463274.3463336","DOIUrl":"https://doi.org/10.1145/3463274.3463336","url":null,"abstract":"Reviewer selection in modern code review is crucial for effective code reviews. Several techniques exist for recommending reviewers appropriate for a given pull request (PR). Most code reviewer recommendation techniques in the literature build and evaluate their models based on datasets collected from real projects using open-source or industrial practices. The techniques invariably presume that these datasets reliably represent the “ground truth.” In the context of a classification problem, ground truth refers to the objectively correct labels of a class used to build models from a dataset or evaluate a model’s performance. In a project dataset used to build a code reviewer recommendation system, the recommended code reviewer picked for a PR is usually assumed to be the best code reviewer for that PR. However, in practice, the recommended code reviewer may not be the best possible code reviewer, or even a qualified one. Recent code reviewer recommendation studies suggest that the datasets used tend to suffer from systematic labeling bias, making the ground truth unreliable. Therefore, models and recommendation systems built on such datasets may perform poorly in real practice. In this study, we introduce a novel approach to automatically detect and eliminate systematic labeling bias in code reviewer recommendation systems. The bias that we remove results from selecting reviewers that do not ensure a permanently successful fix for a bug-related PR. To demonstrate the effectiveness of our approach, we evaluated it on two open-source project datasets —HIVE and QT Creator— and with five code reviewer recommendation techniques —Profile-Based, RSTrace, Naive Bayes, k-NN, and Decision Tree. Our debiasing approach appears promising since it improved the Mean Reciprocal Rank (MRR) of the evaluated techniques up to 26% in the datasets used.","PeriodicalId":328024,"journal":{"name":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117175825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7