Software Artifact Mining in Software Engineering Conferences: A Meta-Analysis

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement Pub Date : 2022-07-18 DOI:10.1145/3544902.3546239

Zeinab Abou Khalil, Stefano Zacchiroli

{"title":"Software Artifact Mining in Software Engineering Conferences: A Meta-Analysis","authors":"Zeinab Abou Khalil, Stefano Zacchiroli","doi":"10.1145/3544902.3546239","DOIUrl":null,"url":null,"abstract":"Background: Software development results in the production of various types of artifacts: source code, version control system metadata, bug reports, mailing list conversations, test data, etc. Empirical software engineering (ESE) has thrived mining those artifacts to uncover the inner workings of software development and improve its practices. But which artifacts are studied in the field is a moving target, which we study empirically in this paper. Aims: We quantitatively characterize the most frequently mined and co-mined software artifacts in ESE research and the research purposes they support. Method: We conduct a meta-analysis of artifact mining studies published in 11 top conferences in ESE, for a total of 9621 papers. We use natural language processing (NLP) techniques to characterize the types of software artifacts that are most often mined and their evolution over a 16-year period (2004–2020). We analyze the combinations of artifact types that are most often mined together, as well as the relationship between study purposes and mined artifacts. Results: We find that: (1) mining happens in the vast majority of analyzed papers, (2) source code and test data are the most mined artifacts, (3) there is an increasing interest in mining novel artifacts, together with source code, (4) researchers are most interested in the evaluation of software systems and use all possible empirical signals to support that goal. Conclusions: Our study presents a meta analysis of the usage of software artifacts in the field over a period of 16 years using NLP techniques.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3544902.3546239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Software development results in the production of various types of artifacts: source code, version control system metadata, bug reports, mailing list conversations, test data, etc. Empirical software engineering (ESE) has thrived mining those artifacts to uncover the inner workings of software development and improve its practices. But which artifacts are studied in the field is a moving target, which we study empirically in this paper. Aims: We quantitatively characterize the most frequently mined and co-mined software artifacts in ESE research and the research purposes they support. Method: We conduct a meta-analysis of artifact mining studies published in 11 top conferences in ESE, for a total of 9621 papers. We use natural language processing (NLP) techniques to characterize the types of software artifacts that are most often mined and their evolution over a 16-year period (2004–2020). We analyze the combinations of artifact types that are most often mined together, as well as the relationship between study purposes and mined artifacts. Results: We find that: (1) mining happens in the vast majority of analyzed papers, (2) source code and test data are the most mined artifacts, (3) there is an increasing interest in mining novel artifacts, together with source code, (4) researchers are most interested in the evaluation of software systems and use all possible empirical signals to support that goal. Conclusions: Our study presents a meta analysis of the usage of software artifacts in the field over a period of 16 years using NLP techniques.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

软件工程会议中的软件工件挖掘:一个元分析

背景:软件开发会产生各种类型的工件:源代码、版本控制系统元数据、bug报告、邮件列表对话、测试数据等。经验软件工程(ESE)通过挖掘这些工件来揭示软件开发的内部工作原理并改进其实践而蓬勃发展。但在该领域研究哪些伪影是一个移动的目标，本文对此进行了实证研究。目的:我们定量地描述了ESE研究中最频繁挖掘和共同挖掘的软件工件，以及它们支持的研究目的。方法:我们对发表在11个ESE顶级会议上的工件挖掘研究进行了荟萃分析，共计9621篇论文。我们使用自然语言处理(NLP)技术来描述最常被挖掘的软件工件类型及其在16年期间(2004-2020年)的演变。我们分析了最常一起挖掘的工件类型的组合，以及研究目的和挖掘的工件之间的关系。结果:我们发现:(1)挖掘发生在绝大多数被分析的论文中，(2)源代码和测试数据是挖掘最多的工件，(3)对挖掘新工件和源代码的兴趣越来越大，(4)研究人员对软件系统的评估最感兴趣，并使用所有可能的经验信号来支持这一目标。结论:我们的研究对该领域16年来使用NLP技术的软件工件的使用情况进行了荟萃分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

自引率

0.00%

发文量