Software Artifact Mining in Software Engineering Conferences: A Meta-Analysis

Zeinab Abou Khalil, Stefano Zacchiroli
{"title":"Software Artifact Mining in Software Engineering Conferences: A Meta-Analysis","authors":"Zeinab Abou Khalil, Stefano Zacchiroli","doi":"10.1145/3544902.3546239","DOIUrl":null,"url":null,"abstract":"Background: Software development results in the production of various types of artifacts: source code, version control system metadata, bug reports, mailing list conversations, test data, etc. Empirical software engineering (ESE) has thrived mining those artifacts to uncover the inner workings of software development and improve its practices. But which artifacts are studied in the field is a moving target, which we study empirically in this paper. Aims: We quantitatively characterize the most frequently mined and co-mined software artifacts in ESE research and the research purposes they support. Method: We conduct a meta-analysis of artifact mining studies published in 11 top conferences in ESE, for a total of 9621 papers. We use natural language processing (NLP) techniques to characterize the types of software artifacts that are most often mined and their evolution over a 16-year period (2004–2020). We analyze the combinations of artifact types that are most often mined together, as well as the relationship between study purposes and mined artifacts. Results: We find that: (1) mining happens in the vast majority of analyzed papers, (2) source code and test data are the most mined artifacts, (3) there is an increasing interest in mining novel artifacts, together with source code, (4) researchers are most interested in the evaluation of software systems and use all possible empirical signals to support that goal. Conclusions: Our study presents a meta analysis of the usage of software artifacts in the field over a period of 16 years using NLP techniques.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3544902.3546239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Software development results in the production of various types of artifacts: source code, version control system metadata, bug reports, mailing list conversations, test data, etc. Empirical software engineering (ESE) has thrived mining those artifacts to uncover the inner workings of software development and improve its practices. But which artifacts are studied in the field is a moving target, which we study empirically in this paper. Aims: We quantitatively characterize the most frequently mined and co-mined software artifacts in ESE research and the research purposes they support. Method: We conduct a meta-analysis of artifact mining studies published in 11 top conferences in ESE, for a total of 9621 papers. We use natural language processing (NLP) techniques to characterize the types of software artifacts that are most often mined and their evolution over a 16-year period (2004–2020). We analyze the combinations of artifact types that are most often mined together, as well as the relationship between study purposes and mined artifacts. Results: We find that: (1) mining happens in the vast majority of analyzed papers, (2) source code and test data are the most mined artifacts, (3) there is an increasing interest in mining novel artifacts, together with source code, (4) researchers are most interested in the evaluation of software systems and use all possible empirical signals to support that goal. Conclusions: Our study presents a meta analysis of the usage of software artifacts in the field over a period of 16 years using NLP techniques.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
软件工程会议中的软件工件挖掘:一个元分析
背景:软件开发会产生各种类型的工件:源代码、版本控制系统元数据、bug报告、邮件列表对话、测试数据等。经验软件工程(ESE)通过挖掘这些工件来揭示软件开发的内部工作原理并改进其实践而蓬勃发展。但在该领域研究哪些伪影是一个移动的目标,本文对此进行了实证研究。目的:我们定量地描述了ESE研究中最频繁挖掘和共同挖掘的软件工件,以及它们支持的研究目的。方法:我们对发表在11个ESE顶级会议上的工件挖掘研究进行了荟萃分析,共计9621篇论文。我们使用自然语言处理(NLP)技术来描述最常被挖掘的软件工件类型及其在16年期间(2004-2020年)的演变。我们分析了最常一起挖掘的工件类型的组合,以及研究目的和挖掘的工件之间的关系。结果:我们发现:(1)挖掘发生在绝大多数被分析的论文中,(2)源代码和测试数据是挖掘最多的工件,(3)对挖掘新工件和源代码的兴趣越来越大,(4)研究人员对软件系统的评估最感兴趣,并使用所有可能的经验信号来支持这一目标。结论:我们的研究对该领域16年来使用NLP技术的软件工件的使用情况进行了荟萃分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analyzing the Relationship between Community and Design Smells in Open-Source Software Projects: An Empirical Study A Preliminary Investigation of MLOps Practices in GitHub PG-VulNet: Detect Supply Chain Vulnerabilities in IoT Devices using Pseudo-code and Graphs On the Relationship Between Story Points and Development Effort in Agile Open-Source Software DevOps Practitioners’ Perceptions of the Low-code Trend
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1