实证软件工程中数据质量挑战的分类

M. Bosu, Stephen G. MacDonell
{"title":"实证软件工程中数据质量挑战的分类","authors":"M. Bosu, Stephen G. MacDonell","doi":"10.1109/ASWEC.2013.21","DOIUrl":null,"url":null,"abstract":"Reliable empirical models such as those used in software effort estimation or defect prediction are inherently dependent on the data from which they are built. As demands for process and product improvement continue to grow, the quality of the data used in measurement and prediction systems warrants increasingly close scrutiny. In this paper we propose a taxonomy of data quality challenges in empirical software engineering, based on an extensive review of prior research. We consider current assessment techniques for each quality issue and proposed mechanisms to address these issues, where available. Our taxonomy classifies data quality issues into three broad areas: first, characteristics of data that mean they are not fit for modeling, second, data set characteristics that lead to concerns about the suitability of applying a given model to another data set, and third, factors that prevent or limit data accessibility and trust. We identify this latter area as of particular need in terms of further research.","PeriodicalId":394020,"journal":{"name":"2013 22nd Australian Software Engineering Conference","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"A Taxonomy of Data Quality Challenges in Empirical Software Engineering\",\"authors\":\"M. Bosu, Stephen G. MacDonell\",\"doi\":\"10.1109/ASWEC.2013.21\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reliable empirical models such as those used in software effort estimation or defect prediction are inherently dependent on the data from which they are built. As demands for process and product improvement continue to grow, the quality of the data used in measurement and prediction systems warrants increasingly close scrutiny. In this paper we propose a taxonomy of data quality challenges in empirical software engineering, based on an extensive review of prior research. We consider current assessment techniques for each quality issue and proposed mechanisms to address these issues, where available. Our taxonomy classifies data quality issues into three broad areas: first, characteristics of data that mean they are not fit for modeling, second, data set characteristics that lead to concerns about the suitability of applying a given model to another data set, and third, factors that prevent or limit data accessibility and trust. We identify this latter area as of particular need in terms of further research.\",\"PeriodicalId\":394020,\"journal\":{\"name\":\"2013 22nd Australian Software Engineering Conference\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 22nd Australian Software Engineering Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASWEC.2013.21\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 22nd Australian Software Engineering Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASWEC.2013.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 34

摘要

可靠的经验模型,例如那些在软件工作评估或缺陷预测中使用的模型,本质上依赖于构建它们的数据。随着对过程和产品改进的需求不断增长,测量和预测系统中使用的数据质量需要越来越密切的审查。在本文中,基于对先前研究的广泛回顾,我们提出了经验软件工程中数据质量挑战的分类。我们考虑每个质量问题的当前评估技术,并在可用的情况下提出解决这些问题的机制。我们的分类法将数据质量问题分为三大类:第一,数据特征意味着它们不适合建模;第二,数据集特征导致对将给定模型应用于另一个数据集的适用性的担忧;第三,阻止或限制数据可访问性和信任的因素。我们认为后一个领域在进一步研究方面特别需要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Taxonomy of Data Quality Challenges in Empirical Software Engineering
Reliable empirical models such as those used in software effort estimation or defect prediction are inherently dependent on the data from which they are built. As demands for process and product improvement continue to grow, the quality of the data used in measurement and prediction systems warrants increasingly close scrutiny. In this paper we propose a taxonomy of data quality challenges in empirical software engineering, based on an extensive review of prior research. We consider current assessment techniques for each quality issue and proposed mechanisms to address these issues, where available. Our taxonomy classifies data quality issues into three broad areas: first, characteristics of data that mean they are not fit for modeling, second, data set characteristics that lead to concerns about the suitability of applying a given model to another data set, and third, factors that prevent or limit data accessibility and trust. We identify this latter area as of particular need in terms of further research.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Decomposing Distributed Software Architectures for the Determination and Incorporation of Security and Other Non-functional Requirements What Can Developers' Messages Tell Us? A Psycholinguistic Analysis of Jazz Teams' Attitudes and Behavior Patterns On the Semantics of Scenario-Based Specification Based on Timed Computational Tree Logic Unifying Configuration Management with Merge Conflict Detection and Awareness Systems A Method of Specifying and Classifying Requirements Change
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1