The necessity of assuring quality in software measurement data

T. Khoshgoftaar, Naeem Seliya
{"title":"The necessity of assuring quality in software measurement data","authors":"T. Khoshgoftaar, Naeem Seliya","doi":"10.1109/METRIC.2004.1357896","DOIUrl":null,"url":null,"abstract":"Software measurement data is often used to model software quality classification models. Related literature has focussed on developing new classification techniques and schemes with the aim of improving classification accuracy. However, the quality of software measurement data used to build such classification models plays a critical role in their accuracy and usefulness. We present empirical case studies, which demonstrate that despite using a very large number of diverse classification techniques for building software quality classification models, the classification accuracy does not show a dramatic improvement. For example, a simple lines-of-code based classification performs comparatively to some other more advanced classification techniques such as neural networks, decision trees, and case-based reasoning. Case studies of the NASA JM1 and KC2 software measurement datasets (obtained through the NASA Metrics Data Program) are presented. Some possible reasons that affect the quality of a software measurement dataset include presence of data noise, errors due to improper software data collection, exclusion of software metrics that are better representative software quality indicators, and improper recording of software fault data. This study shows, through an empirical study, that instead of searching for a classification technique that perform well for given software measurement dataset, the software quality and development teams should focus on improving the quality of the software measurement dataset.","PeriodicalId":261807,"journal":{"name":"10th International Symposium on Software Metrics, 2004. Proceedings.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"10th International Symposium on Software Metrics, 2004. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/METRIC.2004.1357896","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 58

Abstract

Software measurement data is often used to model software quality classification models. Related literature has focussed on developing new classification techniques and schemes with the aim of improving classification accuracy. However, the quality of software measurement data used to build such classification models plays a critical role in their accuracy and usefulness. We present empirical case studies, which demonstrate that despite using a very large number of diverse classification techniques for building software quality classification models, the classification accuracy does not show a dramatic improvement. For example, a simple lines-of-code based classification performs comparatively to some other more advanced classification techniques such as neural networks, decision trees, and case-based reasoning. Case studies of the NASA JM1 and KC2 software measurement datasets (obtained through the NASA Metrics Data Program) are presented. Some possible reasons that affect the quality of a software measurement dataset include presence of data noise, errors due to improper software data collection, exclusion of software metrics that are better representative software quality indicators, and improper recording of software fault data. This study shows, through an empirical study, that instead of searching for a classification technique that perform well for given software measurement dataset, the software quality and development teams should focus on improving the quality of the software measurement dataset.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
软件测量数据质量保证的必要性
软件测量数据经常被用来建立软件质量分类模型。相关文献集中于开发新的分类技术和方案,目的是提高分类精度。然而,用于构建此类分类模型的软件测量数据的质量对其准确性和有用性起着至关重要的作用。我们提出的实证案例研究表明,尽管使用了大量不同的分类技术来构建软件质量分类模型,但分类精度并没有显着提高。例如,与其他一些更高级的分类技术(如神经网络、决策树和基于案例的推理)相比,简单的基于代码行的分类执行得更好。介绍了NASA JM1和KC2软件测量数据集(通过NASA度量数据计划获得)的案例研究。影响软件度量数据集质量的一些可能原因包括数据噪声的存在、由于软件数据收集不当而导致的错误、排除了更好地代表软件质量指标的软件度量,以及软件故障数据的不当记录。本研究通过实证研究表明,软件质量和开发团队应该关注于提高软件度量数据集的质量,而不是寻找一种对给定软件度量数据集表现良好的分类技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The necessity of assuring quality in software measurement data Resource estimation for Web applications A survey on software estimation in the Norwegian industry A controlled experiment for evaluating a metric-based reading technique for requirements inspection Assessing usability through perceptions of information scent
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1