坦桑尼亚回顾性乳腺癌数据的归算方法:一项比较研究

Rahibu A. Abassi, Amina S. Msengwa, Rocky R. J. Akarro
{"title":"坦桑尼亚回顾性乳腺癌数据的归算方法:一项比较研究","authors":"Rahibu A. Abassi, Amina S. Msengwa, Rocky R. J. Akarro","doi":"10.31579/2642-9756/118","DOIUrl":null,"url":null,"abstract":"Background: Clinical datasets are at risk of having missing data for several reasons including patients’ failure to attend clinical measurements and measurement recorder’s defects. Missing data can significantly affect the analysis and results might be doubtful due to bias caused by omission incomplete records during analysis especially if a dataset is small. This study aims to compare several imputation methods in terms of efficiency in filling-in missing data so as to increase prediction and classification accuracy in breast cancer dataset. Methodology: Five imputation methods namely series mean, k-nearest neighbour, hot deck, predictive mean matching, expected maximisation via bootstrapping, and multiple imputation by chained equations were applied to replace the missing values to the real breast cancer dataset. The efficiency of imputation methods was compared by using the Root Mean Square Errors and Mean Absolute Errors to obtain a suitable complete dataset. Binary logistic regression and linear discrimination classifiers were applied to the imputed dataset to compare their efficacy on classification and discrimination. Results: The evaluation of imputation methods revealed that the predictive mean matching method was better off compared to other imputation methods. In addition, the binary logistic regression and linear discriminant analyses yield almost similar values on overall classification rates, sensitivity and specificity. Conclusion: The predictive mean matching imputation showed higher accuracy in estimating and replacing missing data values in a real breast cancer dataset under the study. It is a more effective and good approach to handle missing data. We recommend replacing missing data by using predictive mean matching since it is a plausible approach toward multiple imputations for numerical variables. It improves estimation and prediction accuracy over the use complete-case analysis especially when percentage of missing data is not very small.","PeriodicalId":93058,"journal":{"name":"Women health care and issues","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Imputation methods on retrospective breast cancer data in Tanzania: A comparative study\",\"authors\":\"Rahibu A. Abassi, Amina S. Msengwa, Rocky R. J. Akarro\",\"doi\":\"10.31579/2642-9756/118\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Clinical datasets are at risk of having missing data for several reasons including patients’ failure to attend clinical measurements and measurement recorder’s defects. Missing data can significantly affect the analysis and results might be doubtful due to bias caused by omission incomplete records during analysis especially if a dataset is small. This study aims to compare several imputation methods in terms of efficiency in filling-in missing data so as to increase prediction and classification accuracy in breast cancer dataset. Methodology: Five imputation methods namely series mean, k-nearest neighbour, hot deck, predictive mean matching, expected maximisation via bootstrapping, and multiple imputation by chained equations were applied to replace the missing values to the real breast cancer dataset. The efficiency of imputation methods was compared by using the Root Mean Square Errors and Mean Absolute Errors to obtain a suitable complete dataset. Binary logistic regression and linear discrimination classifiers were applied to the imputed dataset to compare their efficacy on classification and discrimination. Results: The evaluation of imputation methods revealed that the predictive mean matching method was better off compared to other imputation methods. In addition, the binary logistic regression and linear discriminant analyses yield almost similar values on overall classification rates, sensitivity and specificity. Conclusion: The predictive mean matching imputation showed higher accuracy in estimating and replacing missing data values in a real breast cancer dataset under the study. It is a more effective and good approach to handle missing data. We recommend replacing missing data by using predictive mean matching since it is a plausible approach toward multiple imputations for numerical variables. It improves estimation and prediction accuracy over the use complete-case analysis especially when percentage of missing data is not very small.\",\"PeriodicalId\":93058,\"journal\":{\"name\":\"Women health care and issues\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Women health care and issues\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.31579/2642-9756/118\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Women health care and issues","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31579/2642-9756/118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:临床数据集存在数据缺失的风险,原因包括患者未能参加临床测量和测量记录仪的缺陷。缺失的数据会严重影响分析,在分析过程中由于遗漏不完整的记录而导致的偏差可能会导致结果可疑,特别是在数据集很小的情况下。本研究旨在比较几种输入方法在填补缺失数据方面的效率,以提高乳腺癌数据集的预测和分类精度。方法:采用五种方法,即序列均值、k近邻、热甲板、预测均值匹配、通过自举实现期望最大化以及通过链式方程进行多重imputation,以取代真实乳腺癌数据集的缺失值。利用均方根误差和平均绝对误差比较了两种方法的有效性,得到了合适的完整数据集。将二元逻辑回归和线性判别分类器应用于输入数据集,比较其分类和判别效果。结果:预测均值匹配法与其他方法相比具有较好的效果。此外,二元逻辑回归和线性判别分析在总体分类率、敏感性和特异性上产生几乎相似的值。结论:在本研究下,预测均值匹配imputation在估计和替换真实乳腺癌数据集中缺失的数据值方面具有更高的准确性。这是处理丢失数据的更有效和更好的方法。我们建议通过使用预测均值匹配来替换缺失的数据,因为这是一种对数值变量进行多重输入的可行方法。它提高了使用完整案例分析的估计和预测精度,特别是当丢失数据的百分比不是很小的时候。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Imputation methods on retrospective breast cancer data in Tanzania: A comparative study
Background: Clinical datasets are at risk of having missing data for several reasons including patients’ failure to attend clinical measurements and measurement recorder’s defects. Missing data can significantly affect the analysis and results might be doubtful due to bias caused by omission incomplete records during analysis especially if a dataset is small. This study aims to compare several imputation methods in terms of efficiency in filling-in missing data so as to increase prediction and classification accuracy in breast cancer dataset. Methodology: Five imputation methods namely series mean, k-nearest neighbour, hot deck, predictive mean matching, expected maximisation via bootstrapping, and multiple imputation by chained equations were applied to replace the missing values to the real breast cancer dataset. The efficiency of imputation methods was compared by using the Root Mean Square Errors and Mean Absolute Errors to obtain a suitable complete dataset. Binary logistic regression and linear discrimination classifiers were applied to the imputed dataset to compare their efficacy on classification and discrimination. Results: The evaluation of imputation methods revealed that the predictive mean matching method was better off compared to other imputation methods. In addition, the binary logistic regression and linear discriminant analyses yield almost similar values on overall classification rates, sensitivity and specificity. Conclusion: The predictive mean matching imputation showed higher accuracy in estimating and replacing missing data values in a real breast cancer dataset under the study. It is a more effective and good approach to handle missing data. We recommend replacing missing data by using predictive mean matching since it is a plausible approach toward multiple imputations for numerical variables. It improves estimation and prediction accuracy over the use complete-case analysis especially when percentage of missing data is not very small.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Case Series of Septic Abortion During Covid-19 Pandemic at A Tertiary Care Centre in New Delhi, India Gonococcal Pelvic Inflammatory Disease with Sepsis Criteria: Review of 2 Cases Restoration Treatment for Female Patients with Non-Bearing Pregnancy at the Early Terms of Gestation Cutaneous Metastsis of An Endometrioid Adenocarcinoma to The Face Efficacy Of Progestins in The Treatment of Functional Ovarian Cyst
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1