An Approach for Assessing Quality of Labeled Data for a Machine Learning Task in Malaria Detection

Rose Nakasi, Ernest Mwebaze, A. Zawedde, J. Tusubira, Gilbert Maiga
{"title":"An Approach for Assessing Quality of Labeled Data for a Machine Learning Task in Malaria Detection","authors":"Rose Nakasi, Ernest Mwebaze, A. Zawedde, J. Tusubira, Gilbert Maiga","doi":"10.1145/3378393.3402265","DOIUrl":null,"url":null,"abstract":"While microscopy diagnosis through supervised learning for image analysis notably contributes to malaria detection, it has limitations. Among its principle challenges is the manual and tiresome process of data annotation for the classification task. The manual annotation of data is prone to inaccuracy defects due to bias, subjectivity and unclear images resulting into many false positives. This is normally due to personal independent judgements that vary from individual microscopists hence summatively affecting the accuracy of the model. In this paper, we seek to investigate the possibility of classifying the negative far examples and the positive near examples from the positives in thick blood smear images for malaria detection. Assessing the classification performance could potentially inform us of the quality of training dataset and guide n selecting the best training dataset for a malaria parasite detection task. We employ the Mean Squared Error (MSE) to distinguish between positive and negative images. We later investigate the performance of the VGG-16 classification model based on how close or far negative examples are from positives. Experimental results showed that negative examples far from the positives produce better results than those near and that the proposed method could potentially be used to reduce false positives and bias in the training data.","PeriodicalId":176951,"journal":{"name":"Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3378393.3402265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

While microscopy diagnosis through supervised learning for image analysis notably contributes to malaria detection, it has limitations. Among its principle challenges is the manual and tiresome process of data annotation for the classification task. The manual annotation of data is prone to inaccuracy defects due to bias, subjectivity and unclear images resulting into many false positives. This is normally due to personal independent judgements that vary from individual microscopists hence summatively affecting the accuracy of the model. In this paper, we seek to investigate the possibility of classifying the negative far examples and the positive near examples from the positives in thick blood smear images for malaria detection. Assessing the classification performance could potentially inform us of the quality of training dataset and guide n selecting the best training dataset for a malaria parasite detection task. We employ the Mean Squared Error (MSE) to distinguish between positive and negative images. We later investigate the performance of the VGG-16 classification model based on how close or far negative examples are from positives. Experimental results showed that negative examples far from the positives produce better results than those near and that the proposed method could potentially be used to reduce false positives and bias in the training data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
疟疾检测中用于机器学习任务的标记数据质量评估方法
虽然通过监督学习进行图像分析的显微镜诊断显著有助于疟疾检测,但它有局限性。它的主要挑战之一是为分类任务进行数据注释的手动和繁琐的过程。人工标注数据容易存在偏差、主观性、图像不清晰等不准确的缺陷,导致很多误报。这通常是由于个人的独立判断,不同于个别的显微镜,因此最终影响模型的准确性。在本文中,我们试图探讨从厚血涂片图像的阳性中分类阴性远例和阳性近例用于疟疾检测的可能性。评估分类性能可以潜在地告知我们训练数据集的质量,并指导我们为疟疾寄生虫检测任务选择最佳训练数据集。我们采用均方误差(MSE)来区分正面和负面图像。我们随后根据负例与正例的接近程度或距离来研究VGG-16分类模型的性能。实验结果表明,远离阳性的负样例比接近阳性的负样例产生更好的结果,并且该方法可以潜在地用于减少训练数据中的假阳性和偏差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Extracting Features from Online Forums to Meet Social Needs of Breast Cancer Patients ICTs as Enablers of Resilient Social Capital for Ethnic Peace Persuasive information campaign to save water in Universities: An option for water-stressed areas? The "opaque panopticon": Why publishing data online doesn't make the State transparent? The case of India's livelihood program Competitive Cities: Establishing a Classification Model using Data Science-related Jobs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1