An Approach for Assessing Quality of Labeled Data for a Machine Learning Task in Malaria Detection

Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies Pub Date : 2020-06-15 DOI:10.1145/3378393.3402265

Rose Nakasi, Ernest Mwebaze, A. Zawedde, J. Tusubira, Gilbert Maiga

{"title":"An Approach for Assessing Quality of Labeled Data for a Machine Learning Task in Malaria Detection","authors":"Rose Nakasi, Ernest Mwebaze, A. Zawedde, J. Tusubira, Gilbert Maiga","doi":"10.1145/3378393.3402265","DOIUrl":null,"url":null,"abstract":"While microscopy diagnosis through supervised learning for image analysis notably contributes to malaria detection, it has limitations. Among its principle challenges is the manual and tiresome process of data annotation for the classification task. The manual annotation of data is prone to inaccuracy defects due to bias, subjectivity and unclear images resulting into many false positives. This is normally due to personal independent judgements that vary from individual microscopists hence summatively affecting the accuracy of the model. In this paper, we seek to investigate the possibility of classifying the negative far examples and the positive near examples from the positives in thick blood smear images for malaria detection. Assessing the classification performance could potentially inform us of the quality of training dataset and guide n selecting the best training dataset for a malaria parasite detection task. We employ the Mean Squared Error (MSE) to distinguish between positive and negative images. We later investigate the performance of the VGG-16 classification model based on how close or far negative examples are from positives. Experimental results showed that negative examples far from the positives produce better results than those near and that the proposed method could potentially be used to reduce false positives and bias in the training data.","PeriodicalId":176951,"journal":{"name":"Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3378393.3402265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

While microscopy diagnosis through supervised learning for image analysis notably contributes to malaria detection, it has limitations. Among its principle challenges is the manual and tiresome process of data annotation for the classification task. The manual annotation of data is prone to inaccuracy defects due to bias, subjectivity and unclear images resulting into many false positives. This is normally due to personal independent judgements that vary from individual microscopists hence summatively affecting the accuracy of the model. In this paper, we seek to investigate the possibility of classifying the negative far examples and the positive near examples from the positives in thick blood smear images for malaria detection. Assessing the classification performance could potentially inform us of the quality of training dataset and guide n selecting the best training dataset for a malaria parasite detection task. We employ the Mean Squared Error (MSE) to distinguish between positive and negative images. We later investigate the performance of the VGG-16 classification model based on how close or far negative examples are from positives. Experimental results showed that negative examples far from the positives produce better results than those near and that the proposed method could potentially be used to reduce false positives and bias in the training data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

疟疾检测中用于机器学习任务的标记数据质量评估方法

虽然通过监督学习进行图像分析的显微镜诊断显著有助于疟疾检测，但它有局限性。它的主要挑战之一是为分类任务进行数据注释的手动和繁琐的过程。人工标注数据容易存在偏差、主观性、图像不清晰等不准确的缺陷，导致很多误报。这通常是由于个人的独立判断，不同于个别的显微镜，因此最终影响模型的准确性。在本文中，我们试图探讨从厚血涂片图像的阳性中分类阴性远例和阳性近例用于疟疾检测的可能性。评估分类性能可以潜在地告知我们训练数据集的质量，并指导我们为疟疾寄生虫检测任务选择最佳训练数据集。我们采用均方误差(MSE)来区分正面和负面图像。我们随后根据负例与正例的接近程度或距离来研究VGG-16分类模型的性能。实验结果表明，远离阳性的负样例比接近阳性的负样例产生更好的结果，并且该方法可以潜在地用于减少训练数据中的假阳性和偏差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies

自引率

0.00%

发文量