Semi-supervised labelling of chest x-ray images using unsupervised clustering for ground-truth generation

Victor Ikechukwu Agughasi, Murali Srinivasiah
{"title":"Semi-supervised labelling of chest x-ray images using unsupervised clustering for ground-truth generation","authors":"Victor Ikechukwu Agughasi, Murali Srinivasiah","doi":"10.31763/aet.v2i3.1143","DOIUrl":null,"url":null,"abstract":"Supervised classifiers require a lot of data with accurate labels to learn to recognize chest X-ray images (CXR). However, manually labeling an extensive collection of CXR images is time-consuming and costly. To address this issue, a method for the semi-supervised labelling of extensive collections of CXR images is proposed leveraging unsupervised clustering with minimum expert knowledge to generate ground truth images. The proposed methodology entails: using unsupervised clustering techniques such as K-Means and Self-Organizing Maps. Second, the images are fed to five different feature vectors to utilize the potential differences between features to their full advantage. Third, each data point gets the label of the cluster’s center to which it belongs. Finally, a majority vote is used to decide the ground truth image. The number of clusters created by the method chosen strictly limits the amount of human involvement. To evaluate the effectiveness of the proposed method, experiments were conducted on two publicly available CXR datasets, namely VinDR-CXR and Montgomery datasets. The experiments showed that, for a KNN classifier, manually labeling only 1% (VinDr-CXR), or 10% (Montgomery) of the training data, gives a similar performance as labeling the whole dataset. The proposed methodology efficiently generates ground-truth images from publicly available CXR datasets. To our knowledge, this is the first study to use the VinDr-CXR and Montgomery datasets for ground truth image generation. Extensive experimental analysis using machine learning and statistical techniques shows that the proposed methodology efficiently generates ground truth images from CXR datasets.","PeriodicalId":21010,"journal":{"name":"Research Journal of Applied Sciences, Engineering and Technology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Journal of Applied Sciences, Engineering and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31763/aet.v2i3.1143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Supervised classifiers require a lot of data with accurate labels to learn to recognize chest X-ray images (CXR). However, manually labeling an extensive collection of CXR images is time-consuming and costly. To address this issue, a method for the semi-supervised labelling of extensive collections of CXR images is proposed leveraging unsupervised clustering with minimum expert knowledge to generate ground truth images. The proposed methodology entails: using unsupervised clustering techniques such as K-Means and Self-Organizing Maps. Second, the images are fed to five different feature vectors to utilize the potential differences between features to their full advantage. Third, each data point gets the label of the cluster’s center to which it belongs. Finally, a majority vote is used to decide the ground truth image. The number of clusters created by the method chosen strictly limits the amount of human involvement. To evaluate the effectiveness of the proposed method, experiments were conducted on two publicly available CXR datasets, namely VinDR-CXR and Montgomery datasets. The experiments showed that, for a KNN classifier, manually labeling only 1% (VinDr-CXR), or 10% (Montgomery) of the training data, gives a similar performance as labeling the whole dataset. The proposed methodology efficiently generates ground-truth images from publicly available CXR datasets. To our knowledge, this is the first study to use the VinDr-CXR and Montgomery datasets for ground truth image generation. Extensive experimental analysis using machine learning and statistical techniques shows that the proposed methodology efficiently generates ground truth images from CXR datasets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
胸部x线图像的半监督标记,使用无监督聚类生成地面真相
监督分类器需要大量带有准确标签的数据来学习识别胸部x射线图像。然而,手动标记大量的CXR图像集既耗时又昂贵。为了解决这个问题,提出了一种对大量CXR图像集合进行半监督标记的方法,利用最小专家知识的无监督聚类来生成地面真值图像。提出的方法需要:使用无监督聚类技术,如K-Means和自组织地图。其次,将图像馈送到五个不同的特征向量中,充分利用特征之间的潜在差异。第三,每个数据点获得它所属的集群中心的标签。最后,使用多数投票来决定地面真实图像。所选择的方法所产生的集群数量严格限制了人类参与的数量。为了评估所提出方法的有效性,在两个公开的CXR数据集(即vdr -CXR和Montgomery数据集)上进行了实验。实验表明,对于KNN分类器,手动标记1% (VinDr-CXR)或10% (Montgomery)的训练数据,可以获得与标记整个数据集相似的性能。提出的方法有效地从公开可用的CXR数据集生成真实图像。据我们所知,这是第一个使用vdr - cxr和Montgomery数据集生成地面真实图像的研究。使用机器学习和统计技术的广泛实验分析表明,所提出的方法有效地从CXR数据集生成地面真实图像。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Numerical study optimation design of CPU cooling system analysis using CFD method Development and implementation of the MobILcaps application for the teaching and development of information literacy in Higher Education Semi-supervised labelling of chest x-ray images using unsupervised clustering for ground-truth generation Technical and economic appraisal for harnessing a proposed hybrid energy system nexus for power generation and CO2 mitigation in Cross River State, Nigeria Geopolymer vs ordinary portland cement: review of the 3-d printing of concrete
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1