在白细胞形态学评估中比较人类水平和机器学习模型的性能。

IF 2.3 3区医学 Q2 HEMATOLOGY European Journal of Haematology Pub Date : 2024-10-06 DOI:10.1111/ejh.14318

Patrick Lawrence, Christina Brown

{"title":"在白细胞形态学评估中比较人类水平和机器学习模型的性能。","authors":"Patrick Lawrence, Christina Brown","doi":"10.1111/ejh.14318","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction</h3>\n \n <p>There is an increasing research focus on the role of machine learning in the haematology laboratory, particularly in blood cell morphologic assessment. Human-level performance is an important baseline and goal for machine learning. This study aims to assess the interobserver variability and human-level performance in blood cell morphologic assessment.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>A dataset of 1000 single white blood cell images were independently labelled by 10 doctors and morphology scientists. Interobserver variability was calculated using Fleiss' kappa. Observers' labels were then separated into consensus labels used to determine ground truth, and performance labels used to assess observer performance. A machine learning model was trained and assessed using the same cell images. Explainability images (XRAI and IG) were generated for each of the test images.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The Fleiss kappa for all 10 observers was 0.608, indicating substantial agreement between observers. The accuracy of human observers was 95%, with sensitivity 72% and specificity 97%. The accuracy of the machine learning model was 95%, with sensitivity 71% and specificity 97%. The model shared similar performance across labels when compared to humans. Explainability metrics demonstrated that the machine learning model was able to differentiate between the cytoplasm and nucleus of the cells, and used these features to perform predictions.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>The substantial, though not perfect, agreement between human observers highlights the inherent subjectivity in white blood cell morphologic assessment. A machine learning model performed similarly to human observers in single white blood cell identification. Further research is needed to compare human-level and machine learning performance in ways that more closely reflect the typical process of morphologic assessment.</p>\n </section>\n </div>","PeriodicalId":11955,"journal":{"name":"European Journal of Haematology","volume":"114 1","pages":"115-119"},"PeriodicalIF":2.3000,"publicationDate":"2024-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing Human-Level and Machine Learning Model Performance in White Blood Cell Morphology Assessment\",\"authors\":\"Patrick Lawrence, Christina Brown\",\"doi\":\"10.1111/ejh.14318\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Introduction</h3>\\n \\n <p>There is an increasing research focus on the role of machine learning in the haematology laboratory, particularly in blood cell morphologic assessment. Human-level performance is an important baseline and goal for machine learning. This study aims to assess the interobserver variability and human-level performance in blood cell morphologic assessment.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>A dataset of 1000 single white blood cell images were independently labelled by 10 doctors and morphology scientists. Interobserver variability was calculated using Fleiss' kappa. Observers' labels were then separated into consensus labels used to determine ground truth, and performance labels used to assess observer performance. A machine learning model was trained and assessed using the same cell images. Explainability images (XRAI and IG) were generated for each of the test images.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>The Fleiss kappa for all 10 observers was 0.608, indicating substantial agreement between observers. The accuracy of human observers was 95%, with sensitivity 72% and specificity 97%. The accuracy of the machine learning model was 95%, with sensitivity 71% and specificity 97%. The model shared similar performance across labels when compared to humans. Explainability metrics demonstrated that the machine learning model was able to differentiate between the cytoplasm and nucleus of the cells, and used these features to perform predictions.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>The substantial, though not perfect, agreement between human observers highlights the inherent subjectivity in white blood cell morphologic assessment. A machine learning model performed similarly to human observers in single white blood cell identification. Further research is needed to compare human-level and machine learning performance in ways that more closely reflect the typical process of morphologic assessment.</p>\\n </section>\\n </div>\",\"PeriodicalId\":11955,\"journal\":{\"name\":\"European Journal of Haematology\",\"volume\":\"114 1\",\"pages\":\"115-119\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Haematology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/ejh.14318\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"HEMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Haematology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/ejh.14318","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEMATOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

简介机器学习在血液学实验室中的作用，尤其是在血细胞形态评估中的作用，正日益成为研究重点。人类水平的表现是机器学习的重要基准和目标。本研究旨在评估血细胞形态学评估中观察者间的变异性和人类水平的表现：方法：由 10 名医生和形态学科学家对 1000 张单个白细胞图像数据集进行独立标注。使用弗莱斯卡帕计算观察者之间的变异性。然后将观察者的标签分为用于确定基本真相的共识标签和用于评估观察者表现的表现标签。使用相同的细胞图像对机器学习模型进行训练和评估。为每个测试图像生成可解释性图像（XRAI 和 IG）：所有 10 名观察者的弗莱斯卡帕（Fleiss kappa）值为 0.608，表明观察者之间的一致性很高。人类观察者的准确率为 95%，灵敏度为 72%，特异度为 97%。机器学习模型的准确率为 95%，灵敏度为 71%，特异度为 97%。与人类相比，该模型在不同标签上的表现相似。可解释性指标表明，机器学习模型能够区分细胞的细胞质和细胞核，并利用这些特征进行预测：结论：人类观察者之间的一致性虽然并不完美，但却很高，这凸显了白细胞形态学评估固有的主观性。机器学习模型在单个白细胞识别方面的表现与人类观察者相似。还需要进一步的研究来比较人类和机器学习的表现，以更贴近形态学评估的典型过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comparing Human-Level and Machine Learning Model Performance in White Blood Cell Morphology Assessment

Introduction

There is an increasing research focus on the role of machine learning in the haematology laboratory, particularly in blood cell morphologic assessment. Human-level performance is an important baseline and goal for machine learning. This study aims to assess the interobserver variability and human-level performance in blood cell morphologic assessment.

Methods

A dataset of 1000 single white blood cell images were independently labelled by 10 doctors and morphology scientists. Interobserver variability was calculated using Fleiss' kappa. Observers' labels were then separated into consensus labels used to determine ground truth, and performance labels used to assess observer performance. A machine learning model was trained and assessed using the same cell images. Explainability images (XRAI and IG) were generated for each of the test images.

Results

The Fleiss kappa for all 10 observers was 0.608, indicating substantial agreement between observers. The accuracy of human observers was 95%, with sensitivity 72% and specificity 97%. The accuracy of the machine learning model was 95%, with sensitivity 71% and specificity 97%. The model shared similar performance across labels when compared to humans. Explainability metrics demonstrated that the machine learning model was able to differentiate between the cytoplasm and nucleus of the cells, and used these features to perform predictions.

Conclusion

The substantial, though not perfect, agreement between human observers highlights the inherent subjectivity in white blood cell morphologic assessment. A machine learning model performed similarly to human observers in single white blood cell identification. Further research is needed to compare human-level and machine learning performance in ways that more closely reflect the typical process of morphologic assessment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Journal of Haematology 医学-血液学

CiteScore

5.50

自引率

0.00%

发文量

168

审稿时长

4-8 weeks

期刊介绍： European Journal of Haematology is an international journal for communication of basic and clinical research in haematology. The journal welcomes manuscripts on molecular, cellular and clinical research on diseases of the blood, vascular and lymphatic tissue, and on basic molecular and cellular research related to normal development and function of the blood, vascular and lymphatic tissue. The journal also welcomes reviews on clinical haematology and basic research, case reports, and clinical pictures.