Miguel López-Pérez , Alba Morquecho , Arne Schmidt , Fernando Pérez-Bueno , Aurelio Martín-Castro , Javier Mateos , Rafael Molina
{"title":"CrowdGleason 数据集:从人群和专家中学习格里森等级。","authors":"Miguel López-Pérez , Alba Morquecho , Arne Schmidt , Fernando Pérez-Bueno , Aurelio Martín-Castro , Javier Mateos , Rafael Molina","doi":"10.1016/j.cmpb.2024.108472","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>Currently, prostate cancer (PCa) diagnosis relies on the human analysis of prostate biopsy Whole Slide Images (WSIs) using the Gleason score. Since this process is error-prone and time-consuming, recent advances in machine learning have promoted the use of automated systems to assist pathologists. Unfortunately, labeled datasets for training and validation are scarce due to the need for expert pathologists to provide ground-truth labels.</div></div><div><h3>Methods:</h3><div>This work introduces a new prostate histopathological dataset named CrowdGleason, which consists of 19,077 patches from 1045 WSIs with various Gleason grades. The dataset was annotated using a crowdsourcing protocol involving seven pathologists-in-training to distribute the labeling effort. To provide a baseline analysis, two crowdsourcing methods based on Gaussian Processes (GPs) were evaluated for Gleason grade prediction: SVGPCR, which learns a model from the CrowdGleason dataset, and SVGPMIX, which combines data from the public dataset SICAPv2 and the CrowdGleason dataset. The performance of these methods was compared with other crowdsourcing and expert label-based methods through comprehensive experiments.</div></div><div><h3>Results:</h3><div>The results demonstrate that our GP-based crowdsourcing approach outperforms other methods for aggregating crowdsourced labels (<span><math><mrow><mi>κ</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>7048</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>0207</mn></mrow></math></span>) for SVGPCR vs.(<span><math><mrow><mi>κ</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>6576</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>0086</mn></mrow></math></span>) for SVGP with majority voting). SVGPCR trained with crowdsourced labels performs better than GP trained with expert labels from SICAPv2 (<span><math><mrow><mi>κ</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>6583</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>0220</mn></mrow></math></span>) and outperforms most individual pathologists-in-training (mean <span><math><mrow><mi>κ</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>5432</mn></mrow></math></span>). Additionally, SVGPMIX trained with a combination of SICAPv2 and CrowdGleason achieves the highest performance on both datasets (<span><math><mrow><mi>κ</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>7814</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>0083</mn></mrow></math></span> and <span><math><mrow><mi>κ</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>7276</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>0260</mn></mrow></math></span>).</div></div><div><h3>Conclusion:</h3><div>The experiments show that the CrowdGleason dataset can be successfully used for training and validating supervised and crowdsourcing methods. Furthermore, the crowdsourcing methods trained on this dataset obtain competitive results against those using expert labels. Interestingly, the combination of expert and non-expert labels opens the door to a future of massive labeling by incorporating both expert and non-expert pathologist annotators.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"257 ","pages":"Article 108472"},"PeriodicalIF":4.9000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The CrowdGleason dataset: Learning the Gleason grade from crowds and experts\",\"authors\":\"Miguel López-Pérez , Alba Morquecho , Arne Schmidt , Fernando Pérez-Bueno , Aurelio Martín-Castro , Javier Mateos , Rafael Molina\",\"doi\":\"10.1016/j.cmpb.2024.108472\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background:</h3><div>Currently, prostate cancer (PCa) diagnosis relies on the human analysis of prostate biopsy Whole Slide Images (WSIs) using the Gleason score. Since this process is error-prone and time-consuming, recent advances in machine learning have promoted the use of automated systems to assist pathologists. Unfortunately, labeled datasets for training and validation are scarce due to the need for expert pathologists to provide ground-truth labels.</div></div><div><h3>Methods:</h3><div>This work introduces a new prostate histopathological dataset named CrowdGleason, which consists of 19,077 patches from 1045 WSIs with various Gleason grades. The dataset was annotated using a crowdsourcing protocol involving seven pathologists-in-training to distribute the labeling effort. To provide a baseline analysis, two crowdsourcing methods based on Gaussian Processes (GPs) were evaluated for Gleason grade prediction: SVGPCR, which learns a model from the CrowdGleason dataset, and SVGPMIX, which combines data from the public dataset SICAPv2 and the CrowdGleason dataset. The performance of these methods was compared with other crowdsourcing and expert label-based methods through comprehensive experiments.</div></div><div><h3>Results:</h3><div>The results demonstrate that our GP-based crowdsourcing approach outperforms other methods for aggregating crowdsourced labels (<span><math><mrow><mi>κ</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>7048</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>0207</mn></mrow></math></span>) for SVGPCR vs.(<span><math><mrow><mi>κ</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>6576</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>0086</mn></mrow></math></span>) for SVGP with majority voting). SVGPCR trained with crowdsourced labels performs better than GP trained with expert labels from SICAPv2 (<span><math><mrow><mi>κ</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>6583</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>0220</mn></mrow></math></span>) and outperforms most individual pathologists-in-training (mean <span><math><mrow><mi>κ</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>5432</mn></mrow></math></span>). Additionally, SVGPMIX trained with a combination of SICAPv2 and CrowdGleason achieves the highest performance on both datasets (<span><math><mrow><mi>κ</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>7814</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>0083</mn></mrow></math></span> and <span><math><mrow><mi>κ</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>7276</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>0260</mn></mrow></math></span>).</div></div><div><h3>Conclusion:</h3><div>The experiments show that the CrowdGleason dataset can be successfully used for training and validating supervised and crowdsourcing methods. Furthermore, the crowdsourcing methods trained on this dataset obtain competitive results against those using expert labels. Interestingly, the combination of expert and non-expert labels opens the door to a future of massive labeling by incorporating both expert and non-expert pathologist annotators.</div></div>\",\"PeriodicalId\":10624,\"journal\":{\"name\":\"Computer methods and programs in biomedicine\",\"volume\":\"257 \",\"pages\":\"Article 108472\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169260724004656\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260724004656","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
摘要
背景:目前,前列腺癌(PCa)的诊断依赖于人类使用格里森评分对前列腺活检全切片图像(WSI)进行分析。由于这一过程容易出错且耗时,机器学习的最新进展推动了自动系统的使用,以协助病理学家。遗憾的是,由于需要病理专家提供真实标签,用于训练和验证的标签数据集非常稀缺:这项工作引入了一个名为 CrowdGleason 的新前列腺组织病理学数据集,该数据集由来自 1045 个 WSI 的 19,077 个不同 Gleason 等级的斑块组成。该数据集采用众包协议进行标注,有七位受训病理学家参与其中,共同分配标注工作。为了提供基线分析,对两种基于高斯过程(GP)的众包方法进行了格里森等级预测评估:SVGPCR从CrowdGleason数据集中学习模型,SVGPMIX结合了公共数据集SICAPv2和CrowdGleason数据集的数据。通过综合实验,将这些方法的性能与其他基于众包和专家标签的方法进行了比较:结果表明,在聚合众包标签方面,我们基于 GP 的众包方法优于其他方法(κ=0.7048±0.0207)(SVGPCR vs. (κ=0.6576±0.0086)(SVGP with majority voting))。使用众包标签训练的 SVGPCR 比使用 SICAPv2 专家标签训练的 GP 性能更好(κ=0.6583±0.0220),并且优于大多数在训病理学家(平均κ=0.5432)。此外,结合 SICAPv2 和 CrowdGleason 训练的 SVGPMIX 在两个数据集上都取得了最高的性能(κ=0.7814±0.0083 和 κ=0.7276±0.0260):实验表明,CrowdGleason 数据集可成功用于训练和验证监督方法和众包方法。此外,与使用专家标签的方法相比,在该数据集上训练的众包方法获得了具有竞争力的结果。有趣的是,专家标签和非专家标签的结合为未来的大规模标注打开了大门,因为它同时包含了专家和非专家病理学家注释者。
The CrowdGleason dataset: Learning the Gleason grade from crowds and experts
Background:
Currently, prostate cancer (PCa) diagnosis relies on the human analysis of prostate biopsy Whole Slide Images (WSIs) using the Gleason score. Since this process is error-prone and time-consuming, recent advances in machine learning have promoted the use of automated systems to assist pathologists. Unfortunately, labeled datasets for training and validation are scarce due to the need for expert pathologists to provide ground-truth labels.
Methods:
This work introduces a new prostate histopathological dataset named CrowdGleason, which consists of 19,077 patches from 1045 WSIs with various Gleason grades. The dataset was annotated using a crowdsourcing protocol involving seven pathologists-in-training to distribute the labeling effort. To provide a baseline analysis, two crowdsourcing methods based on Gaussian Processes (GPs) were evaluated for Gleason grade prediction: SVGPCR, which learns a model from the CrowdGleason dataset, and SVGPMIX, which combines data from the public dataset SICAPv2 and the CrowdGleason dataset. The performance of these methods was compared with other crowdsourcing and expert label-based methods through comprehensive experiments.
Results:
The results demonstrate that our GP-based crowdsourcing approach outperforms other methods for aggregating crowdsourced labels () for SVGPCR vs.() for SVGP with majority voting). SVGPCR trained with crowdsourced labels performs better than GP trained with expert labels from SICAPv2 () and outperforms most individual pathologists-in-training (mean ). Additionally, SVGPMIX trained with a combination of SICAPv2 and CrowdGleason achieves the highest performance on both datasets ( and ).
Conclusion:
The experiments show that the CrowdGleason dataset can be successfully used for training and validating supervised and crowdsourcing methods. Furthermore, the crowdsourcing methods trained on this dataset obtain competitive results against those using expert labels. Interestingly, the combination of expert and non-expert labels opens the door to a future of massive labeling by incorporating both expert and non-expert pathologist annotators.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.