Ying Zhang, Asma Amjad, Jie Ding, Christina Sarosiek, Mohammad Zarenia, Renae Conlin, William A Hall, Beth Erickson, Eric Paulson
{"title":"用于深度学习自动分割的以临床可用性为导向的综合轮廓质量评估:通过机器学习结合多种定量指标。","authors":"Ying Zhang, Asma Amjad, Jie Ding, Christina Sarosiek, Mohammad Zarenia, Renae Conlin, William A Hall, Beth Erickson, Eric Paulson","doi":"10.1016/j.prro.2024.07.007","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The current commonly used metrics for evaluating the quality of auto-segmented contours have limitations and do not always reflect the clinical usefulness of the contours. This work aims to develop a novel contour quality classification (CQC) method by combining multiple quantitative metrics for clinical usability-oriented contour quality evaluation for deep learning-based auto-segmentation (DLAS).</p><p><strong>Methods and materials: </strong>The CQC was designed to categorize contours on slices as acceptable, minor edit, or major edit based on the expected editing effort/time with supervised ensemble tree classification models using 7 quantitative metrics. Organ-specific models were trained for 5 abdominal organs (pancreas, duodenum, stomach, small, and large bowels) using 50 magnetic resonance imaging (MRI) data sets. Twenty additional MRI and 9 computed tomography (CT) data sets were employed for testing. Interobserver variation (IOV) was assessed among 6 observers and consensus labels were established through majority vote for evaluation. The CQC was also compared with a threshold-based baseline approach.</p><p><strong>Results: </strong>For the 5 organs, the average area under the curve was 0.982 ± 0.01 and 0.979 ± 0.01, the mean accuracy was 95.8% ± 1.7% and 94.3% ± 2.1%, and the mean risk rate was 0.8% ± 0.4% and 0.7% ± 0.5% for MRI and CT testing data set, respectively. The CQC results closely matched the IOV results (mean accuracy of 94.2% ± 0.8% and 94.8% ± 1.7%) and were significantly higher than those obtained using the threshold-based method (mean accuracy of 80.0% ± 4.7%, 83.8% ± 5.2%, and 77.3% ± 6.6% using 1, 2, and 3 metrics).</p><p><strong>Conclusions: </strong>The CQC models demonstrated high performance in classifying the quality of contour slices. This method can address the limitations of existing metrics and offers an intuitive and comprehensive solution for clinically oriented evaluation and comparison of DLAS systems.</p>","PeriodicalId":54245,"journal":{"name":"Practical Radiation Oncology","volume":" ","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comprehensive Clinical Usability-Oriented Contour Quality Evaluation for Deep Learning Auto-segmentation: Combining Multiple Quantitative Metrics Through Machine Learning.\",\"authors\":\"Ying Zhang, Asma Amjad, Jie Ding, Christina Sarosiek, Mohammad Zarenia, Renae Conlin, William A Hall, Beth Erickson, Eric Paulson\",\"doi\":\"10.1016/j.prro.2024.07.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>The current commonly used metrics for evaluating the quality of auto-segmented contours have limitations and do not always reflect the clinical usefulness of the contours. This work aims to develop a novel contour quality classification (CQC) method by combining multiple quantitative metrics for clinical usability-oriented contour quality evaluation for deep learning-based auto-segmentation (DLAS).</p><p><strong>Methods and materials: </strong>The CQC was designed to categorize contours on slices as acceptable, minor edit, or major edit based on the expected editing effort/time with supervised ensemble tree classification models using 7 quantitative metrics. Organ-specific models were trained for 5 abdominal organs (pancreas, duodenum, stomach, small, and large bowels) using 50 magnetic resonance imaging (MRI) data sets. Twenty additional MRI and 9 computed tomography (CT) data sets were employed for testing. Interobserver variation (IOV) was assessed among 6 observers and consensus labels were established through majority vote for evaluation. The CQC was also compared with a threshold-based baseline approach.</p><p><strong>Results: </strong>For the 5 organs, the average area under the curve was 0.982 ± 0.01 and 0.979 ± 0.01, the mean accuracy was 95.8% ± 1.7% and 94.3% ± 2.1%, and the mean risk rate was 0.8% ± 0.4% and 0.7% ± 0.5% for MRI and CT testing data set, respectively. The CQC results closely matched the IOV results (mean accuracy of 94.2% ± 0.8% and 94.8% ± 1.7%) and were significantly higher than those obtained using the threshold-based method (mean accuracy of 80.0% ± 4.7%, 83.8% ± 5.2%, and 77.3% ± 6.6% using 1, 2, and 3 metrics).</p><p><strong>Conclusions: </strong>The CQC models demonstrated high performance in classifying the quality of contour slices. This method can address the limitations of existing metrics and offers an intuitive and comprehensive solution for clinically oriented evaluation and comparison of DLAS systems.</p>\",\"PeriodicalId\":54245,\"journal\":{\"name\":\"Practical Radiation Oncology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Practical Radiation Oncology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.prro.2024.07.007\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Practical Radiation Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.prro.2024.07.007","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
Comprehensive Clinical Usability-Oriented Contour Quality Evaluation for Deep Learning Auto-segmentation: Combining Multiple Quantitative Metrics Through Machine Learning.
Purpose: The current commonly used metrics for evaluating the quality of auto-segmented contours have limitations and do not always reflect the clinical usefulness of the contours. This work aims to develop a novel contour quality classification (CQC) method by combining multiple quantitative metrics for clinical usability-oriented contour quality evaluation for deep learning-based auto-segmentation (DLAS).
Methods and materials: The CQC was designed to categorize contours on slices as acceptable, minor edit, or major edit based on the expected editing effort/time with supervised ensemble tree classification models using 7 quantitative metrics. Organ-specific models were trained for 5 abdominal organs (pancreas, duodenum, stomach, small, and large bowels) using 50 magnetic resonance imaging (MRI) data sets. Twenty additional MRI and 9 computed tomography (CT) data sets were employed for testing. Interobserver variation (IOV) was assessed among 6 observers and consensus labels were established through majority vote for evaluation. The CQC was also compared with a threshold-based baseline approach.
Results: For the 5 organs, the average area under the curve was 0.982 ± 0.01 and 0.979 ± 0.01, the mean accuracy was 95.8% ± 1.7% and 94.3% ± 2.1%, and the mean risk rate was 0.8% ± 0.4% and 0.7% ± 0.5% for MRI and CT testing data set, respectively. The CQC results closely matched the IOV results (mean accuracy of 94.2% ± 0.8% and 94.8% ± 1.7%) and were significantly higher than those obtained using the threshold-based method (mean accuracy of 80.0% ± 4.7%, 83.8% ± 5.2%, and 77.3% ± 6.6% using 1, 2, and 3 metrics).
Conclusions: The CQC models demonstrated high performance in classifying the quality of contour slices. This method can address the limitations of existing metrics and offers an intuitive and comprehensive solution for clinically oriented evaluation and comparison of DLAS systems.
期刊介绍:
The overarching mission of Practical Radiation Oncology is to improve the quality of radiation oncology practice. PRO''s purpose is to document the state of current practice, providing background for those in training and continuing education for practitioners, through discussion and illustration of new techniques, evaluation of current practices, and publication of case reports. PRO strives to provide its readers content that emphasizes knowledge "with a purpose." The content of PRO includes:
Original articles focusing on patient safety, quality measurement, or quality improvement initiatives
Original articles focusing on imaging, contouring, target delineation, simulation, treatment planning, immobilization, organ motion, and other practical issues
ASTRO guidelines, position papers, and consensus statements
Essays that highlight enriching personal experiences in caring for cancer patients and their families.