Jason Holmes PhD , Lian Zhang PhD , Yuzhen Ding PhD , Hongying Feng PhD , Zhengliang Liu MS , Tianming Liu PhD , William W. Wong MD , Sujay A. Vora MD , Jonathan B. Ashman MD, PhD , Wei Liu PhD
{"title":"根据美国医学物理学家协会工作组-263 报告,对基础大型语言模型重新标注结构名称的能力进行基准测试。","authors":"Jason Holmes PhD , Lian Zhang PhD , Yuzhen Ding PhD , Hongying Feng PhD , Zhengliang Liu MS , Tianming Liu PhD , William W. Wong MD , Sujay A. Vora MD , Jonathan B. Ashman MD, PhD , Wei Liu PhD","doi":"10.1016/j.prro.2024.04.017","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>To introduce the concept of using large language models (LLMs) to relabel structure names in accordance with the American Association of Physicists in Medicine Task Group-263 standard and to establish a benchmark for future studies to reference.</div></div><div><h3>Methods and Materials</h3><div>Generative Pretrained Transformer (GPT)-4 was implemented within a Digital Imaging and Communications in Medicine server. Upon receiving a structure-set Digital Imaging and Communications in Medicine file, the server prompts GPT-4 to relabel the structure names according to the American Association of Physicists in Medicine Task Group-263 report. The results were evaluated for 3 disease sites: prostate, head and neck, and thorax. For each disease site, 150 patients were randomly selected for manually tuning the instructions prompt (in batches of 50), and 50 patients were randomly selected for evaluation. Structure names considered were those that were most likely to be relevant for studies using structure contours for many patients.</div></div><div><h3>Results</h3><div>The per-patient accuracy was 97.2%, 98.3%, and 97.1% for prostate, head and neck, and thorax disease sites, respectively. On a per-structure basis, the clinical target volume was relabeled correctly in 100%, 95.3%, and 92.9% of cases, respectively.</div></div><div><h3>Conclusions</h3><div>Given the accuracy of GPT-4 in relabeling structure names as presented in this work, LLMs are poised to become an important method for standardizing structure names in radiation oncology, especially considering the rapid advancements in LLM capabilities that are likely to continue.</div></div>","PeriodicalId":54245,"journal":{"name":"Practical Radiation Oncology","volume":"14 6","pages":"Pages e515-e521"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Benchmarking a Foundation Large Language Model on its Ability to Relabel Structure Names in Accordance With the American Association of Physicists in Medicine Task Group-263 Report\",\"authors\":\"Jason Holmes PhD , Lian Zhang PhD , Yuzhen Ding PhD , Hongying Feng PhD , Zhengliang Liu MS , Tianming Liu PhD , William W. Wong MD , Sujay A. Vora MD , Jonathan B. Ashman MD, PhD , Wei Liu PhD\",\"doi\":\"10.1016/j.prro.2024.04.017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><div>To introduce the concept of using large language models (LLMs) to relabel structure names in accordance with the American Association of Physicists in Medicine Task Group-263 standard and to establish a benchmark for future studies to reference.</div></div><div><h3>Methods and Materials</h3><div>Generative Pretrained Transformer (GPT)-4 was implemented within a Digital Imaging and Communications in Medicine server. Upon receiving a structure-set Digital Imaging and Communications in Medicine file, the server prompts GPT-4 to relabel the structure names according to the American Association of Physicists in Medicine Task Group-263 report. The results were evaluated for 3 disease sites: prostate, head and neck, and thorax. For each disease site, 150 patients were randomly selected for manually tuning the instructions prompt (in batches of 50), and 50 patients were randomly selected for evaluation. Structure names considered were those that were most likely to be relevant for studies using structure contours for many patients.</div></div><div><h3>Results</h3><div>The per-patient accuracy was 97.2%, 98.3%, and 97.1% for prostate, head and neck, and thorax disease sites, respectively. On a per-structure basis, the clinical target volume was relabeled correctly in 100%, 95.3%, and 92.9% of cases, respectively.</div></div><div><h3>Conclusions</h3><div>Given the accuracy of GPT-4 in relabeling structure names as presented in this work, LLMs are poised to become an important method for standardizing structure names in radiation oncology, especially considering the rapid advancements in LLM capabilities that are likely to continue.</div></div>\",\"PeriodicalId\":54245,\"journal\":{\"name\":\"Practical Radiation Oncology\",\"volume\":\"14 6\",\"pages\":\"Pages e515-e521\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Practical Radiation Oncology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1879850024000985\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Practical Radiation Oncology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1879850024000985","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
Benchmarking a Foundation Large Language Model on its Ability to Relabel Structure Names in Accordance With the American Association of Physicists in Medicine Task Group-263 Report
Purpose
To introduce the concept of using large language models (LLMs) to relabel structure names in accordance with the American Association of Physicists in Medicine Task Group-263 standard and to establish a benchmark for future studies to reference.
Methods and Materials
Generative Pretrained Transformer (GPT)-4 was implemented within a Digital Imaging and Communications in Medicine server. Upon receiving a structure-set Digital Imaging and Communications in Medicine file, the server prompts GPT-4 to relabel the structure names according to the American Association of Physicists in Medicine Task Group-263 report. The results were evaluated for 3 disease sites: prostate, head and neck, and thorax. For each disease site, 150 patients were randomly selected for manually tuning the instructions prompt (in batches of 50), and 50 patients were randomly selected for evaluation. Structure names considered were those that were most likely to be relevant for studies using structure contours for many patients.
Results
The per-patient accuracy was 97.2%, 98.3%, and 97.1% for prostate, head and neck, and thorax disease sites, respectively. On a per-structure basis, the clinical target volume was relabeled correctly in 100%, 95.3%, and 92.9% of cases, respectively.
Conclusions
Given the accuracy of GPT-4 in relabeling structure names as presented in this work, LLMs are poised to become an important method for standardizing structure names in radiation oncology, especially considering the rapid advancements in LLM capabilities that are likely to continue.
期刊介绍:
The overarching mission of Practical Radiation Oncology is to improve the quality of radiation oncology practice. PRO''s purpose is to document the state of current practice, providing background for those in training and continuing education for practitioners, through discussion and illustration of new techniques, evaluation of current practices, and publication of case reports. PRO strives to provide its readers content that emphasizes knowledge "with a purpose." The content of PRO includes:
Original articles focusing on patient safety, quality measurement, or quality improvement initiatives
Original articles focusing on imaging, contouring, target delineation, simulation, treatment planning, immobilization, organ motion, and other practical issues
ASTRO guidelines, position papers, and consensus statements
Essays that highlight enriching personal experiences in caring for cancer patients and their families.