Praveenbalaji Rajendran, Yong Yang, Thomas R. Niedermayr, Michael Gensheimer, Beth Beadle, Quynh-Thu Le, Lei Xing, Xianjin Dai
{"title":"Large Language Model-Augmented Auto-Delineation of Treatment Target Volume in Radiation Therapy","authors":"Praveenbalaji Rajendran, Yong Yang, Thomas R. Niedermayr, Michael Gensheimer, Beth Beadle, Quynh-Thu Le, Lei Xing, Xianjin Dai","doi":"arxiv-2407.07296","DOIUrl":null,"url":null,"abstract":"Radiation therapy (RT) is one of the most effective treatments for cancer,\nand its success relies on the accurate delineation of targets. However, target\ndelineation is a comprehensive medical decision that currently relies purely on\nmanual processes by human experts. Manual delineation is time-consuming,\nlaborious, and subject to interobserver variations. Although the advancements\nin artificial intelligence (AI) techniques have significantly enhanced the\nauto-contouring of normal tissues, accurate delineation of RT target volumes\nremains a challenge. In this study, we propose a visual language model-based RT\ntarget volume auto-delineation network termed Radformer. The Radformer utilizes\na hierarichal vision transformer as the backbone and incorporates large\nlanguage models to extract text-rich features from clinical data. We introduce\na visual language attention module (VLAM) for integrating visual and linguistic\nfeatures for language-aware visual encoding (LAVE). The Radformer has been\nevaluated on a dataset comprising 2985 patients with head-and-neck cancer who\nunderwent RT. Metrics, including the Dice similarity coefficient (DSC),\nintersection over union (IOU), and 95th percentile Hausdorff distance (HD95),\nwere used to evaluate the performance of the model quantitatively. Our results\ndemonstrate that the Radformer has superior segmentation performance compared\nto other state-of-the-art models, validating its potential for adoption in RT\npractice.","PeriodicalId":501378,"journal":{"name":"arXiv - PHYS - Medical Physics","volume":"36 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Medical Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.07296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Radiation therapy (RT) is one of the most effective treatments for cancer,
and its success relies on the accurate delineation of targets. However, target
delineation is a comprehensive medical decision that currently relies purely on
manual processes by human experts. Manual delineation is time-consuming,
laborious, and subject to interobserver variations. Although the advancements
in artificial intelligence (AI) techniques have significantly enhanced the
auto-contouring of normal tissues, accurate delineation of RT target volumes
remains a challenge. In this study, we propose a visual language model-based RT
target volume auto-delineation network termed Radformer. The Radformer utilizes
a hierarichal vision transformer as the backbone and incorporates large
language models to extract text-rich features from clinical data. We introduce
a visual language attention module (VLAM) for integrating visual and linguistic
features for language-aware visual encoding (LAVE). The Radformer has been
evaluated on a dataset comprising 2985 patients with head-and-neck cancer who
underwent RT. Metrics, including the Dice similarity coefficient (DSC),
intersection over union (IOU), and 95th percentile Hausdorff distance (HD95),
were used to evaluate the performance of the model quantitatively. Our results
demonstrate that the Radformer has superior segmentation performance compared
to other state-of-the-art models, validating its potential for adoption in RT
practice.