Dabin Min, Kwang Nam Jin, SangHeum Bang, Moon Young Kim, Hack-Lyoung Kim, Won Gi Jeong, Hye-Jeong Lee, Kyongmin Sarah Beck, Sung Ho Hwang, Eun Young Kim, Chang Min Park
Objective: To evaluate the accuracy of large language models (LLMs) in extracting Coronary Artery Disease-Reporting and Data System (CAD-RADS) 2.0 components from coronary CT angiography (CCTA) reports, and assess the impact of prompting strategies.
Materials and methods: In this multi-institutional study, we collected 319 synthetic, semi-structured CCTA reports from six institutions to protect patient privacy while maintaining clinical relevance. The dataset included 150 reports from a primary institution (100 for instruction development and 50 for internal testing) and 169 reports from five external institutions for external testing. Board-certified radiologists established reference standards following the CAD-RADS 2.0 guidelines for all three components: stenosis severity, plaque burden, and modifiers. Six LLMs (GPT-4, GPT-4o, Claude-3.5-Sonnet, o1-mini, Gemini-1.5-Pro, and DeepSeek-R1-Distill-Qwen-14B) were evaluated using an optimized instruction with prompting strategies, including zero-shot or few-shot with or without chain-of-thought (CoT) prompting. The accuracy was assessed and compared using McNemar's test.
Results: LLMs demonstrated robust accuracy across all CAD-RADS 2.0 components. Peak stenosis severity accuracies reached 0.980 (48/49, Claude-3.5-Sonnet and o1-mini) in internal testing and 0.946 (158/167, GPT-4o and o1-mini) in external testing. Plaque burden extraction showed exceptional accuracy, with multiple models achieving perfect accuracy (43/43) in internal testing and 0.993 (137/138, GPT-4o, and o1-mini) in external testing. Modifier detection demonstrated consistently high accuracy (≥0.990) across most models. One open-source model, DeepSeek-R1-Distill-Qwen-14B, showed a relatively low accuracy for stenosis severity: 0.898 (44/49, internal) and 0.820 (137/167, external). CoT prompting significantly enhanced the accuracy of several models, with GPT-4 showing the most substantial improvements: stenosis severity accuracy increased by 0.192 (P < 0.001) and plaque burden accuracy by 0.152 (P < 0.001) in external testing.
Conclusion: LLMs demonstrated high accuracy in automated extraction of CAD-RADS 2.0 components from semi-structured CCTA reports, particularly when used with CoT prompting.
{"title":"Large Language Models for CAD-RADS 2.0 Extraction From Semi-Structured Coronary CT Angiography Reports: A Multi-Institutional Study.","authors":"Dabin Min, Kwang Nam Jin, SangHeum Bang, Moon Young Kim, Hack-Lyoung Kim, Won Gi Jeong, Hye-Jeong Lee, Kyongmin Sarah Beck, Sung Ho Hwang, Eun Young Kim, Chang Min Park","doi":"10.3348/kjr.2025.0293","DOIUrl":"10.3348/kjr.2025.0293","url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the accuracy of large language models (LLMs) in extracting Coronary Artery Disease-Reporting and Data System (CAD-RADS) 2.0 components from coronary CT angiography (CCTA) reports, and assess the impact of prompting strategies.</p><p><strong>Materials and methods: </strong>In this multi-institutional study, we collected 319 synthetic, semi-structured CCTA reports from six institutions to protect patient privacy while maintaining clinical relevance. The dataset included 150 reports from a primary institution (100 for instruction development and 50 for internal testing) and 169 reports from five external institutions for external testing. Board-certified radiologists established reference standards following the CAD-RADS 2.0 guidelines for all three components: stenosis severity, plaque burden, and modifiers. Six LLMs (GPT-4, GPT-4o, Claude-3.5-Sonnet, o1-mini, Gemini-1.5-Pro, and DeepSeek-R1-Distill-Qwen-14B) were evaluated using an optimized instruction with prompting strategies, including zero-shot or few-shot with or without chain-of-thought (CoT) prompting. The accuracy was assessed and compared using McNemar's test.</p><p><strong>Results: </strong>LLMs demonstrated robust accuracy across all CAD-RADS 2.0 components. Peak stenosis severity accuracies reached 0.980 (48/49, Claude-3.5-Sonnet and o1-mini) in internal testing and 0.946 (158/167, GPT-4o and o1-mini) in external testing. Plaque burden extraction showed exceptional accuracy, with multiple models achieving perfect accuracy (43/43) in internal testing and 0.993 (137/138, GPT-4o, and o1-mini) in external testing. Modifier detection demonstrated consistently high accuracy (≥0.990) across most models. One open-source model, DeepSeek-R1-Distill-Qwen-14B, showed a relatively low accuracy for stenosis severity: 0.898 (44/49, internal) and 0.820 (137/167, external). CoT prompting significantly enhanced the accuracy of several models, with GPT-4 showing the most substantial improvements: stenosis severity accuracy increased by 0.192 (<i>P</i> < 0.001) and plaque burden accuracy by 0.152 (<i>P</i> < 0.001) in external testing.</p><p><strong>Conclusion: </strong>LLMs demonstrated high accuracy in automated extraction of CAD-RADS 2.0 components from semi-structured CCTA reports, particularly when used with CoT prompting.</p>","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":"26 9","pages":"817-831"},"PeriodicalIF":5.3,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12394816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144959333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Sung Park, Jisun Hwang, Pyeong Hwa Kim, Woo Hyun Shim, Min Jeong Seo, Dahyun Kim, Jeong In Shin, In Hwa Kim, Hwon Heo, Chong Hyun Suh
Objective: To evaluate the accuracy of multimodal large language models (LLMs) in detecting cases requiring immediate radiology reporting in pediatric radiology.
Materials and methods: Seventy-one publicly available, paraphrased pediatric clinical vignettes with images-sourced from the New England Journal of Medicine, The Lancet, Archives of Pediatrics & Adolescent Medicine, and Radiology-were assessed by seven vision-capable LLMs (temperature levels 0 and 1; t0 and t1) and four human readers (an expert pediatric radiologist, a trainee radiologist, an expert pediatrician, and a trainee pediatrician). Cases were classified as requiring immediate reporting (n = 33) if they corresponded to Korean Triage and Acuity Scale (KTAS) levels 1-2 (n = 24) or met the criteria for a critical value report (CVR) (n = 11). The most accurate LLM was compared with each human reader, with significance set at P < 0.013.
Results: LLMs demonstrated 60.6%-83.1% accuracy in detecting cases requiring immediate radiology reporting: 57.7%-81.7% and 53.5%-87.3% for KTAS levels 1-2 and CVR cases, respectively. Gemini-Flash with t1 showed the highest accuracy among the LLMs: 83.1% (95% confidence interval [CI]: 74.6%-91.5%), 81.7% (95% CI: 71.8%-90.1%), and 87.3% (95% CI: 78.9%-94.4%) for identifying cases requiring immediate reporting, KTAS level 1-2 cases, and CVR cases, respectively, despite its low sensitivity for CVR detection (3/11, 27.3%). Human readers demonstrated 62.0%-84.5% accuracy for immediate radiology reporting, 73.2%-84.5% for KTAS levels 1-2, and 39.4%-94.4% for CVR cases. The accuracy of Gemini-Flash t1 in identifying cases requiring immediate radiology reporting was comparable to that of the most accurate human reader (vs. expert pediatrician: 84.5% [95% CI: 76.1%-93.0%]; P < 0.99).
Conclusion: Multimodal LLMs may achieve overall accuracy comparable to or exceeding that of human readers in identifying cases requiring immediate radiology reporting, supporting their potential use for pediatric radiology worklist prioritization. However, the models' sensitivity in detecting such cases was not reliable.
{"title":"Accuracy of Large Language Models in Detecting Cases Requiring Immediate Reporting in Pediatric Radiology: A Feasibility Study Using Publicly Available Clinical Vignettes.","authors":"Jun Sung Park, Jisun Hwang, Pyeong Hwa Kim, Woo Hyun Shim, Min Jeong Seo, Dahyun Kim, Jeong In Shin, In Hwa Kim, Hwon Heo, Chong Hyun Suh","doi":"10.3348/kjr.2025.0240","DOIUrl":"https://doi.org/10.3348/kjr.2025.0240","url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the accuracy of multimodal large language models (LLMs) in detecting cases requiring immediate radiology reporting in pediatric radiology.</p><p><strong>Materials and methods: </strong>Seventy-one publicly available, paraphrased pediatric clinical vignettes with images-sourced from the <i>New England Journal of Medicine</i>, <i>The Lancet</i>, <i>Archives of Pediatrics & Adolescent Medicine</i>, and <i>Radiology</i>-were assessed by seven vision-capable LLMs (temperature levels 0 and 1; t0 and t1) and four human readers (an expert pediatric radiologist, a trainee radiologist, an expert pediatrician, and a trainee pediatrician). Cases were classified as requiring immediate reporting (n = 33) if they corresponded to Korean Triage and Acuity Scale (KTAS) levels 1-2 (n = 24) or met the criteria for a critical value report (CVR) (n = 11). The most accurate LLM was compared with each human reader, with significance set at <i>P</i> < 0.013.</p><p><strong>Results: </strong>LLMs demonstrated 60.6%-83.1% accuracy in detecting cases requiring immediate radiology reporting: 57.7%-81.7% and 53.5%-87.3% for KTAS levels 1-2 and CVR cases, respectively. Gemini-Flash with t1 showed the highest accuracy among the LLMs: 83.1% (95% confidence interval [CI]: 74.6%-91.5%), 81.7% (95% CI: 71.8%-90.1%), and 87.3% (95% CI: 78.9%-94.4%) for identifying cases requiring immediate reporting, KTAS level 1-2 cases, and CVR cases, respectively, despite its low sensitivity for CVR detection (3/11, 27.3%). Human readers demonstrated 62.0%-84.5% accuracy for immediate radiology reporting, 73.2%-84.5% for KTAS levels 1-2, and 39.4%-94.4% for CVR cases. The accuracy of Gemini-Flash t1 in identifying cases requiring immediate radiology reporting was comparable to that of the most accurate human reader (vs. expert pediatrician: 84.5% [95% CI: 76.1%-93.0%]; <i>P</i> < 0.99).</p><p><strong>Conclusion: </strong>Multimodal LLMs may achieve overall accuracy comparable to or exceeding that of human readers in identifying cases requiring immediate radiology reporting, supporting their potential use for pediatric radiology worklist prioritization. However, the models' sensitivity in detecting such cases was not reliable.</p>","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":"26 9","pages":"855-866"},"PeriodicalIF":5.3,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12394824/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144959245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Selina Chiu, Yvonne Tsitsiou, Andrea Da Silva, Cathy Qin, Christina Fotopoulou, Andrea Rockall
Ovarian cancer (OC) remains one of the leading causes of gynecologic cancer-related mortality, with most patients presenting with disseminated disease, particularly within the peritoneal cavity. Standard treatment includes cytoreductive surgery, platinum-based chemotherapy, and targeted maintenance approaches depending on the patient's and tumor's genetic profile. Despite treatment advancements, approximately 25% of high-grade serous OC cases relapse within a year despite optimal primary treatment with complete tumor clearance at cytoreduction. Advances in contrast-enhanced CT (CE-CT) and MRI have revolutionized the evaluation and treatment planning of advanced OC. CT remains the gold standard for staging and assessing tumor extent, effectively identifying peritoneal, lymphatic, and distant metastases. However, it is less effective in detecting small-volume peritoneal dissemination. MRI, with superior soft-tissue contrast, complements CT by providing a detailed assessment of peritoneal disease, characterizing sonographically indeterminate adnexal masses. Diffusion-weighted imaging and gadolinium-enhanced MRI have improved the diagnostic sensitivity for peritoneal disease but are unable to predict treatment response, recurrence risk, and prognosis. Radiomics, which extracts quantitative tumor features from imaging data, holds promise for personalizing treatment and identifying patients at risk for early recurrence despite optimal therapy. The integration of CT, MRI, and radiomics could enhance surgical planning and improve long-term survival outcomes in patients with advanced OC.
{"title":"CT and MRI in Advanced Ovarian Cancer: Advances in Imaging Techniques.","authors":"Selina Chiu, Yvonne Tsitsiou, Andrea Da Silva, Cathy Qin, Christina Fotopoulou, Andrea Rockall","doi":"10.3348/kjr.2025.0357","DOIUrl":"https://doi.org/10.3348/kjr.2025.0357","url":null,"abstract":"<p><p>Ovarian cancer (OC) remains one of the leading causes of gynecologic cancer-related mortality, with most patients presenting with disseminated disease, particularly within the peritoneal cavity. Standard treatment includes cytoreductive surgery, platinum-based chemotherapy, and targeted maintenance approaches depending on the patient's and tumor's genetic profile. Despite treatment advancements, approximately 25% of high-grade serous OC cases relapse within a year despite optimal primary treatment with complete tumor clearance at cytoreduction. Advances in contrast-enhanced CT (CE-CT) and MRI have revolutionized the evaluation and treatment planning of advanced OC. CT remains the gold standard for staging and assessing tumor extent, effectively identifying peritoneal, lymphatic, and distant metastases. However, it is less effective in detecting small-volume peritoneal dissemination. MRI, with superior soft-tissue contrast, complements CT by providing a detailed assessment of peritoneal disease, characterizing sonographically indeterminate adnexal masses. Diffusion-weighted imaging and gadolinium-enhanced MRI have improved the diagnostic sensitivity for peritoneal disease but are unable to predict treatment response, recurrence risk, and prognosis. Radiomics, which extracts quantitative tumor features from imaging data, holds promise for personalizing treatment and identifying patients at risk for early recurrence despite optimal therapy. The integration of CT, MRI, and radiomics could enhance surgical planning and improve long-term survival outcomes in patients with advanced OC.</p>","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":"26 9","pages":"841-854"},"PeriodicalIF":5.3,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12394823/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144959357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objective: To develop a deep learning model for estimating newborn gestational age (GA) based on the shape of the lumbar vertebral bodies on cross-table lateral radiographs obtained on the first day after birth.
Materials and methods: This retrospective study included 423 cross-table lateral radiographs of 423 newborns (242 boys and 181 girls) taken within 24 hours after birth at two hospitals. Of these, 256 radiographs (157 boys and 99 girls) obtained from one institution were used for model development, and 167 radiographs (85 boys and 82 girls) from the other institution were used for model external testing. Clinical data, including medical history of underlying disorders, GA determined by ultrasound parameters, birth date, birth weight, sex, examination date, and reason for requesting radiographs, were obtained. The radiographs underwent manual labeling of the five lumbar vertebral bodies, followed by preprocessing steps such as normalization, resizing, denoising, cropping, and augmentation via horizontal flipping and rotation. Subsequently, we trained a deep learning model using a DeepLabv3+ network with a ResNet50 backbone for lumbar segmentation and used a customized AgeClassifier model with two parallel ResNet18 backbones for GA estimation. Model performance was evaluated using an external test dataset after image cropping.
Results: Neither GA nor birth weight differed significantly between boys and girls. In the segmentation model, the mean dice similarity coefficient ± standard deviation (SD) was 0.801 ± 0.031. For GA estimation, the mean absolute error ± SD was 5.2 ± 0.5 days. The Bland-Altman bias (AI-estimated GA - ground truth GA) and 95% limits of agreement were -0.4 days and -13.0 to 12.3 days, respectively.
Conclusion: Our deep learning model showed promising performance in lumbar vertebral body segmentation and GA estimation using plain radiographs, suggesting its potential utility as a supportive tool for neonatal maturity assessment in clinical practice.
目的:建立一种深度学习模型,根据出生后第一天的交叉桌侧位片腰椎椎体形状估计新生儿胎龄(GA)。材料与方法:本回顾性研究纳入两家医院423例新生儿(242例男婴,181例女婴)出生后24小时内的423张横贯台侧位片。其中,从一个机构获得的256张x光片(157名男孩和99名女孩)用于模型开发,从另一个机构获得的167张x光片(85名男孩和82名女孩)用于模型外部测试。获得临床资料,包括基础疾病病史、超声参数确定的GA、出生日期、出生体重、性别、检查日期和要求x线片的原因。x线片对5个腰椎椎体进行手动标记,然后进行预处理,如标准化、调整大小、去噪、裁剪和水平翻转和旋转增强。随后,我们使用带ResNet50骨干网的DeepLabv3+网络训练深度学习模型进行腰椎分割,并使用带两个并行ResNet18骨干网的定制AgeClassifier模型进行GA估计。使用图像裁剪后的外部测试数据集评估模型性能。结果:GA和出生体重在男孩和女孩之间没有显著差异。在分割模型中,平均骰子相似系数±标准差(SD)为0.801±0.031。GA估计的平均绝对误差±SD为5.2±0.5天。Bland-Altman偏差(ai估计的GA - ground truth GA)和95%的一致性限制分别为-0.4天和-13.0至12.3天。结论:我们的深度学习模型在腰椎椎体分割和x线平片GA估计方面表现良好,表明其在临床实践中作为新生儿成熟度评估的辅助工具具有潜在的实用性。
{"title":"Development of a Deep-Learning Model for Estimating Newborn Gestational Age via Lumbar Vertebral Segmentation on Plain Radiography.","authors":"Sungwon Ham, Gayoung Choi, Bo-Kyung Je, Saelin Oh","doi":"10.3348/kjr.2025.0172","DOIUrl":"10.3348/kjr.2025.0172","url":null,"abstract":"<p><strong>Objective: </strong>To develop a deep learning model for estimating newborn gestational age (GA) based on the shape of the lumbar vertebral bodies on cross-table lateral radiographs obtained on the first day after birth.</p><p><strong>Materials and methods: </strong>This retrospective study included 423 cross-table lateral radiographs of 423 newborns (242 boys and 181 girls) taken within 24 hours after birth at two hospitals. Of these, 256 radiographs (157 boys and 99 girls) obtained from one institution were used for model development, and 167 radiographs (85 boys and 82 girls) from the other institution were used for model external testing. Clinical data, including medical history of underlying disorders, GA determined by ultrasound parameters, birth date, birth weight, sex, examination date, and reason for requesting radiographs, were obtained. The radiographs underwent manual labeling of the five lumbar vertebral bodies, followed by preprocessing steps such as normalization, resizing, denoising, cropping, and augmentation via horizontal flipping and rotation. Subsequently, we trained a deep learning model using a DeepLabv3+ network with a ResNet50 backbone for lumbar segmentation and used a customized AgeClassifier model with two parallel ResNet18 backbones for GA estimation. Model performance was evaluated using an external test dataset after image cropping.</p><p><strong>Results: </strong>Neither GA nor birth weight differed significantly between boys and girls. In the segmentation model, the mean dice similarity coefficient ± standard deviation (SD) was 0.801 ± 0.031. For GA estimation, the mean absolute error ± SD was 5.2 ± 0.5 days. The Bland-Altman bias (AI-estimated GA - ground truth GA) and 95% limits of agreement were -0.4 days and -13.0 to 12.3 days, respectively.</p><p><strong>Conclusion: </strong>Our deep learning model showed promising performance in lumbar vertebral body segmentation and GA estimation using plain radiographs, suggesting its potential utility as a supportive tool for neonatal maturity assessment in clinical practice.</p>","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":"26 9","pages":"867-876"},"PeriodicalIF":5.3,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12394821/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144959313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Niketa Chotai, Rupa Renganathan, Takayoshi Uematsu, Jane Wang, Qingli Zhu, Kartini Rahmat, Varanatjaa Pradaranon, Julian Cy Fong, Lina Choridah, Jung Min Chang
In 2022, nearly 2.3 million new cases of breast cancer were reported globally, with less than half of these cases originating from Asia. Despite the relatively low incidence of breast cancer in most parts of Asia, the mortality-to-incidence ratio remains high. Low-income countries lack resources for breast cancer screening, whereas high-income countries fail to fully benefit from national breast screening programs because of the underutilization of preventive healthcare services. There is a notable difference in the age distribution of breast cancer cases between Asian and Western populations, with the prevalence peaking approximately a decade earlier in Asian women and most commonly affecting those aged 40-50 years. Existing literature on breast cancer trends, screening guidelines, and clinical practices in Asian countries, particularly regarding regional variations and healthcare system differences, is relatively sparse. Gaining a deeper understanding of how different Asian countries are implementing breast cancer screening in response to the rising incidence of the disease can help identify tailored strategies for early detection, ultimately contributing to a reduction in breast cancer-related mortality. This review explored the current breast cancer landscape, including breast cancer screening guidelines and outcomes of screening examinations in Asia, highlighting key challenges and future directions.
{"title":"Breast Cancer Screening in Asian Countries: Epidemiology, Screening Practices, Outcomes, Challenges, and Future Directions.","authors":"Niketa Chotai, Rupa Renganathan, Takayoshi Uematsu, Jane Wang, Qingli Zhu, Kartini Rahmat, Varanatjaa Pradaranon, Julian Cy Fong, Lina Choridah, Jung Min Chang","doi":"10.3348/kjr.2025.0338","DOIUrl":"10.3348/kjr.2025.0338","url":null,"abstract":"<p><p>In 2022, nearly 2.3 million new cases of breast cancer were reported globally, with less than half of these cases originating from Asia. Despite the relatively low incidence of breast cancer in most parts of Asia, the mortality-to-incidence ratio remains high. Low-income countries lack resources for breast cancer screening, whereas high-income countries fail to fully benefit from national breast screening programs because of the underutilization of preventive healthcare services. There is a notable difference in the age distribution of breast cancer cases between Asian and Western populations, with the prevalence peaking approximately a decade earlier in Asian women and most commonly affecting those aged 40-50 years. Existing literature on breast cancer trends, screening guidelines, and clinical practices in Asian countries, particularly regarding regional variations and healthcare system differences, is relatively sparse. Gaining a deeper understanding of how different Asian countries are implementing breast cancer screening in response to the rising incidence of the disease can help identify tailored strategies for early detection, ultimately contributing to a reduction in breast cancer-related mortality. This review explored the current breast cancer landscape, including breast cancer screening guidelines and outcomes of screening examinations in Asia, highlighting key challenges and future directions.</p>","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":"26 8","pages":"743-758"},"PeriodicalIF":5.3,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12318657/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144742423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arum Choi, Dayeon Bak, Jimin Kim, Se Won Oh, Yoonho Nam, Hyun Gi Kim
Objective: To evaluate the association between hypoxic-ischemic injury (HII) of the brain and glymphatic function using MRI-derived parameters in neonates.
Materials and methods: This retrospective, single-institution study collected brain MRI scans of 127 neonates between July 2020 and July 2022. The volume and fraction of the basal ganglia perivascular space (BG-PVS) were automatically extracted using three-dimensional T2-weighted image processing. Diffusion-tensor imaging (DTI) along the PVS (DTI-ALPS) index values were derived from the DTI maps. BG-PVS and DTI-ALPS parameters were compared between neonates with and without HII. The correlations between MRI-derived glymphatic parameters and corrected gestational age (CGA), as well as between BG-PVS measurements and the DTI-ALPS index, were analyzed using Spearman coefficients. Multivariable logistic regression adjusted for age, sex, birth weight, and mode of delivery was performed to examine the associations between each glymphatic parameter and HII.
Results: This study included 97 neonates without HII (median gestational age [GA]: 252 days) and 30 with HII (median GA: 252 days). Neonates with HII had smaller BG-PVS volumes (19 mm³ vs. 33 mm³, P = 0.001) and fractions (0.29% vs. 0.54%, P = 0.003) compared to neonates without HII. The DTI-ALPS index values did not differ significantly between neonates with and without HII (P = 0.54). CGA correlated negatively with BG-PVS measurements (ρ = -0.21 to -0.26, all P < 0.05) and positively with DTI-ALPS index values (ρ = 0.22, P = 0.014). BG-PVS measurements and DTI-ALPS index values were not significantly correlated (ρ = -0.28 to -0.08, all P > 0.05). Multivariable logistic regression revealed a negative association between BG-PVS volume (odds ratio [OR]: 0.96 per mm³ increase, 95% confidence interval [CI]: 0.93-0.99) and fraction (OR: 0.15 per % increase, 95% CI: 0.03-0.79) with HII, while DTI-ALPS index values were not significantly associated with HII (OR: 0.10, 95% CI: 0.00-25.41).
Conclusion: Neonates with HII demonstrated smaller BG-PVS volume and fraction compared with those without HII, indicating potential alterations in glymphatic function among affected newborns.
{"title":"Association of Hypoxic-Ischemic Injury of the Brain With MRI-Derived Glymphatic Function Parameters in Neonates.","authors":"Arum Choi, Dayeon Bak, Jimin Kim, Se Won Oh, Yoonho Nam, Hyun Gi Kim","doi":"10.3348/kjr.2025.0300","DOIUrl":"10.3348/kjr.2025.0300","url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the association between hypoxic-ischemic injury (HII) of the brain and glymphatic function using MRI-derived parameters in neonates.</p><p><strong>Materials and methods: </strong>This retrospective, single-institution study collected brain MRI scans of 127 neonates between July 2020 and July 2022. The volume and fraction of the basal ganglia perivascular space (BG-PVS) were automatically extracted using three-dimensional T2-weighted image processing. Diffusion-tensor imaging (DTI) along the PVS (DTI-ALPS) index values were derived from the DTI maps. BG-PVS and DTI-ALPS parameters were compared between neonates with and without HII. The correlations between MRI-derived glymphatic parameters and corrected gestational age (CGA), as well as between BG-PVS measurements and the DTI-ALPS index, were analyzed using Spearman coefficients. Multivariable logistic regression adjusted for age, sex, birth weight, and mode of delivery was performed to examine the associations between each glymphatic parameter and HII.</p><p><strong>Results: </strong>This study included 97 neonates without HII (median gestational age [GA]: 252 days) and 30 with HII (median GA: 252 days). Neonates with HII had smaller BG-PVS volumes (19 mm³ vs. 33 mm³, <i>P</i> = 0.001) and fractions (0.29% vs. 0.54%, <i>P</i> = 0.003) compared to neonates without HII. The DTI-ALPS index values did not differ significantly between neonates with and without HII (<i>P</i> = 0.54). CGA correlated negatively with BG-PVS measurements (ρ = -0.21 to -0.26, all <i>P</i> < 0.05) and positively with DTI-ALPS index values (ρ = 0.22, <i>P</i> = 0.014). BG-PVS measurements and DTI-ALPS index values were not significantly correlated (ρ = -0.28 to -0.08, all <i>P</i> > 0.05). Multivariable logistic regression revealed a negative association between BG-PVS volume (odds ratio [OR]: 0.96 per mm³ increase, 95% confidence interval [CI]: 0.93-0.99) and fraction (OR: 0.15 per % increase, 95% CI: 0.03-0.79) with HII, while DTI-ALPS index values were not significantly associated with HII (OR: 0.10, 95% CI: 0.00-25.41).</p><p><strong>Conclusion: </strong>Neonates with HII demonstrated smaller BG-PVS volume and fraction compared with those without HII, indicating potential alterations in glymphatic function among affected newborns.</p>","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":"26 8","pages":"782-792"},"PeriodicalIF":5.3,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12318655/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144742392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiyeon Park, Chae Young Lim, So Yeon Won, Han Kyu Na, Phil Hyu Lee, Sun-Young Baek, Yun Hwa Roh, Minjung Seong, Yongsik Sim, Eung Yeop Kim, Sung Tae Kim, Beomseok Sohn
Objective: To evaluate the effect of deep learning (DL)-based artificial intelligence (AI) software on the diagnostic performance of radiologists with different experience levels in detecting nigrosome 1 (N1) abnormalities on susceptibility map-weighted imaging (SMwI).
Materials and methods: This retrospective diagnostic case-control study analyzed 139 SMwI scans of 59 patients with Parkinson's disease (PD) and 80 healthy participants. Participants were imaged using 3T MRI, and AI-generated assessments for N1 abnormalities were obtained using an AI model (version 1.0.1.0; Heuron Corporation, Seoul, Korea), which utilized YOLOX-based object detection and SparseInst segmentation models. Four radiologists (two experienced neuroradiologists and two less experienced residents) evaluated N1 abnormalities with and without AI in a crossover study design. Diagnostic performance metrics, inter-reader agreements, and reader responses to AI-generated assessments were evaluated.
Results: Use of AI significantly improved diagnostic performance compared with interpretation without it across three readers, with significant increases in specificity (0.86 vs. 0.94, P = 0.004; 0.91 vs. 0.97, P = 0.024; and 0.90 vs. 0.97, P = 0.012). Inter-reader agreement also improved with AI, as Fleiss's kappa increased from 0.73 (95% confidence interval [CI]: 0.61-0.84) to 0.87 (95% CI: 0.76-0.99). The net reclassification index (NRI) demonstrated significant improvement in three of the four readers. When grouped by experience level, less experienced readers showed greater improvement (NRI = 12.8%, 95% CI: 0.067-0.190) than experienced readers (NRI = 0.8%, 95% CI: -0.037-0.051). In the less experienced group, reader-AI disagreement was significantly higher in the PD group than in the normal group (8.1% vs. 3.8%, P = 0.029).
Conclusion: DL-based AI enhances the diagnostic performance in detecting N1 abnormalities on SMwI, particularly benefiting less experienced radiologists. These findings underscore the potential for improving diagnostic workflows for PD.
目的:评价基于深度学习(DL)的人工智能(AI)软件对不同经验水平放射科医师在敏感性地图加权成像(SMwI)上检测黑素体1 (N1)异常诊断效果的影响。材料和方法:本回顾性诊断病例对照研究分析了59例帕金森病患者(PD)和80名健康参与者的139次SMwI扫描。使用3T MRI对参与者进行成像,并使用AI模型(版本1.0.1.0;启发式公司,首尔,韩国),利用基于yolox的目标检测和SparseInst分割模型。四名放射科医生(两名经验丰富的神经放射科医生和两名经验不足的住院医生)在交叉研究设计中评估了有无人工智能的N1异常。对诊断性能指标、读者间协议和读者对人工智能生成的评估的反应进行了评估。结果:与不使用人工智能的解读相比,使用人工智能显著提高了三个解读器的诊断性能,特异性显著提高(0.86 vs. 0.94, P = 0.004;0.91 vs. 0.97, P = 0.024;0.90 vs 0.97, P = 0.012)。AI也改善了读者间的一致性,因为Fleiss kappa从0.73(95%置信区间[CI]: 0.61-0.84)增加到0.87 (95% CI: 0.76-0.99)。净重分类指数(NRI)显示,4名读者中有3名有显著改善。当按经验水平分组时,经验不足的读者比经验丰富的读者表现出更大的改善(NRI = 12.8%, 95% CI: 0.067-0.190) (NRI = 0.8%, 95% CI: -0.037-0.051)。在经验不足组中,PD组的读者- ai不一致显著高于正常组(8.1%比3.8%,P = 0.029)。结论:基于dl的人工智能提高了SMwI N1异常的诊断性能,特别是对经验不足的放射科医生有利。这些发现强调了改善PD诊断工作流程的潜力。
{"title":"Effect of Deep Learning-Based Artificial Intelligence on Radiologists' Performance in Identifying Nigrosome 1 Abnormalities on Susceptibility Map-Weighted Imaging.","authors":"Jiyeon Park, Chae Young Lim, So Yeon Won, Han Kyu Na, Phil Hyu Lee, Sun-Young Baek, Yun Hwa Roh, Minjung Seong, Yongsik Sim, Eung Yeop Kim, Sung Tae Kim, Beomseok Sohn","doi":"10.3348/kjr.2025.0208","DOIUrl":"10.3348/kjr.2025.0208","url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the effect of deep learning (DL)-based artificial intelligence (AI) software on the diagnostic performance of radiologists with different experience levels in detecting nigrosome 1 (N1) abnormalities on susceptibility map-weighted imaging (SMwI).</p><p><strong>Materials and methods: </strong>This retrospective diagnostic case-control study analyzed 139 SMwI scans of 59 patients with Parkinson's disease (PD) and 80 healthy participants. Participants were imaged using 3T MRI, and AI-generated assessments for N1 abnormalities were obtained using an AI model (version 1.0.1.0; Heuron Corporation, Seoul, Korea), which utilized YOLOX-based object detection and SparseInst segmentation models. Four radiologists (two experienced neuroradiologists and two less experienced residents) evaluated N1 abnormalities with and without AI in a crossover study design. Diagnostic performance metrics, inter-reader agreements, and reader responses to AI-generated assessments were evaluated.</p><p><strong>Results: </strong>Use of AI significantly improved diagnostic performance compared with interpretation without it across three readers, with significant increases in specificity (0.86 vs. 0.94, <i>P</i> = 0.004; 0.91 vs. 0.97, <i>P</i> = 0.024; and 0.90 vs. 0.97, <i>P</i> = 0.012). Inter-reader agreement also improved with AI, as Fleiss's kappa increased from 0.73 (95% confidence interval [CI]: 0.61-0.84) to 0.87 (95% CI: 0.76-0.99). The net reclassification index (NRI) demonstrated significant improvement in three of the four readers. When grouped by experience level, less experienced readers showed greater improvement (NRI = 12.8%, 95% CI: 0.067-0.190) than experienced readers (NRI = 0.8%, 95% CI: -0.037-0.051). In the less experienced group, reader-AI disagreement was significantly higher in the PD group than in the normal group (8.1% vs. 3.8%, <i>P</i> = 0.029).</p><p><strong>Conclusion: </strong>DL-based AI enhances the diagnostic performance in detecting N1 abnormalities on SMwI, particularly benefiting less experienced radiologists. These findings underscore the potential for improving diagnostic workflows for PD.</p>","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":"26 8","pages":"771-781"},"PeriodicalIF":5.3,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12318656/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144742425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Response to \"Determining Whether the Glymphatic System is Truly Impaired in Pediatric Patients With Refractory Epilepsy Requires Appropriately Designed Studies\".","authors":"Lu Qiu, Haoxiang Jiang","doi":"10.3348/kjr.2025.0708","DOIUrl":"10.3348/kjr.2025.0708","url":null,"abstract":"","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":"26 8","pages":"799-800"},"PeriodicalIF":5.3,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12318653/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144742426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2025-06-13DOI: 10.3348/kjr.2025.0177
Cherry Kim, Sehyun Hong, Hangseok Choi, Won-Seok Yoo, Jin Young Kim, Suyon Chang, Chan Ho Park, Su Jin Hong, Dong Hyun Yang, Hwan Seok Yong, Marly van Assen, Carlo N De Cecco, Young Joo Suh
Objective: To evaluate the impact of deep learning-based image conversion on the accuracy of automated coronary artery calcium quantification using thin-slice, sharp-kernel, non-gated, low-dose chest computed tomography (LDCT) images collected from multiple institutions.
Materials and methods: A total of 225 pairs of LDCT and calcium scoring CT (CSCT) images scanned at 120 kVp and acquired from the same patient within a 6-month interval were retrospectively collected from four institutions. Image conversion was performed for LDCT images using proprietary software programs to simulate conventional CSCT. This process included 1) deep learning-based kernel conversion of low-dose, high-frequency, sharp kernels to simulate standard-dose, low-frequency kernels, and 2) thickness conversion using the raysum method to convert 1-mm or 1.25-mm thickness images to 3-mm thickness. Automated Agaston scoring was conducted on the LDCT scans before (LDCT-Orgauto) and after the image conversion (LDCT-CONVauto). Manual scoring was performed on the CSCT images (CSCTmanual) and used as a reference standard. The accuracy of automated Agaston scores and risk severity categorization based on the automated scoring on LDCT scans was analyzed compared to the reference standard, using the Bland-Altman analysis, concordance correlation coefficient (CCC), and weighted kappa (κ) statistic.
Results: LDCT-CONVauto demonstrated a reduced bias for Agaston score, compared with CSCTmanual, than LDCT-Orgauto did (-3.45 vs. 206.7). LDCT-CONVauto showed a higher CCC than LDCT-Orgauto did (0.881 [95% confidence interval {CI}, 0.750-0.960] vs. 0.269 [95% CI, 0.129-0.430]). In terms of risk category assignment, LDCT-Orgauto exhibited poor agreement with CSCTmanual (weighted κ = 0.115 [95% CI, 0.082-0.154]), whereas LDCT-CONVauto achieved good agreement (weighted κ = 0.792 [95% CI, 0.731-0.847]).
Conclusion: Deep learning-based conversion of LDCT images originally obtained with thin slices and a sharp kernel can enhance the accuracy of automated coronary artery calcium score measurement using the images.
{"title":"Impact of Deep Learning-Based Image Conversion on Fully Automated Coronary Artery Calcium Scoring Using Thin-Slice, Sharp-Kernel, Non-Gated, Low-Dose Chest CT Scans: A Multi-Center Study.","authors":"Cherry Kim, Sehyun Hong, Hangseok Choi, Won-Seok Yoo, Jin Young Kim, Suyon Chang, Chan Ho Park, Su Jin Hong, Dong Hyun Yang, Hwan Seok Yong, Marly van Assen, Carlo N De Cecco, Young Joo Suh","doi":"10.3348/kjr.2025.0177","DOIUrl":"10.3348/kjr.2025.0177","url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the impact of deep learning-based image conversion on the accuracy of automated coronary artery calcium quantification using thin-slice, sharp-kernel, non-gated, low-dose chest computed tomography (LDCT) images collected from multiple institutions.</p><p><strong>Materials and methods: </strong>A total of 225 pairs of LDCT and calcium scoring CT (CSCT) images scanned at 120 kVp and acquired from the same patient within a 6-month interval were retrospectively collected from four institutions. Image conversion was performed for LDCT images using proprietary software programs to simulate conventional CSCT. This process included 1) deep learning-based kernel conversion of low-dose, high-frequency, sharp kernels to simulate standard-dose, low-frequency kernels, and 2) thickness conversion using the raysum method to convert 1-mm or 1.25-mm thickness images to 3-mm thickness. Automated Agaston scoring was conducted on the LDCT scans before (LDCT-Org<sub>auto</sub>) and after the image conversion (LDCT-CONV<sub>auto</sub>). Manual scoring was performed on the CSCT images (CSCT<sub>manual</sub>) and used as a reference standard. The accuracy of automated Agaston scores and risk severity categorization based on the automated scoring on LDCT scans was analyzed compared to the reference standard, using the Bland-Altman analysis, concordance correlation coefficient (CCC), and weighted kappa (κ) statistic.</p><p><strong>Results: </strong>LDCT-CONV<sub>auto</sub> demonstrated a reduced bias for Agaston score, compared with CSCT<sub>manual</sub>, than LDCT-Org<sub>auto</sub> did (-3.45 vs. 206.7). LDCT-CONV<sub>auto</sub> showed a higher CCC than LDCT-Org<sub>auto</sub> did (0.881 [95% confidence interval {CI}, 0.750-0.960] vs. 0.269 [95% CI, 0.129-0.430]). In terms of risk category assignment, LDCT-Org<sub>auto</sub> exhibited poor agreement with CSCT<sub>manual</sub> (weighted κ = 0.115 [95% CI, 0.082-0.154]), whereas LDCT-CONV<sub>auto</sub> achieved good agreement (weighted κ = 0.792 [95% CI, 0.731-0.847]).</p><p><strong>Conclusion: </strong>Deep learning-based conversion of LDCT images originally obtained with thin slices and a sharp kernel can enhance the accuracy of automated coronary artery calcium score measurement using the images.</p>","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":" ","pages":"759-770"},"PeriodicalIF":5.3,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12318652/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144317318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}