首页 > 最新文献

Radiology最新文献

英文 中文
Perivascular Epithelioid Cell Tumor of the Liver: A Rare and Difficult Case Diagnosis.
IF 12.1 1区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Pub Date : 2025-01-01 DOI: 10.1148/radiol.241611
Amara A Cuello, Miguel E Nazar
{"title":"Perivascular Epithelioid Cell Tumor of the Liver: A Rare and Difficult Case Diagnosis.","authors":"Amara A Cuello, Miguel E Nazar","doi":"10.1148/radiol.241611","DOIUrl":"https://doi.org/10.1148/radiol.241611","url":null,"abstract":"","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"314 1","pages":"e241611"},"PeriodicalIF":12.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143053271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Case 337.
IF 12.1 1区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Pub Date : 2025-01-01 DOI: 10.1148/radiol.241909
Brian H Mu, Faris Galambo, Hadeer W Al-Ali, Sumeet G Dua, Chanae D Dixon, Xinhai R Zhang, Mustafa A Mafraji

History: A 38-year-old previously healthy male patient presented with left-sided facial pain over the prior 5 weeks. He first noticed the pain while washing and applying pressure to his face. The pain was described as shock-like, sharp and shooting, and radiating along the left cheek and temple. It began as 1-2-second episodes occurring two to three times per day, sometimes spontaneously, progressing in severity and frequency over time. Mild progressive left facial weakness also developed a few weeks after initial symptoms. Physical examination demonstrated reproducible pain in the distribution of the maxillary division of the trigeminal nerve (V2), with normal motor and sensory function. A recent routine dental examination demonstrated healthy teeth and gums, and there was no history of dental procedures or trauma. The rest of the physical and neurologic examinations revealed no abnormalities. The patient was afebrile with normal vital signs. Findings of routine laboratory testing, including complete blood count, metabolic panel with electrolytes, kidney and liver function, and inflammatory markers such as C-reactive protein, were all within normal limits. Following the neurologic and otolaryngologic evaluations, imaging was recommended. The patient was also started on treatment with carbamazepine for trigeminal neuralgia, with modest improvement of symptoms. He initially underwent MRI of the temporal bones at an outside hospital. After subsequent referral to our hospital, follow-up concomitant MRI and CT (Figs 1-4) were performed approximately 3 months after the initial imaging.

{"title":"Case 337.","authors":"Brian H Mu, Faris Galambo, Hadeer W Al-Ali, Sumeet G Dua, Chanae D Dixon, Xinhai R Zhang, Mustafa A Mafraji","doi":"10.1148/radiol.241909","DOIUrl":"https://doi.org/10.1148/radiol.241909","url":null,"abstract":"<p><strong>History: </strong>A 38-year-old previously healthy male patient presented with left-sided facial pain over the prior 5 weeks. He first noticed the pain while washing and applying pressure to his face. The pain was described as shock-like, sharp and shooting, and radiating along the left cheek and temple. It began as 1-2-second episodes occurring two to three times per day, sometimes spontaneously, progressing in severity and frequency over time. Mild progressive left facial weakness also developed a few weeks after initial symptoms. Physical examination demonstrated reproducible pain in the distribution of the maxillary division of the trigeminal nerve (V2), with normal motor and sensory function. A recent routine dental examination demonstrated healthy teeth and gums, and there was no history of dental procedures or trauma. The rest of the physical and neurologic examinations revealed no abnormalities. The patient was afebrile with normal vital signs. Findings of routine laboratory testing, including complete blood count, metabolic panel with electrolytes, kidney and liver function, and inflammatory markers such as C-reactive protein, were all within normal limits. Following the neurologic and otolaryngologic evaluations, imaging was recommended. The patient was also started on treatment with carbamazepine for trigeminal neuralgia, with modest improvement of symptoms. He initially underwent MRI of the temporal bones at an outside hospital. After subsequent referral to our hospital, follow-up concomitant MRI and CT (Figs 1-4) were performed approximately 3 months after the initial imaging.</p>","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"314 1","pages":"e241909"},"PeriodicalIF":12.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143053305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning MRI Reconstruction Delivers Superior Resolution and Improved Diagnostics.
IF 12.1 1区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Pub Date : 2025-01-01 DOI: 10.1148/radiol.242952
Mika T Nevalainen
{"title":"Deep Learning MRI Reconstruction Delivers Superior Resolution and Improved Diagnostics.","authors":"Mika T Nevalainen","doi":"10.1148/radiol.242952","DOIUrl":"https://doi.org/10.1148/radiol.242952","url":null,"abstract":"","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"314 1","pages":"e242952"},"PeriodicalIF":12.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143053308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open-Source Large Language Models in Radiology: A Review and Tutorial for Practical Research and Clinical Deployment.
IF 12.1 1区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Pub Date : 2025-01-01 DOI: 10.1148/radiol.241073
Cody H Savage, Adway Kanhere, Vishwa Parekh, Curtis P Langlotz, Anupam Joshi, Heng Huang, Florence X Doo

Integrating large language models (LLMs) into health care holds substantial potential to enhance clinical workflows and care delivery. However, LLMs also pose serious risks if integration is not thoughtfully executed, with complex challenges spanning accuracy, accessibility, privacy, and regulation. Proprietary commercial LLMs (eg, GPT-4 [OpenAI], Claude 3 Sonnet and Claude 3 Opus [Anthropic], Gemini [Google]) have received much attention from researchers in the medical domain, including radiology. Interestingly, open-source LLMs (eg, Llama 3 and LLaVA-Med) have received comparatively little attention. Yet, open-source LLMs hold several key advantages over proprietary LLMs for medical institutions, hospitals, and individual researchers. The wider adoption of open-source LLMs has been slower, perhaps in part due to the lack of familiarity, accessible computational infrastructure, and community-built tools to streamline their local implementation and customize them for specific use cases. Thus, this article provides a tutorial for the implementation of open-source LLMs in radiology, including examples of commonly used tools for text generation and techniques for troubleshooting issues with prompt engineering, retrieval-augmented generation, and fine-tuning. Implementation-ready code for each tool is provided at https://github.com/UM2ii/Open-Source-LLM-Tools-for-Radiology. In addition, this article compares the benefits and drawbacks of open-source and proprietary LLMs, discusses the differentiating characteristics of popular open-source LLMs, and highlights recent advancements that may affect their adoption.

{"title":"Open-Source Large Language Models in Radiology: A Review and Tutorial for Practical Research and Clinical Deployment.","authors":"Cody H Savage, Adway Kanhere, Vishwa Parekh, Curtis P Langlotz, Anupam Joshi, Heng Huang, Florence X Doo","doi":"10.1148/radiol.241073","DOIUrl":"10.1148/radiol.241073","url":null,"abstract":"<p><p>Integrating large language models (LLMs) into health care holds substantial potential to enhance clinical workflows and care delivery. However, LLMs also pose serious risks if integration is not thoughtfully executed, with complex challenges spanning accuracy, accessibility, privacy, and regulation. Proprietary commercial LLMs (eg, GPT-4 [OpenAI], Claude 3 Sonnet and Claude 3 Opus [Anthropic], Gemini [Google]) have received much attention from researchers in the medical domain, including radiology. Interestingly, open-source LLMs (eg, Llama 3 and LLaVA-Med) have received comparatively little attention. Yet, open-source LLMs hold several key advantages over proprietary LLMs for medical institutions, hospitals, and individual researchers. The wider adoption of open-source LLMs has been slower, perhaps in part due to the lack of familiarity, accessible computational infrastructure, and community-built tools to streamline their local implementation and customize them for specific use cases. Thus, this article provides a tutorial for the implementation of open-source LLMs in radiology, including examples of commonly used tools for text generation and techniques for troubleshooting issues with prompt engineering, retrieval-augmented generation, and fine-tuning. Implementation-ready code for each tool is provided at <i>https://github.com/UM2ii/Open-Source-LLM-Tools-for-Radiology</i>. In addition, this article compares the benefits and drawbacks of open-source and proprietary LLMs, discusses the differentiating characteristics of popular open-source LLMs, and highlights recent advancements that may affect their adoption.</p>","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"314 1","pages":"e241073"},"PeriodicalIF":12.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783163/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143053245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unraveling the Obesity Paradox in Non-Small Cell Lung Cancer.
IF 12.1 1区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Pub Date : 2025-01-01 DOI: 10.1148/radiol.243509
Michael W Vannier
{"title":"Unraveling the Obesity Paradox in Non-Small Cell Lung Cancer.","authors":"Michael W Vannier","doi":"10.1148/radiol.243509","DOIUrl":"https://doi.org/10.1148/radiol.243509","url":null,"abstract":"","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"314 1","pages":"e243509"},"PeriodicalIF":12.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143053301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incorporating Lymph Node Size at CT as an N1 Descriptor in Clinical N Staging for Lung Cancer. 将CT淋巴结大小作为肺癌临床N分期的N1描述符
IF 12.1 1区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Pub Date : 2025-01-01 DOI: 10.1148/radiol.241603
Yura Ahn, Sang Min Lee, Jooae Choe, Se Hoon Choi, Kyung-Hyun Do, Joon Beom Seo

Background The ninth edition of the TNM classification for lung cancer revised the N2 categorization, improving patient stratification, but prognostic heterogeneity remains for the N1 category. Purpose To define the optimal size cutoff for a bulky lymph node (LN) on CT scans and to evaluate the prognostic value of bulky LN in the clinical N staging of lung cancer. Materials and Methods This retrospective study analyzed patients who underwent lobectomy or pneumonectomy for lung cancer between January 2013 and December 2021, divided into development (2016-2021) and validation (2013-2015) cohorts. The optimal threshold for a bulky LN was defined based on the short-axis diameter of the largest clinically positive LN at CT. Prognostic differences according to presence of bulky LN in cN1 category for overall survival (OS) were evaluated using multivariable Cox analysis. Survival discrimination was assessed using the Harrell concordance index (C-index). Results A total of 3426 patients (mean age, 64.0 years ± 9.3 [SD]; 1837 male) and 1327 patients (mean age, 63.0 years ± 9.7; 813 male) were included in the development and validation cohorts, respectively. The cutoff size for a bulky LN was established at 15 mm, and the presence of bulky LN was an independent risk factor for OS (hazard ratio [HR], 1.54; 95% CI: 1.10, 2.16; P = .01). In the development and validation cohorts, the cN1-bulky group had higher mortality risk than the cN1-nonbulky group (HR, 2.82 [95% CI: 1.73, 4.58; P < .001]; 2.29 [95% CI: 1.34, 3.92; P = .002], respectively). The bulky LN descriptor improved prognostic discrimination within the cN1 category compared with the current staging (C-index from 0.50 to 0.60 and to 0.58 in the development and validation cohorts [P < .001, P = .006], respectively]). Conclusion Defining bulky LN with a size cutoff of 15 mm was an effective descriptor in the clinical staging of N1 lung cancer. © RSNA, 2025 Supplemental material is available for this article. See also the editorial by Horst in this issue.

第九版TNM肺癌分类修订了N2分类,改善了患者分层,但N1分类的预后异质性仍然存在。目的探讨CT扫描中肿大淋巴结(LN)的最佳截面积,评价肿大淋巴结在肺癌临床分期中的预后价值。材料与方法本回顾性研究分析了2013年1月至2021年12月期间接受肺癌肺叶切除术或全肺切除术的患者,分为发展组(2016-2021)和验证组(2013-2015)。根据CT上最大临床阳性LN的短轴直径来确定大体积LN的最佳阈值。采用多变量Cox分析评估cN1分类中是否存在大体积LN的预后差异。采用Harrell一致性指数(C-index)评估生存歧视。结果共3426例患者(平均年龄64.0岁±9.3岁[SD];男性1837例),1327例(平均年龄63.0岁±9.7岁;813名男性)分别被纳入开发和验证队列。大体积LN的临界值为15 mm,大体积LN的存在是OS的独立危险因素(风险比[HR], 1.54;95% ci: 1.10, 2.16;P = 0.01)。在研究和验证队列中,cn1肥大组的死亡风险高于cn1非肥大组(HR, 2.82) [95% CI: 1.73, 4.58;P < .001];2.29 [95% ci: 1.34, 3.92;P = .002])。与当前分期相比,庞大的LN描述符改善了cN1类别的预后判别(C-index在开发和验证队列中分别从0.50到0.60和0.58 [P < .001, P = .006])。结论以15mm大小的淋巴结划分是N1型肺癌临床分期的有效指标。©RSNA, 2025本文可获得补充材料。参见本期霍斯特的社论。
{"title":"Incorporating Lymph Node Size at CT as an N1 Descriptor in Clinical N Staging for Lung Cancer.","authors":"Yura Ahn, Sang Min Lee, Jooae Choe, Se Hoon Choi, Kyung-Hyun Do, Joon Beom Seo","doi":"10.1148/radiol.241603","DOIUrl":"https://doi.org/10.1148/radiol.241603","url":null,"abstract":"<p><p>Background The ninth edition of the TNM classification for lung cancer revised the N2 categorization, improving patient stratification, but prognostic heterogeneity remains for the N1 category. Purpose To define the optimal size cutoff for a bulky lymph node (LN) on CT scans and to evaluate the prognostic value of bulky LN in the clinical N staging of lung cancer. Materials and Methods This retrospective study analyzed patients who underwent lobectomy or pneumonectomy for lung cancer between January 2013 and December 2021, divided into development (2016-2021) and validation (2013-2015) cohorts. The optimal threshold for a bulky LN was defined based on the short-axis diameter of the largest clinically positive LN at CT. Prognostic differences according to presence of bulky LN in cN1 category for overall survival (OS) were evaluated using multivariable Cox analysis. Survival discrimination was assessed using the Harrell concordance index (C-index). Results A total of 3426 patients (mean age, 64.0 years ± 9.3 [SD]; 1837 male) and 1327 patients (mean age, 63.0 years ± 9.7; 813 male) were included in the development and validation cohorts, respectively. The cutoff size for a bulky LN was established at 15 mm, and the presence of bulky LN was an independent risk factor for OS (hazard ratio [HR], 1.54; 95% CI: 1.10, 2.16; <i>P</i> = .01). In the development and validation cohorts, the cN1-bulky group had higher mortality risk than the cN1-nonbulky group (HR, 2.82 [95% CI: 1.73, 4.58; <i>P</i> < .001]; 2.29 [95% CI: 1.34, 3.92; <i>P</i> = .002], respectively). The bulky LN descriptor improved prognostic discrimination within the cN1 category compared with the current staging (C-index from 0.50 to 0.60 and to 0.58 in the development and validation cohorts [<i>P</i> < .001, <i>P</i> = .006], respectively]). Conclusion Defining bulky LN with a size cutoff of 15 mm was an effective descriptor in the clinical staging of N1 lung cancer. © RSNA, 2025 <i>Supplemental material is available for this article.</i> See also the editorial by Horst in this issue.</p>","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"314 1","pages":"e241603"},"PeriodicalIF":12.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143010693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consideration of Thermal Ablation for Secondary Hyperparathyroidism in Patients with Chronic Kidney Disease. 慢性肾病患者继发性甲状旁腺功能亢进热消融治疗的探讨。
IF 12.1 1区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Pub Date : 2025-01-01 DOI: 10.1148/radiol.243288
Joseph J Gemmete
{"title":"Consideration of Thermal Ablation for Secondary Hyperparathyroidism in Patients with Chronic Kidney Disease.","authors":"Joseph J Gemmete","doi":"10.1148/radiol.243288","DOIUrl":"https://doi.org/10.1148/radiol.243288","url":null,"abstract":"","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"314 1","pages":"e243288"},"PeriodicalIF":12.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142954066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports. 在从自由文本报告中提取胸片结果方面,确保隐私的开放权重大型语言模型与封闭权重gpt - 40具有竞争力。
IF 12.1 1区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Pub Date : 2025-01-01 DOI: 10.1148/radiol.240895
Sebastian Nowak, Benjamin Wulff, Yannik C Layer, Maike Theis, Alexander Isaak, Babak Salam, Wolfgang Block, Daniel Kuetting, Claus C Pieper, Julian A Luetkens, Ulrike Attenberger, Alois M Sprinkart

Background Large-scale secondary use of clinical databases requires automated tools for retrospective extraction of structured content from free-text radiology reports. Purpose To share data and insights on the application of privacy-preserving open-weights large language models (LLMs) for reporting content extraction with comparison to standard rule-based systems and the closed-weights LLMs from OpenAI. Materials and Methods In this retrospective exploratory study conducted between May 2024 and September 2024, zero-shot prompting of 17 open-weights LLMs was preformed. These LLMs with model weights released under open licenses were compared with rule-based annotation and with OpenAI's GPT-4o, GPT-4o-mini, GPT-4-turbo, and GPT-3.5-turbo on a manually annotated public English chest radiography dataset (Indiana University, 3927 patients and reports). An annotated nonpublic German chest radiography dataset (18 500 reports, 16 844 patients [10 340 male; mean age, 62.6 years ± 21.5 {SD}]) was used to compare local fine-tuning of all open-weights LLMs via low-rank adaptation and 4-bit quantization to bidirectional encoder representations from transformers (BERT) with different subsets of reports (from 10 to 14 580). Nonoverlapping 95% CIs of macro-averaged F1 scores were defined as relevant differences. Results For the English reports, the highest zero-shot macro-averaged F1 score was observed for GPT-4o (92.4% [95% CI: 87.9, 95.9]); GPT-4o outperformed the rule-based CheXpert [Stanford University] (73.1% [95% CI: 65.1, 79.7]) but was comparable in performance to several open-weights LLMs (top three: Mistral-Large [Mistral AI], 92.6% [95% CI: 88.2, 96.0]; Llama-3.1-70b [Meta AI], 92.2% [95% CI: 87.1, 95.8]; and Llama-3.1-405b [Meta AI]: 90.3% [95% CI: 84.6, 94.5]). For the German reports, Mistral-Large (91.6% [95% CI: 90.5, 92.7]) had the highest zero-shot macro-averaged F1 score among the six other open-weights LLMs and outperformed the rule-based annotation (74.8% [95% CI: 73.3, 76.1]). Using 1000 reports for fine-tuning, all LLMs (top three: Mistral-Large, 94.3% [95% CI: 93.5, 95.2]; OpenBioLLM-70b [Saama]: 93.9% [95% CI: 92.9, 94.8]; and Mixtral-8×22b [Mistral AI]: 93.8% [95% CI: 92.8, 94.7]) achieved significantly higher macro-averaged F1 score than did BERT (86.7% [95% CI: 85.0, 88.3]); however, the differences were not relevant when 2000 or more reports were used for fine-tuning. Conclusion LLMs have the potential to outperform rule-based systems for zero-shot "out-of-the-box" structuring of report databases, with privacy-ensuring open-weights LLMs being competitive with closed-weights GPT-4o. Additionally, the open-weights LLM outperformed BERT when moderate numbers of reports were used for fine-tuning. Published under a CC BY 4.0 license. Supplemental material is available for this article. See also the editorial by Gee and Yao in this issue.

临床数据库的大规模二次使用需要自动化工具从自由文本放射学报告中回顾性提取结构化内容。目的:通过与基于规则的标准系统和OpenAI的闭权大语言模型的比较,分享关于在报告内容提取中保护隐私的开放权重大语言模型(llm)应用的数据和见解。材料与方法本回顾性探索性研究于2024年5月至2024年9月进行,对17只开重llm进行了零枪提示。这些在开放许可下发布的具有模型权重的llm与基于规则的注释以及OpenAI的gpt - 40、gpt - 40 -mini、GPT-4-turbo和GPT-3.5-turbo在手动注释的公共英语胸片数据集(印第安纳大学,3927例患者和报告)上进行比较。一个带注释的非公开德国胸片数据集(18 500份报告,16 844例患者[10 340名男性;平均年龄(62.6岁±21.5 {SD}]),通过低秩自适应和4位量化来比较所有开权llm对具有不同报告子集(从10到14 580)的变压器(BERT)的双向编码器表示的局部微调。宏观平均F1评分不重叠的95% ci定义为相关差异。结果英文报告中,gpt - 40的零射宏观平均F1评分最高(92.4% [95% CI: 87.9, 95.9]);gpt - 40的表现优于基于规则的CheXpert[斯坦福大学](73.1% [95% CI: 65.1, 79.7]),但在性能上与几个开放权重llm相当(前三名:Mistral- large [Mistral AI], 92.6% [95% CI: 88.2, 96.0];骆驼- 3.1 - 70 b(元AI), 92.2%(95%置信区间CI: 87.1, 95.8);Llama-3.1-405b [Meta AI]: 90.3% [95% CI: 84.6, 94.5])。对于德国的报告,Mistral-Large (91.6% [95% CI: 90.5, 92.7])在其他六种开放权重llm中具有最高的零射击宏观平均F1得分,并且优于基于规则的注释(74.8% [95% CI: 73.3, 76.1])。使用1000份报告进行微调,所有llm(前三名:Mistral-Large, 94.3% [95% CI: 93.5, 95.2];OpenBioLLM-70b [Saama]: 93.9% [95% CI: 92.9, 94.8];Mixtral-8×22b [Mistral AI]: 93.8% [95% CI: 92.8, 94.7])的宏观平均F1评分显著高于BERT (86.7% [95% CI: 85.0, 88.3]);然而,当使用2000或更多的报告进行微调时,差异就不相关了。llm在报告数据库的“开箱即用”结构方面有潜力胜过基于规则的系统,而确保隐私的开放权重llm与封闭权重gpt - 40具有竞争力。此外,当使用适度数量的报告进行微调时,开放权重LLM优于BERT。在CC BY 4.0许可下发布。本文有补充材料。请参阅Gee和Yao在本期的社论。
{"title":"Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports.","authors":"Sebastian Nowak, Benjamin Wulff, Yannik C Layer, Maike Theis, Alexander Isaak, Babak Salam, Wolfgang Block, Daniel Kuetting, Claus C Pieper, Julian A Luetkens, Ulrike Attenberger, Alois M Sprinkart","doi":"10.1148/radiol.240895","DOIUrl":"https://doi.org/10.1148/radiol.240895","url":null,"abstract":"<p><p>Background Large-scale secondary use of clinical databases requires automated tools for retrospective extraction of structured content from free-text radiology reports. Purpose To share data and insights on the application of privacy-preserving open-weights large language models (LLMs) for reporting content extraction with comparison to standard rule-based systems and the closed-weights LLMs from OpenAI. Materials and Methods In this retrospective exploratory study conducted between May 2024 and September 2024, zero-shot prompting of 17 open-weights LLMs was preformed. These LLMs with model weights released under open licenses were compared with rule-based annotation and with OpenAI's GPT-4o, GPT-4o-mini, GPT-4-turbo, and GPT-3.5-turbo on a manually annotated public English chest radiography dataset (Indiana University, 3927 patients and reports). An annotated nonpublic German chest radiography dataset (18 500 reports, 16 844 patients [10 340 male; mean age, 62.6 years ± 21.5 {SD}]) was used to compare local fine-tuning of all open-weights LLMs via low-rank adaptation and 4-bit quantization to bidirectional encoder representations from transformers (BERT) with different subsets of reports (from 10 to 14 580). Nonoverlapping 95% CIs of macro-averaged F1 scores were defined as relevant differences. Results For the English reports, the highest zero-shot macro-averaged F1 score was observed for GPT-4o (92.4% [95% CI: 87.9, 95.9]); GPT-4o outperformed the rule-based CheXpert [Stanford University] (73.1% [95% CI: 65.1, 79.7]) but was comparable in performance to several open-weights LLMs (top three: Mistral-Large [Mistral AI], 92.6% [95% CI: 88.2, 96.0]; Llama-3.1-70b [Meta AI], 92.2% [95% CI: 87.1, 95.8]; and Llama-3.1-405b [Meta AI]: 90.3% [95% CI: 84.6, 94.5]). For the German reports, Mistral-Large (91.6% [95% CI: 90.5, 92.7]) had the highest zero-shot macro-averaged F1 score among the six other open-weights LLMs and outperformed the rule-based annotation (74.8% [95% CI: 73.3, 76.1]). Using 1000 reports for fine-tuning, all LLMs (top three: Mistral-Large, 94.3% [95% CI: 93.5, 95.2]; OpenBioLLM-70b [Saama]: 93.9% [95% CI: 92.9, 94.8]; and Mixtral-8×22b [Mistral AI]: 93.8% [95% CI: 92.8, 94.7]) achieved significantly higher macro-averaged F1 score than did BERT (86.7% [95% CI: 85.0, 88.3]); however, the differences were not relevant when 2000 or more reports were used for fine-tuning. Conclusion LLMs have the potential to outperform rule-based systems for zero-shot \"out-of-the-box\" structuring of report databases, with privacy-ensuring open-weights LLMs being competitive with closed-weights GPT-4o. Additionally, the open-weights LLM outperformed BERT when moderate numbers of reports were used for fine-tuning. Published under a CC BY 4.0 license. <i>Supplemental material is available for this article.</i> See also the editorial by Gee and Yao in this issue.</p>","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"314 1","pages":"e240895"},"PeriodicalIF":12.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142979764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of Multimodal Prompt Elements on Diagnostic Performance of GPT-4V in Challenging Brain MRI Cases. 多模式提示元素对高难度脑MRI病例GPT-4V诊断性能的影响
IF 12.1 1区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Pub Date : 2025-01-01 DOI: 10.1148/radiol.240689
Severin Schramm, Silas Preis, Marie-Christin Metz, Kirsten Jung, Benita Schmitz-Koep, Claus Zimmer, Benedikt Wiestler, Dennis M Hedderich, Su Hwan Kim

Background Studies have explored the application of multimodal large language models (LLMs) in radiologic differential diagnosis. Yet, how different multimodal input combinations affect diagnostic performance is not well understood. Purpose To evaluate the impact of varying multimodal input elements on the accuracy of OpenAI's GPT-4 with vision (GPT-4V)-based brain MRI differential diagnosis. Materials and Methods Sixty brain MRI cases with a challenging yet verified diagnosis were selected. Seven prompt groups with variations of four input elements (image without modifiers [I], annotation [A], medical history [H], and image description [D]) were defined. For each MRI case and prompt group, three identical queries were performed using an LLM-based search engine (Perplexity AI, powered by GPT-4V). The accuracy of LLM-generated differential diagnoses was rated using a binary and a numeric scoring system and analyzed using a χ2 test and a Kruskal-Wallis test. Results were corrected for false-discovery rate with use of the Benjamini-Hochberg procedure. Regression analyses were performed to determine the contribution of each input element to diagnostic performance. Results The prompt group containing I, A, H, and D as input exhibited the highest diagnostic accuracy (124 of 180 responses [69%]). Significant differences were observed between prompt groups that contained D among their inputs and those that did not. Unannotated (I) (four of 180 responses [2.2%]) or annotated radiologic images alone (I and A) (two of 180 responses [1.1%]) yielded very low diagnostic accuracy. Regression analyses confirmed a large positive effect of D on diagnostic accuracy (odds ratio [OR], 68.03; P < .001), as well as a moderate positive effect of H (OR, 4.18; P < .001). Conclusion The textual description of radiologic image findings was identified as the strongest contributor to the performance of GPT-4V in brain MRI differential diagnosis, followed by the medical history; unannotated or annotated images alone yielded very low diagnostic performance. © RSNA, 2025 Supplemental material is available for this article.

研究已经探索了多模态大语言模型(LLMs)在放射学鉴别诊断中的应用。然而,不同的多模态输入组合如何影响诊断性能尚不清楚。目的评估不同多模态输入元素对OpenAI基于视觉的GPT-4 (GPT-4V)脑MRI鉴别诊断准确性的影响。材料与方法选择60例诊断具有挑战性但经证实的脑MRI病例。定义了4个输入元素(无修饰符图像[I]、注释[A]、病史[H]和图像描述[D])的7个提示组。对于每个MRI病例和提示组,使用基于llm的搜索引擎(Perplexity AI,由GPT-4V提供支持)执行三个相同的查询。llm产生的鉴别诊断的准确性使用二进制和数字评分系统进行评分,并使用χ2检验和Kruskal-Wallis检验进行分析。使用Benjamini-Hochberg程序对结果进行错误发现率校正。进行回归分析以确定每个输入元素对诊断性能的贡献。结果包含I, A, H和D作为输入的提示组显示出最高的诊断准确性(180个应答中有124个[69%])。在输入中包含D的提示组和不包含D的提示组之间观察到显著差异。未注释的(I)(180个应答中的4个[2.2%])或单独注释的放射学图像(I和A)(180个应答中的2个[1.1%])的诊断准确性非常低。回归分析证实D对诊断准确性有显著的正向影响(优势比[OR], 68.03;P < .001),以及H的中度正作用(OR, 4.18;P < 0.001)。结论影像学表现的文字描述对GPT-4V在脑MRI鉴别诊断中的作用最大,其次是病史;未注释或单独注释的图像产生非常低的诊断性能。©RSNA, 2025本文可获得补充材料。
{"title":"Impact of Multimodal Prompt Elements on Diagnostic Performance of GPT-4V in Challenging Brain MRI Cases.","authors":"Severin Schramm, Silas Preis, Marie-Christin Metz, Kirsten Jung, Benita Schmitz-Koep, Claus Zimmer, Benedikt Wiestler, Dennis M Hedderich, Su Hwan Kim","doi":"10.1148/radiol.240689","DOIUrl":"https://doi.org/10.1148/radiol.240689","url":null,"abstract":"<p><p>Background Studies have explored the application of multimodal large language models (LLMs) in radiologic differential diagnosis. Yet, how different multimodal input combinations affect diagnostic performance is not well understood. Purpose To evaluate the impact of varying multimodal input elements on the accuracy of OpenAI's GPT-4 with vision (GPT-4V)-based brain MRI differential diagnosis. Materials and Methods Sixty brain MRI cases with a challenging yet verified diagnosis were selected. Seven prompt groups with variations of four input elements (image without modifiers [I], annotation [A], medical history [H], and image description [D]) were defined. For each MRI case and prompt group, three identical queries were performed using an LLM-based search engine (Perplexity AI, powered by GPT-4V). The accuracy of LLM-generated differential diagnoses was rated using a binary and a numeric scoring system and analyzed using a χ<sup>2</sup> test and a Kruskal-Wallis test. Results were corrected for false-discovery rate with use of the Benjamini-Hochberg procedure. Regression analyses were performed to determine the contribution of each input element to diagnostic performance. Results The prompt group containing I, A, H, and D as input exhibited the highest diagnostic accuracy (124 of 180 responses [69%]). Significant differences were observed between prompt groups that contained D among their inputs and those that did not. Unannotated (I) (four of 180 responses [2.2%]) or annotated radiologic images alone (I and A) (two of 180 responses [1.1%]) yielded very low diagnostic accuracy. Regression analyses confirmed a large positive effect of D on diagnostic accuracy (odds ratio [OR], 68.03; <i>P</i> < .001), as well as a moderate positive effect of H (OR, 4.18; <i>P</i> < .001). Conclusion The textual description of radiologic image findings was identified as the strongest contributor to the performance of GPT-4V in brain MRI differential diagnosis, followed by the medical history; unannotated or annotated images alone yielded very low diagnostic performance. © RSNA, 2025 <i>Supplemental material is available for this article.</i></p>","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"314 1","pages":"e240689"},"PeriodicalIF":12.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143010692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Building Rome: TNM Lung Cancer Staging and an Illustration of the Scientific Method. 建筑罗马:TNM肺癌分期和科学方法的例证。
IF 12.1 1区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Pub Date : 2025-01-01 DOI: 10.1148/radiol.243715
Carolyn Horst
{"title":"Building Rome: TNM Lung Cancer Staging and an Illustration of the Scientific Method.","authors":"Carolyn Horst","doi":"10.1148/radiol.243715","DOIUrl":"https://doi.org/10.1148/radiol.243715","url":null,"abstract":"","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"314 1","pages":"e243715"},"PeriodicalIF":12.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143010652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Radiology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1