Adarsh Ghosh MD , Hailong Li PhD , Andrew T. Trout MD
{"title":"Large Language Models can Help with Biostatistics and Coding Needed in Radiology Research","authors":"Adarsh Ghosh MD , Hailong Li PhD , Andrew T. Trout MD","doi":"10.1016/j.acra.2024.09.042","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Original research in radiology often involves handling large datasets, data manipulation, statistical tests, and coding. Recent studies show that large language models (LLMs) can solve bioinformatics tasks, suggesting their potential in radiology research. This study evaluates an LLM's ability to provide statistical and deep learning solutions and code for radiology research.</div></div><div><h3>Materials and Methods</h3><div>We used web-based chat interfaces available for ChatGPT-4o, ChatGPT-3.5, and Google Gemini.</div></div><div><h3>Experiment 1: Biostatistics and Data Visualization</h3><div>We assessed each LLMs' ability to suggest biostatistical tests and generate R code for the same using a Cancer Imaging Archive dataset. Prompts were based on statistical analyses from a peer-reviewed manuscript. The generated code was tested in R Studio for correctness, runtime errors and the ability to generate the requested visualization.</div></div><div><h3>Experiment 2: Deep Learning</h3><div>We used the RSNA-STR Pneumonia Detection Challenge dataset to evaluate ChatGPT-4o and Gemini’s ability to generate Python code for transformer-based image classification models (Vision Transformer ViT-B/16). The generated code was tested in a Jupiter Notebook for functionality and run time errors.</div></div><div><h3>Results</h3><div>Out of the 8 statistical questions posed, correct statistical answers were suggested for 7 (ChatGPT-4o), 6 (ChatGPT-3.5), and 5 (Gemini) scenarios. The R code output by ChatGPT-4o had fewer runtime errors (6 out of the 7 total codes provided) compared to ChatGPT-3.5 (5/7) and Gemini (5/7). Both ChatGPT4o and Gemini were able to generate visualization requested with a few run time errors. Iteratively copying runtime errors from the code generated by ChatGPT4o into the chat helped resolve them. Gemini initially hallucinated during code generation but was able to provide accurate code on restarting the experiment.</div><div>ChatGPT4-o and Gemini successfully generated initial Python code for deep learning tasks. Errors encountered during implementation were resolved through iterations using the chat interface, demonstrating LLM utility in providing baseline code for further code refinement and resolving run time errors.</div></div><div><h3>Conclusion</h3><div>LLMs can assist in coding tasks for radiology research, providing initial code for data visualization, statistical tests, and deep learning models helping researchers with foundational biostatistical knowledge. While LLM can offer a useful starting point, they require users to refine and validate the code and caution is necessary due to potential errors, the risk of hallucinations and data privacy regulations.</div></div><div><h3>Summary statement</h3><div>LLMs can help with coding and statistical problems in radiology research. This can help primary authors trouble shoot coding needed in radiology research.</div></div>","PeriodicalId":50928,"journal":{"name":"Academic Radiology","volume":"32 2","pages":"Pages 604-611"},"PeriodicalIF":3.8000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Academic Radiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1076633224006913","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction
Original research in radiology often involves handling large datasets, data manipulation, statistical tests, and coding. Recent studies show that large language models (LLMs) can solve bioinformatics tasks, suggesting their potential in radiology research. This study evaluates an LLM's ability to provide statistical and deep learning solutions and code for radiology research.
Materials and Methods
We used web-based chat interfaces available for ChatGPT-4o, ChatGPT-3.5, and Google Gemini.
Experiment 1: Biostatistics and Data Visualization
We assessed each LLMs' ability to suggest biostatistical tests and generate R code for the same using a Cancer Imaging Archive dataset. Prompts were based on statistical analyses from a peer-reviewed manuscript. The generated code was tested in R Studio for correctness, runtime errors and the ability to generate the requested visualization.
Experiment 2: Deep Learning
We used the RSNA-STR Pneumonia Detection Challenge dataset to evaluate ChatGPT-4o and Gemini’s ability to generate Python code for transformer-based image classification models (Vision Transformer ViT-B/16). The generated code was tested in a Jupiter Notebook for functionality and run time errors.
Results
Out of the 8 statistical questions posed, correct statistical answers were suggested for 7 (ChatGPT-4o), 6 (ChatGPT-3.5), and 5 (Gemini) scenarios. The R code output by ChatGPT-4o had fewer runtime errors (6 out of the 7 total codes provided) compared to ChatGPT-3.5 (5/7) and Gemini (5/7). Both ChatGPT4o and Gemini were able to generate visualization requested with a few run time errors. Iteratively copying runtime errors from the code generated by ChatGPT4o into the chat helped resolve them. Gemini initially hallucinated during code generation but was able to provide accurate code on restarting the experiment.
ChatGPT4-o and Gemini successfully generated initial Python code for deep learning tasks. Errors encountered during implementation were resolved through iterations using the chat interface, demonstrating LLM utility in providing baseline code for further code refinement and resolving run time errors.
Conclusion
LLMs can assist in coding tasks for radiology research, providing initial code for data visualization, statistical tests, and deep learning models helping researchers with foundational biostatistical knowledge. While LLM can offer a useful starting point, they require users to refine and validate the code and caution is necessary due to potential errors, the risk of hallucinations and data privacy regulations.
Summary statement
LLMs can help with coding and statistical problems in radiology research. This can help primary authors trouble shoot coding needed in radiology research.
期刊介绍:
Academic Radiology publishes original reports of clinical and laboratory investigations in diagnostic imaging, the diagnostic use of radioactive isotopes, computed tomography, positron emission tomography, magnetic resonance imaging, ultrasound, digital subtraction angiography, image-guided interventions and related techniques. It also includes brief technical reports describing original observations, techniques, and instrumental developments; state-of-the-art reports on clinical issues, new technology and other topics of current medical importance; meta-analyses; scientific studies and opinions on radiologic education; and letters to the Editor.