Examining the Role of Large Language Models in Orthopedics: Systematic Review.

IF 6 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES Journal of Medical Internet Research Pub Date : 2024-11-15 DOI:10.2196/59607

Cheng Zhang, Shanshan Liu, Xingyu Zhou, Siyu Zhou, Yinglun Tian, Shenglin Wang, Nanfang Xu, Weishi Li

{"title":"Examining the Role of Large Language Models in Orthopedics: Systematic Review.","authors":"Cheng Zhang, Shanshan Liu, Xingyu Zhou, Siyu Zhou, Yinglun Tian, Shenglin Wang, Nanfang Xu, Weishi Li","doi":"10.2196/59607","DOIUrl":null,"url":null,"abstract":"Background: Large language models (LLMs) can understand natural language and generate corresponding text, images, and even videos based on prompts, which holds great potential in medical scenarios. Orthopedics is a significant branch of medicine, and orthopedic diseases contribute to a significant socioeconomic burden, which could be alleviated by the application of LLMs. Several pioneers in orthopedics have conducted research on LLMs across various subspecialties to explore their performance in addressing different issues. However, there are currently few reviews and summaries of these studies, and a systematic summary of existing research is absent.Objective: The objective of this review was to comprehensively summarize research findings on the application of LLMs in the field of orthopedics and explore the potential opportunities and challenges.Methods: PubMed, Embase, and Cochrane Library databases were searched from January 1, 2014, to February 22, 2024, with the language limited to English. The terms, which included variants of \"large language model,\" \"generative artificial intelligence,\" \"ChatGPT,\" and \"orthopaedics,\" were divided into 2 categories: large language model and orthopedics. After completing the search, the study selection process was conducted according to the inclusion and exclusion criteria. The quality of the included studies was assessed using the revised Cochrane risk-of-bias tool for randomized trials and CONSORT-AI (Consolidated Standards of Reporting Trials-Artificial Intelligence) guidance. Data extraction and synthesis were conducted after the quality assessment.Results: A total of 68 studies were selected. The application of LLMs in orthopedics involved the fields of clinical practice, education, research, and management. Of these 68 studies, 47 (69%) focused on clinical practice, 12 (18%) addressed orthopedic education, 8 (12%) were related to scientific research, and 1 (1%) pertained to the field of management. Of the 68 studies, only 8 (12%) recruited patients, and only 1 (1%) was a high-quality randomized controlled trial. ChatGPT was the most commonly mentioned LLM tool. There was considerable heterogeneity in the definition, measurement, and evaluation of the LLMs' performance across the different studies. For diagnostic tasks alone, the accuracy ranged from 55% to 93%. When performing disease classification tasks, ChatGPT with GPT-4's accuracy ranged from 2% to 100%. With regard to answering questions in orthopedic examinations, the scores ranged from 45% to 73.6% due to differences in models and test selections.Conclusions: LLMs cannot replace orthopedic professionals in the short term. However, using LLMs as copilots could be a potential approach to effectively enhance work efficiency at present. More high-quality clinical trials are needed in the future, aiming to identify optimal applications of LLMs and advance orthopedics toward higher efficiency and precision.","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"26 ","pages":"e59607"},"PeriodicalIF":6.0000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11607553/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/59607","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Large language models (LLMs) can understand natural language and generate corresponding text, images, and even videos based on prompts, which holds great potential in medical scenarios. Orthopedics is a significant branch of medicine, and orthopedic diseases contribute to a significant socioeconomic burden, which could be alleviated by the application of LLMs. Several pioneers in orthopedics have conducted research on LLMs across various subspecialties to explore their performance in addressing different issues. However, there are currently few reviews and summaries of these studies, and a systematic summary of existing research is absent.

Objective: The objective of this review was to comprehensively summarize research findings on the application of LLMs in the field of orthopedics and explore the potential opportunities and challenges.

Methods: PubMed, Embase, and Cochrane Library databases were searched from January 1, 2014, to February 22, 2024, with the language limited to English. The terms, which included variants of "large language model," "generative artificial intelligence," "ChatGPT," and "orthopaedics," were divided into 2 categories: large language model and orthopedics. After completing the search, the study selection process was conducted according to the inclusion and exclusion criteria. The quality of the included studies was assessed using the revised Cochrane risk-of-bias tool for randomized trials and CONSORT-AI (Consolidated Standards of Reporting Trials-Artificial Intelligence) guidance. Data extraction and synthesis were conducted after the quality assessment.

Results: A total of 68 studies were selected. The application of LLMs in orthopedics involved the fields of clinical practice, education, research, and management. Of these 68 studies, 47 (69%) focused on clinical practice, 12 (18%) addressed orthopedic education, 8 (12%) were related to scientific research, and 1 (1%) pertained to the field of management. Of the 68 studies, only 8 (12%) recruited patients, and only 1 (1%) was a high-quality randomized controlled trial. ChatGPT was the most commonly mentioned LLM tool. There was considerable heterogeneity in the definition, measurement, and evaluation of the LLMs' performance across the different studies. For diagnostic tasks alone, the accuracy ranged from 55% to 93%. When performing disease classification tasks, ChatGPT with GPT-4's accuracy ranged from 2% to 100%. With regard to answering questions in orthopedic examinations, the scores ranged from 45% to 73.6% due to differences in models and test selections.

Conclusions: LLMs cannot replace orthopedic professionals in the short term. However, using LLMs as copilots could be a potential approach to effectively enhance work efficiency at present. More high-quality clinical trials are needed in the future, aiming to identify optimal applications of LLMs and advance orthopedics toward higher efficiency and precision.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

研究大型语言模型在矫形外科中的作用：系统回顾。

背景：大语言模型（LLMs）能够理解自然语言，并根据提示生成相应的文本、图像甚至视频，在医疗场景中具有巨大潜力。骨科是医学的一个重要分支，骨科疾病造成了巨大的社会经济负担，而应用 LLM 可以减轻这些负担。骨科领域的一些先驱者已在不同的亚专科领域开展了有关 LLMs 的研究，以探索它们在解决不同问题方面的性能。然而，目前对这些研究的回顾和总结很少，也缺乏对现有研究的系统总结：本综述旨在全面总结骨科领域应用 LLMs 的研究成果，并探索潜在的机遇和挑战：方法：检索了 PubMed、Embase 和 Cochrane Library 数据库，检索时间为 2014 年 1 月 1 日至 2024 年 2 月 22 日，检索语言限于英语。检索词包括 "大型语言模型"、"生成式人工智能"、"ChatGPT "和 "骨科 "的变体，并分为两类：大型语言模型和骨科。检索完成后，根据纳入和排除标准进行了研究筛选。纳入研究的质量采用修订版 Cochrane 随机试验偏倚风险工具和 CONSORT-AI（试验报告统一标准-人工智能）指南进行评估。质量评估后进行数据提取和综合：结果：共选取了 68 项研究。LLM在骨科中的应用涉及临床实践、教育、研究和管理等领域。在这 68 项研究中，47 项（69%）侧重于临床实践，12 项（18%）涉及骨科教育，8 项（12%）与科学研究有关，1 项（1%）涉及管理领域。在 68 项研究中，只有 8 项（12%）招募了患者，只有 1 项（1%）是高质量的随机对照试验。ChatGPT 是最常被提及的 LLM 工具。不同研究对 LLM 的定义、测量和性能评估存在相当大的差异。仅就诊断任务而言，准确率从 55% 到 93% 不等。在执行疾病分类任务时，使用 GPT-4 的 ChatGPT 的准确率从 2% 到 100% 不等。在回答骨科检查问题方面，由于模型和测试选择的不同，得分率从 45% 到 73.6% 不等：结论：在短期内，法律硕士无法取代骨科专业人员。结论：在短期内，法律硕士还不能取代骨科专业人员，但是，使用法律硕士作为副驾驶员可能是目前有效提高工作效率的一种潜在方法。未来需要进行更多高质量的临床试验，以确定 LLMs 的最佳应用，推动矫形外科向更高效、更精确的方向发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Medical Internet Research 医学-卫生保健

CiteScore

14.40

自引率

5.40%

发文量

654

审稿时长

1 months

期刊介绍： The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades. As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor. Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.