Enhancing Orthopedic Knowledge Assessments: The Performance of Specialized Generative Language Model Optimization.

IF 2 4区 医学 Q3 MEDICINE, RESEARCH & EXPERIMENTAL Current Medical Science Pub Date : 2024-10-01 Epub Date: 2024-10-05 DOI:10.1007/s11596-024-2929-4
Hong Zhou, Hong-Lin Wang, Yu-Yu Duan, Zi-Neng Yan, Rui Luo, Xiang-Xin Lv, Yi Xie, Jia-Yao Zhang, Jia-Ming Yang, Ming-di Xue, Ying Fang, Lin Lu, Peng-Ran Liu, Zhe-Wei Ye
{"title":"Enhancing Orthopedic Knowledge Assessments: The Performance of Specialized Generative Language Model Optimization.","authors":"Hong Zhou, Hong-Lin Wang, Yu-Yu Duan, Zi-Neng Yan, Rui Luo, Xiang-Xin Lv, Yi Xie, Jia-Yao Zhang, Jia-Ming Yang, Ming-di Xue, Ying Fang, Lin Lu, Peng-Ran Liu, Zhe-Wei Ye","doi":"10.1007/s11596-024-2929-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study aimed to evaluate and compare the effectiveness of knowledge base-optimized and unoptimized large language models (LLMs) in the field of orthopedics to explore optimization strategies for the application of LLMs in specific fields.</p><p><strong>Methods: </strong>This research constructed a specialized knowledge base using clinical guidelines from the American Academy of Orthopaedic Surgeons (AAOS) and authoritative orthopedic publications. A total of 30 orthopedic-related questions covering aspects such as anatomical knowledge, disease diagnosis, fracture classification, treatment options, and surgical techniques were input into both the knowledge base-optimized and unoptimized versions of the GPT-4, ChatGLM, and Spark LLM, with their generated responses recorded. The overall quality, accuracy, and comprehensiveness of these responses were evaluated by 3 experienced orthopedic surgeons.</p><p><strong>Results: </strong>Compared with their unoptimized LLMs, the optimized version of GPT-4 showed improvements of 15.3% in overall quality, 12.5% in accuracy, and 12.8% in comprehensiveness; ChatGLM showed improvements of 24.8%, 16.1%, and 19.6%, respectively; and Spark LLM showed improvements of 6.5%, 14.5%, and 24.7%, respectively.</p><p><strong>Conclusion: </strong>The optimization of knowledge bases significantly enhances the quality, accuracy, and comprehensiveness of the responses provided by the 3 models in the orthopedic field. Therefore, knowledge base optimization is an effective method for improving the performance of LLMs in specific fields.</p>","PeriodicalId":10820,"journal":{"name":"Current Medical Science","volume":" ","pages":"1001-1005"},"PeriodicalIF":2.0000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Medical Science","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11596-024-2929-4","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/5 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: This study aimed to evaluate and compare the effectiveness of knowledge base-optimized and unoptimized large language models (LLMs) in the field of orthopedics to explore optimization strategies for the application of LLMs in specific fields.

Methods: This research constructed a specialized knowledge base using clinical guidelines from the American Academy of Orthopaedic Surgeons (AAOS) and authoritative orthopedic publications. A total of 30 orthopedic-related questions covering aspects such as anatomical knowledge, disease diagnosis, fracture classification, treatment options, and surgical techniques were input into both the knowledge base-optimized and unoptimized versions of the GPT-4, ChatGLM, and Spark LLM, with their generated responses recorded. The overall quality, accuracy, and comprehensiveness of these responses were evaluated by 3 experienced orthopedic surgeons.

Results: Compared with their unoptimized LLMs, the optimized version of GPT-4 showed improvements of 15.3% in overall quality, 12.5% in accuracy, and 12.8% in comprehensiveness; ChatGLM showed improvements of 24.8%, 16.1%, and 19.6%, respectively; and Spark LLM showed improvements of 6.5%, 14.5%, and 24.7%, respectively.

Conclusion: The optimization of knowledge bases significantly enhances the quality, accuracy, and comprehensiveness of the responses provided by the 3 models in the orthopedic field. Therefore, knowledge base optimization is an effective method for improving the performance of LLMs in specific fields.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
增强骨科知识评估:专业生成语言模型优化的性能。
研究目的本研究旨在评估和比较知识库优化和未优化的大型语言模型(LLMs)在骨科领域的有效性,以探索在特定领域应用 LLMs 的优化策略:本研究利用美国矫形外科医师学会(AAOS)的临床指南和权威的矫形外科出版物构建了一个专门的知识库。研究人员向知识库优化版和未优化版的 GPT-4、ChatGLM 和 Spark LLM 输入了共 30 个骨科相关问题,这些问题涉及解剖知识、疾病诊断、骨折分类、治疗方案和手术技巧等方面,并记录了它们生成的回答。3 位经验丰富的骨科外科医生对这些回答的整体质量、准确性和全面性进行了评估:结果:与未优化的 LLM 相比,优化版 GPT-4 的整体质量提高了 15.3%,准确性提高了 12.5%,全面性提高了 12.8%;ChatGLM 分别提高了 24.8%、16.1% 和 19.6%;Spark LLM 分别提高了 6.5%、14.5% 和 24.7%:知识库的优化大大提高了骨科领域 3 个模型所提供响应的质量、准确性和全面性。因此,知识库优化是提高特定领域 LLM 性能的有效方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Current Medical Science
Current Medical Science Biochemistry, Genetics and Molecular Biology-Genetics
CiteScore
4.70
自引率
0.00%
发文量
126
期刊介绍: Current Medical Science provides a forum for peer-reviewed papers in the medical sciences, to promote academic exchange between Chinese researchers and doctors and their foreign counterparts. The journal covers the subjects of biomedicine such as physiology, biochemistry, molecular biology, pharmacology, pathology and pathophysiology, etc., and clinical research, such as surgery, internal medicine, obstetrics and gynecology, pediatrics and otorhinolaryngology etc. The articles appearing in Current Medical Science are mainly in English, with a very small number of its papers in German, to pay tribute to its German founder. This journal is the only medical periodical in Western languages sponsored by an educational institution located in the central part of China.
期刊最新文献
Qiliqiangxin Alleviates Imbalance of Inflammatory Cytokines in Patients with Dilated Cardiomyopathy: A Randomized Controlled Trial. Brain-computer Interaction in the Smart Era. Contribution of ECT2 to Tubulointerstitial Fibrosis in the Progression of Chronic Kidney Disease. Performance Assessment of GPT 4.0 on the Japanese Medical Licensing Examination. Application and Prospects of Deep Learning Technology in Fracture Diagnosis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1