Enhancing systematic reviews in orthodontics: a comparative examination of GPT-3.5 and GPT-4 for generating PICO-based queries with tailored prompts and configurations.

IF 2.8 3区 医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE European journal of orthodontics Pub Date : 2024-04-01 DOI:10.1093/ejo/cjae011
Gizem Boztaş Demir, Yağızalp Süküt, Gökhan Serhat Duran, Kübra Gülnur Topsakal, Serkan Görgülü
{"title":"Enhancing systematic reviews in orthodontics: a comparative examination of GPT-3.5 and GPT-4 for generating PICO-based queries with tailored prompts and configurations.","authors":"Gizem Boztaş Demir, Yağızalp Süküt, Gökhan Serhat Duran, Kübra Gülnur Topsakal, Serkan Görgülü","doi":"10.1093/ejo/cjae011","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>The rapid advancement of Large Language Models (LLMs) has prompted an exploration of their efficacy in generating PICO-based (Patient, Intervention, Comparison, Outcome) queries, especially in the field of orthodontics. This study aimed to assess the usability of Large Language Models (LLMs), in aiding systematic review processes, with a specific focus on comparing the performance of ChatGPT 3.5 and ChatGPT 4 using a specialized prompt tailored for orthodontics.</p><p><strong>Materials/methods: </strong>Five databases were perused to curate a sample of 77 systematic reviews and meta-analyses published between 2016 and 2021. Utilizing prompt engineering techniques, the LLMs were directed to formulate PICO questions, Boolean queries, and relevant keywords. The outputs were subsequently evaluated for accuracy and consistency by independent researchers using three-point and six-point Likert scales. Furthermore, the PICO records of 41 studies, which were compatible with the PROSPERO records, were compared with the responses provided by the models.</p><p><strong>Results: </strong>ChatGPT 3.5 and 4 showcased a consistent ability to craft PICO-based queries. Statistically significant differences in accuracy were observed in specific categories, with GPT-4 often outperforming GPT-3.5.</p><p><strong>Limitations: </strong>The study's test set might not encapsulate the full range of LLM application scenarios. Emphasis on specific question types may also not reflect the complete capabilities of the models.</p><p><strong>Conclusions/implications: </strong>Both ChatGPT 3.5 and 4 can be pivotal tools for generating PICO-driven queries in orthodontics when optimally configured. However, the precision required in medical research necessitates a judicious and critical evaluation of LLM-generated outputs, advocating for a circumspect integration into scientific investigations.</p>","PeriodicalId":11989,"journal":{"name":"European journal of orthodontics","volume":"46 2","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European journal of orthodontics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ejo/cjae011","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: The rapid advancement of Large Language Models (LLMs) has prompted an exploration of their efficacy in generating PICO-based (Patient, Intervention, Comparison, Outcome) queries, especially in the field of orthodontics. This study aimed to assess the usability of Large Language Models (LLMs), in aiding systematic review processes, with a specific focus on comparing the performance of ChatGPT 3.5 and ChatGPT 4 using a specialized prompt tailored for orthodontics.

Materials/methods: Five databases were perused to curate a sample of 77 systematic reviews and meta-analyses published between 2016 and 2021. Utilizing prompt engineering techniques, the LLMs were directed to formulate PICO questions, Boolean queries, and relevant keywords. The outputs were subsequently evaluated for accuracy and consistency by independent researchers using three-point and six-point Likert scales. Furthermore, the PICO records of 41 studies, which were compatible with the PROSPERO records, were compared with the responses provided by the models.

Results: ChatGPT 3.5 and 4 showcased a consistent ability to craft PICO-based queries. Statistically significant differences in accuracy were observed in specific categories, with GPT-4 often outperforming GPT-3.5.

Limitations: The study's test set might not encapsulate the full range of LLM application scenarios. Emphasis on specific question types may also not reflect the complete capabilities of the models.

Conclusions/implications: Both ChatGPT 3.5 and 4 can be pivotal tools for generating PICO-driven queries in orthodontics when optimally configured. However, the precision required in medical research necessitates a judicious and critical evaluation of LLM-generated outputs, advocating for a circumspect integration into scientific investigations.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
加强口腔正畸方面的系统综述:GPT-3.5 和 GPT-4 在生成基于 PICO 的定制提示和配置查询方面的比较研究。
目的:大语言模型(LLMs)的快速发展促使人们探索其在生成基于 PICO(患者、干预、比较、结果)的查询方面的功效,尤其是在口腔正畸领域。本研究旨在评估大型语言模型(LLMs)在辅助系统性综述过程中的可用性,重点是使用专为口腔正畸定制的提示比较 ChatGPT 3.5 和 ChatGPT 4 的性能:我们浏览了五个数据库,整理出 2016 年至 2021 年间发表的 77 篇系统综述和荟萃分析样本。利用提示工程技术,指导 LLMs 提出 PICO 问题、布尔查询和相关关键词。随后,独立研究人员使用三点和六点李克特量表对结果的准确性和一致性进行了评估。此外,41 项研究的 PICO 记录(与 PROSPERO 记录一致)与模型提供的回答进行了比较:结果:ChatGPT 3.5 和 4 展示了制作基于 PICO 的查询的一致能力。在特定类别中,准确率存在明显的统计学差异,GPT-4 通常优于 GPT-3.5:局限性:本研究的测试集可能无法囊括所有的 LLM 应用场景。结论/影响:ChatGPT 3.5 和 ChatGPT 4 经过优化配置后,可以成为在正畸学中生成 PICO 驱动查询的重要工具。然而,由于医学研究需要精确性,因此有必要对 LLM 生成的结果进行审慎和批判性的评估,提倡谨慎地将其整合到科学研究中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
European journal of orthodontics
European journal of orthodontics 医学-牙科与口腔外科
CiteScore
5.50
自引率
7.70%
发文量
71
审稿时长
4-8 weeks
期刊介绍: The European Journal of Orthodontics publishes papers of excellence on all aspects of orthodontics including craniofacial development and growth. The emphasis of the journal is on full research papers. Succinct and carefully prepared papers are favoured in terms of impact as well as readability.
期刊最新文献
Clinical risk factors caused by third molar levelling following extraction of a mandibular second molar. Does incisor inclination change during orthodontic treatment affect gingival thickness and the width of keratinized gingiva? A prospective controlled study. Roles of B-cell lymphoma 6 in orthodontic tooth movement of rat molars. Influence of genetic and environmental factors on transverse growth. The effect of micro-osteoperforation (MOP) in molar distalization treatments: an exploratory systematic review and meta-analysis of RCTs.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1