PICOT questions and search strategies formulation: A novel approach using artificial intelligence automation.

IF 2.4 3区 医学 Q1 NURSING Journal of Nursing Scholarship Pub Date : 2024-11-24 DOI:10.1111/jnu.13036
Lucija Gosak, Gregor Štiglic, Lisiane Pruinelli, Dominika Vrbnjak
{"title":"PICOT questions and search strategies formulation: A novel approach using artificial intelligence automation.","authors":"Lucija Gosak, Gregor Štiglic, Lisiane Pruinelli, Dominika Vrbnjak","doi":"10.1111/jnu.13036","DOIUrl":null,"url":null,"abstract":"<p><strong>Aim: </strong>The aim of this study was to evaluate and compare artificial intelligence (AI)-based large language models (LLMs) (ChatGPT-3.5, Bing, and Bard) with human-based formulations in generating relevant clinical queries, using comprehensive methodological evaluations.</p><p><strong>Methods: </strong>To interact with the major LLMs ChatGPT-3.5, Bing Chat, and Google Bard, scripts and prompts were designed to formulate PICOT (population, intervention, comparison, outcome, time) clinical questions and search strategies. Quality of the LLMs responses was assessed using a descriptive approach and independent assessment by two researchers. To determine the number of hits, PubMed, Web of Science, Cochrane Library, and CINAHL Ultimate search results were imported separately, without search restrictions, with the search strings generated by the three LLMs and an additional one by the expert. Hits from one of the scenarios were also exported for relevance evaluation. The use of a single scenario was chosen to provide a focused analysis. Cronbach's alpha and intraclass correlation coefficient (ICC) were also calculated.</p><p><strong>Results: </strong>In five different scenarios, ChatGPT-3.5 generated 11,859 hits, Bing 1,376,854, Bard 16,583, and an expert 5919 hits. We then used the first scenario to assess the relevance of the obtained results. The human expert search approach resulted in 65.22% (56/105) relevant articles. Bing was the most accurate AI-based LLM with 70.79% (63/89), followed by ChatGPT-3.5 with 21.05% (12/45), and Bard with 13.29% (42/316) relevant hits. Based on the assessment of two evaluators, ChatGPT-3.5 received the highest score (M = 48.50; SD = 0.71). Results showed a high level of agreement between the two evaluators. Although ChatGPT-3.5 showed a lower percentage of relevant hits compared to Bing, this reflects the nuanced evaluation criteria, where the subjective evaluation prioritized contextual accuracy and quality over mere relevance.</p><p><strong>Conclusion: </strong>This study provides valuable insights into the ability of LLMs to formulate PICOT clinical questions and search strategies. AI-based LLMs, such as ChatGPT-3.5, demonstrate significant potential for augmenting clinical workflows, improving clinical query development, and supporting search strategies. However, the findings also highlight limitations that necessitate further refinement and continued human oversight.</p><p><strong>Clinical relevance: </strong>AI could assist nurses in formulating PICOT clinical questions and search strategies. AI-based LLMs offer valuable support to healthcare professionals by improving the structure of clinical questions and enhancing search strategies, thereby significantly increasing the efficiency of information retrieval.</p>","PeriodicalId":51091,"journal":{"name":"Journal of Nursing Scholarship","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Nursing Scholarship","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/jnu.13036","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NURSING","Score":null,"Total":0}
引用次数: 0

Abstract

Aim: The aim of this study was to evaluate and compare artificial intelligence (AI)-based large language models (LLMs) (ChatGPT-3.5, Bing, and Bard) with human-based formulations in generating relevant clinical queries, using comprehensive methodological evaluations.

Methods: To interact with the major LLMs ChatGPT-3.5, Bing Chat, and Google Bard, scripts and prompts were designed to formulate PICOT (population, intervention, comparison, outcome, time) clinical questions and search strategies. Quality of the LLMs responses was assessed using a descriptive approach and independent assessment by two researchers. To determine the number of hits, PubMed, Web of Science, Cochrane Library, and CINAHL Ultimate search results were imported separately, without search restrictions, with the search strings generated by the three LLMs and an additional one by the expert. Hits from one of the scenarios were also exported for relevance evaluation. The use of a single scenario was chosen to provide a focused analysis. Cronbach's alpha and intraclass correlation coefficient (ICC) were also calculated.

Results: In five different scenarios, ChatGPT-3.5 generated 11,859 hits, Bing 1,376,854, Bard 16,583, and an expert 5919 hits. We then used the first scenario to assess the relevance of the obtained results. The human expert search approach resulted in 65.22% (56/105) relevant articles. Bing was the most accurate AI-based LLM with 70.79% (63/89), followed by ChatGPT-3.5 with 21.05% (12/45), and Bard with 13.29% (42/316) relevant hits. Based on the assessment of two evaluators, ChatGPT-3.5 received the highest score (M = 48.50; SD = 0.71). Results showed a high level of agreement between the two evaluators. Although ChatGPT-3.5 showed a lower percentage of relevant hits compared to Bing, this reflects the nuanced evaluation criteria, where the subjective evaluation prioritized contextual accuracy and quality over mere relevance.

Conclusion: This study provides valuable insights into the ability of LLMs to formulate PICOT clinical questions and search strategies. AI-based LLMs, such as ChatGPT-3.5, demonstrate significant potential for augmenting clinical workflows, improving clinical query development, and supporting search strategies. However, the findings also highlight limitations that necessitate further refinement and continued human oversight.

Clinical relevance: AI could assist nurses in formulating PICOT clinical questions and search strategies. AI-based LLMs offer valuable support to healthcare professionals by improving the structure of clinical questions and enhancing search strategies, thereby significantly increasing the efficiency of information retrieval.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PICOT 问题和搜索策略的制定:使用人工智能自动化的新方法。
目的:本研究旨在利用综合方法评估和比较基于人工智能(AI)的大型语言模型(LLMs)(ChatGPT-3.5、Bing 和 Bard)与基于人类的表述在生成相关临床查询方面的作用:为了与主要的 LLMs(ChatGPT-3.5、Bing Chat 和 Google Bard)进行交互,设计了脚本和提示,以提出 PICOT(人群、干预、比较、结果、时间)临床问题和搜索策略。采用描述性方法评估 LLMs 回复的质量,并由两名研究人员进行独立评估。为了确定检索结果的数量,我们分别导入了 PubMed、Web of Science、Cochrane Library 和 CINAHL 的终极检索结果,没有检索限制,检索字符串由三位 LLM 生成,另外一位由专家生成。其中一种情况下的点击也被导出进行相关性评估。选择使用单一场景是为了进行重点分析。同时还计算了 Cronbach's alpha 和类内相关系数 (ICC):在五个不同的场景中,ChatGPT-3.5 生成了 11,859 次点击,必应生成了 1,376,854 次点击,巴德生成了 16,583 次点击,专家生成了 5919 次点击。然后,我们使用第一种情况来评估所获得结果的相关性。人类专家搜索方法得到了 65.22% (56/105)的相关文章。必应是最准确的基于人工智能的 LLM,准确率为 70.79%(63/89),其次是 ChatGPT-3.5,准确率为 21.05%(12/45),最后是 Bard,准确率为 13.29%(42/316)。根据两名评估人员的评估,ChatGPT-3.5 获得了最高分(M = 48.50;SD = 0.71)。结果显示,两位评估者的意见高度一致。虽然与必应相比,ChatGPT-3.5 显示的相关点击率较低,但这反映了评价标准的细微差别,即主观评价优先考虑上下文的准确性和质量,而非单纯的相关性:本研究为 LLMs 制定 PICOT 临床问题和搜索策略的能力提供了宝贵的见解。基于人工智能的 LLM(如 ChatGPT-3.5)在增强临床工作流程、改进临床查询开发和支持搜索策略方面展现出了巨大的潜力。然而,研究结果也凸显了其局限性,因此有必要进一步完善并继续进行人工监督:人工智能可以帮助护士制定 PICOT 临床问题和搜索策略。基于人工智能的 LLM 通过改进临床问题的结构和加强搜索策略,为医护人员提供了宝贵的支持,从而显著提高了信息检索的效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.30
自引率
5.90%
发文量
85
审稿时长
6-12 weeks
期刊介绍: This widely read and respected journal features peer-reviewed, thought-provoking articles representing research by some of the world’s leading nurse researchers. Reaching health professionals, faculty and students in 103 countries, the Journal of Nursing Scholarship is focused on health of people throughout the world. It is the official journal of Sigma Theta Tau International and it reflects the society’s dedication to providing the tools necessary to improve nursing care around the world.
期刊最新文献
Applying natural language processing to understand symptoms among older adults home healthcare patients with urinary incontinence. The efficacy of behavioral sleep intervention on sleep problems among children with attention-deficit hyperactivity disorder: A randomized controlled trial. PICOT questions and search strategies formulation: A novel approach using artificial intelligence automation. Low-value and high-value care recommendations in nursing: A systematic assessment of clinical practice guidelines. Issue Information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1