牙龈和牙髓健康患者查询中大型语言模型的全面性。

IF 3.2 3区 医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE International dental journal Pub Date : 2025-02-01 DOI:10.1016/j.identj.2024.06.022
Qian Zhang , Zhengyu Wu , Jinlin Song , Shuicai Luo , Zhaowu Chai
{"title":"牙龈和牙髓健康患者查询中大型语言模型的全面性。","authors":"Qian Zhang ,&nbsp;Zhengyu Wu ,&nbsp;Jinlin Song ,&nbsp;Shuicai Luo ,&nbsp;Zhaowu Chai","doi":"10.1016/j.identj.2024.06.022","DOIUrl":null,"url":null,"abstract":"<div><h3>Aim</h3><div>Given the increasing interest in using large language models (LLMs) for self-diagnosis, this study aimed to evaluate the comprehensiveness of two prominent LLMs, ChatGPT-3.5 and ChatGPT-4, in addressing common queries related to gingival and endodontic health across different language contexts and query types.</div></div><div><h3>Methods</h3><div>We assembled a set of 33 common real-life questions related to gingival and endodontic healthcare, including 17 common-sense questions and 16 expert questions. Each question was presented to the LLMs in both English and Chinese. Three specialists were invited to evaluate the comprehensiveness of the responses on a five-point Likert scale, where a higher score indicated greater quality responses.</div></div><div><h3>Results</h3><div>LLMs performed significantly better in English, with an average score of 4.53, compared to 3.95 in Chinese (Mann–Whitney <em>U</em> test, <em>P</em> &lt; .05). Responses to common sense questions received higher scores than those to expert questions, with averages of 4.46 and 4.02 (Mann–Whitney <em>U</em> test, <em>P</em> &lt; .05). Among the LLMs, ChatGPT-4 consistently outperformed ChatGPT-3.5, achieving average scores of 4.45 and 4.03 (Mann–Whitney <em>U</em> test, <em>P</em> &lt; .05).</div></div><div><h3>Conclusions</h3><div>ChatGPT-4 provides more comprehensive responses than ChatGPT-3.5 for queries related to gingival and endodontic health. Both LLMs perform better in English and on common sense questions. However, the performance discrepancies across different language contexts and the presence of inaccurate responses suggest that further evaluation and understanding of their limitations are crucial to avoid potential misunderstandings.</div></div><div><h3>Clinical Relevance</h3><div>This study revealed the performance differences of ChatGPT-3.5 and ChatGPT-4 in handling gingival and endodontic health issues across different language contexts, providing insights into the comprehensiveness and limitations of LLMs in addressing common oral healthcare queries.</div></div>","PeriodicalId":13785,"journal":{"name":"International dental journal","volume":"75 1","pages":"Pages 151-157"},"PeriodicalIF":3.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comprehensiveness of Large Language Models in Patient Queries on Gingival and Endodontic Health\",\"authors\":\"Qian Zhang ,&nbsp;Zhengyu Wu ,&nbsp;Jinlin Song ,&nbsp;Shuicai Luo ,&nbsp;Zhaowu Chai\",\"doi\":\"10.1016/j.identj.2024.06.022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Aim</h3><div>Given the increasing interest in using large language models (LLMs) for self-diagnosis, this study aimed to evaluate the comprehensiveness of two prominent LLMs, ChatGPT-3.5 and ChatGPT-4, in addressing common queries related to gingival and endodontic health across different language contexts and query types.</div></div><div><h3>Methods</h3><div>We assembled a set of 33 common real-life questions related to gingival and endodontic healthcare, including 17 common-sense questions and 16 expert questions. Each question was presented to the LLMs in both English and Chinese. Three specialists were invited to evaluate the comprehensiveness of the responses on a five-point Likert scale, where a higher score indicated greater quality responses.</div></div><div><h3>Results</h3><div>LLMs performed significantly better in English, with an average score of 4.53, compared to 3.95 in Chinese (Mann–Whitney <em>U</em> test, <em>P</em> &lt; .05). Responses to common sense questions received higher scores than those to expert questions, with averages of 4.46 and 4.02 (Mann–Whitney <em>U</em> test, <em>P</em> &lt; .05). Among the LLMs, ChatGPT-4 consistently outperformed ChatGPT-3.5, achieving average scores of 4.45 and 4.03 (Mann–Whitney <em>U</em> test, <em>P</em> &lt; .05).</div></div><div><h3>Conclusions</h3><div>ChatGPT-4 provides more comprehensive responses than ChatGPT-3.5 for queries related to gingival and endodontic health. Both LLMs perform better in English and on common sense questions. However, the performance discrepancies across different language contexts and the presence of inaccurate responses suggest that further evaluation and understanding of their limitations are crucial to avoid potential misunderstandings.</div></div><div><h3>Clinical Relevance</h3><div>This study revealed the performance differences of ChatGPT-3.5 and ChatGPT-4 in handling gingival and endodontic health issues across different language contexts, providing insights into the comprehensiveness and limitations of LLMs in addressing common oral healthcare queries.</div></div>\",\"PeriodicalId\":13785,\"journal\":{\"name\":\"International dental journal\",\"volume\":\"75 1\",\"pages\":\"Pages 151-157\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International dental journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020653924001953\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International dental journal","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020653924001953","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

目的:鉴于人们对使用大型语言模型(LLMs)进行自我诊断的兴趣与日俱增,本研究旨在评估两个著名的 LLMs(ChatGPT-3.5 和 ChatGPT-4)在不同语言环境和查询类型下处理牙龈和牙髓保健相关常见查询的全面性:我们收集了 33 个与牙龈和牙髓保健相关的常见现实问题,其中包括 17 个常识性问题和 16 个专家问题。每个问题都用中英文向法律硕士们展示。我们邀请了三位专家以李克特五点量表对回答的全面性进行评估,得分越高表示回答的质量越高:结果:法学硕士的英文成绩明显更好,平均分为 4.53,而中文成绩为 3.95(曼-惠特尼 U 检验,P < .05)。对常识问题的回答得分高于对专家问题的回答,平均分分别为 4.46 和 4.02(曼-惠特尼 U 检验,P < .05)。在 LLM 中,ChatGPT-4 始终优于 ChatGPT-3.5,平均得分分别为 4.45 和 4.03(曼-惠特尼 U 检验,P < .05):结论:与 ChatGPT-3.5 相比,ChatGPT-4 为牙龈和牙髓健康相关的询问提供了更全面的回答。两种 LLM 在英语和常识性问题上的表现都更好。然而,不同语言环境下的性能差异以及不准确回答的存在表明,进一步评估和了解它们的局限性对于避免潜在误解至关重要:本研究揭示了 ChatGPT-3.5 和 ChatGPT-4 在不同语言环境下处理牙龈和牙髓健康问题时的表现差异,为了解 LLMs 在解决常见口腔保健问题时的全面性和局限性提供了见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comprehensiveness of Large Language Models in Patient Queries on Gingival and Endodontic Health

Aim

Given the increasing interest in using large language models (LLMs) for self-diagnosis, this study aimed to evaluate the comprehensiveness of two prominent LLMs, ChatGPT-3.5 and ChatGPT-4, in addressing common queries related to gingival and endodontic health across different language contexts and query types.

Methods

We assembled a set of 33 common real-life questions related to gingival and endodontic healthcare, including 17 common-sense questions and 16 expert questions. Each question was presented to the LLMs in both English and Chinese. Three specialists were invited to evaluate the comprehensiveness of the responses on a five-point Likert scale, where a higher score indicated greater quality responses.

Results

LLMs performed significantly better in English, with an average score of 4.53, compared to 3.95 in Chinese (Mann–Whitney U test, P < .05). Responses to common sense questions received higher scores than those to expert questions, with averages of 4.46 and 4.02 (Mann–Whitney U test, P < .05). Among the LLMs, ChatGPT-4 consistently outperformed ChatGPT-3.5, achieving average scores of 4.45 and 4.03 (Mann–Whitney U test, P < .05).

Conclusions

ChatGPT-4 provides more comprehensive responses than ChatGPT-3.5 for queries related to gingival and endodontic health. Both LLMs perform better in English and on common sense questions. However, the performance discrepancies across different language contexts and the presence of inaccurate responses suggest that further evaluation and understanding of their limitations are crucial to avoid potential misunderstandings.

Clinical Relevance

This study revealed the performance differences of ChatGPT-3.5 and ChatGPT-4 in handling gingival and endodontic health issues across different language contexts, providing insights into the comprehensiveness and limitations of LLMs in addressing common oral healthcare queries.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International dental journal
International dental journal 医学-牙科与口腔外科
CiteScore
4.80
自引率
6.10%
发文量
159
审稿时长
63 days
期刊介绍: The International Dental Journal features peer-reviewed, scientific articles relevant to international oral health issues, as well as practical, informative articles aimed at clinicians.
期刊最新文献
Oral Cancer Knowledge and Screening Practices Among Dental Professionals in Yemen: a Web-Based Survey. Correlation of Occlusion Asymmetry and Temporomandibular Disorders: A Cross-Sectional Study. Sector-differences in Adults' Dental Care Service Utilisation: 11-year Register-based Observations. Comparative Effect of Different Preoperative Antibiotics on Dental Implant Failure: A Retrospective Cohort Study. Soft Tissue Augmentation by Electrospun Membranes embedded With Nano-Hydroxyapatite: Histologic and Volumetric Analyses.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1