ScholarGPT 在口腔颌面外科方面的表现。

IF 2 3区医学 Q2 DENTISTRY, ORAL SURGERY & MEDICINE Journal of Stomatology Oral and Maxillofacial Surgery Pub Date : 2025-09-01 Epub Date: 2024-10-09 DOI:10.1016/j.jormas.2024.102114

Yunus Balel

{"title":"ScholarGPT 在口腔颌面外科方面的表现。","authors":"Yunus Balel","doi":"10.1016/j.jormas.2024.102114","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>The purpose of this study is to evaluate the performance of Scholar GPT in answering technical questions in the field of oral and maxillofacial surgery and to conduct a comparative analysis with the results of a previous study that assessed the performance of ChatGPT.</div></div><div><h3>Materials and Methods</h3><div>Scholar GPT was accessed via ChatGPT (<span><span>www.chatgpt.com</span><svg><path></path></svg></span>) on March 20, 2024. A total of 60 technical questions (15 each on impacted teeth, dental implants, temporomandibular joint disorders, and orthognathic surgery) from our previous study were used. Scholar GPT's responses were evaluated using a modified Global Quality Scale (GQS). The questions were randomized before scoring using an online randomizer (<span><span>www.randomizer.org</span><svg><path></path></svg></span>). A single researcher performed the evaluations at three different times, three weeks apart, with each evaluation preceded by a new randomization. In cases of score discrepancies, a fourth evaluation was conducted to determine the final score.</div></div><div><h3>Results</h3><div>Scholar GPT performed well across all technical questions, with an average GQS score of 4.48 (SD=0.93). Comparatively, ChatGPT's average GQS score in previous study was 3.1 (SD=1.492). The Wilcoxon Signed-Rank Test indicated a statistically significant higher average score for Scholar GPT compared to ChatGPT (Mean Difference = 2.00, SE = 0.163, <em>p</em> < 0.001). The Kruskal-Wallis Test showed no statistically significant differences among the topic groups (χ² = 0.799, df = 3, <em>p</em> = 0.850, ε² = 0.0135).</div></div><div><h3>Conclusion</h3><div>Scholar GPT demonstrated a generally high performance in technical questions within oral and maxillofacial surgery and produced more consistent and higher-quality responses compared to ChatGPT. The findings suggest that GPT models based on academic databases can provide more accurate and reliable information. Additionally, developing a specialized GPT model for oral and maxillofacial surgery could ensure higher quality and consistency in artificial intelligence-generated information.</div></div>","PeriodicalId":55993,"journal":{"name":"Journal of Stomatology Oral and Maxillofacial Surgery","volume":"126 4","pages":"Article 102114"},"PeriodicalIF":2.0000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ScholarGPT's performance in oral and maxillofacial surgery\",\"authors\":\"Yunus Balel\",\"doi\":\"10.1016/j.jormas.2024.102114\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><div>The purpose of this study is to evaluate the performance of Scholar GPT in answering technical questions in the field of oral and maxillofacial surgery and to conduct a comparative analysis with the results of a previous study that assessed the performance of ChatGPT.</div></div><div><h3>Materials and Methods</h3><div>Scholar GPT was accessed via ChatGPT (<span><span>www.chatgpt.com</span><svg><path></path></svg></span>) on March 20, 2024. A total of 60 technical questions (15 each on impacted teeth, dental implants, temporomandibular joint disorders, and orthognathic surgery) from our previous study were used. Scholar GPT's responses were evaluated using a modified Global Quality Scale (GQS). The questions were randomized before scoring using an online randomizer (<span><span>www.randomizer.org</span><svg><path></path></svg></span>). A single researcher performed the evaluations at three different times, three weeks apart, with each evaluation preceded by a new randomization. In cases of score discrepancies, a fourth evaluation was conducted to determine the final score.</div></div><div><h3>Results</h3><div>Scholar GPT performed well across all technical questions, with an average GQS score of 4.48 (SD=0.93). Comparatively, ChatGPT's average GQS score in previous study was 3.1 (SD=1.492). The Wilcoxon Signed-Rank Test indicated a statistically significant higher average score for Scholar GPT compared to ChatGPT (Mean Difference = 2.00, SE = 0.163, <em>p</em> < 0.001). The Kruskal-Wallis Test showed no statistically significant differences among the topic groups (χ² = 0.799, df = 3, <em>p</em> = 0.850, ε² = 0.0135).</div></div><div><h3>Conclusion</h3><div>Scholar GPT demonstrated a generally high performance in technical questions within oral and maxillofacial surgery and produced more consistent and higher-quality responses compared to ChatGPT. The findings suggest that GPT models based on academic databases can provide more accurate and reliable information. Additionally, developing a specialized GPT model for oral and maxillofacial surgery could ensure higher quality and consistency in artificial intelligence-generated information.</div></div>\",\"PeriodicalId\":55993,\"journal\":{\"name\":\"Journal of Stomatology Oral and Maxillofacial Surgery\",\"volume\":\"126 4\",\"pages\":\"Article 102114\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Stomatology Oral and Maxillofacial Surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468785524004038\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/10/9 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Stomatology Oral and Maxillofacial Surgery","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468785524004038","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/9 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

摘要

研究目的本研究的目的是评估学者 GPT 在回答口腔颌面外科领域技术问题方面的性能，并与之前评估 ChatGPT 性能的研究结果进行对比分析：学者 GPT 于 2024 年 3 月 20 日通过 ChatGPT (www.chatgpt.com) 访问。共使用了我们之前研究中的 60 个技术问题（关于阻生牙、种植牙、颞下颌关节紊乱和正颌外科手术的问题各 15 个）。学者 GPT 的回答采用修改后的全球质量量表 (GQS) 进行评估。在评分前，使用在线随机器（www.randomizer.org）对问题进行了随机化。由一名研究人员在三个不同的时间进行评估，时间间隔为三周，每次评估前都会进行新的随机化。如果出现分数差异，则进行第四次评估以确定最终分数：学者 GPT 在所有技术问题上都表现良好，平均 GQS 得分为 4.48（SD=0.93）。相比之下，ChatGPT 在之前研究中的平均 GQS 得分为 3.1（标准差=1.492）。Wilcoxon Signed-Rank 检验表明，与 ChatGPT 相比，Scholar GPT 的平均得分明显更高（平均差异 = 2.00，SE = 0.163，p < 0.001）。Kruskal-Wallis 检验表明，各主题组之间没有显著的统计学差异（χ² = 0.799，df = 3，p = 0.850，ε² = 0.0135）：与 ChatGPT 相比，学者 GPT 在口腔颌面外科技术问题上表现出普遍较高的性能，并能产生更一致、更高质量的回答。研究结果表明，基于学术数据库的 GPT 模型可以提供更准确、更可靠的信息。此外，为口腔颌面外科开发专门的 GPT 模型可以确保人工智能生成的信息具有更高的质量和一致性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ScholarGPT's performance in oral and maxillofacial surgery

Objective

The purpose of this study is to evaluate the performance of Scholar GPT in answering technical questions in the field of oral and maxillofacial surgery and to conduct a comparative analysis with the results of a previous study that assessed the performance of ChatGPT.

Materials and Methods

Scholar GPT was accessed via ChatGPT (www.chatgpt.com) on March 20, 2024. A total of 60 technical questions (15 each on impacted teeth, dental implants, temporomandibular joint disorders, and orthognathic surgery) from our previous study were used. Scholar GPT's responses were evaluated using a modified Global Quality Scale (GQS). The questions were randomized before scoring using an online randomizer (www.randomizer.org). A single researcher performed the evaluations at three different times, three weeks apart, with each evaluation preceded by a new randomization. In cases of score discrepancies, a fourth evaluation was conducted to determine the final score.

Results

Scholar GPT performed well across all technical questions, with an average GQS score of 4.48 (SD=0.93). Comparatively, ChatGPT's average GQS score in previous study was 3.1 (SD=1.492). The Wilcoxon Signed-Rank Test indicated a statistically significant higher average score for Scholar GPT compared to ChatGPT (Mean Difference = 2.00, SE = 0.163, p < 0.001). The Kruskal-Wallis Test showed no statistically significant differences among the topic groups (χ² = 0.799, df = 3, p = 0.850, ε² = 0.0135).

Conclusion

Scholar GPT demonstrated a generally high performance in technical questions within oral and maxillofacial surgery and produced more consistent and higher-quality responses compared to ChatGPT. The findings suggest that GPT models based on academic databases can provide more accurate and reliable information. Additionally, developing a specialized GPT model for oral and maxillofacial surgery could ensure higher quality and consistency in artificial intelligence-generated information.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊