Paul G Mastrokostas, Leonidas E Mastrokostas, Ahmed K Emara, Ian J Wellington, Elizabeth Ginalis, John K Houten, Amrit S Khalsa, Ahmed Saleh, Afshin E Razi, Mitchell K Ng
{"title":"GPT-4 as a Source of Patient Information for Anterior Cervical Discectomy and Fusion: A Comparative Analysis Against Google Web Search.","authors":"Paul G Mastrokostas, Leonidas E Mastrokostas, Ahmed K Emara, Ian J Wellington, Elizabeth Ginalis, John K Houten, Amrit S Khalsa, Ahmed Saleh, Afshin E Razi, Mitchell K Ng","doi":"10.1177/21925682241241241","DOIUrl":null,"url":null,"abstract":"<p><strong>Study design: </strong>Comparative study.</p><p><strong>Objectives: </strong>This study aims to compare Google and GPT-4 in terms of (1) question types, (2) response readability, (3) source quality, and (4) numerical response accuracy for the top 10 most frequently asked questions (FAQs) about anterior cervical discectomy and fusion (ACDF).</p><p><strong>Methods: </strong>\"Anterior cervical discectomy and fusion\" was searched on Google and GPT-4 on December 18, 2023. Top 10 FAQs were classified according to the Rothwell system. Source quality was evaluated using <i>JAMA</i> benchmark criteria and readability was assessed using Flesch Reading Ease and Flesch-Kincaid grade level. Differences in <i>JAMA</i> scores, Flesch-Kincaid grade level, Flesch Reading Ease, and word count between platforms were analyzed using Student's t-tests. Statistical significance was set at the .05 level.</p><p><strong>Results: </strong>Frequently asked questions from Google were varied, while GPT-4 focused on technical details and indications/management. GPT-4 showed a higher Flesch-Kincaid grade level (12.96 vs 9.28, <i>P</i> = .003), lower Flesch Reading Ease score (37.07 vs 54.85, <i>P</i> = .005), and higher <i>JAMA</i> scores for source quality (3.333 vs 1.800, <i>P</i> = .016). Numerically, 6 out of 10 responses varied between platforms, with GPT-4 providing broader recovery timelines for ACDF.</p><p><strong>Conclusions: </strong>This study demonstrates GPT-4's ability to elevate patient education by providing high-quality, diverse information tailored to those with advanced literacy levels. As AI technology evolves, refining these tools for accuracy and user-friendliness remains crucial, catering to patients' varying literacy levels and information needs in spine surgery.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529100/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/21925682241241241","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/21 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0
Abstract
Study design: Comparative study.
Objectives: This study aims to compare Google and GPT-4 in terms of (1) question types, (2) response readability, (3) source quality, and (4) numerical response accuracy for the top 10 most frequently asked questions (FAQs) about anterior cervical discectomy and fusion (ACDF).
Methods: "Anterior cervical discectomy and fusion" was searched on Google and GPT-4 on December 18, 2023. Top 10 FAQs were classified according to the Rothwell system. Source quality was evaluated using JAMA benchmark criteria and readability was assessed using Flesch Reading Ease and Flesch-Kincaid grade level. Differences in JAMA scores, Flesch-Kincaid grade level, Flesch Reading Ease, and word count between platforms were analyzed using Student's t-tests. Statistical significance was set at the .05 level.
Results: Frequently asked questions from Google were varied, while GPT-4 focused on technical details and indications/management. GPT-4 showed a higher Flesch-Kincaid grade level (12.96 vs 9.28, P = .003), lower Flesch Reading Ease score (37.07 vs 54.85, P = .005), and higher JAMA scores for source quality (3.333 vs 1.800, P = .016). Numerically, 6 out of 10 responses varied between platforms, with GPT-4 providing broader recovery timelines for ACDF.
Conclusions: This study demonstrates GPT-4's ability to elevate patient education by providing high-quality, diverse information tailored to those with advanced literacy levels. As AI technology evolves, refining these tools for accuracy and user-friendliness remains crucial, catering to patients' varying literacy levels and information needs in spine surgery.