Chat Generative Pre-Trained Transformer (ChatGPT) - 3.5 Responses Require Advanced Readability for the General Population and May Not Effectively Supplement Patient-Related Information Provided by the Treating Surgeon Regarding Common Questions About Rotator Cuff Repair.
Emma Eng, Colton Mowers, Divesh Sachdev, Payton Yerke-Hansen, Garrett R Jackson, Derrick M Knapik, Vani J Sabesan
{"title":"Chat Generative Pre-Trained Transformer (ChatGPT) - 3.5 Responses Require Advanced Readability for the General Population and May Not Effectively Supplement Patient-Related Information Provided by the Treating Surgeon Regarding Common Questions About Rotator Cuff Repair.","authors":"Emma Eng, Colton Mowers, Divesh Sachdev, Payton Yerke-Hansen, Garrett R Jackson, Derrick M Knapik, Vani J Sabesan","doi":"10.1016/j.arthro.2024.05.009","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To investigate the accuracy of Chat Generative Pre-Trained Transformer (ChatGPT)'s responses to frequently asked questions prior to rotator cuff repair surgery.</p><p><strong>Methods: </strong>The 10 most common frequently asked questions related to rotator cuff repair were compiled from 4 institution websites. Questions were then input into ChatGPT-3.5 in 1 session. The provided ChatGPT-3.5 responses were analyzed by 2 orthopaedic surgeons for reliability, quality, and readability using the Journal of the American Medical Association Benchmark criteria, the DISCERN score, and the Flesch-Kincaid Grade Level.</p><p><strong>Results: </strong>The Journal of the American Medical Association Benchmark criteria score was 0, indicating the absence of reliable source material citations. The mean Flesch-Kincaid Grade Level was 13.4 (range, 11.2-15.0). The mean DISCERN score was 43.4 (range, 36-51), indicating that the quality of the responses overall was considered fair. All responses cited making final decision-making to be made with the treating physician.</p><p><strong>Conclusions: </strong>ChatGPT-3.5 provided substandard patient-related information in alignment with recommendations from the treating surgeon regarding common questions around rotator cuff repair surgery. Additionally, the responses lacked reliable source material citations, and the readability of the responses was relatively advanced with a complex language style.</p><p><strong>Clinical relevance: </strong>The findings of this study suggest that ChatGPT-3.5 may not effectively supplement patient-related information in the context of recommendations provided by the treating surgeon prior to rotator cuff repair surgery.</p>","PeriodicalId":55459,"journal":{"name":"Arthroscopy-The Journal of Arthroscopic and Related Surgery","volume":" ","pages":"42-52"},"PeriodicalIF":4.4000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arthroscopy-The Journal of Arthroscopic and Related Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.arthro.2024.05.009","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/21 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: To investigate the accuracy of Chat Generative Pre-Trained Transformer (ChatGPT)'s responses to frequently asked questions prior to rotator cuff repair surgery.
Methods: The 10 most common frequently asked questions related to rotator cuff repair were compiled from 4 institution websites. Questions were then input into ChatGPT-3.5 in 1 session. The provided ChatGPT-3.5 responses were analyzed by 2 orthopaedic surgeons for reliability, quality, and readability using the Journal of the American Medical Association Benchmark criteria, the DISCERN score, and the Flesch-Kincaid Grade Level.
Results: The Journal of the American Medical Association Benchmark criteria score was 0, indicating the absence of reliable source material citations. The mean Flesch-Kincaid Grade Level was 13.4 (range, 11.2-15.0). The mean DISCERN score was 43.4 (range, 36-51), indicating that the quality of the responses overall was considered fair. All responses cited making final decision-making to be made with the treating physician.
Conclusions: ChatGPT-3.5 provided substandard patient-related information in alignment with recommendations from the treating surgeon regarding common questions around rotator cuff repair surgery. Additionally, the responses lacked reliable source material citations, and the readability of the responses was relatively advanced with a complex language style.
Clinical relevance: The findings of this study suggest that ChatGPT-3.5 may not effectively supplement patient-related information in the context of recommendations provided by the treating surgeon prior to rotator cuff repair surgery.
期刊介绍:
Nowhere is minimally invasive surgery explained better than in Arthroscopy, the leading peer-reviewed journal in the field. Every issue enables you to put into perspective the usefulness of the various emerging arthroscopic techniques. The advantages and disadvantages of these methods -- along with their applications in various situations -- are discussed in relation to their efficiency, efficacy and cost benefit. As a special incentive, paid subscribers also receive access to the journal expanded website.