Alexander Artamonov M.D. , Ira Bachar-Avnieli M.D. , Eyal Klang M.D. , Omri Lubovsky M.D. , Ehud Atoun M.D. , Alexander Bermant M.D. , Philip J. Rosinsky M.D.
{"title":"ChatGPT-4 的回复与肩关节前方不稳专家共识声明的相关性有限","authors":"Alexander Artamonov M.D. , Ira Bachar-Avnieli M.D. , Eyal Klang M.D. , Omri Lubovsky M.D. , Ehud Atoun M.D. , Alexander Bermant M.D. , Philip J. Rosinsky M.D.","doi":"10.1016/j.asmr.2024.100923","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>To compare the similarity of answers provided by Generative Pretrained Transformer-4 (GPT-4) with those of a consensus statement on diagnosis, nonoperative management, and Bankart repair in anterior shoulder instability (ASI).</p></div><div><h3>Methods</h3><p>An expert consensus statement on ASI published by Hurley et al. in 2022 was reviewed and questions laid out to the expert panel were extracted. GPT-4, the subscription version of ChatGPT, was queried using the same set of questions. Answers provided by GPT-4 were compared with those of the expert panel and subjectively rated for similarity by 2 experienced shoulder surgeons. GPT-4 was then used to rate the similarity of its own responses to the consensus statement, classifying them as low, medium, or high. Rates of similarity as classified by the shoulder surgeons and GPT-4 were then compared and interobserver reliability calculated using weighted κ scores.</p></div><div><h3>Results</h3><p>The degree of similarity between responses of GPT-4 and the ASI consensus statement, as defined by shoulder surgeons, was high in 25.8%, medium in 45.2%, and low 29% of questions. GPT-4 assessed similarity as high in 48.3%, medium in 41.9%, and low 9.7% of questions. Surgeons and GPT-4 reached consensus on the classification of 18 questions (58.1%) and disagreement on 13 questions (41.9%).</p></div><div><h3>Conclusions</h3><p>The responses generated by artificial intelligence exhibit limited correlation with an expert statement on the diagnosis and treatment of ASI.</p></div><div><h3>Clinical Relevance</h3><p>As the use of artificial intelligence becomes more prevalent, it is important to understand how closely information resembles content produced by human authors.</p></div>","PeriodicalId":34631,"journal":{"name":"Arthroscopy Sports Medicine and Rehabilitation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666061X24000415/pdfft?md5=64dacbe11c8dcaec53b3b836778ff98c&pid=1-s2.0-S2666061X24000415-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Responses From ChatGPT-4 Show Limited Correlation With Expert Consensus Statement on Anterior Shoulder Instability\",\"authors\":\"Alexander Artamonov M.D. , Ira Bachar-Avnieli M.D. , Eyal Klang M.D. , Omri Lubovsky M.D. , Ehud Atoun M.D. , Alexander Bermant M.D. , Philip J. Rosinsky M.D.\",\"doi\":\"10.1016/j.asmr.2024.100923\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><p>To compare the similarity of answers provided by Generative Pretrained Transformer-4 (GPT-4) with those of a consensus statement on diagnosis, nonoperative management, and Bankart repair in anterior shoulder instability (ASI).</p></div><div><h3>Methods</h3><p>An expert consensus statement on ASI published by Hurley et al. in 2022 was reviewed and questions laid out to the expert panel were extracted. GPT-4, the subscription version of ChatGPT, was queried using the same set of questions. Answers provided by GPT-4 were compared with those of the expert panel and subjectively rated for similarity by 2 experienced shoulder surgeons. GPT-4 was then used to rate the similarity of its own responses to the consensus statement, classifying them as low, medium, or high. Rates of similarity as classified by the shoulder surgeons and GPT-4 were then compared and interobserver reliability calculated using weighted κ scores.</p></div><div><h3>Results</h3><p>The degree of similarity between responses of GPT-4 and the ASI consensus statement, as defined by shoulder surgeons, was high in 25.8%, medium in 45.2%, and low 29% of questions. GPT-4 assessed similarity as high in 48.3%, medium in 41.9%, and low 9.7% of questions. Surgeons and GPT-4 reached consensus on the classification of 18 questions (58.1%) and disagreement on 13 questions (41.9%).</p></div><div><h3>Conclusions</h3><p>The responses generated by artificial intelligence exhibit limited correlation with an expert statement on the diagnosis and treatment of ASI.</p></div><div><h3>Clinical Relevance</h3><p>As the use of artificial intelligence becomes more prevalent, it is important to understand how closely information resembles content produced by human authors.</p></div>\",\"PeriodicalId\":34631,\"journal\":{\"name\":\"Arthroscopy Sports Medicine and Rehabilitation\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666061X24000415/pdfft?md5=64dacbe11c8dcaec53b3b836778ff98c&pid=1-s2.0-S2666061X24000415-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Arthroscopy Sports Medicine and Rehabilitation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666061X24000415\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arthroscopy Sports Medicine and Rehabilitation","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666061X24000415","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
Responses From ChatGPT-4 Show Limited Correlation With Expert Consensus Statement on Anterior Shoulder Instability
Purpose
To compare the similarity of answers provided by Generative Pretrained Transformer-4 (GPT-4) with those of a consensus statement on diagnosis, nonoperative management, and Bankart repair in anterior shoulder instability (ASI).
Methods
An expert consensus statement on ASI published by Hurley et al. in 2022 was reviewed and questions laid out to the expert panel were extracted. GPT-4, the subscription version of ChatGPT, was queried using the same set of questions. Answers provided by GPT-4 were compared with those of the expert panel and subjectively rated for similarity by 2 experienced shoulder surgeons. GPT-4 was then used to rate the similarity of its own responses to the consensus statement, classifying them as low, medium, or high. Rates of similarity as classified by the shoulder surgeons and GPT-4 were then compared and interobserver reliability calculated using weighted κ scores.
Results
The degree of similarity between responses of GPT-4 and the ASI consensus statement, as defined by shoulder surgeons, was high in 25.8%, medium in 45.2%, and low 29% of questions. GPT-4 assessed similarity as high in 48.3%, medium in 41.9%, and low 9.7% of questions. Surgeons and GPT-4 reached consensus on the classification of 18 questions (58.1%) and disagreement on 13 questions (41.9%).
Conclusions
The responses generated by artificial intelligence exhibit limited correlation with an expert statement on the diagnosis and treatment of ASI.
Clinical Relevance
As the use of artificial intelligence becomes more prevalent, it is important to understand how closely information resembles content produced by human authors.