Eric J Robinson, Chunyuan Qiu, Stuart Sands, Mohammad Khan, Shivang Vora, Kenichiro Oshima, Khang Nguyen, L Andrew DiFronzo, David Rhew, Mark I Feng
{"title":"泌尿外科医生与人工智能生成的信息:患者和医生对准确性、完整性和偏好的评估。","authors":"Eric J Robinson, Chunyuan Qiu, Stuart Sands, Mohammad Khan, Shivang Vora, Kenichiro Oshima, Khang Nguyen, L Andrew DiFronzo, David Rhew, Mark I Feng","doi":"10.1007/s00345-024-05399-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To evaluate the accuracy, comprehensiveness, empathetic tone, and patient preference for AI and urologist responses to patient messages concerning common BPH questions across phases of care.</p><p><strong>Methods: </strong>Cross-sectional study evaluating responses to 20 BPH-related questions generated by 2 AI chatbots and 4 urologists in a simulated clinical messaging environment without direct patient interaction. Accuracy, completeness, and empathetic tone of responses assessed by experts using Likert scales, and preferences and perceptions of authorship (chatbot vs. human) rated by non-medical evaluators.</p><p><strong>Results: </strong>Five non-medical volunteers independently evaluated, ranked, and inferred the source for 120 responses (n = 600 total). For volunteer evaluations, the mean (SD) score of chatbots, 3.0 (1.4) (moderately empathetic) was significantly higher than urologists, 2.1 (1.1) (slightly empathetic) (p < 0.001); mean (SD) and preference ranking for chatbots, 2.6 (1.6), was significantly higher than urologist ranking, 3.9 (1.6) (p < 0.001). Two subject matter experts (SMEs) independently evaluated 120 responses each (answers to 20 questions from 4 urologist and 2 chatbots, n = 240 total). For SME evaluations, mean (SD) accuracy score for chatbots was 4.5 (1.1) (nearly all correct) and not significantly different than urologists, 4.6 (1.2). The mean (SD) completeness score for chatbots was 2.4 (0.8) (comprehensive), significantly higher than urologists, 1.6 (0.6) (adequate) (p < 0.001).</p><p><strong>Conclusion: </strong>Answers to patient BPH messages generated by chatbots were evaluated by experts as equally accurate and more complete than urologist answers. Non-medical volunteers preferred chatbot-generated messages and considered them more empathetic compared to answers generated by urologists.</p>","PeriodicalId":23954,"journal":{"name":"World Journal of Urology","volume":"43 1","pages":"48"},"PeriodicalIF":2.8000,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11680670/pdf/","citationCount":"0","resultStr":"{\"title\":\"Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians.\",\"authors\":\"Eric J Robinson, Chunyuan Qiu, Stuart Sands, Mohammad Khan, Shivang Vora, Kenichiro Oshima, Khang Nguyen, L Andrew DiFronzo, David Rhew, Mark I Feng\",\"doi\":\"10.1007/s00345-024-05399-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>To evaluate the accuracy, comprehensiveness, empathetic tone, and patient preference for AI and urologist responses to patient messages concerning common BPH questions across phases of care.</p><p><strong>Methods: </strong>Cross-sectional study evaluating responses to 20 BPH-related questions generated by 2 AI chatbots and 4 urologists in a simulated clinical messaging environment without direct patient interaction. Accuracy, completeness, and empathetic tone of responses assessed by experts using Likert scales, and preferences and perceptions of authorship (chatbot vs. human) rated by non-medical evaluators.</p><p><strong>Results: </strong>Five non-medical volunteers independently evaluated, ranked, and inferred the source for 120 responses (n = 600 total). For volunteer evaluations, the mean (SD) score of chatbots, 3.0 (1.4) (moderately empathetic) was significantly higher than urologists, 2.1 (1.1) (slightly empathetic) (p < 0.001); mean (SD) and preference ranking for chatbots, 2.6 (1.6), was significantly higher than urologist ranking, 3.9 (1.6) (p < 0.001). Two subject matter experts (SMEs) independently evaluated 120 responses each (answers to 20 questions from 4 urologist and 2 chatbots, n = 240 total). For SME evaluations, mean (SD) accuracy score for chatbots was 4.5 (1.1) (nearly all correct) and not significantly different than urologists, 4.6 (1.2). The mean (SD) completeness score for chatbots was 2.4 (0.8) (comprehensive), significantly higher than urologists, 1.6 (0.6) (adequate) (p < 0.001).</p><p><strong>Conclusion: </strong>Answers to patient BPH messages generated by chatbots were evaluated by experts as equally accurate and more complete than urologist answers. Non-medical volunteers preferred chatbot-generated messages and considered them more empathetic compared to answers generated by urologists.</p>\",\"PeriodicalId\":23954,\"journal\":{\"name\":\"World Journal of Urology\",\"volume\":\"43 1\",\"pages\":\"48\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-12-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11680670/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"World Journal of Urology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s00345-024-05399-y\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"UROLOGY & NEPHROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Urology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00345-024-05399-y","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians.
Purpose: To evaluate the accuracy, comprehensiveness, empathetic tone, and patient preference for AI and urologist responses to patient messages concerning common BPH questions across phases of care.
Methods: Cross-sectional study evaluating responses to 20 BPH-related questions generated by 2 AI chatbots and 4 urologists in a simulated clinical messaging environment without direct patient interaction. Accuracy, completeness, and empathetic tone of responses assessed by experts using Likert scales, and preferences and perceptions of authorship (chatbot vs. human) rated by non-medical evaluators.
Results: Five non-medical volunteers independently evaluated, ranked, and inferred the source for 120 responses (n = 600 total). For volunteer evaluations, the mean (SD) score of chatbots, 3.0 (1.4) (moderately empathetic) was significantly higher than urologists, 2.1 (1.1) (slightly empathetic) (p < 0.001); mean (SD) and preference ranking for chatbots, 2.6 (1.6), was significantly higher than urologist ranking, 3.9 (1.6) (p < 0.001). Two subject matter experts (SMEs) independently evaluated 120 responses each (answers to 20 questions from 4 urologist and 2 chatbots, n = 240 total). For SME evaluations, mean (SD) accuracy score for chatbots was 4.5 (1.1) (nearly all correct) and not significantly different than urologists, 4.6 (1.2). The mean (SD) completeness score for chatbots was 2.4 (0.8) (comprehensive), significantly higher than urologists, 1.6 (0.6) (adequate) (p < 0.001).
Conclusion: Answers to patient BPH messages generated by chatbots were evaluated by experts as equally accurate and more complete than urologist answers. Non-medical volunteers preferred chatbot-generated messages and considered them more empathetic compared to answers generated by urologists.
期刊介绍:
The WORLD JOURNAL OF UROLOGY conveys regularly the essential results of urological research and their practical and clinical relevance to a broad audience of urologists in research and clinical practice. In order to guarantee a balanced program, articles are published to reflect the developments in all fields of urology on an internationally advanced level. Each issue treats a main topic in review articles of invited international experts. Free papers are unrelated articles to the main topic.