Jiyoung Kim, Seo-Young Lee, Jee Hyun Kim, Dong-Hyeon Shin, Eun Hye Oh, Jin A Kim, Jae Wook Cho
{"title":"ChatGPT vs. sleep disorder specialist responses to common sleep queries: Ratings by experts and laypeople.","authors":"Jiyoung Kim, Seo-Young Lee, Jee Hyun Kim, Dong-Hyeon Shin, Eun Hye Oh, Jin A Kim, Jae Wook Cho","doi":"10.1016/j.sleh.2024.08.011","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Many individuals use the Internet, including generative artificial intelligence like ChatGPT, for sleep-related information before consulting medical professionals. This study compared responses from sleep disorder specialists and ChatGPT to common sleep queries, with experts and laypersons evaluating the responses' accuracy and clarity.</p><p><strong>Methods: </strong>We assessed responses from sleep medicine specialists and ChatGPT-4 to 140 sleep-related questions from the Korean Sleep Research Society's website. In a blinded study design, sleep disorder experts and laypersons rated the medical helpfulness, emotional supportiveness, and sentence comprehensibility of the responses on a 1-5 scale.</p><p><strong>Results: </strong>Laypersons rated ChatGPT higher for medical helpfulness (3.79 ± 0.90 vs. 3.44 ± 0.99, p < .001), emotional supportiveness (3.48 ± 0.79 vs. 3.12 ± 0.98, p < .001), and sentence comprehensibility (4.24 ± 0.79 vs. 4.14 ± 0.96, p = .028). Experts also rated ChatGPT higher for emotional supportiveness (3.33 ± 0.62 vs. 3.01 ± 0.67, p < .001) but preferred specialists' responses for sentence comprehensibility (4.15 ± 0.74 vs. 3.94 ± 0.90, p < .001). When it comes to medical helpfulness, the experts rated the specialists' answers slightly higher than the laypersons did (3.70 ± 0.84 vs. 3.63 ± 0.87, p = .109). Experts slightly preferred specialist responses overall (56.0%), while laypersons favored ChatGPT (54.3%; p < .001). ChatGPT's responses were significantly longer (186.76 ± 39.04 vs. 113.16 ± 95.77 words, p < .001).</p><p><strong>Discussion: </strong>Generative artificial intelligence like ChatGPT may help disseminate sleep-related medical information online. Laypersons appear to prefer ChatGPT's detailed, emotionally supportive responses over those from sleep disorder specialists.</p>","PeriodicalId":48545,"journal":{"name":"Sleep Health","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sleep Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.sleh.2024.08.011","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Many individuals use the Internet, including generative artificial intelligence like ChatGPT, for sleep-related information before consulting medical professionals. This study compared responses from sleep disorder specialists and ChatGPT to common sleep queries, with experts and laypersons evaluating the responses' accuracy and clarity.
Methods: We assessed responses from sleep medicine specialists and ChatGPT-4 to 140 sleep-related questions from the Korean Sleep Research Society's website. In a blinded study design, sleep disorder experts and laypersons rated the medical helpfulness, emotional supportiveness, and sentence comprehensibility of the responses on a 1-5 scale.
Results: Laypersons rated ChatGPT higher for medical helpfulness (3.79 ± 0.90 vs. 3.44 ± 0.99, p < .001), emotional supportiveness (3.48 ± 0.79 vs. 3.12 ± 0.98, p < .001), and sentence comprehensibility (4.24 ± 0.79 vs. 4.14 ± 0.96, p = .028). Experts also rated ChatGPT higher for emotional supportiveness (3.33 ± 0.62 vs. 3.01 ± 0.67, p < .001) but preferred specialists' responses for sentence comprehensibility (4.15 ± 0.74 vs. 3.94 ± 0.90, p < .001). When it comes to medical helpfulness, the experts rated the specialists' answers slightly higher than the laypersons did (3.70 ± 0.84 vs. 3.63 ± 0.87, p = .109). Experts slightly preferred specialist responses overall (56.0%), while laypersons favored ChatGPT (54.3%; p < .001). ChatGPT's responses were significantly longer (186.76 ± 39.04 vs. 113.16 ± 95.77 words, p < .001).
Discussion: Generative artificial intelligence like ChatGPT may help disseminate sleep-related medical information online. Laypersons appear to prefer ChatGPT's detailed, emotionally supportive responses over those from sleep disorder specialists.
期刊介绍:
Sleep Health Journal of the National Sleep Foundation is a multidisciplinary journal that explores sleep''s role in population health and elucidates the social science perspective on sleep and health. Aligned with the National Sleep Foundation''s global authoritative, evidence-based voice for sleep health, the journal serves as the foremost publication for manuscripts that advance the sleep health of all members of society.The scope of the journal extends across diverse sleep-related fields, including anthropology, education, health services research, human development, international health, law, mental health, nursing, nutrition, psychology, public health, public policy, fatigue management, transportation, social work, and sociology. The journal welcomes original research articles, review articles, brief reports, special articles, letters to the editor, editorials, and commentaries.