评估ChatGPT聊天机器人对戒烟公共卫生指南的依从性：内容分析

IF 8.2 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES Journal of Medical Internet Research Pub Date : 2025-01-30 DOI:10.2196/66896

Lorien C Abroms, Artin Yousefi, Christina N Wysota, Tien-Chin Wu, David A Broniatowski

{"title":"评估ChatGPT聊天机器人对戒烟公共卫生指南的依从性：内容分析","authors":"Lorien C Abroms, Artin Yousefi, Christina N Wysota, Tien-Chin Wu, David A Broniatowski","doi":"10.2196/66896","DOIUrl":null,"url":null,"abstract":"Background: Large language model (LLM) artificial intelligence chatbots using generative language can offer smoking cessation information and advice. However, little is known about the reliability of the information provided to users.Objective: This study aims to examine whether 3 ChatGPT chatbots-the World Health Organization's Sarah, BeFreeGPT, and BasicGPT-provide reliable information on how to quit smoking.Methods: A list of quit smoking queries was generated from frequent quit smoking searches on Google related to \"how to quit smoking\" (n=12). Each query was given to each chatbot, and responses were analyzed for their adherence to an index developed from the US Preventive Services Task Force public health guidelines for quitting smoking and counseling principles. Responses were independently coded by 2 reviewers, and differences were resolved by a third coder.Results: Across chatbots and queries, on average, chatbot responses were rated as being adherent to 57.1% of the items on the adherence index. Sarah's adherence (72.2%) was significantly higher than BeFreeGPT (50%) and BasicGPT (47.8%; P<.001). The majority of chatbot responses had clear language (97.3%) and included a recommendation to seek out professional counseling (80.3%). About half of the responses included the recommendation to consider using nicotine replacement therapy (52.7%), the recommendation to seek out social support from friends and family (55.6%), and information on how to deal with cravings when quitting smoking (44.4%). The least common was information about considering the use of non-nicotine replacement therapy prescription drugs (14.1%). Finally, some types of misinformation were present in 22% of responses. Specific queries that were most challenging for the chatbots included queries on \"how to quit smoking cold turkey,\" \"...with vapes,\" \"...with gummies,\" \"...with a necklace,\" and \"...with hypnosis.\" All chatbots showed resilience to adversarial attacks that were intended to derail the conversation.Conclusions: LLM chatbots varied in their adherence to quit-smoking guidelines and counseling principles. While chatbots reliably provided some types of information, they omitted other types, as well as occasionally provided misinformation, especially for queries about less evidence-based methods of quitting. LLM chatbot instructions can be revised to compensate for these weaknesses.","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e66896"},"PeriodicalIF":8.2000,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11826940/pdf/","citationCount":"0","resultStr":"{\"title\":\"Assessing the Adherence of ChatGPT Chatbots to Public Health Guidelines for Smoking Cessation: Content Analysis.\",\"authors\":\"Lorien C Abroms, Artin Yousefi, Christina N Wysota, Tien-Chin Wu, David A Broniatowski\",\"doi\":\"10.2196/66896\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Large language model (LLM) artificial intelligence chatbots using generative language can offer smoking cessation information and advice. However, little is known about the reliability of the information provided to users.Objective: This study aims to examine whether 3 ChatGPT chatbots-the World Health Organization's Sarah, BeFreeGPT, and BasicGPT-provide reliable information on how to quit smoking.Methods: A list of quit smoking queries was generated from frequent quit smoking searches on Google related to \\\"how to quit smoking\\\" (n=12). Each query was given to each chatbot, and responses were analyzed for their adherence to an index developed from the US Preventive Services Task Force public health guidelines for quitting smoking and counseling principles. Responses were independently coded by 2 reviewers, and differences were resolved by a third coder.Results: Across chatbots and queries, on average, chatbot responses were rated as being adherent to 57.1% of the items on the adherence index. Sarah's adherence (72.2%) was significantly higher than BeFreeGPT (50%) and BasicGPT (47.8%; P<.001). The majority of chatbot responses had clear language (97.3%) and included a recommendation to seek out professional counseling (80.3%). About half of the responses included the recommendation to consider using nicotine replacement therapy (52.7%), the recommendation to seek out social support from friends and family (55.6%), and information on how to deal with cravings when quitting smoking (44.4%). The least common was information about considering the use of non-nicotine replacement therapy prescription drugs (14.1%). Finally, some types of misinformation were present in 22% of responses. Specific queries that were most challenging for the chatbots included queries on \\\"how to quit smoking cold turkey,\\\" \\\"...with vapes,\\\" \\\"...with gummies,\\\" \\\"...with a necklace,\\\" and \\\"...with hypnosis.\\\" All chatbots showed resilience to adversarial attacks that were intended to derail the conversation.Conclusions: LLM chatbots varied in their adherence to quit-smoking guidelines and counseling principles. While chatbots reliably provided some types of information, they omitted other types, as well as occasionally provided misinformation, especially for queries about less evidence-based methods of quitting. LLM chatbot instructions can be revised to compensate for these weaknesses.\",\"PeriodicalId\":16337,\"journal\":{\"name\":\"Journal of Medical Internet Research\",\"volume\":\"27 \",\"pages\":\"e66896\"},\"PeriodicalIF\":8.2000,\"publicationDate\":\"2025-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11826940/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Internet Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/66896\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/66896","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

背景：使用生成语言的大语言模型（LLM）人工智能聊天机器人可以提供戒烟信息和建议。然而，人们对提供给用户的信息的可靠性知之甚少。目的：本研究旨在检验3个ChatGPT聊天机器人——世界卫生组织的Sarah、BeFreeGPT和basicgpt——是否提供了如何戒烟的可靠信息。方法：根据谷歌上频繁的戒烟搜索“如何戒烟”生成戒烟查询列表（n=12）。每个问题都给了每个聊天机器人，并分析了他们对美国预防服务工作组戒烟和咨询原则公共卫生指南制定的指数的依从性。回复由2个评论者独立编码，差异由第三个编码员解决。结果：在聊天机器人和查询中，平均而言，聊天机器人的回答被评为遵守遵守指数上57.1%的项目。Sarah的依从性（72.2%）显著高于BeFreeGPT（50%）和BasicGPT (47.8%)；结论：法学硕士聊天机器人在遵守戒烟指南和咨询原则方面存在差异。虽然聊天机器人可靠地提供了某些类型的信息，但它们忽略了其他类型的信息，偶尔也会提供错误信息，尤其是在询问有关缺乏证据的戒烟方法时。可以修改LLM聊天机器人指令来弥补这些弱点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Assessing the Adherence of ChatGPT Chatbots to Public Health Guidelines for Smoking Cessation: Content Analysis.

Background: Large language model (LLM) artificial intelligence chatbots using generative language can offer smoking cessation information and advice. However, little is known about the reliability of the information provided to users.

Objective: This study aims to examine whether 3 ChatGPT chatbots-the World Health Organization's Sarah, BeFreeGPT, and BasicGPT-provide reliable information on how to quit smoking.

Methods: A list of quit smoking queries was generated from frequent quit smoking searches on Google related to "how to quit smoking" (n=12). Each query was given to each chatbot, and responses were analyzed for their adherence to an index developed from the US Preventive Services Task Force public health guidelines for quitting smoking and counseling principles. Responses were independently coded by 2 reviewers, and differences were resolved by a third coder.

Results: Across chatbots and queries, on average, chatbot responses were rated as being adherent to 57.1% of the items on the adherence index. Sarah's adherence (72.2%) was significantly higher than BeFreeGPT (50%) and BasicGPT (47.8%; P<.001). The majority of chatbot responses had clear language (97.3%) and included a recommendation to seek out professional counseling (80.3%). About half of the responses included the recommendation to consider using nicotine replacement therapy (52.7%), the recommendation to seek out social support from friends and family (55.6%), and information on how to deal with cravings when quitting smoking (44.4%). The least common was information about considering the use of non-nicotine replacement therapy prescription drugs (14.1%). Finally, some types of misinformation were present in 22% of responses. Specific queries that were most challenging for the chatbots included queries on "how to quit smoking cold turkey," "...with vapes," "...with gummies," "...with a necklace," and "...with hypnosis." All chatbots showed resilience to adversarial attacks that were intended to derail the conversation.

Conclusions: LLM chatbots varied in their adherence to quit-smoking guidelines and counseling principles. While chatbots reliably provided some types of information, they omitted other types, as well as occasionally provided misinformation, especially for queries about less evidence-based methods of quitting. LLM chatbot instructions can be revised to compensate for these weaknesses.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Medical Internet Research 医学-卫生保健

CiteScore

14.40

自引率

5.40%

发文量

654

审稿时长

1 months

期刊介绍： The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades. As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor. Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.