Wahram Andrikyan, Sophie Marie Sametinger, Frithjof Kosfeld, Lea Jung-Poppe, Martin F Fromm, Renke Maas, Hagen F Nicolaus
{"title":"Artificial intelligence-powered chatbots in search engines: a cross-sectional study on the quality and risks of drug information for patients.","authors":"Wahram Andrikyan, Sophie Marie Sametinger, Frithjof Kosfeld, Lea Jung-Poppe, Martin F Fromm, Renke Maas, Hagen F Nicolaus","doi":"10.1136/bmjqs-2024-017476","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Search engines often serve as a primary resource for patients to obtain drug information. However, the search engine market is rapidly changing due to the introduction of artificial intelligence (AI)-powered chatbots. The consequences for medication safety when patients interact with chatbots remain largely unexplored.</p><p><strong>Objective: </strong>To explore the quality and potential safety concerns of answers provided by an AI-powered chatbot integrated within a search engine.</p><p><strong>Methodology: </strong>Bing copilot was queried on 10 frequently asked patient questions regarding the 50 most prescribed drugs in the US outpatient market. Patient questions covered drug indications, mechanisms of action, instructions for use, adverse drug reactions and contraindications. Readability of chatbot answers was assessed using the Flesch Reading Ease Score. Completeness and accuracy were evaluated based on corresponding patient drug information in the pharmaceutical encyclopaedia drugs.com. On a preselected subset of inaccurate chatbot answers, healthcare professionals evaluated likelihood and extent of possible harm if patients follow the chatbot's given recommendations.</p><p><strong>Results: </strong>Of 500 generated chatbot answers, overall readability implied that responses were difficult to read according to the Flesch Reading Ease Score. Overall median completeness and accuracy of chatbot answers were 100.0% (IQR 50.0-100.0%) and 100.0% (IQR 88.1-100.0%), respectively. Of the subset of 20 chatbot answers, experts found 66% (95% CI 50% to 85%) to be potentially harmful. 42% (95% CI 25% to 60%) of these 20 chatbot answers were found to potentially cause moderate to mild harm, and 22% (95% CI 10% to 40%) to cause severe harm or even death if patients follow the chatbot's advice.</p><p><strong>Conclusions: </strong>AI-powered chatbots are capable of providing overall complete and accurate patient drug information. Yet, experts deemed a considerable number of answers incorrect or potentially harmful. Furthermore, complexity of chatbot answers may limit patient understanding. Hence, healthcare professionals should be cautious in recommending AI-powered search engines until more precise and reliable alternatives are available.</p>","PeriodicalId":9077,"journal":{"name":"BMJ Quality & Safety","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Quality & Safety","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bmjqs-2024-017476","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Search engines often serve as a primary resource for patients to obtain drug information. However, the search engine market is rapidly changing due to the introduction of artificial intelligence (AI)-powered chatbots. The consequences for medication safety when patients interact with chatbots remain largely unexplored.
Objective: To explore the quality and potential safety concerns of answers provided by an AI-powered chatbot integrated within a search engine.
Methodology: Bing copilot was queried on 10 frequently asked patient questions regarding the 50 most prescribed drugs in the US outpatient market. Patient questions covered drug indications, mechanisms of action, instructions for use, adverse drug reactions and contraindications. Readability of chatbot answers was assessed using the Flesch Reading Ease Score. Completeness and accuracy were evaluated based on corresponding patient drug information in the pharmaceutical encyclopaedia drugs.com. On a preselected subset of inaccurate chatbot answers, healthcare professionals evaluated likelihood and extent of possible harm if patients follow the chatbot's given recommendations.
Results: Of 500 generated chatbot answers, overall readability implied that responses were difficult to read according to the Flesch Reading Ease Score. Overall median completeness and accuracy of chatbot answers were 100.0% (IQR 50.0-100.0%) and 100.0% (IQR 88.1-100.0%), respectively. Of the subset of 20 chatbot answers, experts found 66% (95% CI 50% to 85%) to be potentially harmful. 42% (95% CI 25% to 60%) of these 20 chatbot answers were found to potentially cause moderate to mild harm, and 22% (95% CI 10% to 40%) to cause severe harm or even death if patients follow the chatbot's advice.
Conclusions: AI-powered chatbots are capable of providing overall complete and accurate patient drug information. Yet, experts deemed a considerable number of answers incorrect or potentially harmful. Furthermore, complexity of chatbot answers may limit patient understanding. Hence, healthcare professionals should be cautious in recommending AI-powered search engines until more precise and reliable alternatives are available.
期刊介绍:
BMJ Quality & Safety (previously Quality & Safety in Health Care) is an international peer review publication providing research, opinions, debates and reviews for academics, clinicians and healthcare managers focused on the quality and safety of health care and the science of improvement.
The journal receives approximately 1000 manuscripts a year and has an acceptance rate for original research of 12%. Time from submission to first decision averages 22 days and accepted articles are typically published online within 20 days. Its current impact factor is 3.281.