Luigi Angelo Vaira, Jerome R Lechien, Vincenzo Abbate, Guido Gabriele, Andrea Frosolini, Andrea De Vito, Antonino Maniaci, Miguel Mayo-Yáñez, Paolo Boscolo-Rizzo, Alberto Maria Saibene, Fabio Maglitto, Giovanni Salzano, Gianluigi Califano, Stefania Troise, Carlos Miguel Chiesa-Estomba, Giacomo De Riu
{"title":"Enhancing AI Chatbot Responses in Health Care: The SMART Prompt Structure in Head and Neck Surgery.","authors":"Luigi Angelo Vaira, Jerome R Lechien, Vincenzo Abbate, Guido Gabriele, Andrea Frosolini, Andrea De Vito, Antonino Maniaci, Miguel Mayo-Yáñez, Paolo Boscolo-Rizzo, Alberto Maria Saibene, Fabio Maglitto, Giovanni Salzano, Gianluigi Califano, Stefania Troise, Carlos Miguel Chiesa-Estomba, Giacomo De Riu","doi":"10.1002/oto2.70075","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study aims to evaluate the impact of prompt construction on the quality of artificial intelligence (AI) chatbot responses in the context of head and neck surgery.</p><p><strong>Study design: </strong>Observational and evaluative study.</p><p><strong>Setting: </strong>An international collaboration involving 16 researchers from 11 European centers specializing in head and neck surgery.</p><p><strong>Methods: </strong>A total of 24 questions, divided into clinical scenarios, theoretical questions, and patient inquiries, were developed. These questions were entered into ChatGPT-4o both with and without the use of a structured prompt format, known as SMART (Seeker, Mission, AI Role, Register, Targeted Question). The AI-generated responses were evaluated by experienced head and neck surgeons using the Quality Analysis of Medical Artificial Intelligence instrument (QAMAI), which assesses accuracy, clarity, relevance, completeness, source quality, and usefulness.</p><p><strong>Results: </strong>The responses generated using the SMART prompt scored significantly higher across all QAMAI dimensions compared to those without contextualized prompts. Median QAMAI scores for SMART prompts were 27.5 (interquartile range [IQR] 25-29) versus 24 (IQR 21.8-25) for unstructured prompts (<i>P</i> < .001). Clinical scenarios and patient inquiries showed the most significant improvements, while theoretical questions also benefited, but to a lesser extent. The AI's source quality improved notably with the SMART prompt, particularly in theoretical questions.</p><p><strong>Conclusion: </strong>This study suggests that the structured SMART prompt format significantly enhances the quality of AI chatbot responses in head and neck surgery. This approach improves the accuracy, relevance, and completeness of AI-generated information, underscoring the importance of well-constructed prompts in clinical applications. Further research is warranted to explore the applicability of SMART prompts across different medical specialties and AI platforms.</p>","PeriodicalId":19697,"journal":{"name":"OTO Open","volume":"9 1","pages":"e70075"},"PeriodicalIF":1.8000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11736147/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"OTO Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/oto2.70075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: This study aims to evaluate the impact of prompt construction on the quality of artificial intelligence (AI) chatbot responses in the context of head and neck surgery.
Study design: Observational and evaluative study.
Setting: An international collaboration involving 16 researchers from 11 European centers specializing in head and neck surgery.
Methods: A total of 24 questions, divided into clinical scenarios, theoretical questions, and patient inquiries, were developed. These questions were entered into ChatGPT-4o both with and without the use of a structured prompt format, known as SMART (Seeker, Mission, AI Role, Register, Targeted Question). The AI-generated responses were evaluated by experienced head and neck surgeons using the Quality Analysis of Medical Artificial Intelligence instrument (QAMAI), which assesses accuracy, clarity, relevance, completeness, source quality, and usefulness.
Results: The responses generated using the SMART prompt scored significantly higher across all QAMAI dimensions compared to those without contextualized prompts. Median QAMAI scores for SMART prompts were 27.5 (interquartile range [IQR] 25-29) versus 24 (IQR 21.8-25) for unstructured prompts (P < .001). Clinical scenarios and patient inquiries showed the most significant improvements, while theoretical questions also benefited, but to a lesser extent. The AI's source quality improved notably with the SMART prompt, particularly in theoretical questions.
Conclusion: This study suggests that the structured SMART prompt format significantly enhances the quality of AI chatbot responses in head and neck surgery. This approach improves the accuracy, relevance, and completeness of AI-generated information, underscoring the importance of well-constructed prompts in clinical applications. Further research is warranted to explore the applicability of SMART prompts across different medical specialties and AI platforms.