{"title":"使用大型语言模型为医疗保健领域的证据综述创建通俗易懂的摘要:可行性研究","authors":"Colleen Ovelman, Shannon Kugley, Gerald Gartlehner, Meera Viswanathan","doi":"10.1002/cesm.12041","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction</h3>\n \n <p>Plain language summaries (PLSs) make complex healthcare evidence accessible to patients and the public. Large language models (LLMs) may assist in generating accurate, readable PLSs. This study explored using the LLM Claude 2 to create PLSs of evidence reviews from the Agency for Healthcare Research and Quality (AHRQ) Effective Health Care Program.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We selected 10 evidence reviews published from 2021 to 2023, representing a range of methods and topics. We iteratively developed a prompt to guide Claude 2 in creating PLSs which included specifications for plain language, reading level, length, organizational structure, active voice, and inclusive language. PLSs were assessed for adherence to prompt specifications, comprehensiveness, accuracy, readability, and cultural sensitivity.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>All PLSs met the word count. We judged one PLS as fully comprehensive; seven mostly comprehensive. We judged two PLSs as fully capturing the PICO elements; five with minor PICO errors. We judged three PLSs as accurately reporting the results; and four with minor result errors. We judged three PLSs as having major result errors for incorrectly reporting total participants. Five PLSs met the target 6th to 8th grade reading level. Passive voice use averaged 16%. All PLSs used inclusive language.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>LLMs show promise for assisting in PLS creation but likely require human input to ensure accuracy, comprehensiveness, and the appropriate nuances of interpretation. Iterative prompt refinement may improve results and address the needs of specific reviews and audiences. As text-only summaries, the AI-generated PLSs could not meet all consumer communication criteria, such as textual design and visual representations. Further testing should include consumer reviewers and explore how to best leverage LLM support in drafting PLS text for complex evidence reviews.</p>\n </section>\n </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"2 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.12041","citationCount":"0","resultStr":"{\"title\":\"The use of a large language model to create plain language summaries of evidence reviews in healthcare: A feasibility study\",\"authors\":\"Colleen Ovelman, Shannon Kugley, Gerald Gartlehner, Meera Viswanathan\",\"doi\":\"10.1002/cesm.12041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Introduction</h3>\\n \\n <p>Plain language summaries (PLSs) make complex healthcare evidence accessible to patients and the public. Large language models (LLMs) may assist in generating accurate, readable PLSs. This study explored using the LLM Claude 2 to create PLSs of evidence reviews from the Agency for Healthcare Research and Quality (AHRQ) Effective Health Care Program.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>We selected 10 evidence reviews published from 2021 to 2023, representing a range of methods and topics. We iteratively developed a prompt to guide Claude 2 in creating PLSs which included specifications for plain language, reading level, length, organizational structure, active voice, and inclusive language. PLSs were assessed for adherence to prompt specifications, comprehensiveness, accuracy, readability, and cultural sensitivity.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>All PLSs met the word count. We judged one PLS as fully comprehensive; seven mostly comprehensive. We judged two PLSs as fully capturing the PICO elements; five with minor PICO errors. We judged three PLSs as accurately reporting the results; and four with minor result errors. We judged three PLSs as having major result errors for incorrectly reporting total participants. Five PLSs met the target 6th to 8th grade reading level. Passive voice use averaged 16%. All PLSs used inclusive language.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>LLMs show promise for assisting in PLS creation but likely require human input to ensure accuracy, comprehensiveness, and the appropriate nuances of interpretation. Iterative prompt refinement may improve results and address the needs of specific reviews and audiences. As text-only summaries, the AI-generated PLSs could not meet all consumer communication criteria, such as textual design and visual representations. Further testing should include consumer reviewers and explore how to best leverage LLM support in drafting PLS text for complex evidence reviews.</p>\\n </section>\\n </div>\",\"PeriodicalId\":100286,\"journal\":{\"name\":\"Cochrane Evidence Synthesis and Methods\",\"volume\":\"2 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.12041\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cochrane Evidence Synthesis and Methods\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cesm.12041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cochrane Evidence Synthesis and Methods","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cesm.12041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The use of a large language model to create plain language summaries of evidence reviews in healthcare: A feasibility study
Introduction
Plain language summaries (PLSs) make complex healthcare evidence accessible to patients and the public. Large language models (LLMs) may assist in generating accurate, readable PLSs. This study explored using the LLM Claude 2 to create PLSs of evidence reviews from the Agency for Healthcare Research and Quality (AHRQ) Effective Health Care Program.
Methods
We selected 10 evidence reviews published from 2021 to 2023, representing a range of methods and topics. We iteratively developed a prompt to guide Claude 2 in creating PLSs which included specifications for plain language, reading level, length, organizational structure, active voice, and inclusive language. PLSs were assessed for adherence to prompt specifications, comprehensiveness, accuracy, readability, and cultural sensitivity.
Results
All PLSs met the word count. We judged one PLS as fully comprehensive; seven mostly comprehensive. We judged two PLSs as fully capturing the PICO elements; five with minor PICO errors. We judged three PLSs as accurately reporting the results; and four with minor result errors. We judged three PLSs as having major result errors for incorrectly reporting total participants. Five PLSs met the target 6th to 8th grade reading level. Passive voice use averaged 16%. All PLSs used inclusive language.
Conclusions
LLMs show promise for assisting in PLS creation but likely require human input to ensure accuracy, comprehensiveness, and the appropriate nuances of interpretation. Iterative prompt refinement may improve results and address the needs of specific reviews and audiences. As text-only summaries, the AI-generated PLSs could not meet all consumer communication criteria, such as textual design and visual representations. Further testing should include consumer reviewers and explore how to best leverage LLM support in drafting PLS text for complex evidence reviews.