Can AI-generated clinical vignettes in Japanese be used medically and linguistically?

medRxiv - Medical Education Pub Date : 2024-03-02 DOI:10.1101/2024.02.28.24303173

Yasutaka Yanagita, Daiki Yokokawa, Shun Uchida, Yu Li, Takanori Uehara, Masatomi Ikusaka

{"title":"Can AI-generated clinical vignettes in Japanese be used medically and linguistically?","authors":"Yasutaka Yanagita, Daiki Yokokawa, Shun Uchida, Yu Li, Takanori Uehara, Masatomi Ikusaka","doi":"10.1101/2024.02.28.24303173","DOIUrl":null,"url":null,"abstract":"Background\nCreating clinical vignettes requires considerable effort. Recent developments in generative artificial intelligence (AI) for natural language processing have been remarkable and may allow for the easy and immediate creation of diverse clinical vignettes. Objective\nIn this study, we evaluated the medical accuracy and grammatical correctness of AI-generated clinical vignettes in Japanese and verified their usefulness.\nMethods\nClinical vignettes in Japanese were created using the generative AI model GPT-4-0613. The input prompts for the clinical vignettes specified the following seven elements: 1) age, 2) sex, 3) chief complaint and time course since onset, 4) physical findings, 5) examination results, 6) diagnosis, and 7) treatment course. The list of diseases integrated into the vignettes was based on 202 cases considered in the management of diseases and symptoms in Japan's Primary Care Physicians Training Program. The clinical vignettes were evaluated for medical and Japanese-language accuracy by three physicians using a five-point scale. A total score of 13 points or above was defined as 'sufficiently beneficial and immediately usable with minor revisions,' a score between 10 and 12 points was defined as 'partly insufficient and in need of modifications,' and a score of 9 points or below was defined as 'insufficient.'\nResults\nRegarding medical accuracy, of the 202 clinical vignettes, 118 scored 13 points or above, 78 scored between 10 and 12 points, and 6 scored 9 points or below. Regarding Japanese-language accuracy, 142 vignettes scored 13 points or above, 56 scored between 10 and 12 points, and 4 scored 9 points or below. Overall, 97% (196/202) of vignettes available with some modifications.\nConclusions\nOverall, 97% of the clinical vignettes proved practically useful, based on confirmation and revision by Japanese medical physicians. Given the significant effort required by physicians to create vignettes without AI assistance, the use of GPT is expected to greatly optimize this process.","PeriodicalId":501387,"journal":{"name":"medRxiv - Medical Education","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Medical Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.02.28.24303173","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background Creating clinical vignettes requires considerable effort. Recent developments in generative artificial intelligence (AI) for natural language processing have been remarkable and may allow for the easy and immediate creation of diverse clinical vignettes. Objective In this study, we evaluated the medical accuracy and grammatical correctness of AI-generated clinical vignettes in Japanese and verified their usefulness. Methods Clinical vignettes in Japanese were created using the generative AI model GPT-4-0613. The input prompts for the clinical vignettes specified the following seven elements: 1) age, 2) sex, 3) chief complaint and time course since onset, 4) physical findings, 5) examination results, 6) diagnosis, and 7) treatment course. The list of diseases integrated into the vignettes was based on 202 cases considered in the management of diseases and symptoms in Japan's Primary Care Physicians Training Program. The clinical vignettes were evaluated for medical and Japanese-language accuracy by three physicians using a five-point scale. A total score of 13 points or above was defined as 'sufficiently beneficial and immediately usable with minor revisions,' a score between 10 and 12 points was defined as 'partly insufficient and in need of modifications,' and a score of 9 points or below was defined as 'insufficient.' Results Regarding medical accuracy, of the 202 clinical vignettes, 118 scored 13 points or above, 78 scored between 10 and 12 points, and 6 scored 9 points or below. Regarding Japanese-language accuracy, 142 vignettes scored 13 points or above, 56 scored between 10 and 12 points, and 4 scored 9 points or below. Overall, 97% (196/202) of vignettes available with some modifications. Conclusions Overall, 97% of the clinical vignettes proved practically useful, based on confirmation and revision by Japanese medical physicians. Given the significant effort required by physicians to create vignettes without AI assistance, the use of GPT is expected to greatly optimize this process.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

人工智能生成的日语临床小故事能否用于医学和语言学？

背景创建临床小故事需要花费大量精力。用于自然语言处理的生成式人工智能（AI）的最新发展令人瞩目，可以轻松、即时地创建各种临床小故事。本研究评估了人工智能生成的日语临床小故事的医学准确性和语法正确性，并验证了其实用性。临床小故事的输入提示指定了以下七个要素：1）年龄；2）性别；3）主诉和发病时间；4）体征；5）检查结果；6）诊断；7）治疗过程。小故事中包含的疾病清单是基于日本初级保健医生培训计划中疾病和症状管理中的 202 个案例。临床小故事的医学和日语准确性由三位医生采用五级评分法进行评估。结果在医学准确性方面，202 个临床小故事中有 118 个获得 13 分或以上，78 个获得 10 分至 12 分，6 个获得 9 分或以下。在日语准确性方面，142 个小故事得分在 13 分或以上，56 个在 10 分至 12 分之间，4 个在 9 分或以下。总体而言，97%（196/202）的小案例经过了一定的修改。鉴于在没有人工智能辅助的情况下，医生需要花费大量精力来创建小故事，使用 GPT 预计将大大优化这一过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

medRxiv - Medical Education

自引率

0.00%

发文量