{"title":"Can AI-generated clinical vignettes in Japanese be used medically and linguistically?","authors":"Yasutaka Yanagita, Daiki Yokokawa, Shun Uchida, Yu Li, Takanori Uehara, Masatomi Ikusaka","doi":"10.1101/2024.02.28.24303173","DOIUrl":null,"url":null,"abstract":"Background\nCreating clinical vignettes requires considerable effort. Recent developments in generative artificial intelligence (AI) for natural language processing have been remarkable and may allow for the easy and immediate creation of diverse clinical vignettes. Objective\nIn this study, we evaluated the medical accuracy and grammatical correctness of AI-generated clinical vignettes in Japanese and verified their usefulness.\nMethods\nClinical vignettes in Japanese were created using the generative AI model GPT-4-0613. The input prompts for the clinical vignettes specified the following seven elements: 1) age, 2) sex, 3) chief complaint and time course since onset, 4) physical findings, 5) examination results, 6) diagnosis, and 7) treatment course. The list of diseases integrated into the vignettes was based on 202 cases considered in the management of diseases and symptoms in Japan's Primary Care Physicians Training Program. The clinical vignettes were evaluated for medical and Japanese-language accuracy by three physicians using a five-point scale. A total score of 13 points or above was defined as 'sufficiently beneficial and immediately usable with minor revisions,' a score between 10 and 12 points was defined as 'partly insufficient and in need of modifications,' and a score of 9 points or below was defined as 'insufficient.'\nResults\nRegarding medical accuracy, of the 202 clinical vignettes, 118 scored 13 points or above, 78 scored between 10 and 12 points, and 6 scored 9 points or below. Regarding Japanese-language accuracy, 142 vignettes scored 13 points or above, 56 scored between 10 and 12 points, and 4 scored 9 points or below. Overall, 97% (196/202) of vignettes available with some modifications.\nConclusions\nOverall, 97% of the clinical vignettes proved practically useful, based on confirmation and revision by Japanese medical physicians. Given the significant effort required by physicians to create vignettes without AI assistance, the use of GPT is expected to greatly optimize this process.","PeriodicalId":501387,"journal":{"name":"medRxiv - Medical Education","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Medical Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.02.28.24303173","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Creating clinical vignettes requires considerable effort. Recent developments in generative artificial intelligence (AI) for natural language processing have been remarkable and may allow for the easy and immediate creation of diverse clinical vignettes. Objective
In this study, we evaluated the medical accuracy and grammatical correctness of AI-generated clinical vignettes in Japanese and verified their usefulness.
Methods
Clinical vignettes in Japanese were created using the generative AI model GPT-4-0613. The input prompts for the clinical vignettes specified the following seven elements: 1) age, 2) sex, 3) chief complaint and time course since onset, 4) physical findings, 5) examination results, 6) diagnosis, and 7) treatment course. The list of diseases integrated into the vignettes was based on 202 cases considered in the management of diseases and symptoms in Japan's Primary Care Physicians Training Program. The clinical vignettes were evaluated for medical and Japanese-language accuracy by three physicians using a five-point scale. A total score of 13 points or above was defined as 'sufficiently beneficial and immediately usable with minor revisions,' a score between 10 and 12 points was defined as 'partly insufficient and in need of modifications,' and a score of 9 points or below was defined as 'insufficient.'
Results
Regarding medical accuracy, of the 202 clinical vignettes, 118 scored 13 points or above, 78 scored between 10 and 12 points, and 6 scored 9 points or below. Regarding Japanese-language accuracy, 142 vignettes scored 13 points or above, 56 scored between 10 and 12 points, and 4 scored 9 points or below. Overall, 97% (196/202) of vignettes available with some modifications.
Conclusions
Overall, 97% of the clinical vignettes proved practically useful, based on confirmation and revision by Japanese medical physicians. Given the significant effort required by physicians to create vignettes without AI assistance, the use of GPT is expected to greatly optimize this process.