Development and Evaluation of a Digital Scribe: Conversation Summarization Pipeline for Emergency Department Counseling Sessions towards Reducing Documentation Burden

Emre Sezgin, Joseph Sirrianni, Kelly Kranz
{"title":"Development and Evaluation of a Digital Scribe: Conversation Summarization Pipeline for Emergency Department Counseling Sessions towards Reducing Documentation Burden","authors":"Emre Sezgin, Joseph Sirrianni, Kelly Kranz","doi":"10.1101/2023.12.06.23299573","DOIUrl":null,"url":null,"abstract":"Objective: We present a proof-of-concept digital scribe system as an ED clinical conversation summarization pipeline and report its performance. Materials and Methods: We use four pre-trained large language models to establish the digital scribe system: T5-small, T5-base, PEGASUS-PubMed, and BART-Large-CNN via zero-shot and fine-tuning approaches. Our dataset includes 100 referral conversations among ED clinicians and medical records. We report the ROUGE-1, ROUGE-2, and ROUGE-L to compare model performance. In addition, we annotated transcriptions to assess the quality of generated summaries. Results: The fine-tuned BART-Large-CNN model demonstrates greater performance in summarization tasks with the highest ROUGE scores (F1ROUGE-1=0.49, F1ROUGE-2=0.23, F1ROUGE-L=0.35) scores. In contrast, PEGASUS-PubMed lags notably (F1ROUGE-1=0.28, F1ROUGE-2=0.11, F1ROUGE-L=0.22). BART-Large-CNN's performance decreases by more than 50% with the zero-shot approach. Annotations show that BART-Large-CNN performs 71.4% recall in identifying key information and a 67.7% accuracy rate. Discussion: The BART-Large-CNN model demonstrates a high level of understanding of clinical dialogue structure, indicated by its performance with and without fine-tuning. Despite some instances of high recall, there is variability in the model's performance, particularly in achieving consistent correctness, suggesting room for refinement. The model's recall ability varies across different information categories. Conclusion: The study provides evidence towards the potential of AI-assisted tools in reducing clinical documentation burden. Future work is suggested on expanding the research scope with larger language models, and comparative analysis to measure documentation efforts and time.","PeriodicalId":501290,"journal":{"name":"medRxiv - Emergency Medicine","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Emergency Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.12.06.23299573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: We present a proof-of-concept digital scribe system as an ED clinical conversation summarization pipeline and report its performance. Materials and Methods: We use four pre-trained large language models to establish the digital scribe system: T5-small, T5-base, PEGASUS-PubMed, and BART-Large-CNN via zero-shot and fine-tuning approaches. Our dataset includes 100 referral conversations among ED clinicians and medical records. We report the ROUGE-1, ROUGE-2, and ROUGE-L to compare model performance. In addition, we annotated transcriptions to assess the quality of generated summaries. Results: The fine-tuned BART-Large-CNN model demonstrates greater performance in summarization tasks with the highest ROUGE scores (F1ROUGE-1=0.49, F1ROUGE-2=0.23, F1ROUGE-L=0.35) scores. In contrast, PEGASUS-PubMed lags notably (F1ROUGE-1=0.28, F1ROUGE-2=0.11, F1ROUGE-L=0.22). BART-Large-CNN's performance decreases by more than 50% with the zero-shot approach. Annotations show that BART-Large-CNN performs 71.4% recall in identifying key information and a 67.7% accuracy rate. Discussion: The BART-Large-CNN model demonstrates a high level of understanding of clinical dialogue structure, indicated by its performance with and without fine-tuning. Despite some instances of high recall, there is variability in the model's performance, particularly in achieving consistent correctness, suggesting room for refinement. The model's recall ability varies across different information categories. Conclusion: The study provides evidence towards the potential of AI-assisted tools in reducing clinical documentation burden. Future work is suggested on expanding the research scope with larger language models, and comparative analysis to measure documentation efforts and time.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
开发和评估数字抄写员:急诊科咨询会议的对话总结管道,减轻文档记录负担
目的:我们提出了一个概念验证数字抄写系统,作为急诊室临床对话总结管道,并报告其性能。材料与方法:我们使用四种预先训练好的大型语言模型来建立数字抄写系统:T5-small、T5-base、PEGASUS-PubMed 和 BART-Large-CNN。我们的数据集包括 100 个急诊室临床医生之间的转诊对话和医疗记录。我们报告了 ROUGE-1、ROUGE-2 和 ROUGE-L,以比较模型性能。此外,我们还对转录内容进行了注释,以评估生成摘要的质量。结果经过微调的 BART-Large-CNN 模型在摘要任务中表现出更高的性能,其 ROUGE 分数最高(F1ROUGE-1=0.49,F1ROUGE-2=0.23,F1ROUGE-L=0.35)。相比之下,PEGASUS-PubMed 则明显落后(F1ROUGE-1=0.28,F1ROUGE-2=0.11,F1ROUGE-L=0.22)。采用零镜头方法后,BART-Large-CNN 的性能下降了 50% 以上。注释显示,BART-Large-CNN 在识别关键信息方面的召回率为 71.4%,准确率为 67.7%。讨论BART-Large-CNN 模型在微调和不微调的情况下都表现出了对临床对话结构的高度理解。尽管存在召回率高的情况,但该模型的性能仍存在差异,特别是在实现一致的正确性方面,这表明该模型仍有改进的余地。在不同的信息类别中,模型的召回能力也各不相同。结论本研究为人工智能辅助工具在减轻临床文档负担方面的潜力提供了证据。建议在今后的工作中扩大研究范围,采用更大型的语言模型,并进行比较分析,以衡量记录工作和时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
In-hospital stay of anemic patients (70-90 g.L-1) in the ED with/without transfusion: a single-center propensity-matched study. Accuracy of the National Early Warning Score version 2 (NEWS2) in predicting need for time-critical treatment: Retrospective observational cohort study What strategies are used to select patients for direct admission under acute medicine services? A systematic review of the literature. Feasibility of mixed-reality telecollaboration to enhance pre-medical student shadowing education Evaluating the Impact of NHS Strikes on Patient Flow through Emergency Departments
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1