ICDBigBird: A Contextual Embedding Model for ICD Code Classification

Workshop on Biomedical Natural Language Processing Pub Date : 2022-04-21 DOI:10.48550/arXiv.2204.10408

George Michalopoulos, Michal Malyska, Nicola Sahar, Alexander Wong, Helen H. Chen

{"title":"ICDBigBird: A Contextual Embedding Model for ICD Code Classification","authors":"George Michalopoulos, Michal Malyska, Nicola Sahar, Alexander Wong, Helen H. Chen","doi":"10.48550/arXiv.2204.10408","DOIUrl":null,"url":null,"abstract":"The International Classification of Diseases (ICD) system is the international standard for classifying diseases and procedures during a healthcare encounter and is widely used for healthcare reporting and management purposes. Assigning correct codes for clinical procedures is important for clinical, operational and financial decision-making in healthcare. Contextual word embedding models have achieved state-of-the-art results in multiple NLP tasks. However, these models have yet to achieve state-of-the-art results in the ICD classification task since one of their main disadvantages is that they can only process documents that contain a small number of tokens which is rarely the case with real patient notes. In this paper, we introduce ICDBigBird a BigBird-based model which can integrate a Graph Convolutional Network (GCN), that takes advantage of the relations between ICD codes in order to create ‘enriched’ representations of their embeddings, with a BigBird contextual model that can process larger documents. Our experiments on a real-world clinical dataset demonstrate the effectiveness of our BigBird-based model on the ICD classification task as it outperforms the previous state-of-the-art models.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Biomedical Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2204.10408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

The International Classification of Diseases (ICD) system is the international standard for classifying diseases and procedures during a healthcare encounter and is widely used for healthcare reporting and management purposes. Assigning correct codes for clinical procedures is important for clinical, operational and financial decision-making in healthcare. Contextual word embedding models have achieved state-of-the-art results in multiple NLP tasks. However, these models have yet to achieve state-of-the-art results in the ICD classification task since one of their main disadvantages is that they can only process documents that contain a small number of tokens which is rarely the case with real patient notes. In this paper, we introduce ICDBigBird a BigBird-based model which can integrate a Graph Convolutional Network (GCN), that takes advantage of the relations between ICD codes in order to create ‘enriched’ representations of their embeddings, with a BigBird contextual model that can process larger documents. Our experiments on a real-world clinical dataset demonstrate the effectiveness of our BigBird-based model on the ICD classification task as it outperforms the previous state-of-the-art models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ICDBigBird:用于ICD代码分类的上下文嵌入模型

国际疾病分类(ICD)系统是在医疗过程中对疾病和程序进行分类的国际标准，广泛用于医疗报告和管理目的。为临床程序分配正确的代码对于医疗保健中的临床、操作和财务决策非常重要。上下文词嵌入模型在多个NLP任务中取得了最先进的结果。然而，这些模型在ICD分类任务中还没有达到最先进的结果，因为它们的主要缺点之一是它们只能处理包含少量令牌的文档，而真正的患者笔记很少出现这种情况。在本文中，我们介绍了一个基于BigBird的模型ICDBigBird，它可以集成一个图卷积网络(GCN)，利用ICD代码之间的关系来创建它们嵌入的“丰富”表示，以及一个可以处理更大文档的BigBird上下文模型。我们在现实世界临床数据集上的实验证明了基于bigbird的模型在ICD分类任务上的有效性，因为它优于以前最先进的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Workshop on Biomedical Natural Language Processing

自引率

0.00%

发文量