Chenwei Yan , Xiangling Fu , Xien Liu , Yuanqiu Zhang , Yue Gao , Ji Wu , Qiang Li
{"title":"A survey of automated International Classification of Diseases coding: development, challenges, and applications","authors":"Chenwei Yan , Xiangling Fu , Xien Liu , Yuanqiu Zhang , Yue Gao , Ji Wu , Qiang Li","doi":"10.1016/j.imed.2022.03.003","DOIUrl":null,"url":null,"abstract":"<div><p>The International Classification of Diseases (ICD) is an international standard and tool for epidemiological investigation, health management, and clinical diagnosis with a fundamental role in intelligent medical care. The assignment of ICD codes to health-related documents has become a focus of academic research, and numerous studies have developed the process of ICD coding from manual to automated work. In this survey, we review the developmental history of this task in recent decades in depth, from the rules-based stage, through the traditional machine learning stage, to the neural-network-based stage. Various methods have been introduced to solve this problem by using different techniques, and we report a performance comparison of different methods on the publicly available Medical Information Mart for Intensive Care dataset. Next, we summarize four major challenges of this task: (1) the large label space, (2) the unbalanced label distribution, (3) the long text of documents, and (4) the interpretability of coding. Various solutions that have been proposed to solve these problems are analyzed. Further, we discuss the applications of ICD coding, from mortality statistics to payments based on disease-related groups and hospital performance management. In addition, we discuss different ways of considering and evaluating this task, and how it has been transformed into a learnable problem. We also provide details of the commonly used datasets. Overall, this survey aims to provide a reference and possible prospective directions for follow-up research work.</p></div>","PeriodicalId":73400,"journal":{"name":"Intelligent medicine","volume":"2 3","pages":"Pages 161-173"},"PeriodicalIF":4.4000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667102622000092/pdfft?md5=46ff900f8f27606836538b7809ec824b&pid=1-s2.0-S2667102622000092-main.pdf","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent medicine","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667102622000092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 6
Abstract
The International Classification of Diseases (ICD) is an international standard and tool for epidemiological investigation, health management, and clinical diagnosis with a fundamental role in intelligent medical care. The assignment of ICD codes to health-related documents has become a focus of academic research, and numerous studies have developed the process of ICD coding from manual to automated work. In this survey, we review the developmental history of this task in recent decades in depth, from the rules-based stage, through the traditional machine learning stage, to the neural-network-based stage. Various methods have been introduced to solve this problem by using different techniques, and we report a performance comparison of different methods on the publicly available Medical Information Mart for Intensive Care dataset. Next, we summarize four major challenges of this task: (1) the large label space, (2) the unbalanced label distribution, (3) the long text of documents, and (4) the interpretability of coding. Various solutions that have been proposed to solve these problems are analyzed. Further, we discuss the applications of ICD coding, from mortality statistics to payments based on disease-related groups and hospital performance management. In addition, we discuss different ways of considering and evaluating this task, and how it has been transformed into a learnable problem. We also provide details of the commonly used datasets. Overall, this survey aims to provide a reference and possible prospective directions for follow-up research work.