自动ICD-10代码与诊断的关联:保加利亚病例

CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics Pub Date : 2020-11-19 DOI:10.1145/3429210.3429224

Boris Velichkov, Simeon Gerginov, P. Panayotov, S. Vassileva, Gerasim Velchev, I. Koychev, S. Boytcheva

{"title":"自动ICD-10代码与诊断的关联:保加利亚病例","authors":"Boris Velichkov, Simeon Gerginov, P. Panayotov, S. Vassileva, Gerasim Velchev, I. Koychev, S. Boytcheva","doi":"10.1145/3429210.3429224","DOIUrl":null,"url":null,"abstract":"This paper presents an approach for the automatic association of diagnoses in Bulgarian language to ICD-10 codes. Since this task is currently performed manually by medical professionals, the ability to automate it would save time and allow doctors to focus more on patient care. The presented approach employs a fine-tuned language model (i.e. BERT) as a multi-class classification model. As there are several different types of BERT models, we conduct experiments to assess the applicability of domain and language specific model adaptation. To train our models we use a big corpora of about 350,000 textual descriptions of diagnosis in Bulgarian language annotated with ICD-10 codes. We conduct experiments comparing the accuracy of ICD-10 code prediction using different types of BERT language models. The results show that the MultilingualBERT model (Accuracy Top 1 - 81%; Macro F1 - 86%, MRR Top 5 - 88%) outperforms other models. However, all models seem to suffer from the class imbalance in the training dataset. The achieved accuracy of prediction in the experiments can be evaluated as very high, given the huge amount of classes and noisiness of the data. The result also provides evidence that the collected dataset and the proposed approach can be useful in building an application to help medical practitioners with this task and encourages further research to improve the prediction accuracy of the models. By design, the proposed approach strives to be language-independent as much as possible and can be easily adapted to other languages.","PeriodicalId":164790,"journal":{"name":"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Automatic ICD-10 codes association to diagnosis: Bulgarian case\",\"authors\":\"Boris Velichkov, Simeon Gerginov, P. Panayotov, S. Vassileva, Gerasim Velchev, I. Koychev, S. Boytcheva\",\"doi\":\"10.1145/3429210.3429224\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents an approach for the automatic association of diagnoses in Bulgarian language to ICD-10 codes. Since this task is currently performed manually by medical professionals, the ability to automate it would save time and allow doctors to focus more on patient care. The presented approach employs a fine-tuned language model (i.e. BERT) as a multi-class classification model. As there are several different types of BERT models, we conduct experiments to assess the applicability of domain and language specific model adaptation. To train our models we use a big corpora of about 350,000 textual descriptions of diagnosis in Bulgarian language annotated with ICD-10 codes. We conduct experiments comparing the accuracy of ICD-10 code prediction using different types of BERT language models. The results show that the MultilingualBERT model (Accuracy Top 1 - 81%; Macro F1 - 86%, MRR Top 5 - 88%) outperforms other models. However, all models seem to suffer from the class imbalance in the training dataset. The achieved accuracy of prediction in the experiments can be evaluated as very high, given the huge amount of classes and noisiness of the data. The result also provides evidence that the collected dataset and the proposed approach can be useful in building an application to help medical practitioners with this task and encourages further research to improve the prediction accuracy of the models. By design, the proposed approach strives to be language-independent as much as possible and can be easily adapted to other languages.\",\"PeriodicalId\":164790,\"journal\":{\"name\":\"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3429210.3429224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3429210.3429224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

本文提出了一种保加利亚语诊断与ICD-10代码自动关联的方法。由于这项任务目前由医疗专业人员手动执行，因此自动化功能将节省时间，并使医生能够更多地关注患者护理。所提出的方法采用一种微调的语言模型(即BERT)作为多类分类模型。由于有几种不同类型的BERT模型，我们进行了实验来评估领域和语言特定模型自适应的适用性。为了训练我们的模型，我们使用了一个大型语料库，该语料库包含大约350,000个保加利亚语的诊断文本描述，并附有ICD-10代码注释。我们通过实验比较了不同类型的BERT语言模型对ICD-10代码预测的准确性。结果表明:MultilingualBERT模型(准确率Top 1 - 81%;宏观F1 - 86%， MRR前5 - 88%)优于其他模型。然而，所有的模型似乎都受到训练数据集中的类不平衡的影响。考虑到大量的分类和数据的噪声，在实验中实现的预测精度可以评价为非常高。该结果还提供了证据，表明所收集的数据集和提出的方法可以用于构建应用程序，以帮助医疗从业者完成这项任务，并鼓励进一步研究以提高模型的预测准确性。通过设计，所提出的方法力求尽可能地与语言无关，并且可以很容易地适应其他语言。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Automatic ICD-10 codes association to diagnosis: Bulgarian case

This paper presents an approach for the automatic association of diagnoses in Bulgarian language to ICD-10 codes. Since this task is currently performed manually by medical professionals, the ability to automate it would save time and allow doctors to focus more on patient care. The presented approach employs a fine-tuned language model (i.e. BERT) as a multi-class classification model. As there are several different types of BERT models, we conduct experiments to assess the applicability of domain and language specific model adaptation. To train our models we use a big corpora of about 350,000 textual descriptions of diagnosis in Bulgarian language annotated with ICD-10 codes. We conduct experiments comparing the accuracy of ICD-10 code prediction using different types of BERT language models. The results show that the MultilingualBERT model (Accuracy Top 1 - 81%; Macro F1 - 86%, MRR Top 5 - 88%) outperforms other models. However, all models seem to suffer from the class imbalance in the training dataset. The achieved accuracy of prediction in the experiments can be evaluated as very high, given the huge amount of classes and noisiness of the data. The result also provides evidence that the collected dataset and the proposed approach can be useful in building an application to help medical practitioners with this task and encourages further research to improve the prediction accuracy of the models. By design, the proposed approach strives to be language-independent as much as possible and can be easily adapted to other languages.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics

自引率

0.00%

发文量