{"title":"NLP Analysis of COVID-19 Radiology Reports in Indonesian using IndoBERT","authors":"N. N. Qomariyah, Tianda Sun, D. Kazakov","doi":"10.1109/IBIOMED56408.2022.9988223","DOIUrl":null,"url":null,"abstract":"The presence of COVID-19, a respiratory disease, can be detected through medical imaging, such as Chest X-Ray (CXR) and Computed Tomography (CT) scans. These radiology images can also show how the patient's condition progresses. Radiologists need to provide a written report for each image, so that other clinicians can use it in their decision making. In this study, we applied one of the Natural Language Processing (NLP) models called IndoBERT to analyze radiology reports of COVID-19 patients written in Indonesian. We performed two tasks, clustering to group reports by meaning and understand their content, and text classification to predict one of the five possible outcomes for each patient. We show the most frequent topics in radiology reports, and word scores in each topic. The IndoBERT model was fine tuned on a medical text, ‘Kamus Kedokteran Dorland’ in an attempt to further improve it. This proved unnecessary: on one hand, there were no additional benefits, on the other, the standard model alone achieved a very satisfactory classification accuracy of over 90 %.","PeriodicalId":250112,"journal":{"name":"2022 4th International Conference on Biomedical Engineering (IBIOMED)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Biomedical Engineering (IBIOMED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IBIOMED56408.2022.9988223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The presence of COVID-19, a respiratory disease, can be detected through medical imaging, such as Chest X-Ray (CXR) and Computed Tomography (CT) scans. These radiology images can also show how the patient's condition progresses. Radiologists need to provide a written report for each image, so that other clinicians can use it in their decision making. In this study, we applied one of the Natural Language Processing (NLP) models called IndoBERT to analyze radiology reports of COVID-19 patients written in Indonesian. We performed two tasks, clustering to group reports by meaning and understand their content, and text classification to predict one of the five possible outcomes for each patient. We show the most frequent topics in radiology reports, and word scores in each topic. The IndoBERT model was fine tuned on a medical text, ‘Kamus Kedokteran Dorland’ in an attempt to further improve it. This proved unnecessary: on one hand, there were no additional benefits, on the other, the standard model alone achieved a very satisfactory classification accuracy of over 90 %.