Background:
Extracting principal diagnosis from patient discharge summaries is an essential task for the meaningful use of medical data. The extraction process, usually by medical staff, is laborious and time-consuming. Although automatic models have been proposed to retrieve principal diagnoses from medical records, many rare diagnoses and a small amount of training data per rare diagnosis provide significant statistical and computational challenges.
Objective:
In this study, we aimed to extract principal diagnoses with limited available data.
Methods:
We proposed the OLR-Net, Object Label Retrieval Network, to extract principal diagnoses for discharge summaries. Our approach included semantic extraction, label localization, label retrieval, and recommendation. The semantic information of discharge summaries was mapped into the diagnoses set. Then, one-dimensional convolutional neural networks slid into the bottom-up region for diagnosis localization to enrich rare diagnoses. Finally, OLR-Net detected the principal diagnosis in the localized region. The evaluation metrics focus on the hit ratio, mean reciprocal rank, and the area under the receiver operating characteristic curve (AUROC).
Results:
12,788 desensitized discharge summary records were collected from the oncology department at Hainan Hospital of Chinese People’s Liberation Army General Hospital. We designed five distinct settings based on the number of training data per diagnosis: the full dataset, the top-50 dataset, the few-shot dataset, the one-shot dataset, and the zero-shot dataset. The performance of our model had the highest HR@5 of 0.8778 and macro-AUROC of 0.9851. In the limited available (few-shot and one-shot) dataset, the macro-AUROC were 0.9833 and 0.9485, respectively.
Conclusions:
OLR-Net has great potential for extracting principal diagnosis with limited available data through label localization and retrieval.