Background
Most neurological care is delivered in outpatient settings without mandated clinical coding. The clinical records remain stored as unstructured text with inconsistent formatting. There is a significant opportunity to increase the value of these data through automated clinical coding utilising natural language processing (NLP). While existing models for full ICD-10 clinical coding lack sufficient accuracy for clinical use, 60 % of neurology outpatient cases fall into just five diagnostic categories. This suggests that a simplified coding system could enhance feasibility and serve as a foundation for more complex coding schemes.
Objective
We propose a simplified coding system of 29 codes for neurology outpatient episodes. We evaluate several machine learning methods in a supervised single-label classification task on real-world outpatient care notes.
Methods
We collected outpatient care notes created between 15 November 2018 and 2 December 2022. The training dataset included 14,917 care notes, most of which were annotated with ICD-10 codes during routine care and subsequently mapped to 29 simplified diagnostic categories. An external validation set of 1,042 randomly selected encounters was retrospectively coded.
Models included logistic regression, support vector machine, bidirectional LSTM, BERT-based models (DistilBERT, RoBERTa), and a generative large language model (LLM), Mistral 7B. All but the LLM were trained via 10-fold stratified cross-validation; final models were trained on the complete dataset.
Results
DistilBERT and RoBERTa outperformed traditional models, with F1-scores of 81.73 (95 % CI: 79.02–84.13) and 81.16 (95 % CI: 78.84–83.76), respectively. The LLM–DistilBERT hybrid performed worse than all but BiLSTM and produced “medical hallucinations,” making it unsuitable for clinical use. The training data were highly imbalanced. BERT-based models showed strong performance on high-frequency categories, with F1-scores over 85 % for the top five classes. At a 0.85 confidence threshold, DistilBERT achieved 96 % accuracy on 64 % of the external validation set.
Conclusions
BERT-based NLP models perform well in classifying neurology outpatient clinic notes when a reduced set of diagnostic categories is used. In a human-in-the-loop workflow, such models can meaningfully reduce the manual coding workload while preserving accuracy. To our knowledge, this is the first applied study of automated clinical coding in neurology outpatient care.
扫码关注我们
求助内容:
应助结果提醒方式:
