Introduction: Electronic health records contain both structured and unstructured data, with unstructured clinical notes widely used in addiction psychiatry. Clinical notes have numerous errors and require proofreading to ensure accuracy and readability. This study evaluates natural language processing methods and adapts a Large Language Model (LLM) for proofreading clinical notes and extracting substance-related information.
Methods: We analysed clinical notes from a 5-year addiction medicine electronic health record dataset (2018-2023), selecting 6500 notes. The proofreading task involved correcting spelling and expanding abbreviations, while information extraction identified the presence of substance use and quantified the time since last use. Annotations by a team of doctors and nurses provided the gold standard. Against this, we compared the performance of existing solutions, including LLMs, and adapted an LLM for these tasks. The final model (fine-tuned LLAMA-3.2-3b) is also compared against a state-of-the-art commercial model (Generative Pretrained Transformer-4-o), and a human-preference experiment is done with masked raters choosing between model-generated and human-generated proofread versions.
Results: Proofreading improved readability and decreased out-of-vocabulary words. LLM-based solutions outperformed simpler approaches. The fine-tuned model outperformed the Generative Pretrained Transformer-4-o on both tasks. Masked human evaluators chose model-corrected clinical notes over the human-corrected version in 62% of trials (p < 0.001). On the information extraction task, while the overall performance is satisfactory (Mean F1 0.99), it is poor on rarer substance classes like hallucinogens.
Discussion and conclusions: Fine-tuned LLMs effectively standardised clinical notes and extracted structured information from addiction psychiatry records. Both these functionalities have important applications. Standardising improves the readability of clinical documentation and facilitates communication within and between interdisciplinary teams. Automated information extraction can decrease the burden on clinical staff, allow the creation of research cohorts from existing records and improve treatment outcomes by extracting critical information, such as 'time since last drink', which can be used to raise alerts. Even with limited computational resources, it is possible to adapt open-source LLMs for bespoke tasks in the field of addiction psychiatry. Our proposed solution is a model that can be deployed on consumer-grade servers, thus ensuring data privacy and security.