Clinical progress notes are critical artifacts for modeling patient trajectories, auditing clinical decision-making, and powering downstream applications in clinical natural language processing (NLP). However, public resources such as MIMIC-III provide limited progress notes, constraining the development of robust and generalizable machine learning models. This work proposes a novel hybrid prompting framework — TriMedPrompt — to generate high-quality, structurally and semantically coherent synthetic progress notes using large language models (LLMs). Our approach conditions the LLMs on a triad of complementary biomedical signals: (1) real-world progress notes from MIMIC-III, (2) clinically aligned case reports from the PMC Patients dataset, selected via embedding-based retrieval, and (3) structured disease-centric knowledge from PrimeKG. We design a multi-source, layout-aware prompting pipeline that dynamically integrates structured and unstructured information to produce notes across standard clinical formats (e.g., SOAP, BIRP, PIE, DAP).
Through rigorous evaluations—including layout adherence, entity extraction comparisons, semantic similarity analysis, and controlled ablations, we demonstrate that our generated notes achieve a 98.6% semantic entity alignment score with real clinical notes, while maintaining high structural fidelity. Ablation studies further confirm the critical role of combining structured biomedical knowledge and unstructured narrative data in improving note quality. In addition, we illustrate the potential of our synthetic notes in privacy-preserving clinical NLP, offering a safe alternative for model development and benchmarking in sensitive healthcare settings. This work establishes a scalable, controllable paradigm for clinical text synthesis, significantly expanding access to realistic, diverse progress notes and laying the foundation for advancing trustworthy clinical NLP research.
扫码关注我们
求助内容:
应助结果提醒方式:

