Background
Extracting temporally sensitive outcomes such as tumor progression from unstructured electronic medical records (EMRs) remains a major challenge in oncology. This study evaluates a solution with a domain-adapted natural language processing (NLP) pipeline designed to extract structured, temporally anchored clinical outcomes from narrative EMR data.
Patients and methods
Patients with oncogene-addicted advanced or metastatic non-small-cell lung cancer (NSCLC) treated with oral targeted therapies between January 2020 and June 2023 at a French academic hospital were included. Extracted Facts were benchmarked against expert annotations. All outputs were mapped to Observational Medical Outcome Partnership vocabularies. F1-scores were calculated for the correct Concept detection without and with their Temporality. Real-world progression-free survival (rwPFS) was estimated based on retrieved clinical outcomes.
Results
Among 1030 NSCLC patients treated between 2020 and 2023, 112 were confirmed to have advanced or metastatic disease with an oncogenic driver mutation, primarily EGFR (n = 66), ALK (n = 23), and KRAS (n = 16). The NLP pipeline achieved high accuracy in extracting clinical concepts, with an F1-score of 79.7% for tumor evolution concepts and 62.0% when temporality was included. Overall performance across all domains reached F1-scores of 76.5% for concept extraction and 63.7% with temporality. Median rwPFS was 21.9 months for EGFR-mutated, 52.4 months for ALK-translocated, and 5.0 months for KRAS-mutant tumors, in line with published benchmarks. Reviewing automatically collected data was 5.8 times faster compared with manual collection.
Conclusions
Our solution demonstrates robust performance for extracting temporally structured tumor outcomes from EMRs and supports the reconstruction of real-world endpoints in oncology.
扫码关注我们
求助内容:
应助结果提醒方式:
