Andrés J. Muñoz Martín, Ramón Lecumberri, Juan Carlos Souto, Berta Obispo, Antonio Sanchez, Jorge Aparicio, Cristina Aguayo, David Gutierrez, Andrés García Palomo, Diego Benavent, Miren Taberna, María Carmen Viñuela-Benéitez, Daniel Arumi, Miguel Ángel Hernández-Presa
{"title":"利用机器学习和自然语言处理建立癌症相关静脉血栓栓塞抗凝患者大出血预测模型","authors":"Andrés J. Muñoz Martín, Ramón Lecumberri, Juan Carlos Souto, Berta Obispo, Antonio Sanchez, Jorge Aparicio, Cristina Aguayo, David Gutierrez, Andrés García Palomo, Diego Benavent, Miren Taberna, María Carmen Viñuela-Benéitez, Daniel Arumi, Miguel Ángel Hernández-Presa","doi":"10.1007/s12094-024-03586-2","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Purpose</h3><p>We developed a predictive model to assess the risk of major bleeding (MB) within 6 months of primary venous thromboembolism (VTE) in cancer patients receiving anticoagulant treatment. We also sought to describe the prevalence and incidence of VTE in cancer patients, and to describe clinical characteristics at baseline and bleeding events during follow-up in patients receiving anticoagulants.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>This observational, retrospective, and multicenter study used natural language processing and machine learning (ML), to analyze unstructured clinical data from electronic health records from nine Spanish hospitals between 2014 and 2018. All adult cancer patients with VTE receiving anticoagulants were included. Both clinically- and ML-driven feature selection was performed to identify MB predictors. Logistic regression (LR), decision tree (DT), and random forest (RF) algorithms were used to train predictive models, which were validated in a hold-out dataset and compared to the previously developed CAT-BLEED score.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Of the 2,893,108 cancer patients screened, in-hospital VTE prevalence was 5.8% and the annual incidence ranged from 2.7 to 3.9%. We identified 21,227 patients with active cancer and VTE receiving anticoagulants (53.9% men, median age of 70 years). MB events after VTE diagnosis occurred in 10.9% of patients within the first six months. MB predictors included: hemoglobin, metastasis, age, platelets, leukocytes, and serum creatinine. The LR, DT, and RF models had AUC-ROC (95% confidence interval) values of 0.60 (0.55, 0.65), 0.60 (0.55, 0.65), and 0.61 (0.56, 0.66), respectively. These models outperformed the CAT-BLEED score with values of 0.53 (0.48, 0.59).</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>Our study shows encouraging results in identifying anticoagulated patients with cancer-associated VTE who are at high risk of MB.</p>","PeriodicalId":10166,"journal":{"name":"Clinical and Translational Oncology","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prediction model for major bleeding in anticoagulated patients with cancer-associated venous thromboembolism using machine learning and natural language processing\",\"authors\":\"Andrés J. Muñoz Martín, Ramón Lecumberri, Juan Carlos Souto, Berta Obispo, Antonio Sanchez, Jorge Aparicio, Cristina Aguayo, David Gutierrez, Andrés García Palomo, Diego Benavent, Miren Taberna, María Carmen Viñuela-Benéitez, Daniel Arumi, Miguel Ángel Hernández-Presa\",\"doi\":\"10.1007/s12094-024-03586-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<h3 data-test=\\\"abstract-sub-heading\\\">Purpose</h3><p>We developed a predictive model to assess the risk of major bleeding (MB) within 6 months of primary venous thromboembolism (VTE) in cancer patients receiving anticoagulant treatment. We also sought to describe the prevalence and incidence of VTE in cancer patients, and to describe clinical characteristics at baseline and bleeding events during follow-up in patients receiving anticoagulants.</p><h3 data-test=\\\"abstract-sub-heading\\\">Methods</h3><p>This observational, retrospective, and multicenter study used natural language processing and machine learning (ML), to analyze unstructured clinical data from electronic health records from nine Spanish hospitals between 2014 and 2018. All adult cancer patients with VTE receiving anticoagulants were included. Both clinically- and ML-driven feature selection was performed to identify MB predictors. Logistic regression (LR), decision tree (DT), and random forest (RF) algorithms were used to train predictive models, which were validated in a hold-out dataset and compared to the previously developed CAT-BLEED score.</p><h3 data-test=\\\"abstract-sub-heading\\\">Results</h3><p>Of the 2,893,108 cancer patients screened, in-hospital VTE prevalence was 5.8% and the annual incidence ranged from 2.7 to 3.9%. We identified 21,227 patients with active cancer and VTE receiving anticoagulants (53.9% men, median age of 70 years). MB events after VTE diagnosis occurred in 10.9% of patients within the first six months. MB predictors included: hemoglobin, metastasis, age, platelets, leukocytes, and serum creatinine. The LR, DT, and RF models had AUC-ROC (95% confidence interval) values of 0.60 (0.55, 0.65), 0.60 (0.55, 0.65), and 0.61 (0.56, 0.66), respectively. These models outperformed the CAT-BLEED score with values of 0.53 (0.48, 0.59).</p><h3 data-test=\\\"abstract-sub-heading\\\">Conclusions</h3><p>Our study shows encouraging results in identifying anticoagulated patients with cancer-associated VTE who are at high risk of MB.</p>\",\"PeriodicalId\":10166,\"journal\":{\"name\":\"Clinical and Translational Oncology\",\"volume\":\"4 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical and Translational Oncology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s12094-024-03586-2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical and Translational Oncology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12094-024-03586-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Prediction model for major bleeding in anticoagulated patients with cancer-associated venous thromboembolism using machine learning and natural language processing
Purpose
We developed a predictive model to assess the risk of major bleeding (MB) within 6 months of primary venous thromboembolism (VTE) in cancer patients receiving anticoagulant treatment. We also sought to describe the prevalence and incidence of VTE in cancer patients, and to describe clinical characteristics at baseline and bleeding events during follow-up in patients receiving anticoagulants.
Methods
This observational, retrospective, and multicenter study used natural language processing and machine learning (ML), to analyze unstructured clinical data from electronic health records from nine Spanish hospitals between 2014 and 2018. All adult cancer patients with VTE receiving anticoagulants were included. Both clinically- and ML-driven feature selection was performed to identify MB predictors. Logistic regression (LR), decision tree (DT), and random forest (RF) algorithms were used to train predictive models, which were validated in a hold-out dataset and compared to the previously developed CAT-BLEED score.
Results
Of the 2,893,108 cancer patients screened, in-hospital VTE prevalence was 5.8% and the annual incidence ranged from 2.7 to 3.9%. We identified 21,227 patients with active cancer and VTE receiving anticoagulants (53.9% men, median age of 70 years). MB events after VTE diagnosis occurred in 10.9% of patients within the first six months. MB predictors included: hemoglobin, metastasis, age, platelets, leukocytes, and serum creatinine. The LR, DT, and RF models had AUC-ROC (95% confidence interval) values of 0.60 (0.55, 0.65), 0.60 (0.55, 0.65), and 0.61 (0.56, 0.66), respectively. These models outperformed the CAT-BLEED score with values of 0.53 (0.48, 0.59).
Conclusions
Our study shows encouraging results in identifying anticoagulated patients with cancer-associated VTE who are at high risk of MB.