Aymen Meddeb, Philipe Ebert, Keno Kyrill Bressem, Dmitriy Desser, Andrea Dell'Orco, Georg Bohner, Justus F Kleine, Eberhard Siebert, Nils Grauhan, Marc A Brockmann, Ahmed Othman, Michael Scheel, Jawed Nawabi
{"title":"评估从缺血性中风患者机械血栓切除术非结构化报告中提取数据的本地开源大型语言模型。","authors":"Aymen Meddeb, Philipe Ebert, Keno Kyrill Bressem, Dmitriy Desser, Andrea Dell'Orco, Georg Bohner, Justus F Kleine, Eberhard Siebert, Nils Grauhan, Marc A Brockmann, Ahmed Othman, Michael Scheel, Jawed Nawabi","doi":"10.1136/jnis-2024-022078","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>A study was undertaken to assess the effectiveness of open-source large language models (LLMs) in extracting clinical data from unstructured mechanical thrombectomy reports in patients with ischemic stroke caused by a vessel occlusion.</p><p><strong>Methods: </strong>We deployed local open-source LLMs to extract data points from free-text procedural reports in patients who underwent mechanical thrombectomy between September 2020 and June 2023 in our institution. The external dataset was obtained from a second university hospital and comprised consecutive cases treated between September 2023 and March 2024. Ground truth labeling was facilitated by a human-in-the-loop (HITL) approach, with time metrics recorded for both automated and manual data extractions. We tested three models-Mixtral, Qwen, and BioMistral-assessing their performance on precision, recall, and F1 score across 15 clinical categories such as National Institute of Health Stroke Scale (NIHSS) scores, occluded vessels, and medication details.</p><p><strong>Results: </strong>The study included 1000 consecutive reports from our primary institution and 50 reports from a secondary institution. Mixtral showed the highest precision, achieving 0.99 for first series time extraction and 0.69 for occluded vessel identification within the internal dataset. In the external dataset, precision ranged from 1.00 for NIHSS scores to 0.70 for occluded vessels. Qwen showed moderate precision with a high of 0.85 for NIHSS scores and a low of 0.28 for occluded vessels. BioMistral had the broadest range of precision, from 0.81 for first series times to 0.14 for medication details. The HITL approach yielded an average time savings of 65.6% per case, with variations from 45.95% to 79.56%.</p><p><strong>Conclusion: </strong>This study highlights the potential of using LLMs for automated clinical data extraction from medical reports. Incorporating HITL annotations enhances precision and also ensures the reliability of the extracted data. This methodology presents a scalable privacy-preserving option that can significantly support clinical documentation and research endeavors.</p>","PeriodicalId":16411,"journal":{"name":"Journal of NeuroInterventional Surgery","volume":" ","pages":""},"PeriodicalIF":4.5000,"publicationDate":"2025-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating local open-source large language models for data extraction from unstructured reports on mechanical thrombectomy in patients with ischemic stroke.\",\"authors\":\"Aymen Meddeb, Philipe Ebert, Keno Kyrill Bressem, Dmitriy Desser, Andrea Dell'Orco, Georg Bohner, Justus F Kleine, Eberhard Siebert, Nils Grauhan, Marc A Brockmann, Ahmed Othman, Michael Scheel, Jawed Nawabi\",\"doi\":\"10.1136/jnis-2024-022078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>A study was undertaken to assess the effectiveness of open-source large language models (LLMs) in extracting clinical data from unstructured mechanical thrombectomy reports in patients with ischemic stroke caused by a vessel occlusion.</p><p><strong>Methods: </strong>We deployed local open-source LLMs to extract data points from free-text procedural reports in patients who underwent mechanical thrombectomy between September 2020 and June 2023 in our institution. The external dataset was obtained from a second university hospital and comprised consecutive cases treated between September 2023 and March 2024. Ground truth labeling was facilitated by a human-in-the-loop (HITL) approach, with time metrics recorded for both automated and manual data extractions. We tested three models-Mixtral, Qwen, and BioMistral-assessing their performance on precision, recall, and F1 score across 15 clinical categories such as National Institute of Health Stroke Scale (NIHSS) scores, occluded vessels, and medication details.</p><p><strong>Results: </strong>The study included 1000 consecutive reports from our primary institution and 50 reports from a secondary institution. Mixtral showed the highest precision, achieving 0.99 for first series time extraction and 0.69 for occluded vessel identification within the internal dataset. In the external dataset, precision ranged from 1.00 for NIHSS scores to 0.70 for occluded vessels. Qwen showed moderate precision with a high of 0.85 for NIHSS scores and a low of 0.28 for occluded vessels. BioMistral had the broadest range of precision, from 0.81 for first series times to 0.14 for medication details. The HITL approach yielded an average time savings of 65.6% per case, with variations from 45.95% to 79.56%.</p><p><strong>Conclusion: </strong>This study highlights the potential of using LLMs for automated clinical data extraction from medical reports. Incorporating HITL annotations enhances precision and also ensures the reliability of the extracted data. This methodology presents a scalable privacy-preserving option that can significantly support clinical documentation and research endeavors.</p>\",\"PeriodicalId\":16411,\"journal\":{\"name\":\"Journal of NeuroInterventional Surgery\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-01-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of NeuroInterventional Surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1136/jnis-2024-022078\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"NEUROIMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of NeuroInterventional Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/jnis-2024-022078","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NEUROIMAGING","Score":null,"Total":0}
Evaluating local open-source large language models for data extraction from unstructured reports on mechanical thrombectomy in patients with ischemic stroke.
Background: A study was undertaken to assess the effectiveness of open-source large language models (LLMs) in extracting clinical data from unstructured mechanical thrombectomy reports in patients with ischemic stroke caused by a vessel occlusion.
Methods: We deployed local open-source LLMs to extract data points from free-text procedural reports in patients who underwent mechanical thrombectomy between September 2020 and June 2023 in our institution. The external dataset was obtained from a second university hospital and comprised consecutive cases treated between September 2023 and March 2024. Ground truth labeling was facilitated by a human-in-the-loop (HITL) approach, with time metrics recorded for both automated and manual data extractions. We tested three models-Mixtral, Qwen, and BioMistral-assessing their performance on precision, recall, and F1 score across 15 clinical categories such as National Institute of Health Stroke Scale (NIHSS) scores, occluded vessels, and medication details.
Results: The study included 1000 consecutive reports from our primary institution and 50 reports from a secondary institution. Mixtral showed the highest precision, achieving 0.99 for first series time extraction and 0.69 for occluded vessel identification within the internal dataset. In the external dataset, precision ranged from 1.00 for NIHSS scores to 0.70 for occluded vessels. Qwen showed moderate precision with a high of 0.85 for NIHSS scores and a low of 0.28 for occluded vessels. BioMistral had the broadest range of precision, from 0.81 for first series times to 0.14 for medication details. The HITL approach yielded an average time savings of 65.6% per case, with variations from 45.95% to 79.56%.
Conclusion: This study highlights the potential of using LLMs for automated clinical data extraction from medical reports. Incorporating HITL annotations enhances precision and also ensures the reliability of the extracted data. This methodology presents a scalable privacy-preserving option that can significantly support clinical documentation and research endeavors.
期刊介绍:
The Journal of NeuroInterventional Surgery (JNIS) is a leading peer review journal for scientific research and literature pertaining to the field of neurointerventional surgery. The journal launch follows growing professional interest in neurointerventional techniques for the treatment of a range of neurological and vascular problems including stroke, aneurysms, brain tumors, and spinal compression.The journal is owned by SNIS and is also the official journal of the Interventional Chapter of the Australian and New Zealand Society of Neuroradiology (ANZSNR), the Canadian Interventional Neuro Group, the Hong Kong Neurological Society (HKNS) and the Neuroradiological Society of Taiwan.