{"title":"LLM-IE: a python package for biomedical generative information extraction with large language models.","authors":"Enshuo Hsu, Kirk Roberts","doi":"10.1093/jamiaopen/ooaf012","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Despite the recent adoption of large language models (LLMs) for biomedical information extraction (IE), challenges in prompt engineering and algorithms persist, with no dedicated software available. To address this, we developed <i>LLM-IE</i>: a Python package for building complete IE pipelines.</p><p><strong>Materials and methods: </strong>The <i>LLM-IE</i> supports named entity recognition, entity attribute extraction, and relation extraction tasks. We benchmarked it on the i2b2 clinical datasets.</p><p><strong>Results: </strong>The sentence-based prompting algorithm resulted in the best 8-shot performance of over 70% strict F1 for entity extraction and about 60% F1 for entity attribute extraction.</p><p><strong>Discussion: </strong>We developed a Python package, <i>LLM-IE,</i> highlighting (1) an interactive LLM agent to support schema definition and prompt design, (2) state-of-the-art prompting algorithms, and (3) visualization features.</p><p><strong>Conclusion: </strong>The <i>LLM-IE</i> provides essential building blocks for developing robust information extraction pipelines. Future work will aim to expand its features and further optimize computational efficiency.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 2","pages":"ooaf012"},"PeriodicalIF":2.5000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11901043/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooaf012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: Despite the recent adoption of large language models (LLMs) for biomedical information extraction (IE), challenges in prompt engineering and algorithms persist, with no dedicated software available. To address this, we developed LLM-IE: a Python package for building complete IE pipelines.
Materials and methods: The LLM-IE supports named entity recognition, entity attribute extraction, and relation extraction tasks. We benchmarked it on the i2b2 clinical datasets.
Results: The sentence-based prompting algorithm resulted in the best 8-shot performance of over 70% strict F1 for entity extraction and about 60% F1 for entity attribute extraction.
Discussion: We developed a Python package, LLM-IE, highlighting (1) an interactive LLM agent to support schema definition and prompt design, (2) state-of-the-art prompting algorithms, and (3) visualization features.
Conclusion: The LLM-IE provides essential building blocks for developing robust information extraction pipelines. Future work will aim to expand its features and further optimize computational efficiency.