Esther Camilo Dos Reis, Santiago Caneppa, Pedro Vasconcelos, Paulo Caleb Júnior de Lima Santos
{"title":"Advancing pharmacogenomics research: automated extraction of insights from PubMed using SpaCy NLP framework.","authors":"Esther Camilo Dos Reis, Santiago Caneppa, Pedro Vasconcelos, Paulo Caleb Júnior de Lima Santos","doi":"10.1080/14622416.2024.2429946","DOIUrl":null,"url":null,"abstract":"<p><p>This paper presents a methodology for automatically extracting insights from PubMed articles using a Natural Language Processing (NLP) framework. Our approach, leveraging advanced NLP techniques and Named Entity Recognition (NER), is crucial for advancing pharmacogenomics and other scientific fields that benefit from streamlined access to literature through automated services like RESTful APIs.Building a new NLP model presents several challenges. First, it is essential to have a thorough understanding of the field in order to define relevant entities. Second, the construction of a diverse and consistent set of examples is crucial. Finally, the effective utilization of pre-established models is of paramount importance, as demonstrated in this work.Our model, validated via ten-fold cross-validation, achieved over 70% recall and precision for all entities in the training set. We provide a reproducible pipeline for the scientific community and propose a structured approach for qualitative analysis and clustering of results. This methodology refines literature reviews, optimizes knowledge extraction, and supports broader application across diverse research domains. An online platform could further extend these benefits to researchers, educators, and practitioners.</p>","PeriodicalId":20018,"journal":{"name":"Pharmacogenomics","volume":" ","pages":"1-6"},"PeriodicalIF":1.9000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pharmacogenomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/14622416.2024.2429946","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a methodology for automatically extracting insights from PubMed articles using a Natural Language Processing (NLP) framework. Our approach, leveraging advanced NLP techniques and Named Entity Recognition (NER), is crucial for advancing pharmacogenomics and other scientific fields that benefit from streamlined access to literature through automated services like RESTful APIs.Building a new NLP model presents several challenges. First, it is essential to have a thorough understanding of the field in order to define relevant entities. Second, the construction of a diverse and consistent set of examples is crucial. Finally, the effective utilization of pre-established models is of paramount importance, as demonstrated in this work.Our model, validated via ten-fold cross-validation, achieved over 70% recall and precision for all entities in the training set. We provide a reproducible pipeline for the scientific community and propose a structured approach for qualitative analysis and clustering of results. This methodology refines literature reviews, optimizes knowledge extraction, and supports broader application across diverse research domains. An online platform could further extend these benefits to researchers, educators, and practitioners.
期刊介绍:
Pharmacogenomics (ISSN 1462-2416) is a peer-reviewed journal presenting reviews and reports by the researchers and decision-makers closely involved in this rapidly developing area. Key objectives are to provide the community with an essential resource for keeping abreast of the latest developments in all areas of this exciting field.
Pharmacogenomics is the leading source of commentary and analysis, bringing you the highest quality expert analyses from corporate and academic opinion leaders in the field.