B-119 Comparative performance of GPT-4 and CNV-ETLAI in extracting copy number variations from medical journals: Bridging the gap between large language models and specialized NLP tools in genomic data interpretation
{"title":"B-119 Comparative performance of GPT-4 and CNV-ETLAI in extracting copy number variations from medical journals: Bridging the gap between large language models and specialized NLP tools in genomic data interpretation","authors":"J Choi","doi":"10.1093/clinchem/hvae106.480","DOIUrl":null,"url":null,"abstract":"Background Copy Number Variations (CNVs) are critical genetic markers in diversity and disease, yet their accurate extraction from medical literature remains challenging due to the complexity of genetic data. While specialized NLP models like CNV-ETLAI have been developed for this task, the advent of Large Language Models (LLMs) such as GPT-4 presents a potential alternative with broader applicability. This study evaluates the efficacy of GPT-4 against CNV-ETLAI in extracting CNVs from medical journal articles, aiming to enhance genetic research and clinical decision-making. Methods We configured GPT-4 to process and interpret medical journal PDFs, designing custom prompts for CNV information extraction. The performance of GPT-4 was benchmarked against CNV-ETLAI using a dataset of 146 true positive CNVs extracted from 23 journal articles. Performance metrics focused on accuracy in extracting CNVs from both text and tables, recognizing the importance of structured data interpretation in genomic analysis. Results CNV-ETLAI demonstrated superior accuracy, achieving a 98% success rate in CNV extraction, compared to GPT-4’s 49%. Specifically, CNV-ETLAI outperformed GPT-4 in table extraction accuracy (99% vs. 41.2%) and context extraction accuracy (96% vs. 63.2%). Despite GPT-4's lower performance, its capacity for improvement and adaptability was noted, indicating potential future applicability in medical data extraction. Conclusions The study highlights CNV-ETLAI's current superiority in extracting CNVs from medical texts, particularly in interpreting structured data like tables. However, the adaptability and potential for growth in LLMs like GPT-4 suggest they could soon become valuable tools for medical data extraction, offering a more versatile and powerful solution across a broader range of applications. The promise of LLMs, despite their current limitations, underscores the need for continued research and development in AI technologies for genomic data interpretation.","PeriodicalId":10690,"journal":{"name":"Clinical chemistry","volume":null,"pages":null},"PeriodicalIF":7.1000,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical chemistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/clinchem/hvae106.480","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL LABORATORY TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background Copy Number Variations (CNVs) are critical genetic markers in diversity and disease, yet their accurate extraction from medical literature remains challenging due to the complexity of genetic data. While specialized NLP models like CNV-ETLAI have been developed for this task, the advent of Large Language Models (LLMs) such as GPT-4 presents a potential alternative with broader applicability. This study evaluates the efficacy of GPT-4 against CNV-ETLAI in extracting CNVs from medical journal articles, aiming to enhance genetic research and clinical decision-making. Methods We configured GPT-4 to process and interpret medical journal PDFs, designing custom prompts for CNV information extraction. The performance of GPT-4 was benchmarked against CNV-ETLAI using a dataset of 146 true positive CNVs extracted from 23 journal articles. Performance metrics focused on accuracy in extracting CNVs from both text and tables, recognizing the importance of structured data interpretation in genomic analysis. Results CNV-ETLAI demonstrated superior accuracy, achieving a 98% success rate in CNV extraction, compared to GPT-4’s 49%. Specifically, CNV-ETLAI outperformed GPT-4 in table extraction accuracy (99% vs. 41.2%) and context extraction accuracy (96% vs. 63.2%). Despite GPT-4's lower performance, its capacity for improvement and adaptability was noted, indicating potential future applicability in medical data extraction. Conclusions The study highlights CNV-ETLAI's current superiority in extracting CNVs from medical texts, particularly in interpreting structured data like tables. However, the adaptability and potential for growth in LLMs like GPT-4 suggest they could soon become valuable tools for medical data extraction, offering a more versatile and powerful solution across a broader range of applications. The promise of LLMs, despite their current limitations, underscores the need for continued research and development in AI technologies for genomic data interpretation.
期刊介绍:
Clinical Chemistry is a peer-reviewed scientific journal that is the premier publication for the science and practice of clinical laboratory medicine. It was established in 1955 and is associated with the Association for Diagnostics & Laboratory Medicine (ADLM).
The journal focuses on laboratory diagnosis and management of patients, and has expanded to include other clinical laboratory disciplines such as genomics, hematology, microbiology, and toxicology. It also publishes articles relevant to clinical specialties including cardiology, endocrinology, gastroenterology, genetics, immunology, infectious diseases, maternal-fetal medicine, neurology, nutrition, oncology, and pediatrics.
In addition to original research, editorials, and reviews, Clinical Chemistry features recurring sections such as clinical case studies, perspectives, podcasts, and Q&A articles. It has the highest impact factor among journals of clinical chemistry, laboratory medicine, pathology, analytical chemistry, transfusion medicine, and clinical microbiology.
The journal is indexed in databases such as MEDLINE and Web of Science.