Xiao Liang , Di Wang , Haodi Zhong , Quan Wang , Ronghan Li , Rui Jia , Bo Wan
{"title":"候选探究式上下文学习(Candidate-Heuristic In-Context Learning):利用 LLM 增强医学视觉问题解答的新框架","authors":"Xiao Liang , Di Wang , Haodi Zhong , Quan Wang , Ronghan Li , Rui Jia , Bo Wan","doi":"10.1016/j.ipm.2024.103805","DOIUrl":null,"url":null,"abstract":"<div><p>Medical Visual Question Answering (MedVQA) is designed to answer natural language questions related to medical images. Existing methods largely adopting the cross-modal pre-training and fine-tuning paradigm, face limitations in accuracy due to data scarcity and insufficient incorporation of extensive medical knowledge. Drawing inspiration from the Knowledge-Based Visual Question Answering (KB-VQA) domain, which leverages Large Language Models (LLMs) and external knowledge bases, we introduce the <strong>C</strong>andidate-<strong>H</strong>euristic <strong>I</strong>n-<strong>C</strong>ontext <strong>L</strong>earning (CH-ICL) framework, a novel approach that leverages LLMs augmented with external knowledge to directly enhance existing MedVQA models. Specifically, we collect a pathology terminology dictionary from a public digital pathology library as an external knowledge base and use it to train a knowledge scope discriminator, which helps identify the knowledge scope required to answer a question. Then, we employ existing MedVQA models to provide reliable answer candidates along with their confidence scores. Finally, the knowledge scope and candidates, combined with retrieved in-context exemplars, are aggregated into prompts for heuristically guiding LLMs in answer generation. Experimental results on the PathVQA, VQA-RAD, and SLAKE public benchmarks show state-of-the-art performance, with improvements of 1.91%, 1.88%, and 2.17% respectively over the baseline. Code and dataset are available at <span>https://github.com/ecoxial2007/CH-ICL</span><svg><path></path></svg>.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Candidate-Heuristic In-Context Learning: A new framework for enhancing medical visual question answering with LLMs\",\"authors\":\"Xiao Liang , Di Wang , Haodi Zhong , Quan Wang , Ronghan Li , Rui Jia , Bo Wan\",\"doi\":\"10.1016/j.ipm.2024.103805\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Medical Visual Question Answering (MedVQA) is designed to answer natural language questions related to medical images. Existing methods largely adopting the cross-modal pre-training and fine-tuning paradigm, face limitations in accuracy due to data scarcity and insufficient incorporation of extensive medical knowledge. Drawing inspiration from the Knowledge-Based Visual Question Answering (KB-VQA) domain, which leverages Large Language Models (LLMs) and external knowledge bases, we introduce the <strong>C</strong>andidate-<strong>H</strong>euristic <strong>I</strong>n-<strong>C</strong>ontext <strong>L</strong>earning (CH-ICL) framework, a novel approach that leverages LLMs augmented with external knowledge to directly enhance existing MedVQA models. Specifically, we collect a pathology terminology dictionary from a public digital pathology library as an external knowledge base and use it to train a knowledge scope discriminator, which helps identify the knowledge scope required to answer a question. Then, we employ existing MedVQA models to provide reliable answer candidates along with their confidence scores. Finally, the knowledge scope and candidates, combined with retrieved in-context exemplars, are aggregated into prompts for heuristically guiding LLMs in answer generation. Experimental results on the PathVQA, VQA-RAD, and SLAKE public benchmarks show state-of-the-art performance, with improvements of 1.91%, 1.88%, and 2.17% respectively over the baseline. Code and dataset are available at <span>https://github.com/ecoxial2007/CH-ICL</span><svg><path></path></svg>.</p></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2024-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S030645732400164X\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S030645732400164X","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Candidate-Heuristic In-Context Learning: A new framework for enhancing medical visual question answering with LLMs
Medical Visual Question Answering (MedVQA) is designed to answer natural language questions related to medical images. Existing methods largely adopting the cross-modal pre-training and fine-tuning paradigm, face limitations in accuracy due to data scarcity and insufficient incorporation of extensive medical knowledge. Drawing inspiration from the Knowledge-Based Visual Question Answering (KB-VQA) domain, which leverages Large Language Models (LLMs) and external knowledge bases, we introduce the Candidate-Heuristic In-Context Learning (CH-ICL) framework, a novel approach that leverages LLMs augmented with external knowledge to directly enhance existing MedVQA models. Specifically, we collect a pathology terminology dictionary from a public digital pathology library as an external knowledge base and use it to train a knowledge scope discriminator, which helps identify the knowledge scope required to answer a question. Then, we employ existing MedVQA models to provide reliable answer candidates along with their confidence scores. Finally, the knowledge scope and candidates, combined with retrieved in-context exemplars, are aggregated into prompts for heuristically guiding LLMs in answer generation. Experimental results on the PathVQA, VQA-RAD, and SLAKE public benchmarks show state-of-the-art performance, with improvements of 1.91%, 1.88%, and 2.17% respectively over the baseline. Code and dataset are available at https://github.com/ecoxial2007/CH-ICL.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.