Distilling implicit multimodal knowledge into large language models for zero-resource dialogue generation

IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Information Fusion Pub Date : 2025-06-01 Epub Date: 2025-02-04 DOI:10.1016/j.inffus.2025.102985
Bo Zhang , Hui Ma , Jian Ding , Jian Wang , Bo Xu , Hongfei Lin
{"title":"Distilling implicit multimodal knowledge into large language models for zero-resource dialogue generation","authors":"Bo Zhang ,&nbsp;Hui Ma ,&nbsp;Jian Ding ,&nbsp;Jian Wang ,&nbsp;Bo Xu ,&nbsp;Hongfei Lin","doi":"10.1016/j.inffus.2025.102985","DOIUrl":null,"url":null,"abstract":"<div><div>Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities. However, the effective incorporation of such knowledge in zero-resource scenarios remains a substantial challenge due to the scarcity of diverse, high-quality dialogue datasets. To address this, we propose the Visual Implicit Knowledge Distillation Framework (VIKDF), an innovative approach aimed at enhancing LLMs for enriched dialogue generation in zero-resource contexts by leveraging implicit multimodal knowledge. VIKDF comprises two main stages: knowledge distillation, using an Implicit Query Transformer to extract and encode visual implicit knowledge from image–text pairs into knowledge vectors; and knowledge integration, employing a novel Bidirectional Variational Information Fusion technique to seamlessly integrate these distilled vectors into LLMs. This enables the LLMs to generate dialogues that are not only coherent and engaging but also exhibit a deep understanding of the context through implicit multimodal cues, effectively overcoming the limitations of zero-resource scenarios. Our extensive experimentation across two dialogue datasets shows that VIKDF outperforms existing state-of-the-art models in generating high-quality dialogues. The code is available at <span><span>https://github.com/zhangbo-nlp/VIKDF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102985"},"PeriodicalIF":15.5000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525000582","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/4 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities. However, the effective incorporation of such knowledge in zero-resource scenarios remains a substantial challenge due to the scarcity of diverse, high-quality dialogue datasets. To address this, we propose the Visual Implicit Knowledge Distillation Framework (VIKDF), an innovative approach aimed at enhancing LLMs for enriched dialogue generation in zero-resource contexts by leveraging implicit multimodal knowledge. VIKDF comprises two main stages: knowledge distillation, using an Implicit Query Transformer to extract and encode visual implicit knowledge from image–text pairs into knowledge vectors; and knowledge integration, employing a novel Bidirectional Variational Information Fusion technique to seamlessly integrate these distilled vectors into LLMs. This enables the LLMs to generate dialogues that are not only coherent and engaging but also exhibit a deep understanding of the context through implicit multimodal cues, effectively overcoming the limitations of zero-resource scenarios. Our extensive experimentation across two dialogue datasets shows that VIKDF outperforms existing state-of-the-art models in generating high-quality dialogues. The code is available at https://github.com/zhangbo-nlp/VIKDF.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
将隐式多模态知识提炼成大型语言模型,用于零资源对话生成
将多模态知识集成到大型语言模型(llm)中代表了对话生成能力的重大进步。然而,由于缺乏多样化、高质量的对话数据集,在零资源场景中有效地整合这些知识仍然是一个重大挑战。为了解决这个问题,我们提出了视觉隐性知识蒸馏框架(VIKDF),这是一种创新的方法,旨在通过利用隐性多模态知识来增强法学硕士在零资源环境下丰富对话生成的能力。VIKDF包括两个主要阶段:知识蒸馏,使用隐式查询转换器从图像-文本对中提取视觉隐式知识并将其编码为知识向量;采用一种新颖的双向变分信息融合技术,将这些提取的向量无缝集成到llm中。这使得llm能够生成不仅连贯和引人入胜的对话,而且通过隐含的多模态线索对上下文有深刻的理解,有效地克服了零资源场景的局限性。我们在两个对话数据集上的广泛实验表明,VIKDF在生成高质量对话方面优于现有的最先进的模型。代码可在https://github.com/zhangbo-nlp/VIKDF上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
期刊最新文献
GTEE: A global timestamp encoding enhanced method for robust time series imputation in complex missing scenarios Resilient distributed Kalman filtering for cyber-physical systems via mean subsequence reduction Learning across modalities: a systematic survey of multimodal models for financial analysis MuBe4D: A mutual benefit framework for generalizable motion segmentation and geometry-first 4D reconstruction Decoding multilingual imagined speech from scalp EEG via dynamic differentiable graph hierarchical fusion network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1