{"title":"具有双知识增强生成预训练语言模型的多模态对话系统","authors":"Xiaolin Chen, Xuemeng Song, Liqiang Jing, Shuo Li, Linmei Hu, Liqiang Nie","doi":"10.1145/3606368","DOIUrl":null,"url":null,"abstract":"Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: 1) overlook the benefit of generative pre-training , and 2) ignore the textual context related knowledge . To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection , dual knowledge-enhanced context learning , and knowledge-enhanced response generation . To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"19 1","pages":"0"},"PeriodicalIF":5.4000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model\",\"authors\":\"Xiaolin Chen, Xuemeng Song, Liqiang Jing, Shuo Li, Linmei Hu, Liqiang Nie\",\"doi\":\"10.1145/3606368\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: 1) overlook the benefit of generative pre-training , and 2) ignore the textual context related knowledge . To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection , dual knowledge-enhanced context learning , and knowledge-enhanced response generation . To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.\",\"PeriodicalId\":50936,\"journal\":{\"name\":\"ACM Transactions on Information Systems\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2023-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3606368\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3606368","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model
Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: 1) overlook the benefit of generative pre-training , and 2) ignore the textual context related knowledge . To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection , dual knowledge-enhanced context learning , and knowledge-enhanced response generation . To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.
期刊介绍:
The ACM Transactions on Information Systems (TOIS) publishes papers on information retrieval (such as search engines, recommender systems) that contain:
new principled information retrieval models or algorithms with sound empirical validation;
observational, experimental and/or theoretical studies yielding new insights into information retrieval or information seeking;
accounts of applications of existing information retrieval techniques that shed light on the strengths and weaknesses of the techniques;
formalization of new information retrieval or information seeking tasks and of methods for evaluating the performance on those tasks;
development of content (text, image, speech, video, etc) analysis methods to support information retrieval and information seeking;
development of computational models of user information preferences and interaction behaviors;
creation and analysis of evaluation methodologies for information retrieval and information seeking; or
surveys of existing work that propose a significant synthesis.
The information retrieval scope of ACM Transactions on Information Systems (TOIS) appeals to industry practitioners for its wealth of creative ideas, and to academic researchers for its descriptions of their colleagues'' work.