基于证据生成的以目标为导向、以文件为基础的对话

IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Data & Knowledge Engineering Pub Date : 2024-11-22 DOI:10.1016/j.datak.2024.102378
Yong Song , Hongjie Fan , Junfei Liu , Yunxin Liu , Xiaozhou Ye , Ye Ouyang
{"title":"基于证据生成的以目标为导向、以文件为基础的对话","authors":"Yong Song ,&nbsp;Hongjie Fan ,&nbsp;Junfei Liu ,&nbsp;Yunxin Liu ,&nbsp;Xiaozhou Ye ,&nbsp;Ye Ouyang","doi":"10.1016/j.datak.2024.102378","DOIUrl":null,"url":null,"abstract":"<div><div>Goal-oriented Document-grounded Dialogue (DGD) is used for retrieving specific domain documents, assisting users in document content retrieval, question answering, and document management. Existing methods typically employ keyword extraction and vector space models to understand the content of documents, identify the intent of questions, and generate answers based on the capabilities of generation models. However, challenges remain in semantic understanding, long text processing, and context understanding. The emergence of Large Language Models (LLMs) has brought new capabilities in context learning and step-by-step reasoning. These models, combined with Retrieval Augmented Generation(RAG) methods, have made significant breakthroughs in text comprehension, intent detection, language organization, offering exciting prospects for DGD research. However, the “hallucination” issue arising from LLMs requires complementary methods to ensure the credibility of their outputs. In this paper we propose a goal-oriented document-grounded dialogue approach based on evidence generation using LLMs. It designs and implements methods for document content retrieval &amp; reranking, fine-tuning and inference, and evidence generation. Through experiments, the method of combining LLMs with vector space model, or with key information matching technique is used as a comparison, the accuracy of the proposed method is improved by 21.91% and 12.81%, while the comprehensiveness is increased by 10.89% and 69.83%, coherence is enhanced by 38.98% and 53.27%, and completeness is boosted by 16.13% and 36.97%, respectively, on average. Additional, ablation analysis conducted reveals that the evidence generation method also contributes significantly to the comprehensiveness and completeness.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"155 ","pages":"Article 102378"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A goal-oriented document-grounded dialogue based on evidence generation\",\"authors\":\"Yong Song ,&nbsp;Hongjie Fan ,&nbsp;Junfei Liu ,&nbsp;Yunxin Liu ,&nbsp;Xiaozhou Ye ,&nbsp;Ye Ouyang\",\"doi\":\"10.1016/j.datak.2024.102378\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Goal-oriented Document-grounded Dialogue (DGD) is used for retrieving specific domain documents, assisting users in document content retrieval, question answering, and document management. Existing methods typically employ keyword extraction and vector space models to understand the content of documents, identify the intent of questions, and generate answers based on the capabilities of generation models. However, challenges remain in semantic understanding, long text processing, and context understanding. The emergence of Large Language Models (LLMs) has brought new capabilities in context learning and step-by-step reasoning. These models, combined with Retrieval Augmented Generation(RAG) methods, have made significant breakthroughs in text comprehension, intent detection, language organization, offering exciting prospects for DGD research. However, the “hallucination” issue arising from LLMs requires complementary methods to ensure the credibility of their outputs. In this paper we propose a goal-oriented document-grounded dialogue approach based on evidence generation using LLMs. It designs and implements methods for document content retrieval &amp; reranking, fine-tuning and inference, and evidence generation. Through experiments, the method of combining LLMs with vector space model, or with key information matching technique is used as a comparison, the accuracy of the proposed method is improved by 21.91% and 12.81%, while the comprehensiveness is increased by 10.89% and 69.83%, coherence is enhanced by 38.98% and 53.27%, and completeness is boosted by 16.13% and 36.97%, respectively, on average. Additional, ablation analysis conducted reveals that the evidence generation method also contributes significantly to the comprehensiveness and completeness.</div></div>\",\"PeriodicalId\":55184,\"journal\":{\"name\":\"Data & Knowledge Engineering\",\"volume\":\"155 \",\"pages\":\"Article 102378\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data & Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169023X24001022\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X24001022","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

面向目标的文档基础对话(DGD)用于检索特定领域的文档,协助用户进行文档内容检索、问题解答和文档管理。现有方法通常采用关键词提取和向量空间模型来理解文档内容、识别问题的意图,并根据生成模型的能力生成答案。然而,在语义理解、长文本处理和上下文理解方面仍然存在挑战。大型语言模型(LLM)的出现为上下文学习和逐步推理带来了新的能力。这些模型与检索增强生成(RAG)方法相结合,在文本理解、意图检测和语言组织方面取得了重大突破,为 DGD 研究提供了令人振奋的前景。然而,LLMs 产生的 "幻觉 "问题需要补充方法来确保其输出结果的可信度。在本文中,我们提出了一种基于使用 LLMs 生成证据的目标导向文档基础对话方法。它设计并实现了文档内容检索&;重排、微调和推理以及证据生成的方法。通过实验,将 LLMs 与向量空间模型或与关键信息匹配技术相结合的方法进行比较,发现所提方法的准确率分别提高了 21.91% 和 12.81%,全面性分别提高了 10.89% 和 69.83%,一致性分别提高了 38.98% 和 53.27%,完整性平均提高了 16.13% 和 36.97%。此外,进行的消融分析表明,证据生成方法对全面性和完整性也有显著的促进作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A goal-oriented document-grounded dialogue based on evidence generation
Goal-oriented Document-grounded Dialogue (DGD) is used for retrieving specific domain documents, assisting users in document content retrieval, question answering, and document management. Existing methods typically employ keyword extraction and vector space models to understand the content of documents, identify the intent of questions, and generate answers based on the capabilities of generation models. However, challenges remain in semantic understanding, long text processing, and context understanding. The emergence of Large Language Models (LLMs) has brought new capabilities in context learning and step-by-step reasoning. These models, combined with Retrieval Augmented Generation(RAG) methods, have made significant breakthroughs in text comprehension, intent detection, language organization, offering exciting prospects for DGD research. However, the “hallucination” issue arising from LLMs requires complementary methods to ensure the credibility of their outputs. In this paper we propose a goal-oriented document-grounded dialogue approach based on evidence generation using LLMs. It designs and implements methods for document content retrieval & reranking, fine-tuning and inference, and evidence generation. Through experiments, the method of combining LLMs with vector space model, or with key information matching technique is used as a comparison, the accuracy of the proposed method is improved by 21.91% and 12.81%, while the comprehensiveness is increased by 10.89% and 69.83%, coherence is enhanced by 38.98% and 53.27%, and completeness is boosted by 16.13% and 36.97%, respectively, on average. Additional, ablation analysis conducted reveals that the evidence generation method also contributes significantly to the comprehensiveness and completeness.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Data & Knowledge Engineering
Data & Knowledge Engineering 工程技术-计算机:人工智能
CiteScore
5.00
自引率
0.00%
发文量
66
审稿时长
6 months
期刊介绍: Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.
期刊最新文献
White box specification of intervention policies for prescriptive process monitoring A goal-oriented document-grounded dialogue based on evidence generation Data-aware process models: From soundness checking to repair Context normalization: A new approach for the stability and improvement of neural network performance An assessment taxonomy for self-adaptation business process solutions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1