同时思考和说话的大语言模型的双层训练和解码

Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji
{"title":"同时思考和说话的大语言模型的双层训练和解码","authors":"Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji","doi":"arxiv-2409.12059","DOIUrl":null,"url":null,"abstract":"Large Language Model can reasonably understand and generate human expressions\nbut may lack of thorough thinking and reasoning mechanisms. Recently there have\nbeen several studies which enhance the thinking ability of language models but\nmost of them are not data-driven or training-based. In this paper, we are\nmotivated by the cognitive mechanism in the natural world, and design a novel\nmodel architecture called TaS which allows it to first consider the thoughts\nand then express the response based upon the query. We design several pipelines\nto annotate or generate the thought contents from prompt-response samples, then\nadd language heads in a middle layer which behaves as the thinking layer. We\ntrain the language model by the thoughts-augmented data and successfully let\nthe thinking layer automatically generate reasonable thoughts and finally\noutput more reasonable responses. Both qualitative examples and quantitative\nresults validate the effectiveness and performance of TaS. Our code is\navailable at https://anonymous.4open.science/r/TadE.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking\",\"authors\":\"Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji\",\"doi\":\"arxiv-2409.12059\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large Language Model can reasonably understand and generate human expressions\\nbut may lack of thorough thinking and reasoning mechanisms. Recently there have\\nbeen several studies which enhance the thinking ability of language models but\\nmost of them are not data-driven or training-based. In this paper, we are\\nmotivated by the cognitive mechanism in the natural world, and design a novel\\nmodel architecture called TaS which allows it to first consider the thoughts\\nand then express the response based upon the query. We design several pipelines\\nto annotate or generate the thought contents from prompt-response samples, then\\nadd language heads in a middle layer which behaves as the thinking layer. We\\ntrain the language model by the thoughts-augmented data and successfully let\\nthe thinking layer automatically generate reasonable thoughts and finally\\noutput more reasonable responses. Both qualitative examples and quantitative\\nresults validate the effectiveness and performance of TaS. Our code is\\navailable at https://anonymous.4open.science/r/TadE.\",\"PeriodicalId\":501030,\"journal\":{\"name\":\"arXiv - CS - Computation and Language\",\"volume\":\"27 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computation and Language\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.12059\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

大型语言模型可以合理地理解和生成人类的表达方式,但可能缺乏全面的思考和推理机制。近来有一些增强语言模型思维能力的研究,但大多不是数据驱动或基于训练的。在本文中,我们从自然世界中的认知机制出发,设计了一种名为 TaS 的新型模型架构,使其能够首先考虑思维,然后根据查询表达响应。我们设计了多个管道来注释或生成提示-响应样本中的思维内容,然后在中间层添加语言头,该层就像思维层。通过思维增强数据对语言模型进行训练,成功地让思维层自动生成合理的思维,并最终输出更合理的回复。定性实例和定量结果都验证了 TaS 的有效性和性能。我们的代码见 https://anonymous.4open.science/r/TadE。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking
Large Language Model can reasonably understand and generate human expressions but may lack of thorough thinking and reasoning mechanisms. Recently there have been several studies which enhance the thinking ability of language models but most of them are not data-driven or training-based. In this paper, we are motivated by the cognitive mechanism in the natural world, and design a novel model architecture called TaS which allows it to first consider the thoughts and then express the response based upon the query. We design several pipelines to annotate or generate the thought contents from prompt-response samples, then add language heads in a middle layer which behaves as the thinking layer. We train the language model by the thoughts-augmented data and successfully let the thinking layer automatically generate reasonable thoughts and finally output more reasonable responses. Both qualitative examples and quantitative results validate the effectiveness and performance of TaS. Our code is available at https://anonymous.4open.science/r/TadE.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
LLMs + Persona-Plug = Personalized LLMs MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts Extract-and-Abstract: Unifying Extractive and Abstractive Summarization within Single Encoder-Decoder Framework Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources Human-like Affective Cognition in Foundation Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1