Personalization plays a critical role in numerous language tasks and applications, since users with the same requirements may prefer diverse outputs based on their individual interests. This has led to the development of various personalized approaches aimed at adapting large language models (LLMs) to generate customized outputs aligned with user preferences. Some of them involve fine-tuning a unique personalized LLM for each user, which is too expensive for widespread application. Alternative approaches introduce personalization information in a plug-and-play manner by retrieving the user's relevant historical texts as demonstrations. However, this retrieval-based strategy may break the continuity of the user history and fail to capture the user's overall styles and patterns, hence leading to sub-optimal performance. To address these challenges, we propose a novel personalized LLM model, ours{}. It constructs a user-specific embedding for each individual by modeling all her historical contexts through a lightweight plug-in user embedder module. By attaching this embedding to the task input, LLMs can better understand and capture user habits and preferences, thereby producing more personalized outputs without tuning their own parameters. Extensive experiments on various tasks in the language model personalization (LaMP) benchmark demonstrate that the proposed model significantly outperforms existing personalized LLM approaches.
{"title":"LLMs + Persona-Plug = Personalized LLMs","authors":"Jiongnan Liu, Yutao Zhu, Shuting Wang, Xiaochi Wei, Erxue Min, Yu Lu, Shuaiqiang Wang, Dawei Yin, Zhicheng Dou","doi":"arxiv-2409.11901","DOIUrl":"https://doi.org/arxiv-2409.11901","url":null,"abstract":"Personalization plays a critical role in numerous language tasks and\u0000applications, since users with the same requirements may prefer diverse outputs\u0000based on their individual interests. This has led to the development of various\u0000personalized approaches aimed at adapting large language models (LLMs) to\u0000generate customized outputs aligned with user preferences. Some of them involve\u0000fine-tuning a unique personalized LLM for each user, which is too expensive for\u0000widespread application. Alternative approaches introduce personalization\u0000information in a plug-and-play manner by retrieving the user's relevant\u0000historical texts as demonstrations. However, this retrieval-based strategy may\u0000break the continuity of the user history and fail to capture the user's overall\u0000styles and patterns, hence leading to sub-optimal performance. To address these\u0000challenges, we propose a novel personalized LLM model, ours{}. It constructs a\u0000user-specific embedding for each individual by modeling all her historical\u0000contexts through a lightweight plug-in user embedder module. By attaching this\u0000embedding to the task input, LLMs can better understand and capture user habits\u0000and preferences, thereby producing more personalized outputs without tuning\u0000their own parameters. Extensive experiments on various tasks in the language\u0000model personalization (LaMP) benchmark demonstrate that the proposed model\u0000significantly outperforms existing personalized LLM approaches.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wang Xu, Shuo Wang, Weilin Zhao, Xu Han, Yukun Yan, Yudi Zhang, Zhe Tao, Zhiyuan Liu, Wanxiang Che
Large language models (LLMs) have demonstrated the ability to improve human efficiency through conversational interactions. Conventional LLM-powered dialogue systems, operating on a turn-based paradigm, preclude real-time interaction during response generation. To address this limitation, researchers have proposed duplex models. These models can dynamically adapt to user input, facilitating real-time interactive feedback. However, these methods typically require substantial computational resources to acquire the ability. To reduce overhead, this paper presents a new duplex decoding approach that enhances LLMs with duplex ability, requiring minimal additional training. Specifically, our method employs parallel decoding of queries and responses in conversations, effectively implementing a channel-division-multiplexing decoding strategy. Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.
{"title":"Enabling Real-Time Conversations with Minimal Training Costs","authors":"Wang Xu, Shuo Wang, Weilin Zhao, Xu Han, Yukun Yan, Yudi Zhang, Zhe Tao, Zhiyuan Liu, Wanxiang Che","doi":"arxiv-2409.11727","DOIUrl":"https://doi.org/arxiv-2409.11727","url":null,"abstract":"Large language models (LLMs) have demonstrated the ability to improve human\u0000efficiency through conversational interactions. Conventional LLM-powered\u0000dialogue systems, operating on a turn-based paradigm, preclude real-time\u0000interaction during response generation. To address this limitation, researchers\u0000have proposed duplex models. These models can dynamically adapt to user input,\u0000facilitating real-time interactive feedback. However, these methods typically\u0000require substantial computational resources to acquire the ability. To reduce\u0000overhead, this paper presents a new duplex decoding approach that enhances LLMs\u0000with duplex ability, requiring minimal additional training. Specifically, our\u0000method employs parallel decoding of queries and responses in conversations,\u0000effectively implementing a channel-division-multiplexing decoding strategy.\u0000Experimental results indicate that our proposed method significantly enhances\u0000the naturalness and human-likeness of user-AI interactions with minimal\u0000training costs.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haoran Ye, Yuhang Xie, Yuanyi Ren, Hanjun Fang, Xin Zhang, Guojie Song
Human values and their measurement are long-standing interdisciplinary inquiry. Recent advances in AI have sparked renewed interest in this area, with large language models (LLMs) emerging as both tools and subjects of value measurement. This work introduces Generative Psychometrics for Values (GPV), an LLM-based, data-driven value measurement paradigm, theoretically grounded in text-revealed selective perceptions. We begin by fine-tuning an LLM for accurate perception-level value measurement and verifying the capability of LLMs to parse texts into perceptions, forming the core of the GPV pipeline. Applying GPV to human-authored blogs, we demonstrate its stability, validity, and superiority over prior psychological tools. Then, extending GPV to LLM value measurement, we advance the current art with 1) a psychometric methodology that measures LLM values based on their scalable and free-form outputs, enabling context-specific measurement; 2) a comparative analysis of measurement paradigms, indicating response biases of prior methods; and 3) an attempt to bridge LLM values and their safety, revealing the predictive power of different value systems and the impacts of various values on LLM safety. Through interdisciplinary efforts, we aim to leverage AI for next-generation psychometrics and psychometrics for value-aligned AI.
{"title":"Measuring Human and AI Values based on Generative Psychometrics with Large Language Models","authors":"Haoran Ye, Yuhang Xie, Yuanyi Ren, Hanjun Fang, Xin Zhang, Guojie Song","doi":"arxiv-2409.12106","DOIUrl":"https://doi.org/arxiv-2409.12106","url":null,"abstract":"Human values and their measurement are long-standing interdisciplinary\u0000inquiry. Recent advances in AI have sparked renewed interest in this area, with\u0000large language models (LLMs) emerging as both tools and subjects of value\u0000measurement. This work introduces Generative Psychometrics for Values (GPV), an\u0000LLM-based, data-driven value measurement paradigm, theoretically grounded in\u0000text-revealed selective perceptions. We begin by fine-tuning an LLM for\u0000accurate perception-level value measurement and verifying the capability of\u0000LLMs to parse texts into perceptions, forming the core of the GPV pipeline.\u0000Applying GPV to human-authored blogs, we demonstrate its stability, validity,\u0000and superiority over prior psychological tools. Then, extending GPV to LLM\u0000value measurement, we advance the current art with 1) a psychometric\u0000methodology that measures LLM values based on their scalable and free-form\u0000outputs, enabling context-specific measurement; 2) a comparative analysis of\u0000measurement paradigms, indicating response biases of prior methods; and 3) an\u0000attempt to bridge LLM values and their safety, revealing the predictive power\u0000of different value systems and the impacts of various values on LLM safety.\u0000Through interdisciplinary efforts, we aim to leverage AI for next-generation\u0000psychometrics and psychometrics for value-aligned AI.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As Large Language Models (LLMs) advance in natural language processing, there is growing interest in leveraging their capabilities to simplify software interactions. In this paper, we propose a novel system that integrates LLMs for both classifying natural language inputs into corresponding API calls and automating the creation of sample datasets tailored to specific API functions. By classifying natural language commands, our system allows users to invoke complex software functionalities through simple inputs, improving interaction efficiency and lowering the barrier to software utilization. Our dataset generation approach also enables the efficient and systematic evaluation of different LLMs in classifying API calls, offering a practical tool for developers or business owners to assess the suitability of LLMs for customized API management. We conduct experiments on several prominent LLMs using generated sample datasets for various API functions. The results show that GPT-4 achieves a high classification accuracy of 0.996, while LLaMA-3-8B performs much worse at 0.759. These findings highlight the potential of LLMs to transform API management and validate the effectiveness of our system in guiding model testing and selection across diverse applications.
随着大型语言模型(LLMs)在自然语言处理领域的发展,人们对利用其功能简化软件交互的兴趣与日俱增。通过对自然语言命令进行分类,我们的系统允许用户通过简单的输入调用复杂的软件功能,从而提高交互效率并降低软件使用门槛。我们的数据集生成方法还能对不同的 LLM 在 API 调用分类方面进行高效、系统的评估,为开发人员或企业主评估 LLM 是否适合定制化 API 管理提供了实用工具。我们使用为各种 API 功能生成的样本数据集对几种著名的 LLM 进行了实验。结果表明,GPT-4 的分类准确率高达 0.996,而 LLaMA-3-8B 的分类准确率仅为 0.759,表现要差得多。这些发现凸显了 LLM 在改变 API 管理方面的潜力,并验证了我们的系统在不同应用中指导模型测试和选择的有效性。
{"title":"Harnessing LLMs for API Interactions: A Framework for Classification and Synthetic Data Generation","authors":"Chunliang Tao, Xiaojing Fan, Yahe Yang","doi":"arxiv-2409.11703","DOIUrl":"https://doi.org/arxiv-2409.11703","url":null,"abstract":"As Large Language Models (LLMs) advance in natural language processing, there\u0000is growing interest in leveraging their capabilities to simplify software\u0000interactions. In this paper, we propose a novel system that integrates LLMs for\u0000both classifying natural language inputs into corresponding API calls and\u0000automating the creation of sample datasets tailored to specific API functions.\u0000By classifying natural language commands, our system allows users to invoke\u0000complex software functionalities through simple inputs, improving interaction\u0000efficiency and lowering the barrier to software utilization. Our dataset\u0000generation approach also enables the efficient and systematic evaluation of\u0000different LLMs in classifying API calls, offering a practical tool for\u0000developers or business owners to assess the suitability of LLMs for customized\u0000API management. We conduct experiments on several prominent LLMs using\u0000generated sample datasets for various API functions. The results show that\u0000GPT-4 achieves a high classification accuracy of 0.996, while LLaMA-3-8B\u0000performs much worse at 0.759. These findings highlight the potential of LLMs to\u0000transform API management and validate the effectiveness of our system in\u0000guiding model testing and selection across diverse applications.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this report, we present a series of math-specific large language models: Qwen2.5-Math and Qwen2.5-Math-Instruct-1.5B/7B/72B. The core innovation of the Qwen2.5 series lies in integrating the philosophy of self-improvement throughout the entire pipeline, from pre-training and post-training to inference: (1) During the pre-training phase, Qwen2-Math-Instruct is utilized to generate large-scale, high-quality mathematical data. (2) In the post-training phase, we develop a reward model (RM) by conducting massive sampling from Qwen2-Math-Instruct. This RM is then applied to the iterative evolution of data in supervised fine-tuning (SFT). With a stronger SFT model, it's possible to iteratively train and update the RM, which in turn guides the next round of SFT data iteration. On the final SFT model, we employ the ultimate RM for reinforcement learning, resulting in the Qwen2.5-Math-Instruct. (3) Furthermore, during the inference stage, the RM is used to guide sampling, optimizing the model's performance. Qwen2.5-Math-Instruct supports both Chinese and English, and possess advanced mathematical reasoning capabilities, including Chain-of-Thought (CoT) and Tool-Integrated Reasoning (TIR). We evaluate our models on 10 mathematics datasets in both English and Chinese, such as GSM8K, MATH, GaoKao, AMC23, and AIME24, covering a range of difficulties from grade school level to math competition problems.
{"title":"Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement","authors":"An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, Keming Lu, Mingfeng Xue, Runji Lin, Tianyu Liu, Xingzhang Ren, Zhenru Zhang","doi":"arxiv-2409.12122","DOIUrl":"https://doi.org/arxiv-2409.12122","url":null,"abstract":"In this report, we present a series of math-specific large language models:\u0000Qwen2.5-Math and Qwen2.5-Math-Instruct-1.5B/7B/72B. The core innovation of the\u0000Qwen2.5 series lies in integrating the philosophy of self-improvement\u0000throughout the entire pipeline, from pre-training and post-training to\u0000inference: (1) During the pre-training phase, Qwen2-Math-Instruct is utilized\u0000to generate large-scale, high-quality mathematical data. (2) In the\u0000post-training phase, we develop a reward model (RM) by conducting massive\u0000sampling from Qwen2-Math-Instruct. This RM is then applied to the iterative\u0000evolution of data in supervised fine-tuning (SFT). With a stronger SFT model,\u0000it's possible to iteratively train and update the RM, which in turn guides the\u0000next round of SFT data iteration. On the final SFT model, we employ the\u0000ultimate RM for reinforcement learning, resulting in the Qwen2.5-Math-Instruct.\u0000(3) Furthermore, during the inference stage, the RM is used to guide sampling,\u0000optimizing the model's performance. Qwen2.5-Math-Instruct supports both Chinese and English, and possess advanced\u0000mathematical reasoning capabilities, including Chain-of-Thought (CoT) and\u0000Tool-Integrated Reasoning (TIR). We evaluate our models on 10 mathematics\u0000datasets in both English and Chinese, such as GSM8K, MATH, GaoKao, AMC23, and\u0000AIME24, covering a range of difficulties from grade school level to math\u0000competition problems.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large language models (LLMs) are increasingly employed in information-seeking and decision-making tasks. Despite their broad utility, LLMs tend to generate information that conflicts with real-world facts, and their persuasive style can make these inaccuracies appear confident and convincing. As a result, end-users struggle to consistently align the confidence expressed by LLMs with the accuracy of their predictions, often leading to either blind trust in all outputs or a complete disregard for their reliability. In this work, we explore supervised finetuning on uncertainty-augmented predictions as a method to develop models that produce linguistic expressions of uncertainty. Specifically, we measure the calibration of pre-trained models and then fine-tune language models to generate calibrated linguistic expressions of uncertainty. Through experiments on various question-answering datasets, we demonstrate that LLMs are well-calibrated in assessing their predictions, and supervised finetuning based on the model's own confidence leads to well-calibrated expressions of uncertainty, particularly for single-claim answers.
{"title":"Finetuning Language Models to Emit Linguistic Expressions of Uncertainty","authors":"Arslan Chaudhry, Sridhar Thiagarajan, Dilan Gorur","doi":"arxiv-2409.12180","DOIUrl":"https://doi.org/arxiv-2409.12180","url":null,"abstract":"Large language models (LLMs) are increasingly employed in information-seeking\u0000and decision-making tasks. Despite their broad utility, LLMs tend to generate\u0000information that conflicts with real-world facts, and their persuasive style\u0000can make these inaccuracies appear confident and convincing. As a result,\u0000end-users struggle to consistently align the confidence expressed by LLMs with\u0000the accuracy of their predictions, often leading to either blind trust in all\u0000outputs or a complete disregard for their reliability. In this work, we explore\u0000supervised finetuning on uncertainty-augmented predictions as a method to\u0000develop models that produce linguistic expressions of uncertainty.\u0000Specifically, we measure the calibration of pre-trained models and then\u0000fine-tune language models to generate calibrated linguistic expressions of\u0000uncertainty. Through experiments on various question-answering datasets, we\u0000demonstrate that LLMs are well-calibrated in assessing their predictions, and\u0000supervised finetuning based on the model's own confidence leads to\u0000well-calibrated expressions of uncertainty, particularly for single-claim\u0000answers.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianle Gu, Kexin Huang, Ruilin Luo, Yuanqi Yao, Yujiu Yang, Yan Teng, Yingchun Wang
Large Language Models (LLMs) can memorize sensitive information, raising concerns about potential misuse. LLM Unlearning, a post-hoc approach to remove this information from trained LLMs, offers a promising solution to mitigate these risks. However, previous practices face three key challenges: 1. Utility: successful unlearning often causes catastrophic collapse on unrelated tasks. 2. Efficiency: many methods either involve adding similarly sized models, which slows down unlearning or inference, or require retain data that are difficult to obtain. 3. Robustness: even effective methods may still leak data via extraction techniques. To address these challenges, we propose MEOW, a simple yet effective gradient descent-based unlearning method. Specifically, we use an offline LLM to generate a set of inverted facts. Then, we design a new metric, MEMO, to quantify memorization in LLMs. Finally, based on the signals provided by MEMO, we select the most appropriate set of inverted facts and finetune the model based on them. We evaluate MEOW on the commonly used unlearn benchmark, ToFU, with Llama2-7B-Chat and Phi-1.5B, and test it on both NLU and NLG tasks. Results demonstrate significant improvement of MEOW in forget quality without substantial loss in model utility. Meanwhile, MEOW does not exhibit significant degradation in NLU or NLG capabilities, and there is even a slight improvement in NLU performance.
{"title":"MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts","authors":"Tianle Gu, Kexin Huang, Ruilin Luo, Yuanqi Yao, Yujiu Yang, Yan Teng, Yingchun Wang","doi":"arxiv-2409.11844","DOIUrl":"https://doi.org/arxiv-2409.11844","url":null,"abstract":"Large Language Models (LLMs) can memorize sensitive information, raising\u0000concerns about potential misuse. LLM Unlearning, a post-hoc approach to remove\u0000this information from trained LLMs, offers a promising solution to mitigate\u0000these risks. However, previous practices face three key challenges: 1. Utility:\u0000successful unlearning often causes catastrophic collapse on unrelated tasks. 2.\u0000Efficiency: many methods either involve adding similarly sized models, which\u0000slows down unlearning or inference, or require retain data that are difficult\u0000to obtain. 3. Robustness: even effective methods may still leak data via\u0000extraction techniques. To address these challenges, we propose MEOW, a simple\u0000yet effective gradient descent-based unlearning method. Specifically, we use an\u0000offline LLM to generate a set of inverted facts. Then, we design a new metric,\u0000MEMO, to quantify memorization in LLMs. Finally, based on the signals provided\u0000by MEMO, we select the most appropriate set of inverted facts and finetune the\u0000model based on them. We evaluate MEOW on the commonly used unlearn benchmark,\u0000ToFU, with Llama2-7B-Chat and Phi-1.5B, and test it on both NLU and NLG tasks.\u0000Results demonstrate significant improvement of MEOW in forget quality without\u0000substantial loss in model utility. Meanwhile, MEOW does not exhibit significant\u0000degradation in NLU or NLG capabilities, and there is even a slight improvement\u0000in NLU performance.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji
Large Language Model can reasonably understand and generate human expressions but may lack of thorough thinking and reasoning mechanisms. Recently there have been several studies which enhance the thinking ability of language models but most of them are not data-driven or training-based. In this paper, we are motivated by the cognitive mechanism in the natural world, and design a novel model architecture called TaS which allows it to first consider the thoughts and then express the response based upon the query. We design several pipelines to annotate or generate the thought contents from prompt-response samples, then add language heads in a middle layer which behaves as the thinking layer. We train the language model by the thoughts-augmented data and successfully let the thinking layer automatically generate reasonable thoughts and finally output more reasonable responses. Both qualitative examples and quantitative results validate the effectiveness and performance of TaS. Our code is available at https://anonymous.4open.science/r/TadE.
大型语言模型可以合理地理解和生成人类的表达方式,但可能缺乏全面的思考和推理机制。近来有一些增强语言模型思维能力的研究,但大多不是数据驱动或基于训练的。在本文中,我们从自然世界中的认知机制出发,设计了一种名为 TaS 的新型模型架构,使其能够首先考虑思维,然后根据查询表达响应。我们设计了多个管道来注释或生成提示-响应样本中的思维内容,然后在中间层添加语言头,该层就像思维层。通过思维增强数据对语言模型进行训练,成功地让思维层自动生成合理的思维,并最终输出更合理的回复。定性实例和定量结果都验证了 TaS 的有效性和性能。我们的代码见 https://anonymous.4open.science/r/TadE。
{"title":"Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking","authors":"Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji","doi":"arxiv-2409.12059","DOIUrl":"https://doi.org/arxiv-2409.12059","url":null,"abstract":"Large Language Model can reasonably understand and generate human expressions\u0000but may lack of thorough thinking and reasoning mechanisms. Recently there have\u0000been several studies which enhance the thinking ability of language models but\u0000most of them are not data-driven or training-based. In this paper, we are\u0000motivated by the cognitive mechanism in the natural world, and design a novel\u0000model architecture called TaS which allows it to first consider the thoughts\u0000and then express the response based upon the query. We design several pipelines\u0000to annotate or generate the thought contents from prompt-response samples, then\u0000add language heads in a middle layer which behaves as the thinking layer. We\u0000train the language model by the thoughts-augmented data and successfully let\u0000the thinking layer automatically generate reasonable thoughts and finally\u0000output more reasonable responses. Both qualitative examples and quantitative\u0000results validate the effectiveness and performance of TaS. Our code is\u0000available at https://anonymous.4open.science/r/TadE.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Extract-then-Abstract is a naturally coherent paradigm to conduct abstractive summarization with the help of salient information identified by the extractive model. Previous works that adopt this paradigm train the extractor and abstractor separately and introduce extra parameters to highlight the extracted salients to the abstractor, which results in error accumulation and additional training costs. In this paper, we first introduce a parameter-free highlight method into the encoder-decoder framework: replacing the encoder attention mask with a saliency mask in the cross-attention module to force the decoder to focus only on salient parts of the input. A preliminary analysis compares different highlight methods, demonstrating the effectiveness of our saliency mask. We further propose the novel extract-and-abstract paradigm, ExtAbs, which jointly and seamlessly performs Extractive and Abstractive summarization tasks within single encoder-decoder model to reduce error accumulation. In ExtAbs, the vanilla encoder is augmented to extract salients, and the vanilla decoder is modified with the proposed saliency mask to generate summaries. Built upon BART and PEGASUS, experiments on three datasets show that ExtAbs can achieve superior performance than baselines on the extractive task and performs comparable, or even better than the vanilla models on the abstractive task.
{"title":"Extract-and-Abstract: Unifying Extractive and Abstractive Summarization within Single Encoder-Decoder Framework","authors":"Yuping Wu, Hao Li, Hongbo Zhu, Goran Nenadic, Xiao-Jun Zeng","doi":"arxiv-2409.11827","DOIUrl":"https://doi.org/arxiv-2409.11827","url":null,"abstract":"Extract-then-Abstract is a naturally coherent paradigm to conduct abstractive\u0000summarization with the help of salient information identified by the extractive\u0000model. Previous works that adopt this paradigm train the extractor and\u0000abstractor separately and introduce extra parameters to highlight the extracted\u0000salients to the abstractor, which results in error accumulation and additional\u0000training costs. In this paper, we first introduce a parameter-free highlight\u0000method into the encoder-decoder framework: replacing the encoder attention mask\u0000with a saliency mask in the cross-attention module to force the decoder to\u0000focus only on salient parts of the input. A preliminary analysis compares\u0000different highlight methods, demonstrating the effectiveness of our saliency\u0000mask. We further propose the novel extract-and-abstract paradigm, ExtAbs, which\u0000jointly and seamlessly performs Extractive and Abstractive summarization tasks\u0000within single encoder-decoder model to reduce error accumulation. In ExtAbs,\u0000the vanilla encoder is augmented to extract salients, and the vanilla decoder\u0000is modified with the proposed saliency mask to generate summaries. Built upon\u0000BART and PEGASUS, experiments on three datasets show that ExtAbs can achieve\u0000superior performance than baselines on the extractive task and performs\u0000comparable, or even better than the vanilla models on the abstractive task.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The task of determining whether two texts are paraphrases has long been a challenge in NLP. However, the prevailing notion of paraphrase is often quite simplistic, offering only a limited view of the vast spectrum of paraphrase phenomena. Indeed, we find that evaluating models in a paraphrase dataset can leave uncertainty about their true semantic understanding. To alleviate this, we release paraphrasus, a benchmark designed for multi-dimensional assessment of paraphrase detection models and finer model selection. We find that paraphrase detection models under a fine-grained evaluation lens exhibit trade-offs that cannot be captured through a single classification dataset.
{"title":"PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models","authors":"Andrianos Michail, Simon Clematide, Juri Opitz","doi":"arxiv-2409.12060","DOIUrl":"https://doi.org/arxiv-2409.12060","url":null,"abstract":"The task of determining whether two texts are paraphrases has long been a\u0000challenge in NLP. However, the prevailing notion of paraphrase is often quite\u0000simplistic, offering only a limited view of the vast spectrum of paraphrase\u0000phenomena. Indeed, we find that evaluating models in a paraphrase dataset can\u0000leave uncertainty about their true semantic understanding. To alleviate this,\u0000we release paraphrasus, a benchmark designed for multi-dimensional assessment\u0000of paraphrase detection models and finer model selection. We find that\u0000paraphrase detection models under a fine-grained evaluation lens exhibit\u0000trade-offs that cannot be captured through a single classification dataset.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"118 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}