在语言、语音和视觉任务中通过人工反馈调整偏好:一项调查

Genta Indra Winata, Hanyang Zhao, Anirban Das, Wenpin Tang, David D. Yao, Shi-Xiong Zhang, Sambit Sahu
{"title":"在语言、语音和视觉任务中通过人工反馈调整偏好:一项调查","authors":"Genta Indra Winata, Hanyang Zhao, Anirban Das, Wenpin Tang, David D. Yao, Shi-Xiong Zhang, Sambit Sahu","doi":"arxiv-2409.11564","DOIUrl":null,"url":null,"abstract":"Preference tuning is a crucial process for aligning deep generative models\nwith human preferences. This survey offers a thorough overview of recent\nadvancements in preference tuning and the integration of human feedback. The\npaper is organized into three main sections: 1) introduction and preliminaries:\nan introduction to reinforcement learning frameworks, preference tuning tasks,\nmodels, and datasets across various modalities: language, speech, and vision,\nas well as different policy approaches, 2) in-depth examination of each\npreference tuning approach: a detailed analysis of the methods used in\npreference tuning, and 3) applications, discussion, and future directions: an\nexploration of the applications of preference tuning in downstream tasks,\nincluding evaluation methods for different modalities, and an outlook on future\nresearch directions. Our objective is to present the latest methodologies in\npreference tuning and model alignment, enhancing the understanding of this\nfield for researchers and practitioners. We hope to encourage further\nengagement and innovation in this area.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey\",\"authors\":\"Genta Indra Winata, Hanyang Zhao, Anirban Das, Wenpin Tang, David D. Yao, Shi-Xiong Zhang, Sambit Sahu\",\"doi\":\"arxiv-2409.11564\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Preference tuning is a crucial process for aligning deep generative models\\nwith human preferences. This survey offers a thorough overview of recent\\nadvancements in preference tuning and the integration of human feedback. The\\npaper is organized into three main sections: 1) introduction and preliminaries:\\nan introduction to reinforcement learning frameworks, preference tuning tasks,\\nmodels, and datasets across various modalities: language, speech, and vision,\\nas well as different policy approaches, 2) in-depth examination of each\\npreference tuning approach: a detailed analysis of the methods used in\\npreference tuning, and 3) applications, discussion, and future directions: an\\nexploration of the applications of preference tuning in downstream tasks,\\nincluding evaluation methods for different modalities, and an outlook on future\\nresearch directions. Our objective is to present the latest methodologies in\\npreference tuning and model alignment, enhancing the understanding of this\\nfield for researchers and practitioners. We hope to encourage further\\nengagement and innovation in this area.\",\"PeriodicalId\":501284,\"journal\":{\"name\":\"arXiv - EE - Audio and Speech Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Audio and Speech Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11564\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11564","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

偏好调整是使深度生成模型与人类偏好相一致的关键过程。本调查报告全面概述了偏好调整和人类反馈整合方面的最新进展。本文分为三个主要部分:1)引言和前言:介绍强化学习框架、偏好调优任务、模型和不同模式的数据集:语言、语音和视觉,以及不同的策略方法;2)深入研究每种偏好调优方法:详细分析偏好调优中使用的方法;3)应用、讨论和未来方向:探讨偏好调优在下游任务中的应用,包括不同模式的评估方法,以及对未来研究方向的展望。我们的目标是介绍偏好调整和模型配准方面的最新方法,增强研究人员和从业人员对这一领域的了解。我们希望鼓励这一领域的进一步参与和创新。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey
Preference tuning is a crucial process for aligning deep generative models with human preferences. This survey offers a thorough overview of recent advancements in preference tuning and the integration of human feedback. The paper is organized into three main sections: 1) introduction and preliminaries: an introduction to reinforcement learning frameworks, preference tuning tasks, models, and datasets across various modalities: language, speech, and vision, as well as different policy approaches, 2) in-depth examination of each preference tuning approach: a detailed analysis of the methods used in preference tuning, and 3) applications, discussion, and future directions: an exploration of the applications of preference tuning in downstream tasks, including evaluation methods for different modalities, and an outlook on future research directions. Our objective is to present the latest methodologies in preference tuning and model alignment, enhancing the understanding of this field for researchers and practitioners. We hope to encourage further engagement and innovation in this area.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Exploring an Inter-Pausal Unit (IPU) based Approach for Indic End-to-End TTS Systems Conformal Prediction for Manifold-based Source Localization with Gaussian Processes Insights into the Incorporation of Signal Information in Binaural Signal Matching with Wearable Microphone Arrays Dense-TSNet: Dense Connected Two-Stage Structure for Ultra-Lightweight Speech Enhancement Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1