Enhancing Preference-based Linear Bandits via Human Response Time

Shen Li, Yuyang Zhang, Zhaolin Ren, Claire Liang, Na Li, Julie A. Shah
{"title":"Enhancing Preference-based Linear Bandits via Human Response Time","authors":"Shen Li, Yuyang Zhang, Zhaolin Ren, Claire Liang, Na Li, Julie A. Shah","doi":"arxiv-2409.05798","DOIUrl":null,"url":null,"abstract":"Binary human choice feedback is widely used in interactive preference\nlearning for its simplicity, but it provides limited information about\npreference strength. To overcome this limitation, we leverage human response\ntimes, which inversely correlate with preference strength, as complementary\ninformation. Our work integrates the EZ-diffusion model, which jointly models\nhuman choices and response times, into preference-based linear bandits. We\nintroduce a computationally efficient utility estimator that reformulates the\nutility estimation problem using both choices and response times as a linear\nregression problem. Theoretical and empirical comparisons with traditional\nchoice-only estimators reveal that for queries with strong preferences (\"easy\"\nqueries), choices alone provide limited information, while response times offer\nvaluable complementary information about preference strength. As a result,\nincorporating response times makes easy queries more useful. We demonstrate\nthis advantage in the fixed-budget best-arm identification problem, with\nsimulations based on three real-world datasets, consistently showing\naccelerated learning when response times are incorporated.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - Econometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05798","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Binary human choice feedback is widely used in interactive preference learning for its simplicity, but it provides limited information about preference strength. To overcome this limitation, we leverage human response times, which inversely correlate with preference strength, as complementary information. Our work integrates the EZ-diffusion model, which jointly models human choices and response times, into preference-based linear bandits. We introduce a computationally efficient utility estimator that reformulates the utility estimation problem using both choices and response times as a linear regression problem. Theoretical and empirical comparisons with traditional choice-only estimators reveal that for queries with strong preferences ("easy" queries), choices alone provide limited information, while response times offer valuable complementary information about preference strength. As a result, incorporating response times makes easy queries more useful. We demonstrate this advantage in the fixed-budget best-arm identification problem, with simulations based on three real-world datasets, consistently showing accelerated learning when response times are incorporated.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过人类响应时间增强基于偏好的线性匪帮
二进制人类选择反馈因其简单性被广泛应用于交互式偏好学习中,但它提供的偏好强度信息有限。为了克服这一局限,我们利用与偏好强度成反比的人类反应时间作为补充信息。我们的工作将 EZ 扩散模型与基于偏好的线性匪帮模型相结合,EZ 扩散模型可以对人类的选择和响应时间进行联合建模。我们引入了一种计算效率高的效用估计器,它将使用选择和响应时间的效用估计问题重新表述为线性回归问题。通过与传统的仅有选择的估计器进行理论和实证比较,我们发现对于具有强烈偏好的查询("简单 "查询),仅有选择提供的信息是有限的,而响应时间则提供了关于偏好强度的宝贵补充信息。因此,加入响应时间会使简单查询更有用。我们在固定预算最佳臂识别问题中证明了这一优势,并基于三个真实世界数据集进行了模拟。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Simple robust two-stage estimation and inference for generalized impulse responses and multi-horizon causality GPT takes the SAT: Tracing changes in Test Difficulty and Math Performance of Students A Simple and Adaptive Confidence Interval when Nuisance Parameters Satisfy an Inequality Why you should also use OLS estimation of tail exponents On LASSO Inference for High Dimensional Predictive Regression
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1