数据驱动vs模型驱动的模仿学习

H. Tembine
{"title":"数据驱动vs模型驱动的模仿学习","authors":"H. Tembine","doi":"10.1109/DDCLS.2017.8067719","DOIUrl":null,"url":null,"abstract":"One of the fundamental problems of an interconnected interactive system is the huge amounts of data that are being generated by every entity. Unfortunately, what we seek is not data but information, and therefore, a growing bottleneck is exactly how to extract and learn useful information from data. In this paper, the information-theoretic learning in data-driven games is studied. This learning shows that the imitative Boltzmann-Gibbs strategy is the maximizer of the perturbed payoff where the perturbation function is the relative entropy from the previous strategy to the current one. In particular, the imitative strategy is the best learning scheme with the respect to data-driven games with cost of moves. Based on it, the classical imitative Boltzmann-Gibbs learning in data-driven games is revisited. Due to communication complexity and noisy data measurements, the classical imitative Boltzmann-Gibbs cannot be applied directly in situations were only numerical values of player's own payoff is measured. A combined fully distributed payoff and strategy imitative learning (CODIPAS) is proposed. Connections between the rest points of the resulting game dynamics, equilibria are established.","PeriodicalId":419114,"journal":{"name":"2017 6th Data Driven Control and Learning Systems (DDCLS)","volume":"270 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data-driven vs model-driven imitative learning\",\"authors\":\"H. Tembine\",\"doi\":\"10.1109/DDCLS.2017.8067719\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the fundamental problems of an interconnected interactive system is the huge amounts of data that are being generated by every entity. Unfortunately, what we seek is not data but information, and therefore, a growing bottleneck is exactly how to extract and learn useful information from data. In this paper, the information-theoretic learning in data-driven games is studied. This learning shows that the imitative Boltzmann-Gibbs strategy is the maximizer of the perturbed payoff where the perturbation function is the relative entropy from the previous strategy to the current one. In particular, the imitative strategy is the best learning scheme with the respect to data-driven games with cost of moves. Based on it, the classical imitative Boltzmann-Gibbs learning in data-driven games is revisited. Due to communication complexity and noisy data measurements, the classical imitative Boltzmann-Gibbs cannot be applied directly in situations were only numerical values of player's own payoff is measured. A combined fully distributed payoff and strategy imitative learning (CODIPAS) is proposed. Connections between the rest points of the resulting game dynamics, equilibria are established.\",\"PeriodicalId\":419114,\"journal\":{\"name\":\"2017 6th Data Driven Control and Learning Systems (DDCLS)\",\"volume\":\"270 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 6th Data Driven Control and Learning Systems (DDCLS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DDCLS.2017.8067719\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 6th Data Driven Control and Learning Systems (DDCLS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DDCLS.2017.8067719","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

互联交互系统的一个基本问题是每个实体产生的海量数据。不幸的是,我们寻找的不是数据而是信息,因此,如何从数据中提取和学习有用的信息是一个日益增长的瓶颈。本文研究了数据驱动博弈中的信息学习问题。这种学习表明,模仿Boltzmann-Gibbs策略是摄动收益的最大化者,其中摄动函数是从前一策略到当前策略的相对熵。特别是,模仿策略是关于带有移动代价的数据驱动游戏的最佳学习方案。在此基础上,我们重新审视了数据驱动游戏中的经典模仿Boltzmann-Gibbs学习。由于通信复杂性和嘈杂的数据测量,经典的模仿Boltzmann-Gibbs不能直接应用于仅测量玩家自身收益数值的情况。提出了一种完全分布式收益与策略模仿相结合的学习方法。由此产生的博弈动态的剩余点之间的联系,平衡点就建立起来了。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Data-driven vs model-driven imitative learning
One of the fundamental problems of an interconnected interactive system is the huge amounts of data that are being generated by every entity. Unfortunately, what we seek is not data but information, and therefore, a growing bottleneck is exactly how to extract and learn useful information from data. In this paper, the information-theoretic learning in data-driven games is studied. This learning shows that the imitative Boltzmann-Gibbs strategy is the maximizer of the perturbed payoff where the perturbation function is the relative entropy from the previous strategy to the current one. In particular, the imitative strategy is the best learning scheme with the respect to data-driven games with cost of moves. Based on it, the classical imitative Boltzmann-Gibbs learning in data-driven games is revisited. Due to communication complexity and noisy data measurements, the classical imitative Boltzmann-Gibbs cannot be applied directly in situations were only numerical values of player's own payoff is measured. A combined fully distributed payoff and strategy imitative learning (CODIPAS) is proposed. Connections between the rest points of the resulting game dynamics, equilibria are established.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Model-free adaptive MIMO control algorithm application in polishing robot Multiple-fault diagnosis of analog circuit with fault tolerance Iterative learning control for switched singular systems Active disturbance rejection generalized predictive control and its application on large time-delay systems Robust ADRC for nonlinear time-varying system with uncertainties
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1