Online learning in sequential Bayesian persuasion: Handling unknown priors

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence Pub Date : 2025-01-01 Epub Date: 2024-11-06 DOI:10.1016/j.artint.2024.104245

Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti, Francesco Trovò

{"title":"Online learning in sequential Bayesian persuasion: Handling unknown priors","authors":"Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti, Francesco Trovò","doi":"10.1016/j.artint.2024.104245","DOIUrl":null,"url":null,"abstract":"<div><div>We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver, through the provision of payoff-relevant information. We consider settings where the receiver repeatedly faces a sequential decision making (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem, which are only partially observable by the receiver. This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations. We study the case in which the sender does not know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of the sender's persuasive information-revelation structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result which also applies to the non-sequential case: no learning algorithm can be persuasive in high probability. Thus, we relax the persuasiveness requirement, studying algorithms that guarantee that the receiver's regret in following recommendations grows sub-linearly. In the full-feedback setting—where the sender observes the realizations of all the possible random events—, we provide an algorithm with <math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msqrt><mrow><mi>T</mi></mrow></msqrt><mo>)</mo></math> regret for both the sender and the receiver. Instead, in the bandit-feedback setting—where the sender only observes the realizations of random events actually occurring in the SDM problem—, we design an algorithm that, given an <math><mi>α</mi><mo>∈</mo><mo>[</mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo><mn>1</mn><mo>]</mo></math> as input, guarantees <math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msup><mrow><mi>T</mi></mrow><mrow><mi>α</mi></mrow></msup><mo>)</mo></math> and <math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msup><mrow><mi>T</mi></mrow><mrow><mi>max</mi><mo>⁡</mo><mo>{</mo><mi>α</mi><mo>,</mo><mn>1</mn><mo>−</mo><mfrac><mrow><mi>α</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>}</mo></mrow></msup><mo>)</mo></math> regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regret trade-off is tight for <math><mi>α</mi><mo>∈</mo><mo>[</mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo><mn>2</mn><mo>/</mo><mn>3</mn><mo>]</mo></math>.</div></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"338 ","pages":"Article 104245"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370224001814","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/6 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver, through the provision of payoff-relevant information. We consider settings where the receiver repeatedly faces a sequential decision making (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem, which are only partially observable by the receiver. This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations. We study the case in which the sender does not know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of the sender's persuasive information-revelation structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result which also applies to the non-sequential case: no learning algorithm can be persuasive in high probability. Thus, we relax the persuasiveness requirement, studying algorithms that guarantee that the receiver's regret in following recommendations grows sub-linearly. In the full-feedback setting—where the sender observes the realizations of all the possible random events—, we provide an algorithm with

\tilde{O} (\sqrt{T})

regret for both the sender and the receiver. Instead, in the bandit-feedback setting—where the sender only observes the realizations of random events actually occurring in the SDM problem—, we design an algorithm that, given an

α \in [1 / 2, 1]

as input, guarantees

\tilde{O} (T^{α})

and

\tilde{O} (T^{\max {α, 1 - \frac{α}{2}}})

regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regret trade-off is tight for

α \in [1 / 2, 2 / 3]

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

序列贝叶斯说服中的在线学习：处理未知先验

我们研究的是一个重复信息设计问题，该问题由一个知情的发送者面临，他试图通过提供与报酬相关的信息来影响一个自利的接收者的行为。我们考虑的是接收方重复面临连续决策（SDM）问题的情况。在每一轮中，发送方都会观察 SDM 问题中随机事件的实现情况，而接收方只能部分地观察到这些情况。这就带来了一个挑战：如何逐步向接收方披露这些信息，以说服他们遵循（理想的）行动建议。我们研究的是发送方不知道随机事件概率的情况，因此发送方必须在说服接收方的同时逐步了解这些概率。我们首先提供了发送方有说服力的信息披露结构集合的非难多顶近似值。这对于设计高效的学习算法至关重要。接下来，我们证明了一个同样适用于非序列情况的否定结果：任何学习算法都不可能高概率地具有说服力。因此，我们放宽了对说服力的要求，研究那些能保证接收者在遵循推荐时的遗憾呈亚线性增长的算法。在全反馈设置中--即发送者观察所有可能的随机事件的实现情况--我们提供了一种对发送者和接收者都有 O˜(T)遗憾的算法。相反，在匪徒反馈设置中，即发送方只观察 SDM 问题中实际发生的随机事件的实现情况，我们设计了一种算法，在输入α∈[1/2,1]的情况下，保证发送方和接收方分别有 O˜(Tα)和 O˜(Tmax{α,1-α2})遗憾。这一结果得到了一个下限的补充，表明这种遗憾权衡在 α∈[1/2,2/3] 时是紧密的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

11.20

自引率

1.40%

发文量

118

审稿时长

8 months

期刊介绍： The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.