Contextual Bandits with Budgeted Information Reveal.

Proceedings of machine learning research Pub Date : 2024-05-01

Kyra Gan, Esmaeil Keyvanshokooh, Xueqing Liu, Susan Murphy

引用次数: 0

Abstract

Contextual bandit algorithms are commonly used in digital health to recommend personalized treatments. However, to ensure the effectiveness of the treatments, patients are often requested to take actions that have no immediate benefit to them, which we refer to as pro-treatment actions. In practice, clinicians have a limited budget to encourage patients to take these actions and collect additional information. We introduce a novel optimization and learning algorithm to address this problem. This algorithm effectively combines the strengths of two algorithmic approaches in a seamless manner, including 1) an online primal-dual algorithm for deciding the optimal timing to reach out to patients, and 2) a contextual bandit learning algorithm to deliver personalized treatment to the patient. We prove that this algorithm admits a sub-linear regret bound. We illustrate the usefulness of this algorithm on both synthetic and real-world data.

微信好友朋友圈 QQ好友复制链接

本刊更多论文

有预算信息揭示的情境大盗。

数字医疗领域通常使用情境强盗算法来推荐个性化治疗方案。然而，为了确保治疗的有效性，患者往往会被要求采取对他们没有直接益处的行动，我们称之为支持治疗行动。在实践中，临床医生的预算有限，无法鼓励患者采取这些行动并收集更多信息。我们引入了一种新颖的优化和学习算法来解决这一问题。该算法有效地将两种算法方法的优势完美地结合在一起，包括：1）在线原始二元算法，用于决定接触患者的最佳时机；2）情境强盗学习算法，用于向患者提供个性化治疗。我们证明了这种算法具有亚线性遗憾约束。我们在合成数据和真实世界数据上说明了该算法的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of machine learning research

自引率

0.00%

发文量