首页 > 最新文献

Proceedings of machine learning research最新文献

英文 中文
Half-Hop: A graph upsampling approach for slowing down message passing. 半跳:一种用于减慢消息传递速度的图形上采样方法。
Mehdi Azabou, Venkataramana Ganesh, Shantanu Thakoor, Chi-Heng Lin, Lakshmi Sathidevi, Ran Liu, Michal Valko, Petar Veličković, Eva L Dyer

Message passing neural networks have shown a lot of success on graph-structured data. However, there are many instances where message passing can lead to over-smoothing or fail when neighboring nodes belong to different classes. In this work, we introduce a simple yet general framework for improving learning in message passing neural networks. Our approach essentially upsamples edges in the original graph by adding "slow nodes" at each edge that can mediate communication between a source and a target node. Our method only modifies the input graph, making it plug-and-play and easy to use with existing models. To understand the benefits of slowing down message passing, we provide theoretical and empirical analyses. We report results on several supervised and self-supervised benchmarks, and show improvements across the board, notably in heterophilic conditions where adjacent nodes are more likely to have different labels. Finally, we show how our approach can be used to generate augmentations for self-supervised learning, where slow nodes are randomly introduced into different edges in the graph to generate multi-scale views with variable path lengths.

消息传递神经网络在图结构数据方面取得了很大的成功。然而,在许多情况下,当相邻节点属于不同的类时,消息传递可能会导致过度平滑或失败。在这项工作中,我们介绍了一个简单而通用的框架,用于改进消息传递神经网络的学习。我们的方法本质上是通过在每条边上添加“慢节点”来对原始图中的边进行上采样,这些节点可以调解源节点和目标节点之间的通信。我们的方法只修改输入图,使其即插即用,并且易于与现有模型一起使用。为了理解减缓信息传递的好处,我们提供了理论和实证分析。我们报告了几个监督和自监督基准的结果,并显示了全面的改进,特别是在相邻节点更有可能具有不同标签的异亲条件下。最后,我们展示了如何使用我们的方法来生成自监督学习的增强,其中将慢节点随机引入图中的不同边,以生成具有可变路径长度的多尺度视图。
{"title":"Half-Hop: A graph upsampling approach for slowing down message passing.","authors":"Mehdi Azabou,&nbsp;Venkataramana Ganesh,&nbsp;Shantanu Thakoor,&nbsp;Chi-Heng Lin,&nbsp;Lakshmi Sathidevi,&nbsp;Ran Liu,&nbsp;Michal Valko,&nbsp;Petar Veličković,&nbsp;Eva L Dyer","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Message passing neural networks have shown a lot of success on graph-structured data. However, there are many instances where message passing can lead to over-smoothing or fail when neighboring nodes belong to different classes. In this work, we introduce a simple yet general framework for improving learning in message passing neural networks. Our approach essentially upsamples edges in the original graph by adding \"slow nodes\" at each edge that can mediate communication between a source and a target node. Our method only modifies the input graph, making it plug-and-play and easy to use with existing models. To understand the benefits of slowing down message passing, we provide theoretical and empirical analyses. We report results on several supervised and self-supervised benchmarks, and show improvements across the board, notably in heterophilic conditions where adjacent nodes are more likely to have different labels. Finally, we show how our approach can be used to generate augmentations for self-supervised learning, where slow nodes are randomly introduced into different edges in the graph to generate multi-scale views with variable path lengths.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"1341-1360"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10559225/pdf/nihms-1931959.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41184447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Controlled Differential Equations on Long Sequences via Non-standard Wavelets. 通过非标准小波控制长序列上的微分方程
Sourav Pal, Zhanpeng Zeng, Sathya N Ravi, Vikas Singh

Neural Controlled Differential equations (NCDE) are a powerful mechanism to model the dynamics in temporal sequences, e.g., applications involving physiological measures, where apart from the initial condition, the dynamics also depend on subsequent measures or even a different "control" sequence. But NCDEs do not scale well to longer sequences. Existing strategies adapt rough path theory, and instead model the dynamics over summaries known as log signatures. While rigorous and elegant, invertibility of these summaries is difficult, and limits the scope of problems where these ideas can offer strong benefits (reconstruction, generative modeling). For tasks where it is sensible to assume that the (long) sequences in the training data are a fixed length of temporal measurements - this assumption holds in most experiments tackled in the literature - we describe an efficient simplification. First, we recast the regression/classification task as an integral transform. We then show how restricting the class of operators (permissible in the integral transform), allows the use of a known algorithm that leverages non-standard Wavelets to decompose the operator. Thereby, our task (learning the operator) radically simplifies. A neural variant of this idea yields consistent improvements across a wide gamut of use cases tackled in existing works. We also describe a novel application on modeling tasks involving coupled differential equations.

神经控制微分方程(NCDE)是一种强大的机制,可用于建立时间序列的动态模型,例如,在涉及生理测量的应用中,除了初始条件外,动态还取决于后续测量甚至不同的 "控制 "序列。但是,NCDE 不能很好地扩展到更长的序列。现有的策略采用了粗糙路径理论,并在称为对数特征的摘要上建立动态模型。虽然这种方法既严谨又优雅,但这些摘要的可逆性却很难实现,这就限制了这些想法能带来巨大优势的问题(重建、生成模型)的范围。对于假设训练数据中的(长)序列是固定长度的时间测量(这一假设在大多数文献中的实验中都成立)的任务,我们描述了一种有效的简化方法。首先,我们将回归/分类任务重塑为积分变换。然后,我们展示了如何限制算子类别(积分变换中允许的算子类别),从而利用非标准小波分解算子的已知算法。这样,我们的任务(学习算子)就从根本上简化了。这一想法的神经变体在现有工作中处理的各种用例中都取得了一致的改进。我们还介绍了在涉及耦合微分方程的建模任务中的新应用。
{"title":"Controlled Differential Equations on Long Sequences via Non-standard Wavelets.","authors":"Sourav Pal, Zhanpeng Zeng, Sathya N Ravi, Vikas Singh","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Neural Controlled Differential equations (NCDE) are a powerful mechanism to model the dynamics in temporal sequences, e.g., applications involving physiological measures, where apart from the initial condition, the dynamics also depend on subsequent measures or even a different \"control\" sequence. But NCDEs do not scale well to longer sequences. Existing strategies adapt rough path theory, and instead model the dynamics over summaries known as <i>log signatures</i>. While rigorous and elegant, invertibility of these summaries is difficult, and limits the scope of problems where these ideas can offer strong benefits (reconstruction, generative modeling). For tasks where it is sensible to assume that the (long) sequences in the training data are a <i>fixed</i> length of temporal measurements - this assumption holds in most experiments tackled in the literature - we describe an efficient simplification. First, we recast the regression/classification task as an integral transform. We then show how restricting the class of operators (permissible in the integral transform), allows the use of a known algorithm that leverages non-standard Wavelets to decompose the operator. Thereby, our task (learning the operator) radically simplifies. A neural variant of this idea yields consistent improvements across a wide gamut of use cases tackled in existing works. We also describe a novel application on modeling tasks involving coupled differential equations.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"26820-26836"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11178150/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141332696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
p -Regression in the Arbitrary Partition Model of Communication. 通信任意划分模型中的回归。
Yi Li, Honghao Lin, David P Woodruff
<p><p>We consider the randomized communication complexity of the distributed <math> <mrow><msub><mi>ℓ</mi> <mi>p</mi></msub> </mrow> </math> -regression problem in the coordinator model, for <math><mrow><mi>p</mi> <mo>∈</mo> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mn>2</mn> <mo>]</mo></mrow> </math> . In this problem, there is a coordinator and <math><mi>s</mi></math> servers. The <math><mi>i</mi></math> -th server receives <math> <mrow><msup><mi>A</mi> <mi>i</mi></msup> <mo>∈</mo> <msup><mrow><mo>{</mo> <mo>-</mo> <mi>M</mi> <mo>,</mo> <mo>-</mo> <mi>M</mi> <mo>+</mo> <mn>1</mn> <mo>,</mo> <mo>…</mo> <mo>,</mo> <mi>M</mi> <mo>}</mo></mrow> <mrow><mi>n</mi> <mo>×</mo> <mi>d</mi></mrow> </msup> </mrow> </math> and <math> <mrow><msup><mi>b</mi> <mi>i</mi></msup> <mo>∈</mo> <msup><mrow><mo>{</mo> <mo>-</mo> <mi>M</mi> <mo>,</mo> <mo>-</mo> <mi>M</mi> <mo>+</mo> <mn>1</mn> <mo>,</mo> <mo>…</mo> <mo>,</mo> <mi>M</mi> <mo>}</mo></mrow> <mi>n</mi></msup> </mrow> </math> and the coordinator would like to find a <math><mrow><mo>(</mo> <mn>1</mn> <mo>+</mo> <mi>ε</mi> <mo>)</mo></mrow> </math> -approximate solution to <math> <mrow> <msub> <mrow><msub><mtext>min</mtext> <mrow><mi>x</mi> <mo>∈</mo> <msup><mtext>R</mtext> <mi>n</mi></msup> </mrow> </msub> <mrow><mo>‖</mo> <mrow> <mrow> <mrow><mrow><mo>(</mo> <mrow><msub><mo>∑</mo> <mi>i</mi></msub> <msup><mi>A</mi> <mi>i</mi></msup> </mrow> <mo>)</mo></mrow> <mi>x</mi> <mo>-</mo> <mrow><mo>(</mo> <mrow><munder><mo>∑</mo> <mi>i</mi></munder> <msup><mi>b</mi> <mi>i</mi></msup> </mrow> <mo>)</mo></mrow> </mrow> <mo>‖</mo></mrow> </mrow> </mrow> </mrow> <mi>p</mi></msub> </mrow> </math> . Here <math><mrow><mi>M</mi> <mo>≤</mo></mrow> </math> poly(nd) for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For <math><mrow><mi>p</mi> <mo>=</mo> <mn>2</mn></mrow> </math> , i.e., least squares regression, we give the first optimal bound of <math> <mrow><mover><mtext>Θ</mtext> <mo>˜</mo></mover> <mrow><mo>(</mo> <mrow><mi>s</mi> <msup><mi>d</mi> <mn>2</mn></msup> <mo>+</mo> <mi>s</mi> <mi>d</mi> <mo>/</mo> <mi>ϵ</mi></mrow> <mo>)</mo></mrow> </mrow> </math> ) bits. For <math><mrow><mi>p</mi> <mo>∈</mo> <mo>(</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>)</mo></mrow> </math> , we obtain an <math> <mrow><mover><mi>O</mi> <mo>˜</mo></mover> <mrow><mo>(</mo> <mrow><mi>s</mi> <msup><mi>d</mi> <mn>2</mn></msup> <mo>/</mo> <mi>ε</mi> <mo>+</mo> <mi>s</mi> <mi>d</mi> <mo>/</mo> <mtext>poly</mtext> <mo>(</mo> <mi>ε</mi> <mo>)</mo></mrow> <mo>)</mo></mrow> </mrow> </math> upper bound. Notably, for <math><mi>d</mi></math> sufficiently large, our leading order term only depends linearly on <math><mrow><mn>1</mn> <mo>/</mo> <mi>ϵ</mi></mrow> </math> rather than quadratically. We also show communication lower bounds of <math><mrow><mtext>Ω</mtext> <mrow><mo>(</mo> <mrow><mi>s</mi> <msup><mi>d</mi> <mn>
我们考虑的是协调器模型中分布式 ℓ p - 回归问题的随机通信复杂度,条件是 p∈ ( 0 , 2 ]。在这个问题中,有一个协调器和 s 个服务器。第 i 个服务器接收 A i∈ { - M , - M + 1 , ... , M } n × d 和 b i∈ { - M , - M + 1 , ... , M } n,协调者希望找到一个 ( 1 + ε ) 近似解,即 min x∈ R n ‖ ( ∑ i A i ) x - ( ∑ i b i ) ‖ p 。为方便起见,此处 M≤ poly(nd)。这种数据在不同服务器之间共享的模型通常被称为任意分区模型。我们在这个问题上得到了明显改善的边界。对于 p = 2,即最小二乘回归,我们首次给出了 Θ ˜ ( s d 2 + s d / ϵ ) 位的最优边界。对于 p∈ ( 1 , 2 ) ,我们得到 O ˜ ( s d 2 / ε + s d / poly ( ε ) ) 上限。值得注意的是,对于足够大的 d,我们的前导项仅线性地依赖于 1 / ϵ,而不是二次。我们还展示了 p∈ ( 0 , 1 ] 时的Ω ( s d 2 + s d / ε 2 ) 和 p∈ ( 1 , 2 ] 时的Ω ( s d 2 + s d / ε ) 的通信下界。我们的边界大大改进了之前的边界(Woodruff 等,COLT,2013 年)和(Vempala 等,SODA,2020 年)。
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\"><ns0:math> <ns0:mrow><ns0:msub><ns0:mi>ℓ</ns0:mi> <ns0:mi>p</ns0:mi></ns0:msub> </ns0:mrow> </ns0:math> -Regression in the Arbitrary Partition Model of Communication.","authors":"Yi Li, Honghao Lin, David P Woodruff","doi":"","DOIUrl":"","url":null,"abstract":"&lt;p&gt;&lt;p&gt;We consider the randomized communication complexity of the distributed &lt;math&gt; &lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;ℓ&lt;/mi&gt; &lt;mi&gt;p&lt;/mi&gt;&lt;/msub&gt; &lt;/mrow&gt; &lt;/math&gt; -regression problem in the coordinator model, for &lt;math&gt;&lt;mrow&gt;&lt;mi&gt;p&lt;/mi&gt; &lt;mo&gt;∈&lt;/mo&gt; &lt;mo&gt;(&lt;/mo&gt; &lt;mn&gt;0&lt;/mn&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mn&gt;2&lt;/mn&gt; &lt;mo&gt;]&lt;/mo&gt;&lt;/mrow&gt; &lt;/math&gt; . In this problem, there is a coordinator and &lt;math&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;/math&gt; servers. The &lt;math&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/math&gt; -th server receives &lt;math&gt; &lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;A&lt;/mi&gt; &lt;mi&gt;i&lt;/mi&gt;&lt;/msup&gt; &lt;mo&gt;∈&lt;/mo&gt; &lt;msup&gt;&lt;mrow&gt;&lt;mo&gt;{&lt;/mo&gt; &lt;mo&gt;-&lt;/mo&gt; &lt;mi&gt;M&lt;/mi&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mo&gt;-&lt;/mo&gt; &lt;mi&gt;M&lt;/mi&gt; &lt;mo&gt;+&lt;/mo&gt; &lt;mn&gt;1&lt;/mn&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mo&gt;…&lt;/mo&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mi&gt;M&lt;/mi&gt; &lt;mo&gt;}&lt;/mo&gt;&lt;/mrow&gt; &lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt; &lt;mo&gt;×&lt;/mo&gt; &lt;mi&gt;d&lt;/mi&gt;&lt;/mrow&gt; &lt;/msup&gt; &lt;/mrow&gt; &lt;/math&gt; and &lt;math&gt; &lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;b&lt;/mi&gt; &lt;mi&gt;i&lt;/mi&gt;&lt;/msup&gt; &lt;mo&gt;∈&lt;/mo&gt; &lt;msup&gt;&lt;mrow&gt;&lt;mo&gt;{&lt;/mo&gt; &lt;mo&gt;-&lt;/mo&gt; &lt;mi&gt;M&lt;/mi&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mo&gt;-&lt;/mo&gt; &lt;mi&gt;M&lt;/mi&gt; &lt;mo&gt;+&lt;/mo&gt; &lt;mn&gt;1&lt;/mn&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mo&gt;…&lt;/mo&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mi&gt;M&lt;/mi&gt; &lt;mo&gt;}&lt;/mo&gt;&lt;/mrow&gt; &lt;mi&gt;n&lt;/mi&gt;&lt;/msup&gt; &lt;/mrow&gt; &lt;/math&gt; and the coordinator would like to find a &lt;math&gt;&lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;mn&gt;1&lt;/mn&gt; &lt;mo&gt;+&lt;/mo&gt; &lt;mi&gt;ε&lt;/mi&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;/math&gt; -approximate solution to &lt;math&gt; &lt;mrow&gt; &lt;msub&gt; &lt;mrow&gt;&lt;msub&gt;&lt;mtext&gt;min&lt;/mtext&gt; &lt;mrow&gt;&lt;mi&gt;x&lt;/mi&gt; &lt;mo&gt;∈&lt;/mo&gt; &lt;msup&gt;&lt;mtext&gt;R&lt;/mtext&gt; &lt;mi&gt;n&lt;/mi&gt;&lt;/msup&gt; &lt;/mrow&gt; &lt;/msub&gt; &lt;mrow&gt;&lt;mo&gt;‖&lt;/mo&gt; &lt;mrow&gt; &lt;mrow&gt; &lt;mrow&gt;&lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;mrow&gt;&lt;msub&gt;&lt;mo&gt;∑&lt;/mo&gt; &lt;mi&gt;i&lt;/mi&gt;&lt;/msub&gt; &lt;msup&gt;&lt;mi&gt;A&lt;/mi&gt; &lt;mi&gt;i&lt;/mi&gt;&lt;/msup&gt; &lt;/mrow&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;mi&gt;x&lt;/mi&gt; &lt;mo&gt;-&lt;/mo&gt; &lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;mrow&gt;&lt;munder&gt;&lt;mo&gt;∑&lt;/mo&gt; &lt;mi&gt;i&lt;/mi&gt;&lt;/munder&gt; &lt;msup&gt;&lt;mi&gt;b&lt;/mi&gt; &lt;mi&gt;i&lt;/mi&gt;&lt;/msup&gt; &lt;/mrow&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;/mrow&gt; &lt;mo&gt;‖&lt;/mo&gt;&lt;/mrow&gt; &lt;/mrow&gt; &lt;/mrow&gt; &lt;/mrow&gt; &lt;mi&gt;p&lt;/mi&gt;&lt;/msub&gt; &lt;/mrow&gt; &lt;/math&gt; . Here &lt;math&gt;&lt;mrow&gt;&lt;mi&gt;M&lt;/mi&gt; &lt;mo&gt;≤&lt;/mo&gt;&lt;/mrow&gt; &lt;/math&gt; poly(nd) for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For &lt;math&gt;&lt;mrow&gt;&lt;mi&gt;p&lt;/mi&gt; &lt;mo&gt;=&lt;/mo&gt; &lt;mn&gt;2&lt;/mn&gt;&lt;/mrow&gt; &lt;/math&gt; , i.e., least squares regression, we give the first optimal bound of &lt;math&gt; &lt;mrow&gt;&lt;mover&gt;&lt;mtext&gt;Θ&lt;/mtext&gt; &lt;mo&gt;˜&lt;/mo&gt;&lt;/mover&gt; &lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt; &lt;msup&gt;&lt;mi&gt;d&lt;/mi&gt; &lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt; &lt;mo&gt;+&lt;/mo&gt; &lt;mi&gt;s&lt;/mi&gt; &lt;mi&gt;d&lt;/mi&gt; &lt;mo&gt;/&lt;/mo&gt; &lt;mi&gt;ϵ&lt;/mi&gt;&lt;/mrow&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;/mrow&gt; &lt;/math&gt; ) bits. For &lt;math&gt;&lt;mrow&gt;&lt;mi&gt;p&lt;/mi&gt; &lt;mo&gt;∈&lt;/mo&gt; &lt;mo&gt;(&lt;/mo&gt; &lt;mn&gt;1&lt;/mn&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mn&gt;2&lt;/mn&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;/math&gt; , we obtain an &lt;math&gt; &lt;mrow&gt;&lt;mover&gt;&lt;mi&gt;O&lt;/mi&gt; &lt;mo&gt;˜&lt;/mo&gt;&lt;/mover&gt; &lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt; &lt;msup&gt;&lt;mi&gt;d&lt;/mi&gt; &lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt; &lt;mo&gt;/&lt;/mo&gt; &lt;mi&gt;ε&lt;/mi&gt; &lt;mo&gt;+&lt;/mo&gt; &lt;mi&gt;s&lt;/mi&gt; &lt;mi&gt;d&lt;/mi&gt; &lt;mo&gt;/&lt;/mo&gt; &lt;mtext&gt;poly&lt;/mtext&gt; &lt;mo&gt;(&lt;/mo&gt; &lt;mi&gt;ε&lt;/mi&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;/mrow&gt; &lt;/math&gt; upper bound. Notably, for &lt;math&gt;&lt;mi&gt;d&lt;/mi&gt;&lt;/math&gt; sufficiently large, our leading order term only depends linearly on &lt;math&gt;&lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt; &lt;mo&gt;/&lt;/mo&gt; &lt;mi&gt;ϵ&lt;/mi&gt;&lt;/mrow&gt; &lt;/math&gt; rather than quadratically. We also show communication lower bounds of &lt;math&gt;&lt;mrow&gt;&lt;mtext&gt;Ω&lt;/mtext&gt; &lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;mrow&gt;&lt;mi&gt;s&lt;/mi&gt; &lt;msup&gt;&lt;mi&gt;d&lt;/mi&gt; &lt;mn&gt;","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"195 ","pages":"4902-4928"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11646800/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142839750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning. 折扣正则化的意外后果:改进确定性等价强化学习中的正则化。
Sarah Rathnam, Sonali Parbhoo, Weiwei Pan, Susan A Murphy, Finale Doshi-Velez

Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to restrict planning to a less complex set of policies when estimating an MDP from sparse or noisy data (Jiang et al., 2015). It is commonly understood that discount regularization functions by de-emphasizing or ignoring delayed effects. In this paper, we reveal an alternate view of discount regularization that exposes unintended consequences. We demonstrate that planning under a lower discount factor produces an identical optimal policy to planning using any prior on the transition matrix that has the same distribution for all states and actions. In fact, it functions like a prior with stronger regularization on state-action pairs with more transition data. This leads to poor performance when the transition matrix is estimated from data sets with uneven amounts of data across state-action pairs. Our equivalence theorem leads to an explicit formula to set regularization parameters locally for individual state-action pairs rather than globally. We demonstrate the failures of discount regularization and how we remedy them using our state-action-specific method across simple empirical examples as well as a medical cancer simulator.

贴现正则化是指在计算最优策略时使用较短的规划期限,它是一种常用的选择,可以在根据稀疏或噪声数据估计 MDP 时,将规划限制在不太复杂的策略集上(Jiang 等人,2015 年)。一般认为,折扣正则化功能是通过去强调或忽略延迟效应来实现的。在本文中,我们揭示了折扣正则化的另一种观点,它暴露了意想不到的后果。我们证明,在较低的贴现因子下进行规划,与在过渡矩阵上使用任何对所有状态和行动具有相同分布的先验进行规划,都能产生相同的最优策略。事实上,它的功能类似于对具有更多过渡数据的状态-行动对进行更强正则化的先验。当过渡矩阵是通过状态-行动对数据量不均的数据集估算出来时,这就会导致性能不佳。我们的等价定理提供了一个明确的公式,可以为单个状态-行动对局部而不是全局设置正则化参数。我们通过简单的经验示例和医疗癌症模拟器,展示了折扣正则化的失败,以及我们如何使用针对特定状态行动的方法来弥补这些失败。
{"title":"The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning.","authors":"Sarah Rathnam, Sonali Parbhoo, Weiwei Pan, Susan A Murphy, Finale Doshi-Velez","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to restrict planning to a less complex set of policies when estimating an MDP from sparse or noisy data (Jiang et al., 2015). It is commonly understood that discount regularization functions by de-emphasizing or ignoring delayed effects. In this paper, we reveal an alternate view of discount regularization that exposes unintended consequences. We demonstrate that planning under a lower discount factor produces an identical optimal policy to planning using any prior on the transition matrix that has the same distribution for all states and actions. In fact, it functions like a prior with stronger regularization on state-action pairs with more transition data. This leads to poor performance when the transition matrix is estimated from data sets with uneven amounts of data across state-action pairs. Our equivalence theorem leads to an explicit formula to set regularization parameters locally for individual state-action pairs rather than globally. We demonstrate the failures of discount regularization and how we remedy them using our state-action-specific method across simple empirical examples as well as a medical cancer simulator.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"28746-28767"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10472113/pdf/nihms-1926341.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10151971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Algorithms for White-Box Adversarial Streams. 白箱对抗流的改进算法。
Ying Feng, David P Woodruff

We study streaming algorithms in the white-box adversarial stream model, where the internal state of the streaming algorithm is revealed to an adversary who adaptively generates the stream updates, but the algorithm obtains fresh randomness unknown to the adversary at each time step. We incorporate cryptographic assumptions to construct robust algorithms against such adversaries. We propose efficient algorithms for sparse recovery of vectors, low rank recovery of matrices and tensors, as well as low rank plus sparse recovery of matrices, i.e., robust PCA. Unlike deterministic algorithms, our algorithms can report when the input is not sparse or low rank even in the presence of such an adversary. We use these recovery algorithms to improve upon and solve new problems in numerical linear algebra and combinatorial optimization on white-box adversarial streams. For example, we give the first efficient algorithm for outputting a matching in a graph with insertions and deletions to its edges provided the matching size is small, and otherwise we declare the matching size is large. We also improve the approximation versus memory tradeoff of previous work for estimating the number of non-zero elements in a vector and computing the matrix rank.

我们研究的是白箱对抗流模型中的流算法,在这种模型中,流算法的内部状态会透露给一个自适应生成流更新的对手,但算法会在每个时间步获得对手未知的新随机性。我们结合密码学假设,构建了针对此类对手的鲁棒算法。我们提出了矢量稀疏恢复、矩阵和张量低秩恢复以及矩阵低秩加稀疏恢复(即鲁棒性 PCA)的高效算法。与确定性算法不同的是,我们的算法可以在输入不稀疏或低秩时报告,即使存在这样的对手。我们利用这些恢复算法来改进和解决白箱对抗流上的数值线性代数和组合优化中的新问题。例如,我们给出了第一种高效算法,用于在匹配大小较小的情况下,在边有插入和删除的图中输出匹配,否则我们宣布匹配大小较大。我们还改进了以前工作中在估计向量中的非零元素数量和计算矩阵秩时的近似值与内存之间的权衡。
{"title":"Improved Algorithms for White-Box Adversarial Streams.","authors":"Ying Feng, David P Woodruff","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We study streaming algorithms in the white-box adversarial stream model, where the internal state of the streaming algorithm is revealed to an adversary who adaptively generates the stream updates, but the algorithm obtains fresh randomness unknown to the adversary at each time step. We incorporate cryptographic assumptions to construct robust algorithms against such adversaries. We propose efficient algorithms for sparse recovery of vectors, low rank recovery of matrices and tensors, as well as low rank plus sparse recovery of matrices, i.e., robust PCA. Unlike deterministic algorithms, our algorithms can report when the input is not sparse or low rank even in the presence of such an adversary. We use these recovery algorithms to improve upon and solve new problems in numerical linear algebra and combinatorial optimization on white-box adversarial streams. For example, we give the first efficient algorithm for outputting a matching in a graph with insertions and deletions to its edges provided the matching size is small, and otherwise we declare the matching size is large. We also improve the approximation versus memory tradeoff of previous work for estimating the number of non-zero elements in a vector and computing the matrix rank.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"9962-9975"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11576266/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal isotonic calibration for heterogeneous treatment effects. 非均匀处理效果的因果等压校准。
Lars van der Laan, Ernesto Ulloa-Pérez, Marco Carone, Alex Luedtke

We propose causal isotonic calibration, a novel nonparametric method for calibrating predictors of heterogeneous treatment effects. In addition, we introduce a novel data-efficient variant of calibration that avoids the need for hold-out calibration sets, which we refer to as cross-calibration. Causal isotonic cross-calibration takes cross-fitted predictors and outputs a single calibrated predictor obtained using all available data. We establish under weak conditions that causal isotonic calibration and cross-calibration both achieve fast doubly-robust calibration rates so long as either the propensity score or outcome regression is estimated well in an appropriate sense. The proposed causal isotonic calibrator can be wrapped around any black-box learning algorithm to provide strong distribution-free calibration guarantees while preserving predictive performance.

我们提出因果等压校准,这是一种新的非参数方法,用于校准异质性治疗效果的预测因子。此外,我们引入了一种新的数据高效的校准变体,避免了对保留校准集的需要,我们将其称为交叉校准。因果等压交叉校准采用交叉拟合的预测因子,并输出使用所有可用数据获得的单个校准预测因子。我们建立了在弱条件下,只要倾向得分或结果回归在适当的意义上估计得很好,因果等压校准和交叉校准都可以实现快速的双稳健校准率。所提出的因果等压校准器可以包裹在任何黑箱学习算法中,以提供强大的无分布校准保证,同时保持预测性能。
{"title":"Causal isotonic calibration for heterogeneous treatment effects.","authors":"Lars van der Laan,&nbsp;Ernesto Ulloa-Pérez,&nbsp;Marco Carone,&nbsp;Alex Luedtke","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We propose causal isotonic calibration, a novel nonparametric method for calibrating predictors of heterogeneous treatment effects. In addition, we introduce a novel data-efficient variant of calibration that avoids the need for hold-out calibration sets, which we refer to as cross-calibration. Causal isotonic cross-calibration takes cross-fitted predictors and outputs a single calibrated predictor obtained using all available data. We establish under weak conditions that causal isotonic calibration and cross-calibration both achieve fast doubly-robust calibration rates so long as either the propensity score or outcome regression is estimated well in an appropriate sense. The proposed causal isotonic calibrator can be wrapped around any black-box learning algorithm to provide strong distribution-free calibration guarantees while preserving predictive performance.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"34831-34854"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10416780/pdf/nihms-1900331.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9996727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Stain Decomposition via Inversion Regulation for Multiplex Immunohistochemistry Images. 通过反转调节对多重免疫组化图像进行无监督污点分解
Shahira Abousamra, Danielle Fassler, Jiachen Yao, Rajarsi Gupta, Tahsin Kurc, Luisa Escobar-Hoyos, Dimitris Samaras, Kenneth Shroyer, Joel Saltz, Chao Chen

Multiplex Immunohistochemistry (mIHC) is a cost-effective and accessible method for in situ labeling of multiple protein biomarkers in a tissue sample. By assigning a different stain to each biomarker, it allows the visualization of different types of cells within the tumor vicinity for downstream analysis. However, to detect different types of stains in a given mIHC image is a challenging problem, especially when the number of stains is high. Previous deep-learning-based methods mostly assume full supervision; yet the annotation can be costly. In this paper, we propose a novel unsupervised stain decomposition method to detect different stains simultaneously. Our method does not require any supervision, except for color samples of different stains. A main technical challenge is that the problem is underdetermined and can have multiple solutions. To conquer this issue, we propose a novel inversion regulation technique, which eliminates most undesirable solutions. On a 7-plexed IHC images dataset, the proposed method achieves high quality stain decomposition results without human annotation.

多重免疫组化(mIHC)是一种经济有效且易于使用的方法,可对组织样本中的多种蛋白质生物标记物进行原位标记。通过为每种生物标记物分配不同的染色剂,可以观察到肿瘤附近不同类型的细胞,以便进行下游分析。然而,在给定的 mIHC 图像中检测不同类型的染色剂是一个具有挑战性的问题,尤其是当染色剂数量较多时。以往基于深度学习的方法大多假定了完全的监督;但注释的成本可能很高。在本文中,我们提出了一种新颖的无监督污点分解方法来同时检测不同的污点。除了不同污渍的颜色样本,我们的方法不需要任何监督。一个主要的技术挑战是,该问题是一个未确定的问题,可能有多个解决方案。为了解决这个问题,我们提出了一种新颖的反转调节技术,它可以消除大多数不理想的解决方案。在 7 种复合物的 IHC 图像数据集上,所提出的方法无需人工标注即可获得高质量的染色分解结果。
{"title":"Unsupervised Stain Decomposition via Inversion Regulation for Multiplex Immunohistochemistry Images.","authors":"Shahira Abousamra, Danielle Fassler, Jiachen Yao, Rajarsi Gupta, Tahsin Kurc, Luisa Escobar-Hoyos, Dimitris Samaras, Kenneth Shroyer, Joel Saltz, Chao Chen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Multiplex Immunohistochemistry (mIHC) is a cost-effective and accessible method for in situ labeling of multiple protein biomarkers in a tissue sample. By assigning a different stain to each biomarker, it allows the visualization of different types of cells within the tumor vicinity for downstream analysis. However, to detect different types of stains in a given mIHC image is a challenging problem, especially when the number of stains is high. Previous deep-learning-based methods mostly assume full supervision; yet the annotation can be costly. In this paper, we propose a novel unsupervised stain decomposition method to detect different stains simultaneously. Our method does not require any supervision, except for color samples of different stains. A main technical challenge is that the problem is underdetermined and can have multiple solutions. To conquer this issue, we propose a novel inversion regulation technique, which eliminates most undesirable solutions. On a 7-plexed IHC images dataset, the proposed method achieves high quality stain decomposition results without human annotation.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"227 ","pages":"74-94"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11138139/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141181231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing. 学习型索引的分布依赖次对数查询时间。
Sepanta Zeighami, Cyrus Shahabi

A fundamental problem in data management is to find the elements in an array that match a query. Recently, learned indexes are being extensively used to solve this problem, where they learn a model to predict the location of the items in the array. They are empirically shown to outperform non-learned methods (e.g., B-trees or binary search that answer queries in O(logn) time) by orders of magnitude. However, success of learned indexes has not been theoretically justified. Only existing attempt shows the same query time of O(logn), but with a constant factor improvement in space complexity over non-learned methods, under some assumptions on data distribution. In this paper, we significantly strengthen this result, showing that under mild assumptions on data distribution, and the same space complexity as non-learned methods, learned indexes can answer queries in O(loglogn) expected query time. We also show that allowing for slightly larger but still near-linear space overhead, a learned index can achieve O(1) expected query time. Our results theoretically prove learned indexes are orders of magnitude faster than non-learned methods, theoretically grounding their empirical success.

数据管理中的一个基本问题是在数组中查找与查询匹配的元素。最近,学习的索引被广泛用于解决这个问题,它们学习一个模型来预测数组中项目的位置。经验表明,它们在数量级上优于非学习方法(例如,在O(logn)时间内回答查询的B树或二进制搜索)。然而,学习指数的成功并没有得到理论上的证明。只有现有的尝试显示了相同的查询时间O(logn),但在数据分布的一些假设下,与非学习方法相比,空间复杂性不断提高。在本文中,我们显著地加强了这一结果,表明在对数据分布的温和假设下,以及与非学习方法相同的空间复杂性下,学习索引可以在O(loglogn)预期查询时间内回答查询。我们还表明,考虑到稍大但仍接近线性的空间开销,学习的索引可以实现O(1)的预期查询时间。我们的结果从理论上证明了学习指数比非学习方法快几个数量级,这在理论上奠定了它们的经验成功基础。
{"title":"On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing.","authors":"Sepanta Zeighami, Cyrus Shahabi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A fundamental problem in data management is to find the elements in an array that match a query. Recently, learned indexes are being extensively used to solve this problem, where they learn a model to predict the location of the items in the array. They are empirically shown to outperform non-learned methods (e.g., B-trees or binary search that answer queries in <math><mi>O</mi><mo>(</mo><mi>l</mi><mi>o</mi><mi>g</mi><mspace></mspace><mi>n</mi><mo>)</mo></math> time) by orders of magnitude. However, success of learned indexes has not been theoretically justified. Only existing attempt shows the same query time of <math><mi>O</mi><mo>(</mo><mi>l</mi><mi>o</mi><mi>g</mi><mspace></mspace><mi>n</mi><mo>)</mo></math>, but with a constant factor improvement in space complexity over non-learned methods, under some assumptions on data distribution. In this paper, we significantly strengthen this result, showing that under mild assumptions on data distribution, and the same space complexity as non-learned methods, learned indexes can answer queries in <math><mi>O</mi><mo>(</mo><mi>l</mi><mi>o</mi><mi>g</mi><mi>l</mi><mi>o</mi><mi>g</mi><mspace></mspace><mi>n</mi><mo>)</mo></math> expected query time. We also show that allowing for slightly larger but still near-linear space overhead, a learned index can achieve <math><mi>O</mi><mo>(</mo><mn>1</mn><mo>)</mo></math> expected query time. Our results theoretically prove learned indexes are orders of magnitude faster than non-learned methods, theoretically grounding their empirical success.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"40669-40680"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627073/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71489774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating Causal Effects using a Multi-task Deep Ensemble. 使用多任务深度集合估计因果效应。
Ziyang Jiang, Zhuoran Hou, Yiling Liu, Yiman Ren, Keyu Li, David Carlson

A number of methods have been proposed for causal effect estimation, yet few have demonstrated efficacy in handling data with complex structures, such as images. To fill this gap, we propose Causal Multi-task Deep Ensemble (CMDE), a novel framework that learns both shared and group-specific information from the study population. We provide proofs demonstrating equivalency of CDME to a multi-task Gaussian process (GP) with a coregionalization kernel a priori. Compared to multi-task GP, CMDE efficiently handles high-dimensional and multi-modal covariates and provides pointwise uncertainty estimates of causal effects. We evaluate our method across various types of datasets and tasks and find that CMDE outperforms state-of-the-art methods on a majority of these tasks.

针对因果效应估计提出了很多方法,但很少有方法能有效处理图像等结构复杂的数据。为了填补这一空白,我们提出了因果多任务深度集合(CMDE),这是一种新颖的框架,可以从研究人群中学习共享信息和特定群体信息。我们提供了证明,证明 CDME 等同于带有先验核心区域化内核的多任务高斯过程(GP)。与多任务 GP 相比,CMDE 能有效处理高维和多模态协变量,并提供因果效应的点式不确定性估计。我们在各种类型的数据集和任务中对我们的方法进行了评估,发现 CMDE 在大多数任务中的表现都优于最先进的方法。
{"title":"Estimating Causal Effects using a Multi-task Deep Ensemble.","authors":"Ziyang Jiang, Zhuoran Hou, Yiling Liu, Yiman Ren, Keyu Li, David Carlson","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A number of methods have been proposed for causal effect estimation, yet few have demonstrated efficacy in handling data with complex structures, such as images. To fill this gap, we propose Causal Multi-task Deep Ensemble (CMDE), a novel framework that learns both shared and group-specific information from the study population. We provide proofs demonstrating equivalency of CDME to a multi-task Gaussian process (GP) with a coregionalization kernel <i>a priori</i>. Compared to multi-task GP, CMDE efficiently handles high-dimensional and multi-modal covariates and provides pointwise uncertainty estimates of causal effects. We evaluate our method across various types of datasets and tasks and find that CMDE outperforms state-of-the-art methods on a majority of these tasks.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"15023-15040"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10759931/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139089657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Actor-Critic Alignment for Offline-to-Online Reinforcement Learning. 离线到在线强化学习的演员-评论家对齐。
Zishun Yu, Xinhua Zhang

Deep offline reinforcement learning has recently demonstrated considerable promises in leveraging offline datasets, providing high-quality models that significantly reduce the online interactions required for fine-tuning. However, such a benefit is often diminished due to the marked state-action distribution shift, which causes significant bootstrap error and wipes out the good initial policy Existing solutions resort to constraining the policy shift or balancing the sample replay based on their online-ness. However, they require online estimation of distribution divergence or density ratio. To avoid such complications, we propose deviating from existing actor-critic approaches that directly transfer the state-action value functions. Instead, we post-process them by aligning with the offline learned policy, so that the Q -values for actions outside the offline policy are also tamed. As a result, the online fine-tuning can be simply performed as in the standard actor-critic algorithms. We show empirically that the proposed method improves the performance of the fine-tuned robotic agents on various simulated tasks.

深度离线强化学习最近在利用离线数据集方面展现出了巨大的前景,它提供了高质量的模型,大大减少了微调所需的在线交互。然而,这种优势往往会因为明显的状态-行动分布偏移而被削弱,因为这种偏移会导致显著的引导误差,并抹去良好的初始策略。但是,它们需要在线估计分布发散或密度比。为了避免这种复杂性,我们提出偏离现有的直接转移状态-行动值函数的行动者批评方法。取而代之的是,我们通过与离线学习的策略保持一致来对其进行后处理,这样离线策略之外的行动 Q 值也会被驯服。因此,在线微调可以像标准演员批评算法一样简单地执行。我们通过实验证明,所提出的方法提高了经过微调的机器人代理在各种模拟任务中的性能。
{"title":"Actor-Critic Alignment for Offline-to-Online Reinforcement Learning.","authors":"Zishun Yu, Xinhua Zhang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Deep offline reinforcement learning has recently demonstrated considerable promises in leveraging offline datasets, providing high-quality models that significantly reduce the online interactions required for fine-tuning. However, such a benefit is often diminished due to the marked state-action distribution shift, which causes significant bootstrap error and wipes out the good initial policy Existing solutions resort to constraining the policy shift or balancing the sample replay based on their online-ness. However, they require online estimation of distribution divergence or density ratio. To avoid such complications, we propose deviating from existing actor-critic approaches that directly transfer the state-action value functions. Instead, we post-process them by aligning with the offline learned policy, so that the <math><mi>Q</mi></math> -values for actions outside the offline policy are also tamed. As a result, the online fine-tuning can be simply performed as in the standard actor-critic algorithms. We show empirically that the proposed method improves the performance of the fine-tuned robotic agents on various simulated tasks.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"40452-40474"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11232493/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of machine learning research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1