Mehdi Azabou, Venkataramana Ganesh, Shantanu Thakoor, Chi-Heng Lin, Lakshmi Sathidevi, Ran Liu, Michal Valko, Petar Veličković, Eva L Dyer
Message passing neural networks have shown a lot of success on graph-structured data. However, there are many instances where message passing can lead to over-smoothing or fail when neighboring nodes belong to different classes. In this work, we introduce a simple yet general framework for improving learning in message passing neural networks. Our approach essentially upsamples edges in the original graph by adding "slow nodes" at each edge that can mediate communication between a source and a target node. Our method only modifies the input graph, making it plug-and-play and easy to use with existing models. To understand the benefits of slowing down message passing, we provide theoretical and empirical analyses. We report results on several supervised and self-supervised benchmarks, and show improvements across the board, notably in heterophilic conditions where adjacent nodes are more likely to have different labels. Finally, we show how our approach can be used to generate augmentations for self-supervised learning, where slow nodes are randomly introduced into different edges in the graph to generate multi-scale views with variable path lengths.
{"title":"Half-Hop: A graph upsampling approach for slowing down message passing.","authors":"Mehdi Azabou, Venkataramana Ganesh, Shantanu Thakoor, Chi-Heng Lin, Lakshmi Sathidevi, Ran Liu, Michal Valko, Petar Veličković, Eva L Dyer","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Message passing neural networks have shown a lot of success on graph-structured data. However, there are many instances where message passing can lead to over-smoothing or fail when neighboring nodes belong to different classes. In this work, we introduce a simple yet general framework for improving learning in message passing neural networks. Our approach essentially upsamples edges in the original graph by adding \"slow nodes\" at each edge that can mediate communication between a source and a target node. Our method only modifies the input graph, making it plug-and-play and easy to use with existing models. To understand the benefits of slowing down message passing, we provide theoretical and empirical analyses. We report results on several supervised and self-supervised benchmarks, and show improvements across the board, notably in heterophilic conditions where adjacent nodes are more likely to have different labels. Finally, we show how our approach can be used to generate augmentations for self-supervised learning, where slow nodes are randomly introduced into different edges in the graph to generate multi-scale views with variable path lengths.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"1341-1360"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10559225/pdf/nihms-1931959.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41184447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sourav Pal, Zhanpeng Zeng, Sathya N Ravi, Vikas Singh
Neural Controlled Differential equations (NCDE) are a powerful mechanism to model the dynamics in temporal sequences, e.g., applications involving physiological measures, where apart from the initial condition, the dynamics also depend on subsequent measures or even a different "control" sequence. But NCDEs do not scale well to longer sequences. Existing strategies adapt rough path theory, and instead model the dynamics over summaries known as log signatures. While rigorous and elegant, invertibility of these summaries is difficult, and limits the scope of problems where these ideas can offer strong benefits (reconstruction, generative modeling). For tasks where it is sensible to assume that the (long) sequences in the training data are a fixed length of temporal measurements - this assumption holds in most experiments tackled in the literature - we describe an efficient simplification. First, we recast the regression/classification task as an integral transform. We then show how restricting the class of operators (permissible in the integral transform), allows the use of a known algorithm that leverages non-standard Wavelets to decompose the operator. Thereby, our task (learning the operator) radically simplifies. A neural variant of this idea yields consistent improvements across a wide gamut of use cases tackled in existing works. We also describe a novel application on modeling tasks involving coupled differential equations.
{"title":"Controlled Differential Equations on Long Sequences via Non-standard Wavelets.","authors":"Sourav Pal, Zhanpeng Zeng, Sathya N Ravi, Vikas Singh","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Neural Controlled Differential equations (NCDE) are a powerful mechanism to model the dynamics in temporal sequences, e.g., applications involving physiological measures, where apart from the initial condition, the dynamics also depend on subsequent measures or even a different \"control\" sequence. But NCDEs do not scale well to longer sequences. Existing strategies adapt rough path theory, and instead model the dynamics over summaries known as <i>log signatures</i>. While rigorous and elegant, invertibility of these summaries is difficult, and limits the scope of problems where these ideas can offer strong benefits (reconstruction, generative modeling). For tasks where it is sensible to assume that the (long) sequences in the training data are a <i>fixed</i> length of temporal measurements - this assumption holds in most experiments tackled in the literature - we describe an efficient simplification. First, we recast the regression/classification task as an integral transform. We then show how restricting the class of operators (permissible in the integral transform), allows the use of a known algorithm that leverages non-standard Wavelets to decompose the operator. Thereby, our task (learning the operator) radically simplifies. A neural variant of this idea yields consistent improvements across a wide gamut of use cases tackled in existing works. We also describe a novel application on modeling tasks involving coupled differential equations.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"26820-26836"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11178150/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141332696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p><p>We consider the randomized communication complexity of the distributed <math> <mrow><msub><mi>ℓ</mi> <mi>p</mi></msub> </mrow> </math> -regression problem in the coordinator model, for <math><mrow><mi>p</mi> <mo>∈</mo> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mn>2</mn> <mo>]</mo></mrow> </math> . In this problem, there is a coordinator and <math><mi>s</mi></math> servers. The <math><mi>i</mi></math> -th server receives <math> <mrow><msup><mi>A</mi> <mi>i</mi></msup> <mo>∈</mo> <msup><mrow><mo>{</mo> <mo>-</mo> <mi>M</mi> <mo>,</mo> <mo>-</mo> <mi>M</mi> <mo>+</mo> <mn>1</mn> <mo>,</mo> <mo>…</mo> <mo>,</mo> <mi>M</mi> <mo>}</mo></mrow> <mrow><mi>n</mi> <mo>×</mo> <mi>d</mi></mrow> </msup> </mrow> </math> and <math> <mrow><msup><mi>b</mi> <mi>i</mi></msup> <mo>∈</mo> <msup><mrow><mo>{</mo> <mo>-</mo> <mi>M</mi> <mo>,</mo> <mo>-</mo> <mi>M</mi> <mo>+</mo> <mn>1</mn> <mo>,</mo> <mo>…</mo> <mo>,</mo> <mi>M</mi> <mo>}</mo></mrow> <mi>n</mi></msup> </mrow> </math> and the coordinator would like to find a <math><mrow><mo>(</mo> <mn>1</mn> <mo>+</mo> <mi>ε</mi> <mo>)</mo></mrow> </math> -approximate solution to <math> <mrow> <msub> <mrow><msub><mtext>min</mtext> <mrow><mi>x</mi> <mo>∈</mo> <msup><mtext>R</mtext> <mi>n</mi></msup> </mrow> </msub> <mrow><mo>‖</mo> <mrow> <mrow> <mrow><mrow><mo>(</mo> <mrow><msub><mo>∑</mo> <mi>i</mi></msub> <msup><mi>A</mi> <mi>i</mi></msup> </mrow> <mo>)</mo></mrow> <mi>x</mi> <mo>-</mo> <mrow><mo>(</mo> <mrow><munder><mo>∑</mo> <mi>i</mi></munder> <msup><mi>b</mi> <mi>i</mi></msup> </mrow> <mo>)</mo></mrow> </mrow> <mo>‖</mo></mrow> </mrow> </mrow> </mrow> <mi>p</mi></msub> </mrow> </math> . Here <math><mrow><mi>M</mi> <mo>≤</mo></mrow> </math> poly(nd) for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For <math><mrow><mi>p</mi> <mo>=</mo> <mn>2</mn></mrow> </math> , i.e., least squares regression, we give the first optimal bound of <math> <mrow><mover><mtext>Θ</mtext> <mo>˜</mo></mover> <mrow><mo>(</mo> <mrow><mi>s</mi> <msup><mi>d</mi> <mn>2</mn></msup> <mo>+</mo> <mi>s</mi> <mi>d</mi> <mo>/</mo> <mi>ϵ</mi></mrow> <mo>)</mo></mrow> </mrow> </math> ) bits. For <math><mrow><mi>p</mi> <mo>∈</mo> <mo>(</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>)</mo></mrow> </math> , we obtain an <math> <mrow><mover><mi>O</mi> <mo>˜</mo></mover> <mrow><mo>(</mo> <mrow><mi>s</mi> <msup><mi>d</mi> <mn>2</mn></msup> <mo>/</mo> <mi>ε</mi> <mo>+</mo> <mi>s</mi> <mi>d</mi> <mo>/</mo> <mtext>poly</mtext> <mo>(</mo> <mi>ε</mi> <mo>)</mo></mrow> <mo>)</mo></mrow> </mrow> </math> upper bound. Notably, for <math><mi>d</mi></math> sufficiently large, our leading order term only depends linearly on <math><mrow><mn>1</mn> <mo>/</mo> <mi>ϵ</mi></mrow> </math> rather than quadratically. We also show communication lower bounds of <math><mrow><mtext>Ω</mtext> <mrow><mo>(</mo> <mrow><mi>s</mi> <msup><mi>d</mi> <mn>
我们考虑的是协调器模型中分布式 ℓ p - 回归问题的随机通信复杂度,条件是 p∈ ( 0 , 2 ]。在这个问题中,有一个协调器和 s 个服务器。第 i 个服务器接收 A i∈ { - M , - M + 1 , ... , M } n × d 和 b i∈ { - M , - M + 1 , ... , M } n,协调者希望找到一个 ( 1 + ε ) 近似解,即 min x∈ R n ‖ ( ∑ i A i ) x - ( ∑ i b i ) ‖ p 。为方便起见,此处 M≤ poly(nd)。这种数据在不同服务器之间共享的模型通常被称为任意分区模型。我们在这个问题上得到了明显改善的边界。对于 p = 2,即最小二乘回归,我们首次给出了 Θ ˜ ( s d 2 + s d / ϵ ) 位的最优边界。对于 p∈ ( 1 , 2 ) ,我们得到 O ˜ ( s d 2 / ε + s d / poly ( ε ) ) 上限。值得注意的是,对于足够大的 d,我们的前导项仅线性地依赖于 1 / ϵ,而不是二次。我们还展示了 p∈ ( 0 , 1 ] 时的Ω ( s d 2 + s d / ε 2 ) 和 p∈ ( 1 , 2 ] 时的Ω ( s d 2 + s d / ε ) 的通信下界。我们的边界大大改进了之前的边界(Woodruff 等,COLT,2013 年)和(Vempala 等,SODA,2020 年)。
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\"><ns0:math> <ns0:mrow><ns0:msub><ns0:mi>ℓ</ns0:mi> <ns0:mi>p</ns0:mi></ns0:msub> </ns0:mrow> </ns0:math> -Regression in the Arbitrary Partition Model of Communication.","authors":"Yi Li, Honghao Lin, David P Woodruff","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We consider the randomized communication complexity of the distributed <math> <mrow><msub><mi>ℓ</mi> <mi>p</mi></msub> </mrow> </math> -regression problem in the coordinator model, for <math><mrow><mi>p</mi> <mo>∈</mo> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mn>2</mn> <mo>]</mo></mrow> </math> . In this problem, there is a coordinator and <math><mi>s</mi></math> servers. The <math><mi>i</mi></math> -th server receives <math> <mrow><msup><mi>A</mi> <mi>i</mi></msup> <mo>∈</mo> <msup><mrow><mo>{</mo> <mo>-</mo> <mi>M</mi> <mo>,</mo> <mo>-</mo> <mi>M</mi> <mo>+</mo> <mn>1</mn> <mo>,</mo> <mo>…</mo> <mo>,</mo> <mi>M</mi> <mo>}</mo></mrow> <mrow><mi>n</mi> <mo>×</mo> <mi>d</mi></mrow> </msup> </mrow> </math> and <math> <mrow><msup><mi>b</mi> <mi>i</mi></msup> <mo>∈</mo> <msup><mrow><mo>{</mo> <mo>-</mo> <mi>M</mi> <mo>,</mo> <mo>-</mo> <mi>M</mi> <mo>+</mo> <mn>1</mn> <mo>,</mo> <mo>…</mo> <mo>,</mo> <mi>M</mi> <mo>}</mo></mrow> <mi>n</mi></msup> </mrow> </math> and the coordinator would like to find a <math><mrow><mo>(</mo> <mn>1</mn> <mo>+</mo> <mi>ε</mi> <mo>)</mo></mrow> </math> -approximate solution to <math> <mrow> <msub> <mrow><msub><mtext>min</mtext> <mrow><mi>x</mi> <mo>∈</mo> <msup><mtext>R</mtext> <mi>n</mi></msup> </mrow> </msub> <mrow><mo>‖</mo> <mrow> <mrow> <mrow><mrow><mo>(</mo> <mrow><msub><mo>∑</mo> <mi>i</mi></msub> <msup><mi>A</mi> <mi>i</mi></msup> </mrow> <mo>)</mo></mrow> <mi>x</mi> <mo>-</mo> <mrow><mo>(</mo> <mrow><munder><mo>∑</mo> <mi>i</mi></munder> <msup><mi>b</mi> <mi>i</mi></msup> </mrow> <mo>)</mo></mrow> </mrow> <mo>‖</mo></mrow> </mrow> </mrow> </mrow> <mi>p</mi></msub> </mrow> </math> . Here <math><mrow><mi>M</mi> <mo>≤</mo></mrow> </math> poly(nd) for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For <math><mrow><mi>p</mi> <mo>=</mo> <mn>2</mn></mrow> </math> , i.e., least squares regression, we give the first optimal bound of <math> <mrow><mover><mtext>Θ</mtext> <mo>˜</mo></mover> <mrow><mo>(</mo> <mrow><mi>s</mi> <msup><mi>d</mi> <mn>2</mn></msup> <mo>+</mo> <mi>s</mi> <mi>d</mi> <mo>/</mo> <mi>ϵ</mi></mrow> <mo>)</mo></mrow> </mrow> </math> ) bits. For <math><mrow><mi>p</mi> <mo>∈</mo> <mo>(</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>)</mo></mrow> </math> , we obtain an <math> <mrow><mover><mi>O</mi> <mo>˜</mo></mover> <mrow><mo>(</mo> <mrow><mi>s</mi> <msup><mi>d</mi> <mn>2</mn></msup> <mo>/</mo> <mi>ε</mi> <mo>+</mo> <mi>s</mi> <mi>d</mi> <mo>/</mo> <mtext>poly</mtext> <mo>(</mo> <mi>ε</mi> <mo>)</mo></mrow> <mo>)</mo></mrow> </mrow> </math> upper bound. Notably, for <math><mi>d</mi></math> sufficiently large, our leading order term only depends linearly on <math><mrow><mn>1</mn> <mo>/</mo> <mi>ϵ</mi></mrow> </math> rather than quadratically. We also show communication lower bounds of <math><mrow><mtext>Ω</mtext> <mrow><mo>(</mo> <mrow><mi>s</mi> <msup><mi>d</mi> <mn>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"195 ","pages":"4902-4928"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11646800/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142839750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah Rathnam, Sonali Parbhoo, Weiwei Pan, Susan A Murphy, Finale Doshi-Velez
Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to restrict planning to a less complex set of policies when estimating an MDP from sparse or noisy data (Jiang et al., 2015). It is commonly understood that discount regularization functions by de-emphasizing or ignoring delayed effects. In this paper, we reveal an alternate view of discount regularization that exposes unintended consequences. We demonstrate that planning under a lower discount factor produces an identical optimal policy to planning using any prior on the transition matrix that has the same distribution for all states and actions. In fact, it functions like a prior with stronger regularization on state-action pairs with more transition data. This leads to poor performance when the transition matrix is estimated from data sets with uneven amounts of data across state-action pairs. Our equivalence theorem leads to an explicit formula to set regularization parameters locally for individual state-action pairs rather than globally. We demonstrate the failures of discount regularization and how we remedy them using our state-action-specific method across simple empirical examples as well as a medical cancer simulator.
{"title":"The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning.","authors":"Sarah Rathnam, Sonali Parbhoo, Weiwei Pan, Susan A Murphy, Finale Doshi-Velez","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to restrict planning to a less complex set of policies when estimating an MDP from sparse or noisy data (Jiang et al., 2015). It is commonly understood that discount regularization functions by de-emphasizing or ignoring delayed effects. In this paper, we reveal an alternate view of discount regularization that exposes unintended consequences. We demonstrate that planning under a lower discount factor produces an identical optimal policy to planning using any prior on the transition matrix that has the same distribution for all states and actions. In fact, it functions like a prior with stronger regularization on state-action pairs with more transition data. This leads to poor performance when the transition matrix is estimated from data sets with uneven amounts of data across state-action pairs. Our equivalence theorem leads to an explicit formula to set regularization parameters locally for individual state-action pairs rather than globally. We demonstrate the failures of discount regularization and how we remedy them using our state-action-specific method across simple empirical examples as well as a medical cancer simulator.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"28746-28767"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10472113/pdf/nihms-1926341.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10151971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study streaming algorithms in the white-box adversarial stream model, where the internal state of the streaming algorithm is revealed to an adversary who adaptively generates the stream updates, but the algorithm obtains fresh randomness unknown to the adversary at each time step. We incorporate cryptographic assumptions to construct robust algorithms against such adversaries. We propose efficient algorithms for sparse recovery of vectors, low rank recovery of matrices and tensors, as well as low rank plus sparse recovery of matrices, i.e., robust PCA. Unlike deterministic algorithms, our algorithms can report when the input is not sparse or low rank even in the presence of such an adversary. We use these recovery algorithms to improve upon and solve new problems in numerical linear algebra and combinatorial optimization on white-box adversarial streams. For example, we give the first efficient algorithm for outputting a matching in a graph with insertions and deletions to its edges provided the matching size is small, and otherwise we declare the matching size is large. We also improve the approximation versus memory tradeoff of previous work for estimating the number of non-zero elements in a vector and computing the matrix rank.
{"title":"Improved Algorithms for White-Box Adversarial Streams.","authors":"Ying Feng, David P Woodruff","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We study streaming algorithms in the white-box adversarial stream model, where the internal state of the streaming algorithm is revealed to an adversary who adaptively generates the stream updates, but the algorithm obtains fresh randomness unknown to the adversary at each time step. We incorporate cryptographic assumptions to construct robust algorithms against such adversaries. We propose efficient algorithms for sparse recovery of vectors, low rank recovery of matrices and tensors, as well as low rank plus sparse recovery of matrices, i.e., robust PCA. Unlike deterministic algorithms, our algorithms can report when the input is not sparse or low rank even in the presence of such an adversary. We use these recovery algorithms to improve upon and solve new problems in numerical linear algebra and combinatorial optimization on white-box adversarial streams. For example, we give the first efficient algorithm for outputting a matching in a graph with insertions and deletions to its edges provided the matching size is small, and otherwise we declare the matching size is large. We also improve the approximation versus memory tradeoff of previous work for estimating the number of non-zero elements in a vector and computing the matrix rank.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"9962-9975"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11576266/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lars van der Laan, Ernesto Ulloa-Pérez, Marco Carone, Alex Luedtke
We propose causal isotonic calibration, a novel nonparametric method for calibrating predictors of heterogeneous treatment effects. In addition, we introduce a novel data-efficient variant of calibration that avoids the need for hold-out calibration sets, which we refer to as cross-calibration. Causal isotonic cross-calibration takes cross-fitted predictors and outputs a single calibrated predictor obtained using all available data. We establish under weak conditions that causal isotonic calibration and cross-calibration both achieve fast doubly-robust calibration rates so long as either the propensity score or outcome regression is estimated well in an appropriate sense. The proposed causal isotonic calibrator can be wrapped around any black-box learning algorithm to provide strong distribution-free calibration guarantees while preserving predictive performance.
{"title":"Causal isotonic calibration for heterogeneous treatment effects.","authors":"Lars van der Laan, Ernesto Ulloa-Pérez, Marco Carone, Alex Luedtke","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We propose causal isotonic calibration, a novel nonparametric method for calibrating predictors of heterogeneous treatment effects. In addition, we introduce a novel data-efficient variant of calibration that avoids the need for hold-out calibration sets, which we refer to as cross-calibration. Causal isotonic cross-calibration takes cross-fitted predictors and outputs a single calibrated predictor obtained using all available data. We establish under weak conditions that causal isotonic calibration and cross-calibration both achieve fast doubly-robust calibration rates so long as either the propensity score or outcome regression is estimated well in an appropriate sense. The proposed causal isotonic calibrator can be wrapped around any black-box learning algorithm to provide strong distribution-free calibration guarantees while preserving predictive performance.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"34831-34854"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10416780/pdf/nihms-1900331.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9996727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multiplex Immunohistochemistry (mIHC) is a cost-effective and accessible method for in situ labeling of multiple protein biomarkers in a tissue sample. By assigning a different stain to each biomarker, it allows the visualization of different types of cells within the tumor vicinity for downstream analysis. However, to detect different types of stains in a given mIHC image is a challenging problem, especially when the number of stains is high. Previous deep-learning-based methods mostly assume full supervision; yet the annotation can be costly. In this paper, we propose a novel unsupervised stain decomposition method to detect different stains simultaneously. Our method does not require any supervision, except for color samples of different stains. A main technical challenge is that the problem is underdetermined and can have multiple solutions. To conquer this issue, we propose a novel inversion regulation technique, which eliminates most undesirable solutions. On a 7-plexed IHC images dataset, the proposed method achieves high quality stain decomposition results without human annotation.
{"title":"Unsupervised Stain Decomposition via Inversion Regulation for Multiplex Immunohistochemistry Images.","authors":"Shahira Abousamra, Danielle Fassler, Jiachen Yao, Rajarsi Gupta, Tahsin Kurc, Luisa Escobar-Hoyos, Dimitris Samaras, Kenneth Shroyer, Joel Saltz, Chao Chen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Multiplex Immunohistochemistry (mIHC) is a cost-effective and accessible method for in situ labeling of multiple protein biomarkers in a tissue sample. By assigning a different stain to each biomarker, it allows the visualization of different types of cells within the tumor vicinity for downstream analysis. However, to detect different types of stains in a given mIHC image is a challenging problem, especially when the number of stains is high. Previous deep-learning-based methods mostly assume full supervision; yet the annotation can be costly. In this paper, we propose a novel unsupervised stain decomposition method to detect different stains simultaneously. Our method does not require any supervision, except for color samples of different stains. A main technical challenge is that the problem is underdetermined and can have multiple solutions. To conquer this issue, we propose a novel inversion regulation technique, which eliminates most undesirable solutions. On a 7-plexed IHC images dataset, the proposed method achieves high quality stain decomposition results without human annotation.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"227 ","pages":"74-94"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11138139/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141181231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A fundamental problem in data management is to find the elements in an array that match a query. Recently, learned indexes are being extensively used to solve this problem, where they learn a model to predict the location of the items in the array. They are empirically shown to outperform non-learned methods (e.g., B-trees or binary search that answer queries in time) by orders of magnitude. However, success of learned indexes has not been theoretically justified. Only existing attempt shows the same query time of , but with a constant factor improvement in space complexity over non-learned methods, under some assumptions on data distribution. In this paper, we significantly strengthen this result, showing that under mild assumptions on data distribution, and the same space complexity as non-learned methods, learned indexes can answer queries in expected query time. We also show that allowing for slightly larger but still near-linear space overhead, a learned index can achieve expected query time. Our results theoretically prove learned indexes are orders of magnitude faster than non-learned methods, theoretically grounding their empirical success.
{"title":"On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing.","authors":"Sepanta Zeighami, Cyrus Shahabi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A fundamental problem in data management is to find the elements in an array that match a query. Recently, learned indexes are being extensively used to solve this problem, where they learn a model to predict the location of the items in the array. They are empirically shown to outperform non-learned methods (e.g., B-trees or binary search that answer queries in <math><mi>O</mi><mo>(</mo><mi>l</mi><mi>o</mi><mi>g</mi><mspace></mspace><mi>n</mi><mo>)</mo></math> time) by orders of magnitude. However, success of learned indexes has not been theoretically justified. Only existing attempt shows the same query time of <math><mi>O</mi><mo>(</mo><mi>l</mi><mi>o</mi><mi>g</mi><mspace></mspace><mi>n</mi><mo>)</mo></math>, but with a constant factor improvement in space complexity over non-learned methods, under some assumptions on data distribution. In this paper, we significantly strengthen this result, showing that under mild assumptions on data distribution, and the same space complexity as non-learned methods, learned indexes can answer queries in <math><mi>O</mi><mo>(</mo><mi>l</mi><mi>o</mi><mi>g</mi><mi>l</mi><mi>o</mi><mi>g</mi><mspace></mspace><mi>n</mi><mo>)</mo></math> expected query time. We also show that allowing for slightly larger but still near-linear space overhead, a learned index can achieve <math><mi>O</mi><mo>(</mo><mn>1</mn><mo>)</mo></math> expected query time. Our results theoretically prove learned indexes are orders of magnitude faster than non-learned methods, theoretically grounding their empirical success.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"40669-40680"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627073/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71489774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A number of methods have been proposed for causal effect estimation, yet few have demonstrated efficacy in handling data with complex structures, such as images. To fill this gap, we propose Causal Multi-task Deep Ensemble (CMDE), a novel framework that learns both shared and group-specific information from the study population. We provide proofs demonstrating equivalency of CDME to a multi-task Gaussian process (GP) with a coregionalization kernel a priori. Compared to multi-task GP, CMDE efficiently handles high-dimensional and multi-modal covariates and provides pointwise uncertainty estimates of causal effects. We evaluate our method across various types of datasets and tasks and find that CMDE outperforms state-of-the-art methods on a majority of these tasks.
针对因果效应估计提出了很多方法,但很少有方法能有效处理图像等结构复杂的数据。为了填补这一空白,我们提出了因果多任务深度集合(CMDE),这是一种新颖的框架,可以从研究人群中学习共享信息和特定群体信息。我们提供了证明,证明 CDME 等同于带有先验核心区域化内核的多任务高斯过程(GP)。与多任务 GP 相比,CMDE 能有效处理高维和多模态协变量,并提供因果效应的点式不确定性估计。我们在各种类型的数据集和任务中对我们的方法进行了评估,发现 CMDE 在大多数任务中的表现都优于最先进的方法。
{"title":"Estimating Causal Effects using a Multi-task Deep Ensemble.","authors":"Ziyang Jiang, Zhuoran Hou, Yiling Liu, Yiman Ren, Keyu Li, David Carlson","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A number of methods have been proposed for causal effect estimation, yet few have demonstrated efficacy in handling data with complex structures, such as images. To fill this gap, we propose Causal Multi-task Deep Ensemble (CMDE), a novel framework that learns both shared and group-specific information from the study population. We provide proofs demonstrating equivalency of CDME to a multi-task Gaussian process (GP) with a coregionalization kernel <i>a priori</i>. Compared to multi-task GP, CMDE efficiently handles high-dimensional and multi-modal covariates and provides pointwise uncertainty estimates of causal effects. We evaluate our method across various types of datasets and tasks and find that CMDE outperforms state-of-the-art methods on a majority of these tasks.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"15023-15040"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10759931/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139089657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep offline reinforcement learning has recently demonstrated considerable promises in leveraging offline datasets, providing high-quality models that significantly reduce the online interactions required for fine-tuning. However, such a benefit is often diminished due to the marked state-action distribution shift, which causes significant bootstrap error and wipes out the good initial policy Existing solutions resort to constraining the policy shift or balancing the sample replay based on their online-ness. However, they require online estimation of distribution divergence or density ratio. To avoid such complications, we propose deviating from existing actor-critic approaches that directly transfer the state-action value functions. Instead, we post-process them by aligning with the offline learned policy, so that the -values for actions outside the offline policy are also tamed. As a result, the online fine-tuning can be simply performed as in the standard actor-critic algorithms. We show empirically that the proposed method improves the performance of the fine-tuned robotic agents on various simulated tasks.
{"title":"Actor-Critic Alignment for Offline-to-Online Reinforcement Learning.","authors":"Zishun Yu, Xinhua Zhang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Deep offline reinforcement learning has recently demonstrated considerable promises in leveraging offline datasets, providing high-quality models that significantly reduce the online interactions required for fine-tuning. However, such a benefit is often diminished due to the marked state-action distribution shift, which causes significant bootstrap error and wipes out the good initial policy Existing solutions resort to constraining the policy shift or balancing the sample replay based on their online-ness. However, they require online estimation of distribution divergence or density ratio. To avoid such complications, we propose deviating from existing actor-critic approaches that directly transfer the state-action value functions. Instead, we post-process them by aligning with the offline learned policy, so that the <math><mi>Q</mi></math> -values for actions outside the offline policy are also tamed. As a result, the online fine-tuning can be simply performed as in the standard actor-critic algorithms. We show empirically that the proposed method improves the performance of the fine-tuned robotic agents on various simulated tasks.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"202 ","pages":"40452-40474"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11232493/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}