This paper introduces a novel approach to personalised federated learning within the $mathcal{X}$-armed bandit framework, addressing the challenge of optimising both local and global objectives in a highly heterogeneous environment. Our method employs a surrogate objective function that combines individual client preferences with aggregated global knowledge, allowing for a flexible trade-off between personalisation and collective learning. We propose a phase-based elimination algorithm that achieves sublinear regret with logarithmic communication overhead, making it well-suited for federated settings. Theoretical analysis and empirical evaluations demonstrate the effectiveness of our approach compared to existing methods. Potential applications of this work span various domains, including healthcare, smart home devices, and e-commerce, where balancing personalisation with global insights is crucial.
{"title":"Federated $mathcal{X}$-armed Bandit with Flexible Personalisation","authors":"Ali Arabzadeh, James A. Grant, David S. Leslie","doi":"arxiv-2409.07251","DOIUrl":"https://doi.org/arxiv-2409.07251","url":null,"abstract":"This paper introduces a novel approach to personalised federated learning\u0000within the $mathcal{X}$-armed bandit framework, addressing the challenge of\u0000optimising both local and global objectives in a highly heterogeneous\u0000environment. Our method employs a surrogate objective function that combines\u0000individual client preferences with aggregated global knowledge, allowing for a\u0000flexible trade-off between personalisation and collective learning. We propose\u0000a phase-based elimination algorithm that achieves sublinear regret with\u0000logarithmic communication overhead, making it well-suited for federated\u0000settings. Theoretical analysis and empirical evaluations demonstrate the\u0000effectiveness of our approach compared to existing methods. Potential\u0000applications of this work span various domains, including healthcare, smart\u0000home devices, and e-commerce, where balancing personalisation with global\u0000insights is crucial.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"71 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jake Fawkes, Lucile Ter-Minassian, Desi Ivanova, Uri Shalit, Chris Holmes
Merging datasets across institutions is a lengthy and costly procedure, especially when it involves private information. Data hosts may therefore want to prospectively gauge which datasets are most beneficial to merge with, without revealing sensitive information. For causal estimation this is particularly challenging as the value of a merge will depend not only on the reduction in epistemic uncertainty but also the improvement in overlap. To address this challenge, we introduce the first cryptographically secure information-theoretic approach for quantifying the value of a merge in the context of heterogeneous treatment effect estimation. We do this by evaluating the Expected Information Gain (EIG) and utilising multi-party computation to ensure it can be securely computed without revealing any raw data. As we demonstrate, this can be used with differential privacy (DP) to ensure privacy requirements whilst preserving more accurate computation than naive DP alone. To the best of our knowledge, this work presents the first privacy-preserving method for dataset acquisition tailored to causal estimation. We demonstrate the effectiveness and reliability of our method on a range of simulated and realistic benchmarks. The code is available anonymously.
{"title":"Is merging worth it? Securely evaluating the information gain for causal dataset acquisition","authors":"Jake Fawkes, Lucile Ter-Minassian, Desi Ivanova, Uri Shalit, Chris Holmes","doi":"arxiv-2409.07215","DOIUrl":"https://doi.org/arxiv-2409.07215","url":null,"abstract":"Merging datasets across institutions is a lengthy and costly procedure,\u0000especially when it involves private information. Data hosts may therefore want\u0000to prospectively gauge which datasets are most beneficial to merge with,\u0000without revealing sensitive information. For causal estimation this is\u0000particularly challenging as the value of a merge will depend not only on the\u0000reduction in epistemic uncertainty but also the improvement in overlap. To\u0000address this challenge, we introduce the first cryptographically secure\u0000information-theoretic approach for quantifying the value of a merge in the\u0000context of heterogeneous treatment effect estimation. We do this by evaluating\u0000the Expected Information Gain (EIG) and utilising multi-party computation to\u0000ensure it can be securely computed without revealing any raw data. As we\u0000demonstrate, this can be used with differential privacy (DP) to ensure privacy\u0000requirements whilst preserving more accurate computation than naive DP alone.\u0000To the best of our knowledge, this work presents the first privacy-preserving\u0000method for dataset acquisition tailored to causal estimation. We demonstrate\u0000the effectiveness and reliability of our method on a range of simulated and\u0000realistic benchmarks. The code is available anonymously.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanyu Zhang, Reza Zandehshahvar, Mathieu Tanneau, Pascal Van Hentenryck
The integration of renewable energy sources (RES) into power grids presents significant challenges due to their intrinsic stochasticity and uncertainty, necessitating the development of new techniques for reliable and efficient forecasting. This paper proposes a method combining probabilistic forecasting and Gaussian copula for day-ahead prediction and scenario generation of load, wind, and solar power in high-dimensional contexts. By incorporating weather covariates and restoring spatio-temporal correlations, the proposed method enhances the reliability of probabilistic forecasts in RES. Extensive numerical experiments compare the effectiveness of different time series models, with performance evaluated using comprehensive metrics on a real-world and high-dimensional dataset from Midcontinent Independent System Operator (MISO). The results highlight the importance of weather information and demonstrate the efficacy of the Gaussian copula in generating realistic scenarios, with the proposed weather-informed Temporal Fusion Transformer (WI-TFT) model showing superior performance.
{"title":"Weather-Informed Probabilistic Forecasting and Scenario Generation in Power Systems","authors":"Hanyu Zhang, Reza Zandehshahvar, Mathieu Tanneau, Pascal Van Hentenryck","doi":"arxiv-2409.07637","DOIUrl":"https://doi.org/arxiv-2409.07637","url":null,"abstract":"The integration of renewable energy sources (RES) into power grids presents\u0000significant challenges due to their intrinsic stochasticity and uncertainty,\u0000necessitating the development of new techniques for reliable and efficient\u0000forecasting. This paper proposes a method combining probabilistic forecasting\u0000and Gaussian copula for day-ahead prediction and scenario generation of load,\u0000wind, and solar power in high-dimensional contexts. By incorporating weather\u0000covariates and restoring spatio-temporal correlations, the proposed method\u0000enhances the reliability of probabilistic forecasts in RES. Extensive numerical\u0000experiments compare the effectiveness of different time series models, with\u0000performance evaluated using comprehensive metrics on a real-world and\u0000high-dimensional dataset from Midcontinent Independent System Operator (MISO).\u0000The results highlight the importance of weather information and demonstrate the\u0000efficacy of the Gaussian copula in generating realistic scenarios, with the\u0000proposed weather-informed Temporal Fusion Transformer (WI-TFT) model showing\u0000superior performance.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"183 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
FIRAL is a recently proposed deterministic active learning algorithm for multiclass classification using logistic regression. It was shown to outperform the state-of-the-art in terms of accuracy and robustness and comes with theoretical performance guarantees. However, its scalability suffers when dealing with datasets featuring a large number of points $n$, dimensions $d$, and classes $c$, due to its $mathcal{O}(c^2d^2+nc^2d)$ storage and $mathcal{O}(c^3(nd^2 + bd^3 + bn))$ computational complexity where $b$ is the number of points to select in active learning. To address these challenges, we propose an approximate algorithm with storage requirements reduced to $mathcal{O}(n(d+c) + cd^2)$ and a computational complexity of $mathcal{O}(bncd^2)$. Additionally, we present a parallel implementation on GPUs. We demonstrate the accuracy and scalability of our approach using MNIST, CIFAR-10, Caltech101, and ImageNet. The accuracy tests reveal no deterioration in accuracy compared to FIRAL. We report strong and weak scaling tests on up to 12 GPUs, for three million point synthetic dataset.
{"title":"A Scalable Algorithm for Active Learning","authors":"Youguang Chen, Zheyu Wen, George Biros","doi":"arxiv-2409.07392","DOIUrl":"https://doi.org/arxiv-2409.07392","url":null,"abstract":"FIRAL is a recently proposed deterministic active learning algorithm for\u0000multiclass classification using logistic regression. It was shown to outperform\u0000the state-of-the-art in terms of accuracy and robustness and comes with\u0000theoretical performance guarantees. However, its scalability suffers when\u0000dealing with datasets featuring a large number of points $n$, dimensions $d$,\u0000and classes $c$, due to its $mathcal{O}(c^2d^2+nc^2d)$ storage and\u0000$mathcal{O}(c^3(nd^2 + bd^3 + bn))$ computational complexity where $b$ is the\u0000number of points to select in active learning. To address these challenges, we\u0000propose an approximate algorithm with storage requirements reduced to\u0000$mathcal{O}(n(d+c) + cd^2)$ and a computational complexity of\u0000$mathcal{O}(bncd^2)$. Additionally, we present a parallel implementation on\u0000GPUs. We demonstrate the accuracy and scalability of our approach using MNIST,\u0000CIFAR-10, Caltech101, and ImageNet. The accuracy tests reveal no deterioration\u0000in accuracy compared to FIRAL. We report strong and weak scaling tests on up to\u000012 GPUs, for three million point synthetic dataset.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Hilbert-Schmidt Independence Criterion (HSIC) is a powerful tool for nonparametric detection of dependence between random variables. It crucially depends, however, on the selection of reasonable kernels; commonly-used choices like the Gaussian kernel, or the kernel that yields the distance covariance, are sufficient only for amply sized samples from data distributions with relatively simple forms of dependence. We propose a scheme for selecting the kernels used in an HSIC-based independence test, based on maximizing an estimate of the asymptotic test power. We prove that maximizing this estimate indeed approximately maximizes the true power of the test, and demonstrate that our learned kernels can identify forms of structured dependence between random variables in various experiments.
{"title":"Learning Deep Kernels for Non-Parametric Independence Testing","authors":"Nathaniel Xu, Feng Liu, Danica J. Sutherland","doi":"arxiv-2409.06890","DOIUrl":"https://doi.org/arxiv-2409.06890","url":null,"abstract":"The Hilbert-Schmidt Independence Criterion (HSIC) is a powerful tool for\u0000nonparametric detection of dependence between random variables. It crucially\u0000depends, however, on the selection of reasonable kernels; commonly-used choices\u0000like the Gaussian kernel, or the kernel that yields the distance covariance,\u0000are sufficient only for amply sized samples from data distributions with\u0000relatively simple forms of dependence. We propose a scheme for selecting the\u0000kernels used in an HSIC-based independence test, based on maximizing an\u0000estimate of the asymptotic test power. We prove that maximizing this estimate\u0000indeed approximately maximizes the true power of the test, and demonstrate that\u0000our learned kernels can identify forms of structured dependence between random\u0000variables in various experiments.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"100 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stephen Y Zhang, Fangfei Lan, Youjia Zhou, Agnese Barbensi, Michael P H Stumpf, Bei Wang, Tom Needham
Interactions and relations between objects may be pairwise or higher-order in nature, and so network-valued data are ubiquitous in the real world. The "space of networks", however, has a complex structure that cannot be adequately described using conventional statistical tools. We introduce a measure-theoretic formalism for modeling generalized network structures such as graphs, hypergraphs, or graphs whose nodes come with a partition into categorical classes. We then propose a metric that extends the Gromov-Wasserstein distance between graphs and the co-optimal transport distance between hypergraphs. We characterize the geometry of this space, thereby providing a unified theoretical treatment of generalized networks that encompasses the cases of pairwise, as well as higher-order, relations. In particular, we show that our metric is an Alexandrov space of non-negative curvature, and leverage this structure to define gradients for certain functionals commonly arising in geometric data analysis tasks. We extend our analysis to the setting where vertices have additional label information, and derive efficient computational schemes to use in practice. Equipped with these theoretical and computational tools, we demonstrate the utility of our framework in a suite of applications, including hypergraph alignment, clustering and dictionary learning from ensemble data, multi-omics alignment, as well as multiscale network alignment.
{"title":"Geometry of the Space of Partitioned Networks: A Unified Theoretical and Computational Framework","authors":"Stephen Y Zhang, Fangfei Lan, Youjia Zhou, Agnese Barbensi, Michael P H Stumpf, Bei Wang, Tom Needham","doi":"arxiv-2409.06302","DOIUrl":"https://doi.org/arxiv-2409.06302","url":null,"abstract":"Interactions and relations between objects may be pairwise or higher-order in\u0000nature, and so network-valued data are ubiquitous in the real world. The \"space\u0000of networks\", however, has a complex structure that cannot be adequately\u0000described using conventional statistical tools. We introduce a\u0000measure-theoretic formalism for modeling generalized network structures such as\u0000graphs, hypergraphs, or graphs whose nodes come with a partition into\u0000categorical classes. We then propose a metric that extends the\u0000Gromov-Wasserstein distance between graphs and the co-optimal transport\u0000distance between hypergraphs. We characterize the geometry of this space,\u0000thereby providing a unified theoretical treatment of generalized networks that\u0000encompasses the cases of pairwise, as well as higher-order, relations. In\u0000particular, we show that our metric is an Alexandrov space of non-negative\u0000curvature, and leverage this structure to define gradients for certain\u0000functionals commonly arising in geometric data analysis tasks. We extend our\u0000analysis to the setting where vertices have additional label information, and\u0000derive efficient computational schemes to use in practice. Equipped with these\u0000theoretical and computational tools, we demonstrate the utility of our\u0000framework in a suite of applications, including hypergraph alignment,\u0000clustering and dictionary learning from ensemble data, multi-omics alignment,\u0000as well as multiscale network alignment.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we show how $K$-nearest neighbor ($K$-NN) resampling, an off-policy evaluation method proposed in cite{giegrich2023k}, can be applied to simulate limit order book (LOB) markets and how it can be used to evaluate and calibrate trading strategies. Using historical LOB data, we demonstrate that our simulation method is capable of recreating realistic LOB dynamics and that synthetic trading within the simulation leads to a market impact in line with the corresponding literature. Compared to other statistical LOB simulation methods, our algorithm has theoretical convergence guarantees under general conditions, does not require optimization, is easy to implement and computationally efficient. Furthermore, we show that in a benchmark comparison our method outperforms a deep learning-based algorithm for several key statistics. In the context of a LOB with pro-rata type matching, we demonstrate how our algorithm can calibrate the size of limit orders for a liquidation strategy. Finally, we describe how $K$-NN resampling can be modified for choices of higher dimensional state spaces.
{"title":"Limit Order Book Simulation and Trade Evaluation with $K$-Nearest-Neighbor Resampling","authors":"Michael Giegrich, Roel Oomen, Christoph Reisinger","doi":"arxiv-2409.06514","DOIUrl":"https://doi.org/arxiv-2409.06514","url":null,"abstract":"In this paper, we show how $K$-nearest neighbor ($K$-NN) resampling, an\u0000off-policy evaluation method proposed in cite{giegrich2023k}, can be applied\u0000to simulate limit order book (LOB) markets and how it can be used to evaluate\u0000and calibrate trading strategies. Using historical LOB data, we demonstrate\u0000that our simulation method is capable of recreating realistic LOB dynamics and\u0000that synthetic trading within the simulation leads to a market impact in line\u0000with the corresponding literature. Compared to other statistical LOB simulation\u0000methods, our algorithm has theoretical convergence guarantees under general\u0000conditions, does not require optimization, is easy to implement and\u0000computationally efficient. Furthermore, we show that in a benchmark comparison\u0000our method outperforms a deep learning-based algorithm for several key\u0000statistics. In the context of a LOB with pro-rata type matching, we demonstrate\u0000how our algorithm can calibrate the size of limit orders for a liquidation\u0000strategy. Finally, we describe how $K$-NN resampling can be modified for\u0000choices of higher dimensional state spaces.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"95 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purvasha Chakravarti, Lucas Kania, Olaf Behnke, Mikael Kuusela, Larry Wasserman
Searches of new signals in particle physics are usually done by training a supervised classifier to separate a signal model from the known Standard Model physics (also called the background model). However, even when the signal model is correct, systematic errors in the background model can influence supervised classifiers and might adversely affect the signal detection procedure. To tackle this problem, one approach is to use the (possibly misspecified) classifier only to perform a preliminary signal-enrichment step and then to carry out a bump hunt on the signal-rich sample using only the real experimental data. For this procedure to work, we need a classifier constrained to be decorrelated with one or more protected variables used for the signal detection step. We do this by considering an optimal transport map of the classifier output that makes it independent of the protected variable(s) for the background. We then fit a semi-parametric mixture model to the distribution of the protected variable after making cuts on the transformed classifier to detect the presence of a signal. We compare and contrast this decorrelation method with previous approaches, show that the decorrelation procedure is robust to moderate background misspecification, and analyse the power of the signal detection test as a function of the cut on the classifier.
{"title":"Robust semi-parametric signal detection in particle physics with classifiers decorrelated via optimal transport","authors":"Purvasha Chakravarti, Lucas Kania, Olaf Behnke, Mikael Kuusela, Larry Wasserman","doi":"arxiv-2409.06399","DOIUrl":"https://doi.org/arxiv-2409.06399","url":null,"abstract":"Searches of new signals in particle physics are usually done by training a\u0000supervised classifier to separate a signal model from the known Standard Model\u0000physics (also called the background model). However, even when the signal model\u0000is correct, systematic errors in the background model can influence supervised\u0000classifiers and might adversely affect the signal detection procedure. To\u0000tackle this problem, one approach is to use the (possibly misspecified)\u0000classifier only to perform a preliminary signal-enrichment step and then to\u0000carry out a bump hunt on the signal-rich sample using only the real\u0000experimental data. For this procedure to work, we need a classifier constrained\u0000to be decorrelated with one or more protected variables used for the signal\u0000detection step. We do this by considering an optimal transport map of the\u0000classifier output that makes it independent of the protected variable(s) for\u0000the background. We then fit a semi-parametric mixture model to the distribution\u0000of the protected variable after making cuts on the transformed classifier to\u0000detect the presence of a signal. We compare and contrast this decorrelation\u0000method with previous approaches, show that the decorrelation procedure is\u0000robust to moderate background misspecification, and analyse the power of the\u0000signal detection test as a function of the cut on the classifier.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces a generalized ps-BART model for the estimation of Average Treatment Effect (ATE) and Conditional Average Treatment Effect (CATE) in continuous treatments, addressing limitations of the Bayesian Causal Forest (BCF) model. The ps-BART model's nonparametric nature allows for flexibility in capturing nonlinear relationships between treatment and outcome variables. Across three distinct sets of Data Generating Processes (DGPs), the ps-BART model consistently outperforms the BCF model, particularly in highly nonlinear settings. The ps-BART model's robustness in uncertainty estimation and accuracy in both point-wise and probabilistic estimation demonstrate its utility for real-world applications. This research fills a crucial gap in causal inference literature, providing a tool better suited for nonlinear treatment-outcome relationships and opening avenues for further exploration in the domain of continuous treatment effect estimation.
{"title":"Advancing Causal Inference: A Nonparametric Approach to ATE and CATE Estimation with Continuous Treatments","authors":"Hugo Gobato Souto, Francisco Louzada Neto","doi":"arxiv-2409.06593","DOIUrl":"https://doi.org/arxiv-2409.06593","url":null,"abstract":"This paper introduces a generalized ps-BART model for the estimation of\u0000Average Treatment Effect (ATE) and Conditional Average Treatment Effect (CATE)\u0000in continuous treatments, addressing limitations of the Bayesian Causal Forest\u0000(BCF) model. The ps-BART model's nonparametric nature allows for flexibility in\u0000capturing nonlinear relationships between treatment and outcome variables.\u0000Across three distinct sets of Data Generating Processes (DGPs), the ps-BART\u0000model consistently outperforms the BCF model, particularly in highly nonlinear\u0000settings. The ps-BART model's robustness in uncertainty estimation and accuracy\u0000in both point-wise and probabilistic estimation demonstrate its utility for\u0000real-world applications. This research fills a crucial gap in causal inference\u0000literature, providing a tool better suited for nonlinear treatment-outcome\u0000relationships and opening avenues for further exploration in the domain of\u0000continuous treatment effect estimation.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper studies simple bilevel problems, where a convex upper-level function is minimized over the optimal solutions of a convex lower-level problem. We first show the fundamental difficulty of simple bilevel problems, that the approximate optimal value of such problems is not obtainable by first-order zero-respecting algorithms. Then we follow recent works to pursue the weak approximate solutions. For this goal, we propose novel near-optimal methods for smooth and nonsmooth problems by reformulating them into functionally constrained problems.
{"title":"Functionally Constrained Algorithm Solves Convex Simple Bilevel Problems","authors":"Huaqing Zhang, Lesi Chen, Jing Xu, Jingzhao Zhang","doi":"arxiv-2409.06530","DOIUrl":"https://doi.org/arxiv-2409.06530","url":null,"abstract":"This paper studies simple bilevel problems, where a convex upper-level\u0000function is minimized over the optimal solutions of a convex lower-level\u0000problem. We first show the fundamental difficulty of simple bilevel problems,\u0000that the approximate optimal value of such problems is not obtainable by\u0000first-order zero-respecting algorithms. Then we follow recent works to pursue\u0000the weak approximate solutions. For this goal, we propose novel near-optimal\u0000methods for smooth and nonsmooth problems by reformulating them into\u0000functionally constrained problems.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}