We consider the problem of decomposing a higher-order tensor with binary entries. Such data problems arise frequently in applications such as neuroimaging, recommendation system, topic modeling, and sensor network localization. We propose a multilinear Bernoulli model, develop a rank-constrained likelihood-based estimation method, and obtain the theoretical accuracy guarantees. In contrast to continuous-valued problems, the binary tensor problem exhibits an interesting phase transition phenomenon according to the signal-to-noise ratio. The error bound for the parameter tensor estimation is established, and we show that the obtained rate is minimax optimal under the considered model. Furthermore, we develop an alternating optimization algorithm with convergence guarantees. The efficacy of our approach is demonstrated through both simulations and analyses of multiple data sets on the tasks of tensor completion and clustering.
{"title":"Learning from Binary Multiway Data: Probabilistic Tensor Decomposition and its Statistical Optimality.","authors":"Miaoyan Wang, Lexin Li","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We consider the problem of decomposing a higher-order tensor with binary entries. Such data problems arise frequently in applications such as neuroimaging, recommendation system, topic modeling, and sensor network localization. We propose a multilinear Bernoulli model, develop a rank-constrained likelihood-based estimation method, and obtain the theoretical accuracy guarantees. In contrast to continuous-valued problems, the binary tensor problem exhibits an interesting phase transition phenomenon according to the signal-to-noise ratio. The error bound for the parameter tensor estimation is established, and we show that the obtained rate is minimax optimal under the considered model. Furthermore, we develop an alternating optimization algorithm with convergence guarantees. The efficacy of our approach is demonstrated through both simulations and analyses of multiple data sets on the tasks of tensor completion and clustering.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"21 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8457422/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39465843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nilabja Guha, Veera Baladandayuthapani, Bani K Mallick
Graphical models are ubiquitous tools to describe the interdependence between variables measured simultaneously such as large-scale gene or protein expression data. Gaussian graphical models (GGMs) are well-established tools for probabilistic exploration of dependence structures using precision matrices and they are generated under a multivariate normal joint distribution. However, they suffer from several shortcomings since they are based on Gaussian distribution assumptions. In this article, we propose a Bayesian quantile based approach for sparse estimation of graphs. We demonstrate that the resulting graph estimation is robust to outliers and applicable under general distributional assumptions. Furthermore, we develop efficient variational Bayes approximations to scale the methods for large data sets. Our methods are applied to a novel cancer proteomics data dataset where-in multiple proteomic antibodies are simultaneously assessed on tumor samples using reverse-phase protein arrays (RPPA) technology.
{"title":"Quantile Graphical Models: Bayesian Approaches.","authors":"Nilabja Guha, Veera Baladandayuthapani, Bani K Mallick","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Graphical models are ubiquitous tools to describe the interdependence between variables measured simultaneously such as large-scale gene or protein expression data. Gaussian graphical models (GGMs) are well-established tools for probabilistic exploration of dependence structures using precision matrices and they are generated under a multivariate normal joint distribution. However, they suffer from several shortcomings since they are based on Gaussian distribution assumptions. In this article, we propose a Bayesian quantile based approach for sparse estimation of graphs. We demonstrate that the resulting graph estimation is robust to outliers and applicable under general distributional assumptions. Furthermore, we develop efficient variational Bayes approximations to scale the methods for large data sets. Our methods are applied to a novel cancer proteomics data dataset where-in multiple proteomic antibodies are simultaneously assessed on tumor samples using reverse-phase protein arrays (RPPA) technology.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"21 79","pages":"1-47"},"PeriodicalIF":6.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8297664/pdf/nihms-1636569.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39223529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The individualized treatment recommendation (ITR) is an important analytic framework for precision medicine. The goal of ITR is to assign the best treatments to patients based on their individual characteristics. From the machine learning perspective, the solution to the ITR problem can be formulated as a weighted classification problem to maximize the mean benefit from the recommended treatments given patients' characteristics. Several ITR methods have been proposed in both the binary setting and the multicategory setting. In practice, one may prefer a more flexible recommendation that includes multiple treatment options. This motivates us to develop methods to obtain a set of near-optimal individualized treatment recommendations alternative to each other, called alternative individualized treatment recommendations (A-ITR). We propose two methods to estimate the optimal A-ITR within the outcome weighted learning (OWL) framework. Simulation studies and a real data analysis for Type 2 diabetic patients with injectable antidiabetic treatments are conducted to show the usefulness of the proposed A-ITR framework. We also show the consistency of these methods and obtain an upper bound for the risk between the theoretically optimal recommendation and the estimated one. An R package aitr has been developed, found at https://github.com/menghaomiao/aitr.
{"title":"Near-optimal Individualized Treatment Recommendations.","authors":"Haomiao Meng, Ying-Qi Zhao, Haoda Fu, Xingye Qiao","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The individualized treatment recommendation (ITR) is an important analytic framework for precision medicine. The goal of ITR is to assign the best treatments to patients based on their individual characteristics. From the machine learning perspective, the solution to the ITR problem can be formulated as a weighted classification problem to maximize the mean benefit from the recommended treatments given patients' characteristics. Several ITR methods have been proposed in both the binary setting and the multicategory setting. In practice, one may prefer a more flexible recommendation that includes multiple treatment options. This motivates us to develop methods to obtain a set of near-optimal individualized treatment recommendations alternative to each other, called alternative individualized treatment recommendations (A-ITR). We propose two methods to estimate the optimal A-ITR within the outcome weighted learning (OWL) framework. Simulation studies and a real data analysis for Type 2 diabetic patients with injectable antidiabetic treatments are conducted to show the usefulness of the proposed A-ITR framework. We also show the consistency of these methods and obtain an upper bound for the risk between the theoretically optimal recommendation and the estimated one. An R package aitr has been developed, found at https://github.com/menghaomiao/aitr.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"21 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8324003/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39264728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artin Spiridonoff, Alex Olshevsky, Ioannis Ch Paschalidis
We consider the standard model of distributed optimization of a sum of functions , where node i in a network holds the function fi (z). We allow for a harsh network model characterized by asynchronous updates, message delays, unpredictable message losses, and directed communication among nodes. In this setting, we analyze a modification of the Gradient-Push method for distributed optimization, assuming that (i) node i is capable of generating gradients of its function fi (z) corrupted by zero-mean bounded-support additive noise at each step, (ii) F(z) is strongly convex, and (iii) each fi (z) has Lipschitz gradients. We show that our proposed method asymptotically performs as well as the best bounds on centralized gradient descent that takes steps in the direction of the sum of the noisy gradients of all the functions f1(z), …, fn (z) at each step.
我们考虑函数和的分布式优化的标准模型F (z) =∑i = 1 n F i (z),其中网络中的节点i保存函数fi (z)。我们允许一个苛刻的网络模型,其特征是异步更新,消息延迟,不可预测的消息丢失和节点之间的定向通信。在此设置中,我们分析了用于分布式优化的Gradient-Push方法的修改,假设(i)节点i能够生成其函数fi (z)的梯度,该函数在每一步都被零均值有界支持加性噪声破坏,(ii) F(z)是强凸的,以及(iii)每个fi (z)具有Lipschitz梯度。我们表明,我们提出的方法在集中梯度下降上的渐近性能与最佳边界一样好,该方法在每一步都朝着所有函数f1 (z),…,fn (z)的噪声梯度之和的方向采取步骤。
{"title":"Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions.","authors":"Artin Spiridonoff, Alex Olshevsky, Ioannis Ch Paschalidis","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We consider the standard model of distributed optimization of a sum of functions <math><mrow><mi>F</mi> <mrow><mo>(</mo> <mi>z</mi> <mo>)</mo></mrow> <mo>=</mo> <msubsup><mo>∑</mo> <mrow><mi>i</mi> <mo>=</mo> <mn>1</mn></mrow> <mi>n</mi></msubsup> <mrow><msub><mi>f</mi> <mi>i</mi></msub> <mrow><mo>(</mo> <mi>z</mi> <mo>)</mo></mrow> </mrow> </mrow> </math> , where node <i>i</i> in a network holds the function <i>f<sub>i</sub></i> (<b>z</b>). We allow for a harsh network model characterized by asynchronous updates, message delays, unpredictable message losses, and directed communication among nodes. In this setting, we analyze a modification of the Gradient-Push method for distributed optimization, assuming that (i) node <i>i</i> is capable of generating gradients of its function <i>f<sub>i</sub></i> (<b>z</b>) corrupted by zero-mean bounded-support additive noise at each step, (ii) <i>F</i>(<b>z</b>) is strongly convex, and (iii) each <i>f<sub>i</sub></i> (<b>z</b>) has Lipschitz gradients. We show that our proposed method asymptotically performs as well as the best bounds on centralized gradient descent that takes steps in the direction of the sum of the noisy gradients of all the functions <i>f</i> <sub>1</sub>(<b>z</b>), …, <i>f<sub>n</sub></i> (<b>z</b>) at each step.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"21 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7520166/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38434192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In spite of its urgent importance in the era of big data, testing high-dimensional parameters in generalized linear models (GLMs) in the presence of high-dimensional nuisance parameters has been largely under-studied, especially with regard to constructing powerful tests for general (and unknown) alternatives. Most existing tests are powerful only against certain alternatives and may yield incorrect Type I error rates under high-dimensional nuisance parameter situations. In this paper, we propose the adaptive interaction sum of powered score (aiSPU) test in the framework of penalized regression with a non-convex penalty, called truncated Lasso penalty (TLP), which can maintain correct Type I error rates while yielding high statistical power across a wide range of alternatives. To calculate its p-values analytically, we derive its asymptotic null distribution. Via simulations, its superior finite-sample performance is demonstrated over several representative existing methods. In addition, we apply it and other representative tests to an Alzheimer's Disease Neuroimaging Initiative (ADNI) data set, detecting possible gene-gender interactions for Alzheimer's disease. We also put R package "aispu" implementing the proposed test on GitHub.
{"title":"A Regularization-Based Adaptive Test for High-Dimensional Generalized Linear Models.","authors":"Chong Wu, Gongjun Xu, Xiaotong Shen, Wei Pan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In spite of its urgent importance in the era of big data, testing high-dimensional parameters in generalized linear models (GLMs) in the presence of high-dimensional nuisance parameters has been largely under-studied, especially with regard to constructing powerful tests for general (and unknown) alternatives. Most existing tests are powerful only against certain alternatives and may yield incorrect Type I error rates under high-dimensional nuisance parameter situations. In this paper, we propose the adaptive interaction sum of powered score (aiSPU) test in the framework of penalized regression with a non-convex penalty, called truncated Lasso penalty (TLP), which can maintain correct Type I error rates while yielding high statistical power across a wide range of alternatives. To calculate its <i>p</i>-values analytically, we derive its asymptotic null distribution. Via simulations, its superior finite-sample performance is demonstrated over several representative existing methods. In addition, we apply it and other representative tests to an Alzheimer's Disease Neuroimaging Initiative (ADNI) data set, detecting possible gene-gender interactions for Alzheimer's disease. We also put R package \"<i>aispu</i>\" implementing the proposed test on GitHub.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"21 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7425805/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38270305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Testing the hypothesis of parallelism is a fundamental statistical problem arising from many applied sciences. In this paper, we develop a nonparametric parallelism test for inferring whether the trends are parallel in treatment and control groups. In particular, the proposed nonparametric parallelism test is a Wald type test based on a smoothing spline ANOVA (SSANOVA) model which can characterize the complex patterns of the data. We derive that the asymptotic null distribution of the test statistic is a Chi-square distribution, unveiling a new version of Wilks phenomenon. Notably, we establish the minimax sharp lower bound of the distinguishable rate for the nonparametric parallelism test by using the information theory, and further prove that the proposed test is minimax optimal. Simulation studies are conducted to investigate the empirical performance of the proposed test. DNA methylation and neuroimaging studies are presented to illustrate potential applications of the test. The software is available at https://github.com/BioAlgs/Parallelism.
{"title":"Minimax Nonparametric Parallelism Test.","authors":"Xin Xing, Meimei Liu, Ping Ma, Wenxuan Zhong","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Testing the hypothesis of parallelism is a fundamental statistical problem arising from many applied sciences. In this paper, we develop a nonparametric parallelism test for inferring whether the trends are parallel in treatment and control groups. In particular, the proposed nonparametric parallelism test is a Wald type test based on a smoothing spline ANOVA (SSANOVA) model which can characterize the complex patterns of the data. We derive that the asymptotic null distribution of the test statistic is a Chi-square distribution, unveiling a new version of Wilks phenomenon. Notably, we establish the minimax sharp lower bound of the distinguishable rate for the nonparametric parallelism test by using the information theory, and further prove that the proposed test is minimax optimal. Simulation studies are conducted to investigate the empirical performance of the proposed test. DNA methylation and neuroimaging studies are presented to illustrate potential applications of the test. The software is available at https://github.com/BioAlgs/Parallelism.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"21 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11086968/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140912390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eric C Chi, Brian R Gaines, Will Wei Sun, Hua Zhou, Jian Yang
Cluster analysis is a fundamental tool for pattern discovery of complex heterogeneous data. Prevalent clustering methods mainly focus on vector or matrix-variate data and are not applicable to general-order tensors, which arise frequently in modern scientific and business applications. Moreover, there is a gap between statistical guarantees and computational efficiency for existing tensor clustering solutions due to the nature of their non-convex formulations. In this work, we bridge this gap by developing a provable convex formulation of tensor co-clustering. Our convex co-clustering (CoCo) estimator enjoys stability guarantees and its computational and storage costs are polynomial in the size of the data. We further establish a non-asymptotic error bound for the CoCo estimator, which reveals a surprising "blessing of dimensionality" phenomenon that does not exist in vector or matrix-variate cluster analysis. Our theoretical findings are supported by extensive simulated studies. Finally, we apply the CoCo estimator to the cluster analysis of advertisement click tensor data from a major online company. Our clustering results provide meaningful business insights to improve advertising effectiveness.
{"title":"Provable Convex Co-clustering of Tensors.","authors":"Eric C Chi, Brian R Gaines, Will Wei Sun, Hua Zhou, Jian Yang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Cluster analysis is a fundamental tool for pattern discovery of complex heterogeneous data. Prevalent clustering methods mainly focus on vector or matrix-variate data and are not applicable to general-order tensors, which arise frequently in modern scientific and business applications. Moreover, there is a gap between statistical guarantees and computational efficiency for existing tensor clustering solutions due to the nature of their non-convex formulations. In this work, we bridge this gap by developing a provable convex formulation of tensor co-clustering. Our convex co-clustering (CoCo) estimator enjoys stability guarantees and its computational and storage costs are polynomial in the size of the data. We further establish a non-asymptotic error bound for the CoCo estimator, which reveals a surprising \"blessing of dimensionality\" phenomenon that does not exist in vector or matrix-variate cluster analysis. Our theoretical findings are supported by extensive simulated studies. Finally, we apply the CoCo estimator to the cluster analysis of advertisement click tensor data from a major online company. Our clustering results provide meaningful business insights to improve advertising effectiveness.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"21 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7731944/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38706545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Proximal distance algorithms combine the classical penalty method of constrained minimization with distance majorization. If f(x) is the loss function, and C is the constraint set in a constrained minimization problem, then the proximal distance principle mandates minimizing the penalized loss and following the solution xρ to its limit as ρ tends to ∞. At each iteration the squared Euclidean distance dist(x,C)2 is majorized by the spherical quadratic ‖x- PC (xk )‖2, where PC (xk ) denotes the projection of the current iterate xk onto C. The minimum of the surrogate function is given by the proximal map prox ρ -1f [PC (xk )]. The next iterate xk+1 automatically decreases the original penalized loss for fixed ρ. Since many explicit projections and proximal maps are known, it is straightforward to derive and implement novel optimization algorithms in this setting. These algorithms can take hundreds if not thousands of iterations to converge, but the simple nature of each iteration makes proximal distance algorithms competitive with traditional algorithms. For convex problems, proximal distance algorithms reduce to proximal gradient algorithms and therefore enjoy well understood convergence properties. For nonconvex problems, one can attack convergence by invoking Zangwill's theorem. Our numerical examples demonstrate the utility of proximal distance algorithms in various high-dimensional settings, including a) linear programming, b) constrained least squares, c) projection to the closest kinship matrix, d) projection onto a second-order cone constraint, e) calculation of Horn's copositive matrix index, f) linear complementarity programming, and g) sparse principal components analysis. The proximal distance algorithm in each case is competitive or superior in speed to traditional methods such as the interior point method and the alternating direction method of multipliers (ADMM). Source code for the numerical examples can be found at https://github.com/klkeys/proxdist.
{"title":"Proximal Distance Algorithms: Theory and Practice.","authors":"Kevin L Keys, Hua Zhou, Kenneth Lange","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Proximal distance algorithms combine the classical penalty method of constrained minimization with distance majorization. If <i>f</i>(<i>x</i>) is the loss function, and <i>C</i> is the constraint set in a constrained minimization problem, then the proximal distance principle mandates minimizing the penalized loss <math><mrow><mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>+</mo> <mfrac><mi>ρ</mi> <mn>2</mn></mfrac> <mtext>dist</mtext> <msup> <mrow><mrow><mo>(</mo> <mrow><mi>x</mi> <mo>,</mo> <mi>C</mi></mrow> <mo>)</mo></mrow> </mrow> <mn>2</mn></msup> </mrow> </math> and following the solution <i>x</i> <sub><i>ρ</i></sub> to its limit as <i>ρ</i> tends to ∞. At each iteration the squared Euclidean distance dist(<i>x,C</i>)<sup>2</sup> is majorized by the spherical quadratic ‖<i>x</i>- <i>P</i> <sub><i>C</i></sub> (<i>x</i> <sub><i>k</i></sub> )‖<sup>2</sup>, where <i>P</i> <sub><i>C</i></sub> (<i>x</i> <sub><i>k</i></sub> ) denotes the projection of the current iterate <i>x</i> <sub><i>k</i></sub> onto <i>C</i>. The minimum of the surrogate function <math><mrow><mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>+</mo> <mfrac><mi>ρ</mi> <mn>2</mn></mfrac> <mo>‖</mo> <mi>x</mi> <mo>-</mo> <msub><mi>P</mi> <mi>C</mi></msub> <mrow><mo>(</mo> <mrow><msub><mi>x</mi> <mi>k</mi></msub> </mrow> <mo>)</mo></mrow> <msup><mo>‖</mo> <mn>2</mn></msup> </mrow> </math> is given by the proximal map prox <sub><i>ρ</i></sub> -<sub>1<i>f</i></sub> [<i>P</i> <sub><i>C</i></sub> (<i>x</i> <sub><i>k</i></sub> )]. The next iterate <i>x</i> <sub><i>k</i>+1</sub> automatically decreases the original penalized loss for fixed <i>ρ</i>. Since many explicit projections and proximal maps are known, it is straightforward to derive and implement novel optimization algorithms in this setting. These algorithms can take hundreds if not thousands of iterations to converge, but the simple nature of each iteration makes proximal distance algorithms competitive with traditional algorithms. For convex problems, proximal distance algorithms reduce to proximal gradient algorithms and therefore enjoy well understood convergence properties. For nonconvex problems, one can attack convergence by invoking Zangwill's theorem. Our numerical examples demonstrate the utility of proximal distance algorithms in various high-dimensional settings, including a) linear programming, b) constrained least squares, c) projection to the closest kinship matrix, d) projection onto a second-order cone constraint, e) calculation of Horn's copositive matrix index, f) linear complementarity programming, and g) sparse principal components analysis. The proximal distance algorithm in each case is competitive or superior in speed to traditional methods such as the interior point method and the alternating direction method of multipliers (ADMM). Source code for the numerical examples can be found at https://github.com/klkeys/proxdist.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"20 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6812563/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41219016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the task of estimating a high-dimensional directed acyclic graph, given observations from a linear structural equation model with arbitrary noise distribution. By exploiting properties of common random graphs, we develop a new algorithm that requires conditioning only on small sets of variables. The proposed algorithm, which is essentially a modified version of the PC-Algorithm, offers significant gains in both computational complexity and estimation accuracy. In particular, it results in more efficient and accurate estimation in large networks containing hub nodes, which are common in biological systems. We prove the consistency of the proposed algorithm, and show that it also requires a less stringent faithfulness assumption than the PC-Algorithm. Simulations in low and high-dimensional settings are used to illustrate these findings. An application to gene expression data suggests that the proposed algorithm can identify a greater number of clinically relevant genes than current methods.
{"title":"The Reduced PC-Algorithm: Improved Causal Structure Learning in Large Random Networks.","authors":"Arjun Sondhi, Ali Shojaie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We consider the task of estimating a high-dimensional directed acyclic graph, given observations from a linear structural equation model with arbitrary noise distribution. By exploiting properties of common random graphs, we develop a new algorithm that requires conditioning only on small sets of variables. The proposed algorithm, which is essentially a modified version of the PC-Algorithm, offers significant gains in both computational complexity and estimation accuracy. In particular, it results in more efficient and accurate estimation in large networks containing hub nodes, which are common in biological systems. We prove the consistency of the proposed algorithm, and show that it also requires a less stringent faithfulness assumption than the PC-Algorithm. Simulations in low and high-dimensional settings are used to illustrate these findings. An application to gene expression data suggests that the proposed algorithm can identify a greater number of clinically relevant genes than current methods.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"20 164","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10552884/pdf/nihms-1885649.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41105823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steven M Hill, Chris J Oates, Duncan A Blythe, Sach Mukherjee
This paper frames causal structure estimation as a machine learning task. The idea is to treat indicators of causal relationships between variables as 'labels' and to exploit available data on the variables of interest to provide features for the labelling task. Background scientific knowledge or any available interventional data provide labels on some causal relationships and the remainder are treated as unlabelled. To illustrate the key ideas, we develop a distance-based approach (based on bivariate histograms) within a manifold regularization framework. We present empirical results on three different biological data sets (including examples where causal effects can be verified by experimental intervention), that together demonstrate the efficacy and general nature of the approach as well as its simplicity from a user's point of view.
{"title":"Causal Learning via Manifold Regularization.","authors":"Steven M Hill, Chris J Oates, Duncan A Blythe, Sach Mukherjee","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This paper frames causal structure estimation as a machine learning task. The idea is to treat indicators of causal relationships between variables as 'labels' and to exploit available data on the variables of interest to provide features for the labelling task. Background scientific knowledge or any available interventional data provide labels on some causal relationships and the remainder are treated as unlabelled. To illustrate the key ideas, we develop a distance-based approach (based on bivariate histograms) within a manifold regularization framework. We present empirical results on three different biological data sets (including examples where causal effects can be verified by experimental intervention), that together demonstrate the efficacy and general nature of the approach as well as its simplicity from a user's point of view.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"20 ","pages":"127"},"PeriodicalIF":4.3,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6986916/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9142095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}