David Hallac, Christopher Wong, Steven Diamond, Abhijit Sharang, Rok Sosič, Stephen Boyd, Jure Leskovec
SnapVX is a high-performance solver for convex optimization problems defined on networks. For problems of this form, SnapVX provides a fast and scalable solution with guaranteed global convergence. It combines the capabilities of two open source software packages: Snap.py and CVXPY. Snap.py is a large scale graph processing library, and CVXPY provides a general modeling framework for small-scale subproblems. SnapVX offers a customizable yet easy-to-use Python interface with "out-of-the-box" functionality. Based on the Alternating Direction Method of Multipliers (ADMM), it is able to efficiently store, analyze, parallelize, and solve large optimization problems from a variety of different applications. Documentation, examples, and more can be found on the SnapVX website at http://snap.stanford.edu/snapvx.
{"title":"SnapVX: A Network-Based Convex Optimization Solver.","authors":"David Hallac, Christopher Wong, Steven Diamond, Abhijit Sharang, Rok Sosič, Stephen Boyd, Jure Leskovec","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>SnapVX is a high-performance solver for convex optimization problems defined on networks. For problems of this form, SnapVX provides a fast and scalable solution with guaranteed global convergence. It combines the capabilities of two open source software packages: Snap.py and CVXPY. Snap.py is a large scale graph processing library, and CVXPY provides a general modeling framework for small-scale subproblems. SnapVX offers a customizable yet easy-to-use Python interface with \"out-of-the-box\" functionality. Based on the Alternating Direction Method of Multipliers (ADMM), it is able to efficiently store, analyze, parallelize, and solve large optimization problems from a variety of different applications. Documentation, examples, and more can be found on the SnapVX website at http://snap.stanford.edu/snapvx.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870756/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35960855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper takes a close look at the important commonalities and subtle differences between the well-established field of supervised learning and the much younger one of cooptimization. It explains...
本文仔细研究了建立良好的监督学习领域与更年轻的协同优化领域之间的重要共同点和细微差异。它解释了……
{"title":"Bridging supervised learning and test-based co-optimization","authors":"PopoviciElena","doi":"10.5555/3122009.3122047","DOIUrl":"https://doi.org/10.5555/3122009.3122047","url":null,"abstract":"This paper takes a close look at the important commonalities and subtle differences between the well-established field of supervised learning and the much younger one of cooptimization. It explains...","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71140012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PadillaOscar Hernan Madrid, SharpnackJames, G. ScottJames
The fused lasso, also known as (anisotropic) total variation denoising, is widely used for piecewise constant signal estimation with respect to a given undirected graph. The fused lasso estimate is...
融合套索,也称为(各向异性)全变分去噪,广泛用于对给定无向图的分段常数信号估计。融合套索估计是…
{"title":"The DFS fused lasso","authors":"PadillaOscar Hernan Madrid, SharpnackJames, G. ScottJames","doi":"10.5555/3122009.3242033","DOIUrl":"https://doi.org/10.5555/3122009.3242033","url":null,"abstract":"The fused lasso, also known as (anisotropic) total variation denoising, is widely used for piecewise constant signal estimation with respect to a given undirected graph. The fused lasso estimate is...","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71139798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Fan, Yirong Wu, Ming Yuan, David Page, Jie Liu, Irene M Ong, Peggy Peissig, Elizabeth Burnside
Predicting breast cancer risk has long been a goal of medical research in the pursuit of precision medicine. The goal of this study is to develop novel penalized methods to improve breast cancer risk prediction by leveraging structure information in electronic health records. We conducted a retrospective case-control study, garnering 49 mammography descriptors and 77 high-frequency/low-penetrance single-nucleotide polymorphisms (SNPs) from an existing personalized medicine data repository. Structured mammography reports and breast imaging features have long been part of a standard electronic health record (EHR), and genetic markers likely will be in the near future. Lasso and its variants are widely used approaches to integrated learning and feature selection, and our methodological contribution is to incorporate the dependence structure among the features into these approaches. More specifically, we propose a new methodology by combining group penalty and [Formula: see text] (1 ≤ p ≤ 2) fusion penalty to improve breast cancer risk prediction, taking into account structure information in mammography descriptors and SNPs. We demonstrate that our method provides benefits that are both statistically significant and potentially significant to people's lives.
{"title":"Structure-Leveraged Methods in Breast Cancer Risk Prediction.","authors":"Jun Fan, Yirong Wu, Ming Yuan, David Page, Jie Liu, Irene M Ong, Peggy Peissig, Elizabeth Burnside","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Predicting breast cancer risk has long been a goal of medical research in the pursuit of precision medicine. The goal of this study is to develop novel penalized methods to improve breast cancer risk prediction by leveraging structure information in electronic health records. We conducted a retrospective case-control study, garnering 49 mammography descriptors and 77 high-frequency/low-penetrance single-nucleotide polymorphisms (SNPs) from an existing personalized medicine data repository. Structured mammography reports and breast imaging features have long been part of a standard electronic health record (EHR), and genetic markers likely will be in the near future. Lasso and its variants are widely used approaches to integrated learning and feature selection, and our methodological contribution is to incorporate the dependence structure among the features into these approaches. More specifically, we propose a new methodology by combining group penalty and [Formula: see text] (1 ≤ <i>p</i> ≤ 2) fusion penalty to improve breast cancer risk prediction, taking into account structure information in mammography descriptors and SNPs. We demonstrate that our method provides benefits that are both statistically significant and potentially significant to people's lives.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5446896/pdf/nihms-826646.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35042470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Crowdsourcing has gained immense popularity in machine learning applications for obtaining large amounts of labeled data. Crowdsourcing is cheap and fast, but suffers from the problem of low-qualit...
众包在机器学习应用中获得大量标记数据获得了极大的普及。众包成本低、速度快,但存在质量低下的问题……
{"title":"Double or Nothing","authors":"Carol Sutton","doi":"10.5555/2946645.3053447","DOIUrl":"https://doi.org/10.5555/2946645.3053447","url":null,"abstract":"Crowdsourcing has gained immense popularity in machine learning applications for obtaining large amounts of labeled data. Crowdsourcing is cheap and fast, but suffers from the problem of low-qualit...","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2016-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71138965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the problem of predicting an outcome variable on the basis of a small number of covariates, using an interpretable yet non-additive model. We propose convex regression with interpretable sharp partitions (CRISP) for this task. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. We explore the properties of CRISP, and evaluate its performance in a simulation study and on a housing price data set.
{"title":"Convex Regression with Interpretable Sharp Partitions.","authors":"Ashley Petersen, Noah Simon, Daniela Witten","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We consider the problem of predicting an outcome variable on the basis of a small number of covariates, using an interpretable yet non-additive model. We propose <i>convex regression with interpretable sharp partitions</i> (CRISP) for this task. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. We explore the properties of CRISP, and evaluate its performance in a simulation study and on a housing price data set.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5021451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140208103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is known that for a certain class of single index models (SIMs) [Formula: see text], support recovery is impossible when X ~ 𝒩(0, 𝕀 p×p ) and a model complexity adjusted sample size is below a critical threshold. Recently, optimal algorithms based on Sliced Inverse Regression (SIR) were suggested. These algorithms work provably under the assumption that the design X comes from an i.i.d. Gaussian distribution. In the present paper we analyze algorithms based on covariance screening and least squares with L1 penalization (i.e. LASSO) and demonstrate that they can also enjoy optimal (up to a scalar) rescaled sample size in terms of support recovery, albeit under slightly different assumptions on f and ε compared to the SIR based algorithms. Furthermore, we show more generally, that LASSO succeeds in recovering the signed support of β0 if X ~ 𝒩 (0, Σ), and the covariance Σ satisfies the irrepresentable condition. Our work extends existing results on the support recovery of LASSO for the linear model, to a more general class of SIMs.
众所周知,对于某一类单指标模型(SIMs)[公式:见正文],当 X ~ 𝒩(0, 𝕀 p×p ) 和模型复杂度调整样本量低于临界阈值时,支持恢复是不可能的。最近,有人提出了基于切片反回归(SIR)的最优算法。这些算法是在设计 X 来自 i.i.d. 高斯分布的假设条件下证明有效的。在本文中,我们分析了基于协方差筛选和 L1 惩罚最小二乘法(即 LASSO)的算法,并证明它们在支持恢复方面也能获得最佳(达到标量)重标样本大小,尽管与基于 SIR 的算法相比,对 f 和 ε 的假设略有不同。此外,我们还更广泛地表明,如果 X ~ 𝒩 (0, Σ),并且协方差 Σ 满足不可呈现条件,那么 LASSO 就能成功地恢复 β0 的有符号支持。我们的工作将现有的线性模型 LASSO 支持恢复结果扩展到了更一般的 SIMs 类别。
{"title":"<i>L</i><sub>1</sub>-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs.","authors":"Matey Neykov, Jun S Liu, Tianxi Cai","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>It is known that for a certain class of single index models (SIMs) [Formula: see text], support recovery is impossible when <b><i>X</i></b> ~ 𝒩(0, 𝕀 <i><sub>p</sub></i><sub>×</sub><i><sub>p</sub></i> ) and a <i>model complexity adjusted sample size</i> is below a critical threshold. Recently, optimal algorithms based on Sliced Inverse Regression (SIR) were suggested. These algorithms work provably under the assumption that the design <b><i>X</i></b> comes from an i.i.d. Gaussian distribution. In the present paper we analyze algorithms based on covariance screening and least squares with <i>L</i><sub>1</sub> penalization (i.e. LASSO) and demonstrate that they can also enjoy optimal (up to a scalar) rescaled sample size in terms of support recovery, albeit under slightly different assumptions on <i>f</i> and <i>ε</i> compared to the SIR based algorithms. Furthermore, we show more generally, that LASSO succeeds in recovering the signed support of <b><i>β</i></b><sub>0</sub> if <b><i>X</i></b> ~ 𝒩 (0, <b>Σ</b>), and the covariance <b>Σ</b> satisfies the irrepresentable condition. Our work extends existing results on the support recovery of LASSO for the linear model, to a more general class of SIMs.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5426818/pdf/nihms851690.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34994441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CVXPY is a domain-specific language for convex optimization embedded in Python. It allows the user to express convex optimization problems in a natural syntax that follows the math, rather than in the restrictive standard form required by solvers. CVXPY makes it easy to combine convex optimization with high-level features of Python such as parallelism and object-oriented design. CVXPY is available at http://www.cvxpy.org/ under the GPL license, along with documentation and examples.
{"title":"CVXPY: A Python-Embedded Modeling Language for Convex Optimization.","authors":"Steven Diamond, Stephen Boyd","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>CVXPY is a domain-specific language for convex optimization embedded in Python. It allows the user to express convex optimization problems in a natural syntax that follows the math, rather than in the restrictive standard form required by solvers. CVXPY makes it easy to combine convex optimization with high-level features of Python such as parallelism and object-oriented design. CVXPY is available at http://www.cvxpy.org/ under the GPL license, along with documentation and examples.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4927437/pdf/nihms772320.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34633294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a Gibbs sampler for structure learning in directed acyclic graph (DAG) models. The standard Markov chain Monte Carlo algorithms used for learning DAGs are random-walk Metropolis-Hastings samplers. These samplers are guaranteed to converge asymptotically but often mix slowly when exploring the large graph spaces that arise in structure learning. In each step, the sampler we propose draws entire sets of parents for multiple nodes from the appropriate conditional distribution. This provides an efficient way to make large moves in graph space, permitting faster mixing whilst retaining asymptotic guarantees of convergence. The conditional distribution is related to variable selection with candidate parents playing the role of covariates or inputs. We empirically examine the performance of the sampler using several simulated and real data examples. The proposed method gives robust results in diverse settings, outperforming several existing Bayesian and frequentist methods. In addition, our empirical results shed some light on the relative merits of Bayesian and constraint-based methods for structure learning.
我们提出了一种用于有向无环图(DAG)模型结构学习的吉布斯采样器。用于 DAG 学习的标准马尔可夫链蒙特卡罗算法是随机漫步 Metropolis-Hastings 采样器。这些采样器保证渐进收敛,但在探索结构学习中出现的大型图空间时,往往混合缓慢。在每一步中,我们提出的采样器都会从适当的条件分布中为多个节点抽取整套父节点。这提供了一种在图空间中进行大规模移动的有效方法,既能加快混合速度,又能保持渐进保证的收敛性。条件分布与变量选择有关,候选父节点扮演协变量或输入的角色。我们利用几个模拟和真实数据实例对采样器的性能进行了实证检验。所提出的方法在不同的环境下都能提供稳健的结果,其性能优于现有的几种贝叶斯和频数方法。此外,我们的实证结果还揭示了贝叶斯方法和基于约束的结构学习方法的相对优势。
{"title":"A Gibbs Sampler for Learning DAGs.","authors":"Robert J B Goudie, Sach Mukherjee","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We propose a Gibbs sampler for structure learning in directed acyclic graph (DAG) models. The standard Markov chain Monte Carlo algorithms used for learning DAGs are random-walk Metropolis-Hastings samplers. These samplers are guaranteed to converge asymptotically but often mix slowly when exploring the large graph spaces that arise in structure learning. In each step, the sampler we propose draws entire sets of parents for multiple nodes from the appropriate conditional distribution. This provides an efficient way to make large moves in graph space, permitting faster mixing whilst retaining asymptotic guarantees of convergence. The conditional distribution is related to variable selection with candidate parents playing the role of covariates or inputs. We empirically examine the performance of the sampler using several simulated and real data examples. The proposed method gives robust results in diverse settings, outperforming several existing Bayesian and frequentist methods. In addition, our empirical results shed some light on the relative merits of Bayesian and constraint-based methods for structure learning.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5358773/pdf/emss-67582.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34845238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For spline regressions, it is well known that the choice of knots is crucial for the performance of the estimator. As a general learning framework covering the smoothing splines, learning in a Reproducing Kernel Hilbert Space (RKHS) has a similar issue. However, the selection of training data points for kernel functions in the RKHS representation has not been carefully studied in the literature. In this paper we study quantile regression as an example of learning in a RKHS. In this case, the regular squared norm penalty does not perform training data selection. We propose a data sparsity constraint that imposes thresholding on the kernel function coefficients to achieve a sparse kernel function representation. We demonstrate that the proposed data sparsity method can have competitive prediction performance for certain situations, and have comparable performance in other cases compared to that of the traditional squared norm penalty. Therefore, the data sparsity method can serve as a competitive alternative to the squared norm penalty method. Some theoretical properties of our proposed method using the data sparsity constraint are obtained. Both simulated and real data sets are used to demonstrate the usefulness of our data sparsity constraint.
{"title":"On Quantile Regression in Reproducing Kernel Hilbert Spaces with Data Sparsity Constraint.","authors":"Chong Zhang, Yufeng Liu, Yichao Wu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>For spline regressions, it is well known that the choice of knots is crucial for the performance of the estimator. As a general learning framework covering the smoothing splines, learning in a Reproducing Kernel Hilbert Space (RKHS) has a similar issue. However, the selection of training data points for kernel functions in the RKHS representation has not been carefully studied in the literature. In this paper we study quantile regression as an example of learning in a RKHS. In this case, the regular squared norm penalty does not perform training data selection. We propose a data sparsity constraint that imposes thresholding on the kernel function coefficients to achieve a sparse kernel function representation. We demonstrate that the proposed data sparsity method can have competitive prediction performance for certain situations, and have comparable performance in other cases compared to that of the traditional squared norm penalty. Therefore, the data sparsity method can serve as a competitive alternative to the squared norm penalty method. Some theoretical properties of our proposed method using the data sparsity constraint are obtained. Both simulated and real data sets are used to demonstrate the usefulness of our data sparsity constraint.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4850041/pdf/nihms729829.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34446126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}