首页 > 最新文献

Journal of Machine Learning Research最新文献

英文 中文
Provable Convex Co-clustering of Tensors. 可证明的张量凸共聚
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2020-01-01
Eric C Chi, Brian R Gaines, Will Wei Sun, Hua Zhou, Jian Yang

Cluster analysis is a fundamental tool for pattern discovery of complex heterogeneous data. Prevalent clustering methods mainly focus on vector or matrix-variate data and are not applicable to general-order tensors, which arise frequently in modern scientific and business applications. Moreover, there is a gap between statistical guarantees and computational efficiency for existing tensor clustering solutions due to the nature of their non-convex formulations. In this work, we bridge this gap by developing a provable convex formulation of tensor co-clustering. Our convex co-clustering (CoCo) estimator enjoys stability guarantees and its computational and storage costs are polynomial in the size of the data. We further establish a non-asymptotic error bound for the CoCo estimator, which reveals a surprising "blessing of dimensionality" phenomenon that does not exist in vector or matrix-variate cluster analysis. Our theoretical findings are supported by extensive simulated studies. Finally, we apply the CoCo estimator to the cluster analysis of advertisement click tensor data from a major online company. Our clustering results provide meaningful business insights to improve advertising effectiveness.

聚类分析是发现复杂异构数据模式的基本工具。流行的聚类方法主要集中于向量或矩阵变量数据,不适用于现代科学和商业应用中经常出现的一般阶张量。此外,现有的张量聚类解决方案由于其非凸公式的性质,在统计保证和计算效率之间存在差距。在这项工作中,我们通过开发一种可证明的张量共聚类凸表述来弥合这一差距。我们的凸共聚类(CoCo)估计器具有稳定性保证,其计算和存储成本是数据大小的多项式。我们进一步建立了 CoCo 估计器的非渐近误差约束,揭示了一种令人惊讶的 "维度祝福 "现象,而这种现象在向量或矩阵变量聚类分析中并不存在。我们的理论发现得到了大量模拟研究的支持。最后,我们将 CoCo 估计器应用于一家大型网络公司广告点击张量数据的聚类分析。我们的聚类结果为提高广告效果提供了有意义的商业见解。
{"title":"Provable Convex Co-clustering of Tensors.","authors":"Eric C Chi, Brian R Gaines, Will Wei Sun, Hua Zhou, Jian Yang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Cluster analysis is a fundamental tool for pattern discovery of complex heterogeneous data. Prevalent clustering methods mainly focus on vector or matrix-variate data and are not applicable to general-order tensors, which arise frequently in modern scientific and business applications. Moreover, there is a gap between statistical guarantees and computational efficiency for existing tensor clustering solutions due to the nature of their non-convex formulations. In this work, we bridge this gap by developing a provable convex formulation of tensor co-clustering. Our convex co-clustering (CoCo) estimator enjoys stability guarantees and its computational and storage costs are polynomial in the size of the data. We further establish a non-asymptotic error bound for the CoCo estimator, which reveals a surprising \"blessing of dimensionality\" phenomenon that does not exist in vector or matrix-variate cluster analysis. Our theoretical findings are supported by extensive simulated studies. Finally, we apply the CoCo estimator to the cluster analysis of advertisement click tensor data from a major online company. Our clustering results provide meaningful business insights to improve advertising effectiveness.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"21 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7731944/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38706545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proximal Distance Algorithms: Theory and Practice. 近距离算法:理论与实践。
IF 6 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2019-04-01
Kevin L Keys, Hua Zhou, Kenneth Lange

Proximal distance algorithms combine the classical penalty method of constrained minimization with distance majorization. If f(x) is the loss function, and C is the constraint set in a constrained minimization problem, then the proximal distance principle mandates minimizing the penalized loss f ( x ) + ρ 2 dist ( x , C ) 2 and following the solution x ρ to its limit as ρ tends to ∞. At each iteration the squared Euclidean distance dist(x,C)2 is majorized by the spherical quadratic ‖x- P C (x k )‖2, where P C (x k ) denotes the projection of the current iterate x k onto C. The minimum of the surrogate function f ( x ) + ρ 2 x - P C ( x k ) 2 is given by the proximal map prox ρ -1f [P C (x k )]. The next iterate x k+1 automatically decreases the original penalized loss for fixed ρ. Since many explicit projections and proximal maps are known, it is straightforward to derive and implement novel optimization algorithms in this setting. These algorithms can take hundreds if not thousands of iterations to converge, but the simple nature of each iteration makes proximal distance algorithms competitive with traditional algorithms. For convex problems, proximal distance algorithms reduce to proximal gradient algorithms and therefore enjoy well understood convergence properties. For nonconvex problems, one can attack convergence by invoking Zangwill's theorem. Our numerical examples demonstrate the utility of proximal distance algorithms in various high-dimensional settings, including a) linear programming, b) constrained least squares, c) projection to the closest kinship matrix, d) projection onto a second-order cone constraint, e) calculation of Horn's copositive matrix index, f) linear complementarity programming, and g) sparse principal components analysis. The proximal distance algorithm in each case is competitive or superior in speed to traditional methods such as the interior point method and the alternating direction method of multipliers (ADMM). Source code for the numerical examples can be found at https://github.com/klkeys/proxdist.

近距离算法将约束最小化的经典惩罚方法与距离优化相结合。如果f(x)是损失函数,而C是约束最小化问题中的约束集,则近距离原理要求最小化惩罚损失f(x)+ρ2 dist(x,C)2,并在ρ趋于∞时遵循解xρ到其极限。在每次迭代中,欧几里得距离dist(x,C)2的平方由球面二次方的‖x-PC(xk)‖2来控制,其中PC(xK)表示当前迭代的xk在C上的投影。代理函数f(x)+ρ2‖x-P C(xk。下一次迭代x k+1自动减少固定ρ的原始惩罚损失。由于许多显式投影和近端映射是已知的,因此在这种情况下推导和实现新的优化算法是简单的。这些算法可能需要数百次甚至数千次迭代才能收敛,但每次迭代的简单性使近距离算法与传统算法具有竞争力。对于凸问题,近距离算法简化为近梯度算法,因此具有众所周知的收敛特性。对于非凸问题,可以通过调用Zangwill定理来攻击收敛性。我们的数值例子证明了近距离算法在各种高维设置中的实用性,包括a)线性规划,b)约束最小二乘,c)投影到最接近的亲属矩阵,d)投影到二阶锥约束,e)计算Horn的正方矩阵指数,f)线性互补规划,以及g)稀疏主成分分析。在每种情况下,近距离算法在速度上都优于传统方法,如内点法和交替方向乘法器法(ADMM)。有关数值示例的源代码,请访问https://github.com/klkeys/proxdist.
{"title":"Proximal Distance Algorithms: Theory and Practice.","authors":"Kevin L Keys,&nbsp;Hua Zhou,&nbsp;Kenneth Lange","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Proximal distance algorithms combine the classical penalty method of constrained minimization with distance majorization. If <i>f</i>(<i>x</i>) is the loss function, and <i>C</i> is the constraint set in a constrained minimization problem, then the proximal distance principle mandates minimizing the penalized loss <math><mrow><mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>+</mo> <mfrac><mi>ρ</mi> <mn>2</mn></mfrac> <mtext>dist</mtext> <msup> <mrow><mrow><mo>(</mo> <mrow><mi>x</mi> <mo>,</mo> <mi>C</mi></mrow> <mo>)</mo></mrow> </mrow> <mn>2</mn></msup> </mrow> </math> and following the solution <i>x</i> <sub><i>ρ</i></sub> to its limit as <i>ρ</i> tends to ∞. At each iteration the squared Euclidean distance dist(<i>x,C</i>)<sup>2</sup> is majorized by the spherical quadratic ‖<i>x</i>- <i>P</i> <sub><i>C</i></sub> (<i>x</i> <sub><i>k</i></sub> )‖<sup>2</sup>, where <i>P</i> <sub><i>C</i></sub> (<i>x</i> <sub><i>k</i></sub> ) denotes the projection of the current iterate <i>x</i> <sub><i>k</i></sub> onto <i>C</i>. The minimum of the surrogate function <math><mrow><mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>+</mo> <mfrac><mi>ρ</mi> <mn>2</mn></mfrac> <mo>‖</mo> <mi>x</mi> <mo>-</mo> <msub><mi>P</mi> <mi>C</mi></msub> <mrow><mo>(</mo> <mrow><msub><mi>x</mi> <mi>k</mi></msub> </mrow> <mo>)</mo></mrow> <msup><mo>‖</mo> <mn>2</mn></msup> </mrow> </math> is given by the proximal map prox <sub><i>ρ</i></sub> -<sub>1<i>f</i></sub> [<i>P</i> <sub><i>C</i></sub> (<i>x</i> <sub><i>k</i></sub> )]. The next iterate <i>x</i> <sub><i>k</i>+1</sub> automatically decreases the original penalized loss for fixed <i>ρ</i>. Since many explicit projections and proximal maps are known, it is straightforward to derive and implement novel optimization algorithms in this setting. These algorithms can take hundreds if not thousands of iterations to converge, but the simple nature of each iteration makes proximal distance algorithms competitive with traditional algorithms. For convex problems, proximal distance algorithms reduce to proximal gradient algorithms and therefore enjoy well understood convergence properties. For nonconvex problems, one can attack convergence by invoking Zangwill's theorem. Our numerical examples demonstrate the utility of proximal distance algorithms in various high-dimensional settings, including a) linear programming, b) constrained least squares, c) projection to the closest kinship matrix, d) projection onto a second-order cone constraint, e) calculation of Horn's copositive matrix index, f) linear complementarity programming, and g) sparse principal components analysis. The proximal distance algorithm in each case is competitive or superior in speed to traditional methods such as the interior point method and the alternating direction method of multipliers (ADMM). Source code for the numerical examples can be found at https://github.com/klkeys/proxdist.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"20 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6812563/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41219016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Reduced PC-Algorithm: Improved Causal Structure Learning in Large Random Networks. 简化PC算法:大型随机网络中改进的因果结构学习。
IF 6 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2019-01-01
Arjun Sondhi, Ali Shojaie

We consider the task of estimating a high-dimensional directed acyclic graph, given observations from a linear structural equation model with arbitrary noise distribution. By exploiting properties of common random graphs, we develop a new algorithm that requires conditioning only on small sets of variables. The proposed algorithm, which is essentially a modified version of the PC-Algorithm, offers significant gains in both computational complexity and estimation accuracy. In particular, it results in more efficient and accurate estimation in large networks containing hub nodes, which are common in biological systems. We prove the consistency of the proposed algorithm, and show that it also requires a less stringent faithfulness assumption than the PC-Algorithm. Simulations in low and high-dimensional settings are used to illustrate these findings. An application to gene expression data suggests that the proposed algorithm can identify a greater number of clinically relevant genes than current methods.

我们考虑了估计高维有向无环图的任务,给出了具有任意噪声分布的线性结构方程模型的观测结果。通过利用常见随机图的性质,我们开发了一种只需要对小变量集进行条件处理的新算法。所提出的算法本质上是PC算法的修改版本,在计算复杂性和估计精度方面都有显著的提高。特别是,它在包含中枢节点的大型网络中产生了更高效和准确的估计,这在生物系统中很常见。我们证明了所提出算法的一致性,并表明它还需要比PC算法更不严格的忠实性假设。使用低维和高维环境中的模拟来说明这些发现。对基因表达数据的应用表明,与目前的方法相比,所提出的算法可以识别更多的临床相关基因。
{"title":"The Reduced PC-Algorithm: Improved Causal Structure Learning in Large Random Networks.","authors":"Arjun Sondhi, Ali Shojaie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We consider the task of estimating a high-dimensional directed acyclic graph, given observations from a linear structural equation model with arbitrary noise distribution. By exploiting properties of common random graphs, we develop a new algorithm that requires conditioning only on small sets of variables. The proposed algorithm, which is essentially a modified version of the PC-Algorithm, offers significant gains in both computational complexity and estimation accuracy. In particular, it results in more efficient and accurate estimation in large networks containing hub nodes, which are common in biological systems. We prove the consistency of the proposed algorithm, and show that it also requires a less stringent faithfulness assumption than the PC-Algorithm. Simulations in low and high-dimensional settings are used to illustrate these findings. An application to gene expression data suggests that the proposed algorithm can identify a greater number of clinically relevant genes than current methods.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"20 164","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10552884/pdf/nihms-1885649.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41105823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal Learning via Manifold Regularization. 通过漫反射正则化进行因果学习
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2019-01-01
Steven M Hill, Chris J Oates, Duncan A Blythe, Sach Mukherjee

This paper frames causal structure estimation as a machine learning task. The idea is to treat indicators of causal relationships between variables as 'labels' and to exploit available data on the variables of interest to provide features for the labelling task. Background scientific knowledge or any available interventional data provide labels on some causal relationships and the remainder are treated as unlabelled. To illustrate the key ideas, we develop a distance-based approach (based on bivariate histograms) within a manifold regularization framework. We present empirical results on three different biological data sets (including examples where causal effects can be verified by experimental intervention), that together demonstrate the efficacy and general nature of the approach as well as its simplicity from a user's point of view.

本文将因果结构估算作为一项机器学习任务。其思路是将变量间因果关系的指标视为 "标签",并利用相关变量的可用数据为标签任务提供特征。背景科学知识或任何可用的干预数据可为某些因果关系提供标签,而其余的则被视为无标签。为了说明关键思路,我们在流形正则化框架内开发了一种基于距离的方法(基于双变量直方图)。我们展示了三个不同生物数据集(包括可通过实验干预验证因果效应的例子)的实证结果,这些结果共同证明了该方法的有效性和通用性,以及从用户角度看它的简便性。
{"title":"Causal Learning via Manifold Regularization.","authors":"Steven M Hill, Chris J Oates, Duncan A Blythe, Sach Mukherjee","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This paper frames causal structure estimation as a machine learning task. The idea is to treat indicators of causal relationships between variables as 'labels' and to exploit available data on the variables of interest to provide features for the labelling task. Background scientific knowledge or any available interventional data provide labels on some causal relationships and the remainder are treated as unlabelled. To illustrate the key ideas, we develop a distance-based approach (based on bivariate histograms) within a manifold regularization framework. We present empirical results on three different biological data sets (including examples where causal effects can be verified by experimental intervention), that together demonstrate the efficacy and general nature of the approach as well as its simplicity from a user's point of view.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"20 ","pages":"127"},"PeriodicalIF":4.3,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6986916/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9142095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient augmentation and relaxation learning for individualized treatment rules using observational data. 使用观察数据对个性化治疗规则进行有效的强化和放松学习。
IF 6 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2019-01-01
Ying-Qi Zhao, Eric B Laber, Yang Ning, Sumona Saha, Bruce E Sands

Individualized treatment rules aim to identify if, when, which, and to whom treatment should be applied. A globally aging population, rising healthcare costs, and increased access to patient-level data have created an urgent need for high-quality estimators of individualized treatment rules that can be applied to observational data. A recent and promising line of research for estimating individualized treatment rules recasts the problem of estimating an optimal treatment rule as a weighted classification problem. We consider a class of estimators for optimal treatment rules that are analogous to convex large-margin classifiers. The proposed class applies to observational data and is doubly-robust in the sense that correct specification of either a propensity or outcome model leads to consistent estimation of the optimal individualized treatment rule. Using techniques from semiparametric efficiency theory, we derive rates of convergence for the proposed estimators and use these rates to characterize the bias-variance trade-off for estimating individualized treatment rules with classification-based methods. Simulation experiments informed by these results demonstrate that it is possible to construct new estimators within the proposed framework that significantly outperform existing ones. We illustrate the proposed methods using data from a labor training program and a study of inflammatory bowel syndrome.

个性化治疗规则旨在确定是否、何时、哪种治疗以及治疗对象。全球人口老龄化、医疗成本上升以及对患者层面数据的访问增加,迫切需要能够应用于观察数据的个性化治疗规则的高质量估计。最近一项有希望的估计个性化治疗规则的研究将估计最佳治疗规则的问题重新定义为加权分类问题。我们考虑了一类最优处理规则的估计量,它类似于凸的大边缘分类器。所提出的类别适用于观察数据,并且在倾向或结果模型的正确规范导致最佳个体化治疗规则的一致估计的意义上是双重稳健的。使用半参数效率理论中的技术,我们导出了所提出的估计量的收敛率,并使用这些收敛率来表征使用基于分类的方法估计个性化治疗规则的偏差-方差权衡。根据这些结果进行的模拟实验表明,在所提出的框架内构建新的估计量是可能的,其显著优于现有的估计量。我们使用来自分娩训练计划和炎症性肠综合征研究的数据来说明所提出的方法。
{"title":"Efficient augmentation and relaxation learning for individualized treatment rules using observational data.","authors":"Ying-Qi Zhao, Eric B Laber, Yang Ning, Sumona Saha, Bruce E Sands","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Individualized treatment rules aim to identify if, when, which, and to whom treatment should be applied. A globally aging population, rising healthcare costs, and increased access to patient-level data have created an urgent need for high-quality estimators of individualized treatment rules that can be applied to observational data. A recent and promising line of research for estimating individualized treatment rules recasts the problem of estimating an optimal treatment rule as a weighted classification problem. We consider a class of estimators for optimal treatment rules that are analogous to convex large-margin classifiers. The proposed class applies to observational data and is doubly-robust in the sense that correct specification of either a propensity or outcome model leads to consistent estimation of the optimal individualized treatment rule. Using techniques from semiparametric efficiency theory, we derive rates of convergence for the proposed estimators and use these rates to characterize the bias-variance trade-off for estimating individualized treatment rules with classification-based methods. Simulation experiments informed by these results demonstrate that it is possible to construct new estimators within the proposed framework that significantly outperform existing ones. We illustrate the proposed methods using data from a labor training program and a study of inflammatory bowel syndrome.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"20 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6705615/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41219015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse concordance-assisted learning for optimal treatment decision. 稀疏一致性辅助学习的最优治疗决策。
IF 6 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2018-04-01
Shuhan Liang, Wenbin Lu, Rui Song, Lan Wang

To find optimal decision rule, Fan et al. (2016) proposed an innovative concordance-assisted learning algorithm which is based on maximum rank correlation estimator. It makes better use of the available information through pairwise comparison. However the objective function is discontinuous and computationally hard to optimize. In this paper, we consider a convex surrogate loss function to solve this problem. In addition, our algorithm ensures sparsity of decision rule and renders easy interpretation. We derive the L 2 error bound of the estimated coefficients under ultra-high dimension. Simulation results of various settings and application to STAR*D both illustrate that the proposed method can still estimate optimal treatment regime successfully when the number of covariates is large.

为了寻找最优的决策规则,Fan等(2016)提出了一种基于最大秩相关估计的创新的一致性辅助学习算法。它通过两两比较更好地利用了现有信息。然而,目标函数是不连续的,计算上难以优化。在本文中,我们考虑一个凸替代损失函数来解决这个问题。此外,我们的算法保证了决策规则的稀疏性,并且易于解释。导出了超高维下估计系数的l2误差界。各种设置的仿真结果和STAR*D的应用都表明,当协变量数量较大时,所提出的方法仍然可以成功地估计出最优处理方案。
{"title":"Sparse concordance-assisted learning for optimal treatment decision.","authors":"Shuhan Liang,&nbsp;Wenbin Lu,&nbsp;Rui Song,&nbsp;Lan Wang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>To find optimal decision rule, Fan et al. (2016) proposed an innovative concordance-assisted learning algorithm which is based on maximum rank correlation estimator. It makes better use of the available information through pairwise comparison. However the objective function is discontinuous and computationally hard to optimize. In this paper, we consider a convex surrogate loss function to solve this problem. In addition, our algorithm ensures sparsity of decision rule and renders easy interpretation. We derive the <i>L</i> <sub>2</sub> error bound of the estimated coefficients under ultra-high dimension. Simulation results of various settings and application to STAR*D both illustrate that the proposed method can still estimate optimal treatment regime successfully when the number of covariates is large.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"18 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6226264/pdf/nihms-987205.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36655227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Saturating Splines and Feature Selection. 饱和样条和特征选择
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2018-04-01
Nicholas Boyd, Trevor Hastie, Stephen Boyd, Benjamin Recht, Michael I Jordan

We extend the adaptive regression spline model by incorporating saturation, the natural requirement that a function extend as a constant outside a certain range. We fit saturating splines to data via a convex optimization problem over a space of measures, which we solve using an efficient algorithm based on the conditional gradient method. Unlike many existing approaches, our algorithm solves the original infinite-dimensional (for splines of degree at least two) optimization problem without pre-specified knot locations. We then adapt our algorithm to fit generalized additive models with saturating splines as coordinate functions and show that the saturation requirement allows our model to simultaneously perform feature selection and nonlinear function fitting. Finally, we briefly sketch how the method can be extended to higher order splines and to different requirements on the extension outside the data range.

我们扩展了自适应回归样条曲线模型,将饱和度(即函数在一定范围外扩展为常数的自然要求)纳入其中。我们通过一个度量空间上的凸优化问题将饱和样条曲线拟合到数据中,并使用基于条件梯度法的高效算法解决该问题。与许多现有方法不同的是,我们的算法无需预先指定节点位置,即可解决原始的无限维(对于阶数至少为 2 的样条曲线)优化问题。然后,我们将算法调整为拟合以饱和样条为坐标函数的广义加法模型,并证明饱和度要求允许我们的模型同时执行特征选择和非线性函数拟合。最后,我们简要介绍了如何将该方法扩展到更高阶的样条线,以及数据范围外扩展的不同要求。
{"title":"Saturating Splines and Feature Selection.","authors":"Nicholas Boyd, Trevor Hastie, Stephen Boyd, Benjamin Recht, Michael I Jordan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We extend the adaptive regression spline model by incorporating <i>saturation</i>, the natural requirement that a function extend as a constant outside a certain range. We fit saturating splines to data via a convex optimization problem over a space of measures, which we solve using an efficient algorithm based on the conditional gradient method. Unlike many existing approaches, our algorithm solves the original infinite-dimensional (for splines of degree at least two) optimization problem without pre-specified knot locations. We then adapt our algorithm to fit generalized additive models with saturating splines as coordinate functions and show that the saturation requirement allows our model to simultaneously perform feature selection and nonlinear function fitting. Finally, we briefly sketch how the method can be extended to higher order splines and to different requirements on the extension outside the data range.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"18 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6474379/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37347891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Significance-based community detection in weighted networks. 加权网络中基于显著性的社区检测。
IF 6 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2018-04-01
John Palowitch, Shankar Bhamidi, Andrew B Nobel

Community detection is the process of grouping strongly connected nodes in a network. Many community detection methods for un-weighted networks have a theoretical basis in a null model. Communities discovered by these methods therefore have interpretations in terms of statistical significance. In this paper, we introduce a null for weighted networks called the continuous configuration model. First, we propose a community extraction algorithm for weighted networks which incorporates iterative hypothesis testing under the null. We prove a central limit theorem for edge-weight sums and asymptotic consistency of the algorithm under a weighted stochastic block model. We then incorporate the algorithm in a community detection method called CCME. To benchmark the method, we provide a simulation framework involving the null to plant "background" nodes in weighted networks with communities. We show that the empirical performance of CCME on these simulations is competitive with existing methods, particularly when overlapping communities and background nodes are present. To further validate the method, we present two real-world networks with potential background nodes and analyze them with CCME, yielding results that reveal macro-features of the corresponding systems.

社区检测是对网络中的强连接节点进行分组的过程。许多非加权网络的社区检测方法都有零模型的理论基础。因此,通过这些方法发现的群落具有统计学意义的解释。在本文中,我们为加权网络引入了一个零,称为连续配置模型。首先,我们提出了一种加权网络的社区提取算法,该算法结合了零下的迭代假设检验。在加权随机块模型下,我们证明了边权和的中心极限定理和算法的渐近一致性。然后,我们将该算法纳入一种称为CCME的社区检测方法中。为了对该方法进行基准测试,我们提供了一个模拟框架,该框架涉及社区加权网络中的零到植物“背景”节点。我们表明,CCME在这些模拟上的经验性能与现有方法相比是有竞争力的,特别是当存在重叠的社区和背景节点时。为了进一步验证该方法,我们提出了两个具有潜在背景节点的真实世界网络,并用CCME对其进行分析,得出的结果揭示了相应系统的宏观特征。
{"title":"Significance-based community detection in weighted networks.","authors":"John Palowitch,&nbsp;Shankar Bhamidi,&nbsp;Andrew B Nobel","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Community detection is the process of grouping strongly connected nodes in a network. Many community detection methods for <i>un</i>-weighted networks have a theoretical basis in a null model. Communities discovered by these methods therefore have interpretations in terms of statistical significance. In this paper, we introduce a null for weighted networks called the continuous configuration model. First, we propose a community extraction algorithm for weighted networks which incorporates iterative hypothesis testing under the null. We prove a central limit theorem for edge-weight sums and asymptotic consistency of the algorithm under a weighted stochastic block model. We then incorporate the algorithm in a community detection method called CCME. To benchmark the method, we provide a simulation framework involving the null to plant \"background\" nodes in weighted networks with communities. We show that the empirical performance of CCME on these simulations is competitive with existing methods, particularly when overlapping communities and background nodes are present. To further validate the method, we present two real-world networks with potential background nodes and analyze them with CCME, yielding results that reveal macro-features of the corresponding systems.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"18 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6402789/pdf/nihms970916.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41156142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An l Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation. 一个l∞特征向量扰动界及其在鲁棒协方差估计中的应用。
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2018-04-01
Jianqing Fan, Weichen Wang, Yiqiao Zhong

In statistics and machine learning, we are interested in the eigenvectors (or singular vectors) of certain matrices (e.g. covariance matrices, data matrices, etc). However, those matrices are usually perturbed by noises or statistical errors, either from random sampling or structural patterns. The Davis-Kahan sin θ theorem is often used to bound the difference between the eigenvectors of a matrix A and those of a perturbed matrix A ˜ = A + E , in terms of l 2 norm. In this paper, we prove that when A is a low-rank and incoherent matrix, the l norm perturbation bound of singular vectors (or eigenvectors in the symmetric case) is smaller by a factor of d 1 or d 2 for left and right vectors, where d 1 and d 2 are the matrix dimensions. The power of this new perturbation result is shown in robust covariance estimation, particularly when random variables have heavy tails. There, we propose new robust covariance estimators and establish their asymptotic properties using the newly developed perturbation bound. Our theoretical results are verified through extensive numerical experiments.

在统计学和机器学习中,我们对某些矩阵(如协方差矩阵、数据矩阵等)的特征向量(或奇异向量)感兴趣。然而,这些矩阵通常受到来自随机采样或结构模式的噪声或统计误差的干扰。Davis-Kahan-sinθ定理通常用于根据L2范数来约束矩阵a的特征向量与扰动矩阵a~=a+E的特征向量之间的差。在本文中,我们证明了当A是一个低秩非相干矩阵时,对于左向量和右向量,奇异向量(或对称情况下的特征向量)的l∞范数扰动界小于d1或d2的因子,其中d1和d2是矩阵维数。这种新的扰动结果的功率显示在鲁棒协方差估计中,特别是当随机变量具有重尾时。在那里,我们提出了新的鲁棒协方差估计,并使用新发展的扰动界建立了它们的渐近性质。我们的理论结果通过大量的数值实验得到了验证。
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">An <ns0:math> <ns0:mrow><ns0:msub><ns0:mi>l</ns0:mi> <ns0:mi>∞</ns0:mi></ns0:msub> </ns0:mrow> </ns0:math> Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation.","authors":"Jianqing Fan, Weichen Wang, Yiqiao Zhong","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In statistics and machine learning, we are interested in the eigenvectors (or singular vectors) of certain matrices (e.g. covariance matrices, data matrices, etc). However, those matrices are usually perturbed by noises or statistical errors, either from random sampling or structural patterns. The Davis-Kahan sin <i>θ</i> theorem is often used to bound the difference between the eigenvectors of a matrix A and those of a perturbed matrix <math> <mrow><mover><mi>A</mi> <mo>˜</mo></mover> <mo>=</mo> <mi>A</mi> <mo>+</mo> <mi>E</mi></mrow> </math> , in terms of <math> <mrow><msub><mi>l</mi> <mn>2</mn></msub> </mrow> </math> norm. In this paper, we prove that when <i>A</i> is a low-rank and incoherent matrix, the <math> <mrow><msub><mi>l</mi> <mi>∞</mi></msub> </mrow> </math> norm perturbation bound of singular vectors (or eigenvectors in the symmetric case) is smaller by a factor of <math> <mrow> <msqrt> <mrow><msub><mi>d</mi> <mn>1</mn></msub> </mrow> </msqrt> </mrow> </math> or <math> <mrow> <msqrt> <mrow><msub><mi>d</mi> <mn>2</mn></msub> </mrow> </msqrt> </mrow> </math> for left and right vectors, where <i>d</i> <sub>1</sub> and <i>d</i> <sub>2</sub> are the matrix dimensions. The power of this new perturbation result is shown in robust covariance estimation, particularly when random variables have heavy tails. There, we propose new robust covariance estimators and establish their asymptotic properties using the newly developed perturbation bound. Our theoretical results are verified through extensive numerical experiments.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"18 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6867801/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49684379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simultaneous Clustering and Estimation of Heterogeneous Graphical Models. 异构图形模型的同步聚类和估算。
IF 6 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2018-04-01
Botao Hao, Will Wei Sun, Yufeng Liu, Guang Cheng

We consider joint estimation of multiple graphical models arising from heterogeneous and high-dimensional observations. Unlike most previous approaches which assume that the cluster structure is given in advance, an appealing feature of our method is to learn cluster structure while estimating heterogeneous graphical models. This is achieved via a high dimensional version of Expectation Conditional Maximization (ECM) algorithm (Meng and Rubin, 1993). A joint graphical lasso penalty is imposed on the conditional maximization step to extract both homogeneity and heterogeneity components across all clusters. Our algorithm is computationally efficient due to fast sparse learning routines and can be implemented without unsupervised learning knowledge. The superior performance of our method is demonstrated by extensive experiments and its application to a Glioblastoma cancer dataset reveals some new insights in understanding the Glioblastoma cancer. In theory, a non-asymptotic error bound is established for the output directly from our high dimensional ECM algorithm, and it consists of two quantities: statistical error (statistical accuracy) and optimization error (computational complexity). Such a result gives a theoretical guideline in terminating our ECM iterations.

我们考虑的是对由异构高维观测结果产生的多个图形模型进行联合估计。以前的大多数方法都假定聚类结构是事先给定的,而我们的方法与之不同,它的一个吸引人的特点是在估计异构图形模型的同时学习聚类结构。这是通过高维版本的期望条件最大化(ECM)算法(Meng 和 Rubin,1993 年)实现的。在条件最大化步骤中施加了联合图形套索惩罚,以提取所有聚类中的同质性和异质性成分。由于采用了快速稀疏学习程序,我们的算法计算效率很高,而且无需无监督学习知识即可实现。大量实验证明了我们的方法性能优越,将其应用于胶质母细胞瘤癌症数据集揭示了理解胶质母细胞瘤癌症的一些新见解。从理论上讲,我们为高维 ECM 算法的直接输出建立了一个非渐进误差约束,它包括两个量:统计误差(统计准确性)和优化误差(计算复杂性)。这一结果为终止 ECM 迭代提供了理论指导。
{"title":"Simultaneous Clustering and Estimation of Heterogeneous Graphical Models.","authors":"Botao Hao, Will Wei Sun, Yufeng Liu, Guang Cheng","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We consider joint estimation of multiple graphical models arising from heterogeneous and high-dimensional observations. Unlike most previous approaches which assume that the cluster structure is given in advance, an appealing feature of our method is to learn cluster structure while estimating heterogeneous graphical models. This is achieved via a high dimensional version of Expectation Conditional Maximization (ECM) algorithm (Meng and Rubin, 1993). A joint graphical lasso penalty is imposed on the conditional maximization step to extract both homogeneity and heterogeneity components across all clusters. Our algorithm is computationally efficient due to fast sparse learning routines and can be implemented without unsupervised learning knowledge. The superior performance of our method is demonstrated by extensive experiments and its application to a Glioblastoma cancer dataset reveals some new insights in understanding the Glioblastoma cancer. In theory, a non-asymptotic error bound is established for the output directly from our high dimensional ECM algorithm, and it consists of two quantities: <i>statistical error</i> (statistical accuracy) and <i>optimization error</i> (computational complexity). Such a result gives a theoretical guideline in terminating our ECM iterations.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"18 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6338433/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36923362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Machine Learning Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1