首页 > 最新文献

Journal of Machine Learning Research最新文献

英文 中文
Minimax Nonparametric Parallelism Test. 最小非参数平行检验。
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2020-01-01
Xin Xing, Meimei Liu, Ping Ma, Wenxuan Zhong

Testing the hypothesis of parallelism is a fundamental statistical problem arising from many applied sciences. In this paper, we develop a nonparametric parallelism test for inferring whether the trends are parallel in treatment and control groups. In particular, the proposed nonparametric parallelism test is a Wald type test based on a smoothing spline ANOVA (SSANOVA) model which can characterize the complex patterns of the data. We derive that the asymptotic null distribution of the test statistic is a Chi-square distribution, unveiling a new version of Wilks phenomenon. Notably, we establish the minimax sharp lower bound of the distinguishable rate for the nonparametric parallelism test by using the information theory, and further prove that the proposed test is minimax optimal. Simulation studies are conducted to investigate the empirical performance of the proposed test. DNA methylation and neuroimaging studies are presented to illustrate potential applications of the test. The software is available at https://github.com/BioAlgs/Parallelism.

平行假设检验是许多应用科学中出现的基本统计问题。在本文中,我们开发了一种非参数平行性检验,用于推断治疗组和对照组的趋势是否平行。具体而言,本文提出的非参数平行性检验是一种基于平滑样条方差分析(SSANOVA)模型的 Wald 型检验,它可以描述数据的复杂模式。我们推导出检验统计量的渐近零分布是 Chi-square 分布,揭示了新版本的 Wilks 现象。值得注意的是,我们利用信息论建立了非参数并行性检验可区分率的最小陡峭下限,并进一步证明了所提出的检验是最小最优的。我们还进行了模拟研究,以考察所提检验的经验性能。此外,还介绍了 DNA 甲基化和神经影像学研究,以说明该检验的潜在应用。该软件可在 https://github.com/BioAlgs/Parallelism 上获取。
{"title":"Minimax Nonparametric Parallelism Test.","authors":"Xin Xing, Meimei Liu, Ping Ma, Wenxuan Zhong","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Testing the hypothesis of parallelism is a fundamental statistical problem arising from many applied sciences. In this paper, we develop a nonparametric parallelism test for inferring whether the trends are parallel in treatment and control groups. In particular, the proposed nonparametric parallelism test is a Wald type test based on a smoothing spline ANOVA (SSANOVA) model which can characterize the complex patterns of the data. We derive that the asymptotic null distribution of the test statistic is a Chi-square distribution, unveiling a new version of Wilks phenomenon. Notably, we establish the minimax sharp lower bound of the distinguishable rate for the nonparametric parallelism test by using the information theory, and further prove that the proposed test is minimax optimal. Simulation studies are conducted to investigate the empirical performance of the proposed test. DNA methylation and neuroimaging studies are presented to illustrate potential applications of the test. The software is available at https://github.com/BioAlgs/Parallelism.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11086968/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140912390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Provable Convex Co-clustering of Tensors. 可证明的张量凸共聚
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2020-01-01
Eric C Chi, Brian R Gaines, Will Wei Sun, Hua Zhou, Jian Yang

Cluster analysis is a fundamental tool for pattern discovery of complex heterogeneous data. Prevalent clustering methods mainly focus on vector or matrix-variate data and are not applicable to general-order tensors, which arise frequently in modern scientific and business applications. Moreover, there is a gap between statistical guarantees and computational efficiency for existing tensor clustering solutions due to the nature of their non-convex formulations. In this work, we bridge this gap by developing a provable convex formulation of tensor co-clustering. Our convex co-clustering (CoCo) estimator enjoys stability guarantees and its computational and storage costs are polynomial in the size of the data. We further establish a non-asymptotic error bound for the CoCo estimator, which reveals a surprising "blessing of dimensionality" phenomenon that does not exist in vector or matrix-variate cluster analysis. Our theoretical findings are supported by extensive simulated studies. Finally, we apply the CoCo estimator to the cluster analysis of advertisement click tensor data from a major online company. Our clustering results provide meaningful business insights to improve advertising effectiveness.

聚类分析是发现复杂异构数据模式的基本工具。流行的聚类方法主要集中于向量或矩阵变量数据,不适用于现代科学和商业应用中经常出现的一般阶张量。此外,现有的张量聚类解决方案由于其非凸公式的性质,在统计保证和计算效率之间存在差距。在这项工作中,我们通过开发一种可证明的张量共聚类凸表述来弥合这一差距。我们的凸共聚类(CoCo)估计器具有稳定性保证,其计算和存储成本是数据大小的多项式。我们进一步建立了 CoCo 估计器的非渐近误差约束,揭示了一种令人惊讶的 "维度祝福 "现象,而这种现象在向量或矩阵变量聚类分析中并不存在。我们的理论发现得到了大量模拟研究的支持。最后,我们将 CoCo 估计器应用于一家大型网络公司广告点击张量数据的聚类分析。我们的聚类结果为提高广告效果提供了有意义的商业见解。
{"title":"Provable Convex Co-clustering of Tensors.","authors":"Eric C Chi, Brian R Gaines, Will Wei Sun, Hua Zhou, Jian Yang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Cluster analysis is a fundamental tool for pattern discovery of complex heterogeneous data. Prevalent clustering methods mainly focus on vector or matrix-variate data and are not applicable to general-order tensors, which arise frequently in modern scientific and business applications. Moreover, there is a gap between statistical guarantees and computational efficiency for existing tensor clustering solutions due to the nature of their non-convex formulations. In this work, we bridge this gap by developing a provable convex formulation of tensor co-clustering. Our convex co-clustering (CoCo) estimator enjoys stability guarantees and its computational and storage costs are polynomial in the size of the data. We further establish a non-asymptotic error bound for the CoCo estimator, which reveals a surprising \"blessing of dimensionality\" phenomenon that does not exist in vector or matrix-variate cluster analysis. Our theoretical findings are supported by extensive simulated studies. Finally, we apply the CoCo estimator to the cluster analysis of advertisement click tensor data from a major online company. Our clustering results provide meaningful business insights to improve advertising effectiveness.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7731944/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38706545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized Score Matching for Non-Negative Data. 非负数据的广义分数匹配。
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2019-04-01
Shiqing Yu, Mathias Drton, Ali Shojaie

A common challenge in estimating parameters of probability density functions is the intractability of the normalizing constant. While in such cases maximum likelihood estimation may be implemented using numerical integration, the approach becomes computationally intensive. The score matching method of Hyvärinen (2005) avoids direct calculation of the normalizing constant and yields closed-form estimates for exponential families of continuous distributions over R m . Hyvärinen (2007) extended the approach to distributions supported on the non-negative orthant, R + m . In this paper, we give a generalized form of score matching for non-negative data that improves estimation efficiency. As an example, we consider a general class of pairwise interaction models. Addressing an overlooked inexistence problem, we generalize the regularized score matching method of Lin et al. (2016) and improve its theoretical guarantees for non-negative Gaussian graphical models.

估计概率密度函数参数的一个常见挑战是归一化常数的难处。虽然在这种情况下,最大似然估计可以使用数值积分来实现,但这种方法的计算量很大。Hyvärinen(2005)的得分匹配方法避免了直接计算归一化常数,并对R m上的连续分布的指数族产生了封闭形式的估计。Hyvärinen(2007)将该方法扩展到非负正交R + m上支持的分布。本文给出了一种非负数据的分数匹配的广义形式,提高了估计效率。作为一个例子,我们考虑一类一般的两两交互模型。为了解决一个被忽视的不存在问题,我们推广了Lin等人(2016)的正则化分数匹配方法,并改进了其对非负高斯图模型的理论保证。
{"title":"Generalized Score Matching for Non-Negative Data.","authors":"Shiqing Yu,&nbsp;Mathias Drton,&nbsp;Ali Shojaie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A common challenge in estimating parameters of probability density functions is the intractability of the normalizing constant. While in such cases maximum likelihood estimation may be implemented using numerical integration, the approach becomes computationally intensive. The score matching method of Hyvärinen (2005) avoids direct calculation of the normalizing constant and yields closed-form estimates for exponential families of continuous distributions over <math> <mrow><msup><mi>R</mi> <mi>m</mi></msup> </mrow> </math> . Hyvärinen (2007) extended the approach to distributions supported on the non-negative orthant, <math> <mrow><msubsup><mi>R</mi> <mo>+</mo> <mi>m</mi></msubsup> </mrow> </math> . In this paper, we give a generalized form of score matching for non-negative data that improves estimation efficiency. As an example, we consider a general class of pairwise interaction models. Addressing an overlooked inexistence problem, we generalize the regularized score matching method of Lin et al. (2016) and improve its theoretical guarantees for non-negative Gaussian graphical models.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8291733/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39208339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proximal Distance Algorithms: Theory and Practice. 近距离算法:理论与实践。
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2019-04-01
Kevin L Keys, Hua Zhou, Kenneth Lange

Proximal distance algorithms combine the classical penalty method of constrained minimization with distance majorization. If f(x) is the loss function, and C is the constraint set in a constrained minimization problem, then the proximal distance principle mandates minimizing the penalized loss f ( x ) + ρ 2 dist ( x , C ) 2 and following the solution x ρ to its limit as ρ tends to ∞. At each iteration the squared Euclidean distance dist(x,C)2 is majorized by the spherical quadratic ‖x- P C (x k )‖2, where P C (x k ) denotes the projection of the current iterate x k onto C. The minimum of the surrogate function f ( x ) + ρ 2 x - P C ( x k ) 2 is given by the proximal map prox ρ -1f [P C (x k )]. The next iterate x k+1 automatically decreases the original penalized loss for fixed ρ. Since many explicit projections and proximal maps are known, it is straightforward to derive and implement novel optimization algorithms in this setting. These algorithms can take hundreds if not thousands of iterations to converge, but the simple nature of each iteration makes proximal distance algorithms competitive with traditional algorithms. For convex problems, proximal distance algorithms reduce to proximal gradient algorithms and therefore enjoy well understood convergence properties. For nonconvex problems, one can attack convergence by invoking Zangwill's theorem. Our numerical examples demonstrate the utility of proximal distance algorithms in various high-dimensional settings, including a) linear programming, b) constrained least squares, c) projection to the closest kinship matrix, d) projection onto a second-order cone constraint, e) calculation of Horn's copositive matrix index, f) linear complementarity programming, and g) sparse principal components analysis. The proximal distance algorithm in each case is competitive or superior in speed to traditional methods such as the interior point method and the alternating direction method of multipliers (ADMM). Source code for the numerical examples can be found at https://github.com/klkeys/proxdist.

近距离算法将约束最小化的经典惩罚方法与距离优化相结合。如果f(x)是损失函数,而C是约束最小化问题中的约束集,则近距离原理要求最小化惩罚损失f(x)+ρ2 dist(x,C)2,并在ρ趋于∞时遵循解xρ到其极限。在每次迭代中,欧几里得距离dist(x,C)2的平方由球面二次方的‖x-PC(xk)‖2来控制,其中PC(xK)表示当前迭代的xk在C上的投影。代理函数f(x)+ρ2‖x-P C(xk。下一次迭代x k+1自动减少固定ρ的原始惩罚损失。由于许多显式投影和近端映射是已知的,因此在这种情况下推导和实现新的优化算法是简单的。这些算法可能需要数百次甚至数千次迭代才能收敛,但每次迭代的简单性使近距离算法与传统算法具有竞争力。对于凸问题,近距离算法简化为近梯度算法,因此具有众所周知的收敛特性。对于非凸问题,可以通过调用Zangwill定理来攻击收敛性。我们的数值例子证明了近距离算法在各种高维设置中的实用性,包括a)线性规划,b)约束最小二乘,c)投影到最接近的亲属矩阵,d)投影到二阶锥约束,e)计算Horn的正方矩阵指数,f)线性互补规划,以及g)稀疏主成分分析。在每种情况下,近距离算法在速度上都优于传统方法,如内点法和交替方向乘法器法(ADMM)。有关数值示例的源代码,请访问https://github.com/klkeys/proxdist.
{"title":"Proximal Distance Algorithms: Theory and Practice.","authors":"Kevin L Keys,&nbsp;Hua Zhou,&nbsp;Kenneth Lange","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Proximal distance algorithms combine the classical penalty method of constrained minimization with distance majorization. If <i>f</i>(<i>x</i>) is the loss function, and <i>C</i> is the constraint set in a constrained minimization problem, then the proximal distance principle mandates minimizing the penalized loss <math><mrow><mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>+</mo> <mfrac><mi>ρ</mi> <mn>2</mn></mfrac> <mtext>dist</mtext> <msup> <mrow><mrow><mo>(</mo> <mrow><mi>x</mi> <mo>,</mo> <mi>C</mi></mrow> <mo>)</mo></mrow> </mrow> <mn>2</mn></msup> </mrow> </math> and following the solution <i>x</i> <sub><i>ρ</i></sub> to its limit as <i>ρ</i> tends to ∞. At each iteration the squared Euclidean distance dist(<i>x,C</i>)<sup>2</sup> is majorized by the spherical quadratic ‖<i>x</i>- <i>P</i> <sub><i>C</i></sub> (<i>x</i> <sub><i>k</i></sub> )‖<sup>2</sup>, where <i>P</i> <sub><i>C</i></sub> (<i>x</i> <sub><i>k</i></sub> ) denotes the projection of the current iterate <i>x</i> <sub><i>k</i></sub> onto <i>C</i>. The minimum of the surrogate function <math><mrow><mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>+</mo> <mfrac><mi>ρ</mi> <mn>2</mn></mfrac> <mo>‖</mo> <mi>x</mi> <mo>-</mo> <msub><mi>P</mi> <mi>C</mi></msub> <mrow><mo>(</mo> <mrow><msub><mi>x</mi> <mi>k</mi></msub> </mrow> <mo>)</mo></mrow> <msup><mo>‖</mo> <mn>2</mn></msup> </mrow> </math> is given by the proximal map prox <sub><i>ρ</i></sub> -<sub>1<i>f</i></sub> [<i>P</i> <sub><i>C</i></sub> (<i>x</i> <sub><i>k</i></sub> )]. The next iterate <i>x</i> <sub><i>k</i>+1</sub> automatically decreases the original penalized loss for fixed <i>ρ</i>. Since many explicit projections and proximal maps are known, it is straightforward to derive and implement novel optimization algorithms in this setting. These algorithms can take hundreds if not thousands of iterations to converge, but the simple nature of each iteration makes proximal distance algorithms competitive with traditional algorithms. For convex problems, proximal distance algorithms reduce to proximal gradient algorithms and therefore enjoy well understood convergence properties. For nonconvex problems, one can attack convergence by invoking Zangwill's theorem. Our numerical examples demonstrate the utility of proximal distance algorithms in various high-dimensional settings, including a) linear programming, b) constrained least squares, c) projection to the closest kinship matrix, d) projection onto a second-order cone constraint, e) calculation of Horn's copositive matrix index, f) linear complementarity programming, and g) sparse principal components analysis. The proximal distance algorithm in each case is competitive or superior in speed to traditional methods such as the interior point method and the alternating direction method of multipliers (ADMM). Source code for the numerical examples can be found at https://github.com/klkeys/proxdist.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6812563/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41219016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Reduced PC-Algorithm: Improved Causal Structure Learning in Large Random Networks. 简化PC算法:大型随机网络中改进的因果结构学习。
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2019-01-01
Arjun Sondhi, Ali Shojaie

We consider the task of estimating a high-dimensional directed acyclic graph, given observations from a linear structural equation model with arbitrary noise distribution. By exploiting properties of common random graphs, we develop a new algorithm that requires conditioning only on small sets of variables. The proposed algorithm, which is essentially a modified version of the PC-Algorithm, offers significant gains in both computational complexity and estimation accuracy. In particular, it results in more efficient and accurate estimation in large networks containing hub nodes, which are common in biological systems. We prove the consistency of the proposed algorithm, and show that it also requires a less stringent faithfulness assumption than the PC-Algorithm. Simulations in low and high-dimensional settings are used to illustrate these findings. An application to gene expression data suggests that the proposed algorithm can identify a greater number of clinically relevant genes than current methods.

我们考虑了估计高维有向无环图的任务,给出了具有任意噪声分布的线性结构方程模型的观测结果。通过利用常见随机图的性质,我们开发了一种只需要对小变量集进行条件处理的新算法。所提出的算法本质上是PC算法的修改版本,在计算复杂性和估计精度方面都有显著的提高。特别是,它在包含中枢节点的大型网络中产生了更高效和准确的估计,这在生物系统中很常见。我们证明了所提出算法的一致性,并表明它还需要比PC算法更不严格的忠实性假设。使用低维和高维环境中的模拟来说明这些发现。对基因表达数据的应用表明,与目前的方法相比,所提出的算法可以识别更多的临床相关基因。
{"title":"The Reduced PC-Algorithm: Improved Causal Structure Learning in Large Random Networks.","authors":"Arjun Sondhi, Ali Shojaie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We consider the task of estimating a high-dimensional directed acyclic graph, given observations from a linear structural equation model with arbitrary noise distribution. By exploiting properties of common random graphs, we develop a new algorithm that requires conditioning only on small sets of variables. The proposed algorithm, which is essentially a modified version of the PC-Algorithm, offers significant gains in both computational complexity and estimation accuracy. In particular, it results in more efficient and accurate estimation in large networks containing hub nodes, which are common in biological systems. We prove the consistency of the proposed algorithm, and show that it also requires a less stringent faithfulness assumption than the PC-Algorithm. Simulations in low and high-dimensional settings are used to illustrate these findings. An application to gene expression data suggests that the proposed algorithm can identify a greater number of clinically relevant genes than current methods.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10552884/pdf/nihms-1885649.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41105823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal Learning via Manifold Regularization. 通过漫反射正则化进行因果学习
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2019-01-01
Steven M Hill, Chris J Oates, Duncan A Blythe, Sach Mukherjee

This paper frames causal structure estimation as a machine learning task. The idea is to treat indicators of causal relationships between variables as 'labels' and to exploit available data on the variables of interest to provide features for the labelling task. Background scientific knowledge or any available interventional data provide labels on some causal relationships and the remainder are treated as unlabelled. To illustrate the key ideas, we develop a distance-based approach (based on bivariate histograms) within a manifold regularization framework. We present empirical results on three different biological data sets (including examples where causal effects can be verified by experimental intervention), that together demonstrate the efficacy and general nature of the approach as well as its simplicity from a user's point of view.

本文将因果结构估算作为一项机器学习任务。其思路是将变量间因果关系的指标视为 "标签",并利用相关变量的可用数据为标签任务提供特征。背景科学知识或任何可用的干预数据可为某些因果关系提供标签,而其余的则被视为无标签。为了说明关键思路,我们在流形正则化框架内开发了一种基于距离的方法(基于双变量直方图)。我们展示了三个不同生物数据集(包括可通过实验干预验证因果效应的例子)的实证结果,这些结果共同证明了该方法的有效性和通用性,以及从用户角度看它的简便性。
{"title":"Causal Learning via Manifold Regularization.","authors":"Steven M Hill, Chris J Oates, Duncan A Blythe, Sach Mukherjee","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This paper frames causal structure estimation as a machine learning task. The idea is to treat indicators of causal relationships between variables as 'labels' and to exploit available data on the variables of interest to provide features for the labelling task. Background scientific knowledge or any available interventional data provide labels on some causal relationships and the remainder are treated as unlabelled. To illustrate the key ideas, we develop a distance-based approach (based on bivariate histograms) within a manifold regularization framework. We present empirical results on three different biological data sets (including examples where causal effects can be verified by experimental intervention), that together demonstrate the efficacy and general nature of the approach as well as its simplicity from a user's point of view.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6986916/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9142095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously. 所有的模型都是错误的,但许多是有用的:通过同时研究整个预测模型类来了解变量的重要性。
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2019-01-01
Aaron Fisher, Cynthia Rudin, Francesca Dominici

Variable importance (VI) tools describe how much covariates contribute to a prediction model's accuracy. However, important variables for one well-performing model (for example, a linear model f (x) = x T β with a fixed coefficient vector β) may be unimportant for another model. In this paper, we propose model class reliance (MCR) as the range of VI values across all well-performing model in a prespecified class. Thus, MCR gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well. In the process of deriving MCR, we show several informative results for permutation-based VI estimates, based on the VI measures used in Random Forests. Specifically, we derive connections between permutation importance estimates for a single prediction model, U-statistics, conditional variable importance, conditional causal effects, and linear model coefficients. We then give probabilistic bounds for MCR, using a novel, generalizable technique. We apply MCR to a public data set of Broward County criminal records to study the reliance of recidivism prediction models on sex and race. In this application, MCR can be used to help inform VI for unknown, proprietary models.

变量重要性(VI)工具描述了协变量对预测模型准确性的影响程度。然而,对于一个表现良好的模型(例如,具有固定系数向量β的线性模型f (x) = x T β)的重要变量对于另一个模型可能不重要。在本文中,我们提出模型类依赖(MCR)作为VI值在预先指定的类中所有表现良好的模型的范围。因此,MCR通过考虑到许多预测模型(可能具有不同的参数形式)可能很好地拟合数据这一事实,给出了更全面的重要性描述。在推导MCR的过程中,我们展示了基于随机森林中使用的VI度量的基于排列的VI估计的几个信息结果。具体来说,我们推导了单个预测模型的排列重要性估计、u统计量、条件变量重要性、条件因果效应和线性模型系数之间的联系。然后,我们使用一种新颖的、可推广的技术,给出了MCR的概率界限。我们将MCR应用于布劳沃德县犯罪记录的公共数据集,以研究累犯预测模型对性别和种族的依赖。在此应用程序中,MCR可用于帮助VI了解未知的专有模型。
{"title":"All Models are Wrong, but <i>Many</i> are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.","authors":"Aaron Fisher,&nbsp;Cynthia Rudin,&nbsp;Francesca Dominici","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Variable importance (VI) tools describe how much covariates contribute to a prediction model's accuracy. However, important variables for one well-performing model (for example, a linear model <i>f</i> (x) = x <sup><i>T</i></sup> <i>β</i> with a fixed coefficient vector <i>β</i>) may be unimportant for another model. In this paper, we propose model class reliance (MCR) as the range of VI values across <i>all</i> well-performing model in a prespecified class. Thus, MCR gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well. In the process of deriving MCR, we show several informative results for permutation-based VI estimates, based on the VI measures used in Random Forests. Specifically, we derive connections between permutation importance estimates for a <i>single</i> prediction model, U-statistics, conditional variable importance, conditional causal effects, and linear model coefficients. We then give probabilistic bounds for MCR, using a novel, generalizable technique. We apply MCR to a public data set of Broward County criminal records to study the reliance of recidivism prediction models on sex and race. In this application, MCR can be used to help inform VI for unknown, proprietary models.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8323609/pdf/nihms-1670270.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39264727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Determining the Number of Latent Factors in Statistical Multi-Relational Learning. 统计多关系学习中潜在因素数量的确定。
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2019-01-01
Chengchun Shi, Wenbin Lu, Rui Song

Statistical relational learning is primarily concerned with learning and inferring relationships between entities in large-scale knowledge graphs. Nickel et al. (2011) proposed a RESCAL tensor factorization model for statistical relational learning, which achieves better or at least comparable results on common benchmark data sets when compared to other state-of-the-art methods. Given a positive integer s, RESCAL computes an s-dimensional latent vector for each entity. The latent factors can be further used for solving relational learning tasks, such as collective classification, collective entity resolution and link-based clustering. The focus of this paper is to determine the number of latent factors in the RESCAL model. Due to the structure of the RESCAL model, its log-likelihood function is not concave. As a result, the corresponding maximum likelihood estimators (MLEs) may not be consistent. Nonetheless, we design a specific pseudometric, prove the consistency of the MLEs under this pseudometric and establish its rate of convergence. Based on these results, we propose a general class of information criteria and prove their model selection consistencies when the number of relations is either bounded or diverges at a proper rate of the number of entities. Simulations and real data examples show that our proposed information criteria have good finite sample properties.

统计关系学习主要关注大规模知识图中实体之间的学习和推断关系。Nickel等人(2011)提出了一种用于统计关系学习的RESCAL张量分解模型,与其他最先进的方法相比,该模型在常见基准数据集上取得了更好的结果,或者至少具有可比性。给定一个正整数s, RESCAL为每个实体计算一个s维潜在向量。潜在因素可以进一步用于解决关系学习任务,如集体分类、集体实体解析和基于链接的聚类。本文的重点是确定RESCAL模型中潜在因素的数量。由于RESCAL模型的结构,其对数似然函数不是凹的。因此,相应的最大似然估计量(mle)可能不一致。然而,我们设计了一个特定的伪度量,证明了在这个伪度量下最大似然点的一致性,并确定了它的收敛速度。基于这些结果,我们提出了一类一般的信息准则,并证明了它们在关系数量有界或以实体数量的适当速率发散时的模型选择一致性。仿真和实际数据实例表明,所提出的信息准则具有良好的有限样本特性。
{"title":"Determining the Number of Latent Factors in Statistical Multi-Relational Learning.","authors":"Chengchun Shi,&nbsp;Wenbin Lu,&nbsp;Rui Song","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Statistical relational learning is primarily concerned with learning and inferring relationships between entities in large-scale knowledge graphs. Nickel et al. (2011) proposed a RESCAL tensor factorization model for statistical relational learning, which achieves better or at least comparable results on common benchmark data sets when compared to other state-of-the-art methods. Given a positive integer <i>s</i>, RESCAL computes an <i>s</i>-dimensional latent vector for each entity. The latent factors can be further used for solving relational learning tasks, such as collective classification, collective entity resolution and link-based clustering. The focus of this paper is to determine the number of latent factors in the RESCAL model. Due to the structure of the RESCAL model, its log-likelihood function is not concave. As a result, the corresponding maximum likelihood estimators (MLEs) may not be consistent. Nonetheless, we design a specific pseudometric, prove the consistency of the MLEs under this pseudometric and establish its rate of convergence. Based on these results, we propose a general class of information criteria and prove their model selection consistencies when the number of relations is either bounded or diverges at a proper rate of the number of entities. Simulations and real data examples show that our proposed information criteria have good finite sample properties.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6980192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37581845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient augmentation and relaxation learning for individualized treatment rules using observational data. 使用观察数据对个性化治疗规则进行有效的强化和放松学习。
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2019-01-01
Ying-Qi Zhao, Eric B Laber, Yang Ning, Sumona Saha, Bruce E Sands

Individualized treatment rules aim to identify if, when, which, and to whom treatment should be applied. A globally aging population, rising healthcare costs, and increased access to patient-level data have created an urgent need for high-quality estimators of individualized treatment rules that can be applied to observational data. A recent and promising line of research for estimating individualized treatment rules recasts the problem of estimating an optimal treatment rule as a weighted classification problem. We consider a class of estimators for optimal treatment rules that are analogous to convex large-margin classifiers. The proposed class applies to observational data and is doubly-robust in the sense that correct specification of either a propensity or outcome model leads to consistent estimation of the optimal individualized treatment rule. Using techniques from semiparametric efficiency theory, we derive rates of convergence for the proposed estimators and use these rates to characterize the bias-variance trade-off for estimating individualized treatment rules with classification-based methods. Simulation experiments informed by these results demonstrate that it is possible to construct new estimators within the proposed framework that significantly outperform existing ones. We illustrate the proposed methods using data from a labor training program and a study of inflammatory bowel syndrome.

个性化治疗规则旨在确定是否、何时、哪种治疗以及治疗对象。全球人口老龄化、医疗成本上升以及对患者层面数据的访问增加,迫切需要能够应用于观察数据的个性化治疗规则的高质量估计。最近一项有希望的估计个性化治疗规则的研究将估计最佳治疗规则的问题重新定义为加权分类问题。我们考虑了一类最优处理规则的估计量,它类似于凸的大边缘分类器。所提出的类别适用于观察数据,并且在倾向或结果模型的正确规范导致最佳个体化治疗规则的一致估计的意义上是双重稳健的。使用半参数效率理论中的技术,我们导出了所提出的估计量的收敛率,并使用这些收敛率来表征使用基于分类的方法估计个性化治疗规则的偏差-方差权衡。根据这些结果进行的模拟实验表明,在所提出的框架内构建新的估计量是可能的,其显著优于现有的估计量。我们使用来自分娩训练计划和炎症性肠综合征研究的数据来说明所提出的方法。
{"title":"Efficient augmentation and relaxation learning for individualized treatment rules using observational data.","authors":"Ying-Qi Zhao, Eric B Laber, Yang Ning, Sumona Saha, Bruce E Sands","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Individualized treatment rules aim to identify if, when, which, and to whom treatment should be applied. A globally aging population, rising healthcare costs, and increased access to patient-level data have created an urgent need for high-quality estimators of individualized treatment rules that can be applied to observational data. A recent and promising line of research for estimating individualized treatment rules recasts the problem of estimating an optimal treatment rule as a weighted classification problem. We consider a class of estimators for optimal treatment rules that are analogous to convex large-margin classifiers. The proposed class applies to observational data and is doubly-robust in the sense that correct specification of either a propensity or outcome model leads to consistent estimation of the optimal individualized treatment rule. Using techniques from semiparametric efficiency theory, we derive rates of convergence for the proposed estimators and use these rates to characterize the bias-variance trade-off for estimating individualized treatment rules with classification-based methods. Simulation experiments informed by these results demonstrate that it is possible to construct new estimators within the proposed framework that significantly outperform existing ones. We illustrate the proposed methods using data from a labor training program and a study of inflammatory bowel syndrome.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6705615/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41219015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonuniformity of P-values Can Occur Early in Diverging Dimensions. p值的非均匀性可以在发散维数的早期出现。
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2019-01-01
Yingying Fan, Emre Demirkaya, Jinchi Lv

Evaluating the joint significance of covariates is of fundamental importance in a wide range of applications. To this end, p-values are frequently employed and produced by algorithms that are powered by classical large-sample asymptotic theory. It is well known that the conventional p-values in Gaussian linear model are valid even when the dimensionality is a non-vanishing fraction of the sample size, but can break down when the design matrix becomes singular in higher dimensions or when the error distribution deviates from Gaussianity. A natural question is when the conventional p-values in generalized linear models become invalid in diverging dimensions. We establish that such a breakdown can occur early in nonlinear models. Our theoretical characterizations are confirmed by simulation studies.

评估协变量的联合显著性在广泛的应用中具有根本的重要性。为此,p值经常被使用,并由经典大样本渐近理论驱动的算法产生。众所周知,在高斯线性模型中,传统的p值即使在维数为样本量的非消失部分时也是有效的,但当设计矩阵在高维中变得奇异或误差分布偏离高斯性时,p值就会失效。一个自然的问题是,当广义线性模型中的常规p值在发散维数中失效时。我们证明了这种击穿可以在非线性模型中早期发生。我们的理论描述得到了仿真研究的证实。
{"title":"Nonuniformity of P-values Can Occur Early in Diverging Dimensions.","authors":"Yingying Fan,&nbsp;Emre Demirkaya,&nbsp;Jinchi Lv","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Evaluating the joint significance of covariates is of fundamental importance in a wide range of applications. To this end, p-values are frequently employed and produced by algorithms that are powered by classical large-sample asymptotic theory. It is well known that the conventional p-values in Gaussian linear model are valid even when the dimensionality is a non-vanishing fraction of the sample size, but can break down when the design matrix becomes singular in higher dimensions or when the error distribution deviates from Gaussianity. A natural question is when the conventional p-values in generalized linear models become invalid in diverging dimensions. We establish that such a breakdown can occur early in nonlinear models. Our theoretical characterizations are confirmed by simulation studies.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7079742/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37753218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Machine Learning Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1