首页 > 最新文献

Journal of Machine Learning Research最新文献

英文 中文
Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays. 广义矩阵因式分解:为大型数据阵列拟合广义线性潜变量模型的高效算法。
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2022-11-01
Łukasz Kidziński, Francis K C Hui, David I Warton, Trevor Hastie

Unmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses. In this article, we propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood and then using a Newton method and Fisher scoring to learn the model parameters. Computationally, our method is noticeably faster and more stable, enabling GLLVM fits to much larger matrices than previously possible. We apply our method on a dataset of 48,000 observational units with over 2,000 observed species in each unit and find that most of the variability can be explained with a handful of factors. We publish an easy-to-use implementation of our proposed fitting algorithm.

心理学、生态学和医学等多个领域都在研究多变量测量之间的相关性。对于高斯测量,有一些经典的工具,如因子分析或主成分分析,具有成熟的理论和快速的算法。广义线性潜变量模型(GLLVM)将这些因子模型推广到非高斯响应。然而,目前在 GLLVMs 中估计模型参数的算法需要大量计算,无法扩展到包含数千个观察单元或反应的大型数据集。在本文中,我们提出了一种将 GLLVM 拟合到高维数据集的新方法,该方法基于使用惩罚准似然法逼近模型,然后使用牛顿方法和费雪评分来学习模型参数。在计算上,我们的方法明显更快、更稳定,能对比起以前更大的矩阵进行 GLLVM 拟合。我们在一个包含 48,000 个观测单元的数据集上应用了我们的方法,每个单元中有超过 2,000 个观测物种,结果发现大部分变异性都可以用少数几个因子来解释。我们发布了我们提出的拟合算法的易用实现方法。
{"title":"Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays.","authors":"Łukasz Kidziński, Francis K C Hui, David I Warton, Trevor Hastie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Unmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses. In this article, we propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood and then using a Newton method and Fisher scoring to learn the model parameters. Computationally, our method is noticeably faster and more stable, enabling GLLVM fits to much larger matrices than previously possible. We apply our method on a dataset of 48,000 observational units with over 2,000 observed species in each unit and find that most of the variability can be explained with a handful of factors. We publish an easy-to-use implementation of our proposed fitting algorithm.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10129058/pdf/nihms-1843577.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9391635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tree-based Node Aggregation in Sparse Graphical Models. 稀疏图形模型中基于树的节点聚合
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2022-09-01
Ines Wilms, Jacob Bien

High-dimensional graphical models are often estimated using regularization that is aimed at reducing the number of edges in a network. In this work, we show how even simpler networks can be produced by aggregating the nodes of the graphical model. We develop a new convex regularized method, called the tree-aggregated graphical lasso or tag-lasso, that estimates graphical models that are both edge-sparse and node-aggregated. The aggregation is performed in a data-driven fashion by leveraging side information in the form of a tree that encodes node similarity and facilitates the interpretation of the resulting aggregated nodes. We provide an efficient implementation of the tag-lasso by using the locally adaptive alternating direction method of multipliers and illustrate our proposal's practical advantages in simulation and in applications in finance and biology.

高维图形模型通常使用正则化来估算,正则化的目的是减少网络中的边的数量。在这项工作中,我们展示了如何通过聚合图形模型的节点来生成更简单的网络。我们开发了一种新的凸正则化方法,称为树状聚合图形套索(tree-aggregated graphical lasso)或标签套索(tag-lasso),可估算边缘稀疏且节点聚合的图形模型。聚合是以数据驱动的方式进行的,它利用树形的侧信息来编码节点的相似性,并方便解释所产生的聚合节点。我们通过使用局部自适应交替方向乘法提供了标签拉索的有效实现方法,并在模拟以及金融和生物学应用中说明了我们的建议的实际优势。
{"title":"Tree-based Node Aggregation in Sparse Graphical Models.","authors":"Ines Wilms, Jacob Bien","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>High-dimensional graphical models are often estimated using regularization that is aimed at reducing the number of edges in a network. In this work, we show how even simpler networks can be produced by aggregating the nodes of the graphical model. We develop a new convex regularized method, called the <i>tree-aggregated graphical lasso</i> or tag-lasso, that estimates graphical models that are both edge-sparse and node-aggregated. The aggregation is performed in a data-driven fashion by leveraging side information in the form of a tree that encodes node similarity and facilitates the interpretation of the resulting aggregated nodes. We provide an efficient implementation of the tag-lasso by using the locally adaptive alternating direction method of multipliers and illustrate our proposal's practical advantages in simulation and in applications in finance and biology.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10805464/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139543530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforcement Learning Algorithm for Mixed Mean Field Control Games 混合平均场控制博弈的强化学习算法
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2022-05-04 DOI: 10.4208/jml.220915
Andrea Angiuli, Nils Detering, J. Fouque, M. Laurière, Jimin Lin
We present a new combined textit{mean field control game} (MFCG) problem which can be interpreted as a competitive game between collaborating groups and its solution as a Nash equilibrium between groups. Players coordinate their strategies within each group. An example is a modification of the classical trader's problem. Groups of traders maximize their wealth. They face cost for their transactions, for their own terminal positions, and for the average holding within their group. The asset price is impacted by the trades of all agents. We propose a three-timescale reinforcement learning algorithm to approximate the solution of such MFCG problems. We test the algorithm on benchmark linear-quadratic specifications for which we provide analytic solutions.
本文提出了一种textit{新的组合平均场控制博弈}问题,该问题可以解释为协作群体之间的竞争博弈,其解可以解释为群体之间的纳什均衡。玩家在每个小组中协调他们的策略。一个例子是对经典交易者问题的修正。交易员群体使他们的财富最大化。他们的交易、自己的终端头寸以及集团内的平均持仓都面临成本。资产价格受所有代理人的交易影响。我们提出了一个三时间尺度的强化学习算法来近似求解这类MFCG问题。我们在基准线性二次规范上测试了该算法,并提供了解析解。
{"title":"Reinforcement Learning Algorithm for Mixed Mean Field Control Games","authors":"Andrea Angiuli, Nils Detering, J. Fouque, M. Laurière, Jimin Lin","doi":"10.4208/jml.220915","DOIUrl":"https://doi.org/10.4208/jml.220915","url":null,"abstract":"We present a new combined textit{mean field control game} (MFCG) problem which can be interpreted as a competitive game between collaborating groups and its solution as a Nash equilibrium between groups. Players coordinate their strategies within each group. An example is a modification of the classical trader's problem. Groups of traders maximize their wealth. They face cost for their transactions, for their own terminal positions, and for the average holding within their group. The asset price is impacted by the trades of all agents. We propose a three-timescale reinforcement learning algorithm to approximate the solution of such MFCG problems. We test the algorithm on benchmark linear-quadratic specifications for which we provide analytic solutions.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2022-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89925607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Beyond the Quadratic Approximation: The Multiscale Structure of Neural Network Loss Landscapes 超越二次逼近:神经网络损失景观的多尺度结构
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2022-04-24 DOI: 10.4208/jml.220404
Chao Ma, D. Kunin, Lei Wu, Lexing Ying
A quadratic approximation of neural network loss landscapes has been extensively used to study the optimization process of these networks. Though, it usually holds in a very small neighborhood of the minimum, it cannot explain many phenomena observed during the optimization process. In this work, we study the structure of neural network loss functions and its implication on optimization in a region beyond the reach of a good quadratic approximation. Numerically, we observe that neural network loss functions possesses a multiscale structure, manifested in two ways: (1) in a neighborhood of minima, the loss mixes a continuum of scales and grows subquadratically, and (2) in a larger region, the loss shows several separate scales clearly. Using the subquadratic growth, we are able to explain the Edge of Stability phenomenon [5] observed for the gradient descent (GD) method. Using the separate scales, we explain the working mechanism of learning rate decay by simple examples. Finally, we study the origin of the multiscale structure and propose that the non-convexity of the models and the non-uniformity of training data is one of the causes. By constructing a two-layer neural network problem we show that training data with different magnitudes give rise to different scales of the loss function, producing subquadratic growth and multiple separate scales.
神经网络损失景观的二次逼近已被广泛用于研究这些网络的优化过程。虽然它通常存在于极小值的一个很小的邻域内,但它不能解释优化过程中观察到的许多现象。在这项工作中,我们研究了神经网络损失函数的结构及其对超出良好二次逼近范围的区域优化的意义。数值上,我们观察到神经网络损失函数具有多尺度结构,表现在两个方面:(1)在最小邻域内,损失混合了尺度连续体并以次二次增长;(2)在更大的区域内,损失清晰地显示出几个独立的尺度。利用次二次增长,我们能够解释梯度下降(GD)方法所观察到的稳定性边缘现象[5]。我们使用单独的量表,通过简单的例子解释了学习率衰减的工作机制。最后,我们研究了多尺度结构的起源,提出模型的非凸性和训练数据的非均匀性是多尺度结构产生的原因之一。通过构造一个两层神经网络问题,证明了不同量级的训练数据会产生不同尺度的损失函数,产生次二次增长和多个分离尺度。
{"title":"Beyond the Quadratic Approximation: The Multiscale Structure of Neural Network Loss Landscapes","authors":"Chao Ma, D. Kunin, Lei Wu, Lexing Ying","doi":"10.4208/jml.220404","DOIUrl":"https://doi.org/10.4208/jml.220404","url":null,"abstract":"A quadratic approximation of neural network loss landscapes has been extensively used to study the optimization process of these networks. Though, it usually holds in a very small neighborhood of the minimum, it cannot explain many phenomena observed during the optimization process. In this work, we study the structure of neural network loss functions and its implication on optimization in a region beyond the reach of a good quadratic approximation. Numerically, we observe that neural network loss functions possesses a multiscale structure, manifested in two ways: (1) in a neighborhood of minima, the loss mixes a continuum of scales and grows subquadratically, and (2) in a larger region, the loss shows several separate scales clearly. Using the subquadratic growth, we are able to explain the Edge of Stability phenomenon [5] observed for the gradient descent (GD) method. Using the separate scales, we explain the working mechanism of learning rate decay by simple examples. Finally, we study the origin of the multiscale structure and propose that the non-convexity of the models and the non-uniformity of training data is one of the causes. By constructing a two-layer neural network problem we show that training data with different magnitudes give rise to different scales of the loss function, producing subquadratic growth and multiple separate scales.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2022-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88018799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Tree-Values: Selective Inference for Regression Trees. 树值:回归树的选择性推理
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2022-01-01
Anna C Neufeld, Lucy L Gao, Daniela M Witten

We consider conducting inference on the output of the Classification and Regression Tree (CART) (Breiman et al., 1984) algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage. Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake.

我们考虑对分类与回归树(CART)(Breiman 等人,1984 年)算法的输出结果进行推断。如果不考虑该树是根据数据估计出来的这一事实,天真的推理方法将无法实现标准保证,如第一类错误率控制和名义覆盖率。因此,我们提出了一种选择性推断框架,用于对拟合 CART 树进行推断。简而言之,我们的条件是该树是根据数据估计出来的。我们提出了一对终端节点之间平均响应差异的检验方法,以控制选择性 1 类错误率,并提出了单个终端节点内平均响应的置信区间,以实现名义选择性覆盖。我们提供了计算必要条件集的高效算法。我们将这些方法应用于模拟和一个数据集,该数据集涉及份量控制干预与热量摄入之间的关联。
{"title":"Tree-Values: Selective Inference for Regression Trees.","authors":"Anna C Neufeld, Lucy L Gao, Daniela M Witten","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We consider conducting inference on the output of the Classification and Regression Tree (CART) (Breiman et al., 1984) algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage. Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10933572/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140121229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extensions to the Proximal Distance Method of Constrained Optimization. 约束优化近距离法的扩展。
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2022-01-01
Alfonso Landeros, Oscar Hernan Madrid Padilla, Hua Zhou, Kenneth Lange

The current paper studies the problem of minimizing a loss f(x) subject to constraints of the form DxS, where S is a closed set, convex or not, and D is a matrix that fuses parameters. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns. To tackle this generic class of problems, we combine the Beltrami-Courant penalty method of optimization with the proximal distance principle. The latter is driven by minimization of penalized objectives f(x)+ρ2dist(Dx,S)2 involving large tuning constants ρ and the squared Euclidean distance of Dx from S. The next iterate xn+1 of the corresponding proximal distance algorithm is constructed from the current iterate xn by minimizing the majorizing surrogate function f(x)+ρ2Dx-𝒫S(Dxn)2. For fixed ρ and a subanalytic loss f(x) and a subanalytic constraint set S, we prove convergence to a stationary point. Under stronger assumptions, we provide convergence rates and demonstrate linear local convergence. We also construct a steepest descent (SD) variant to avoid costly linear system solves. To benchmark our algorithms, we compare their results to those delivered by the alternating direction method of multipliers (ADMM). Our extensive numerical tests include problems on metric projection, convex regression, convex clustering, total variation image denoising, and projection of a matrix to a good condition number. These experiments demonstrate the superior speed and acceptable accuracy of our steepest variant on high-dimensional problems. Julia code to replicate all of our experiments can be found at https://github.com/alanderos91/ProximalDistanceAlgorithms.jl.

本文研究了在形式为Dx∈S的约束下最小化损失f(x)的问题,其中S是一个闭集,可以是凸的也可以是非凸的,D是一个融合参数的矩阵。融合约束可以捕获平滑性、稀疏性或更一般的约束模式。为了解决这类问题,我们将优化的Beltrami-Courant惩罚方法与近距离原则结合起来。后者是由最小化惩罚目标f(x)+ρ2dist(Dx,S)2驱动的,涉及大的调谐常数ρ和Dx到S的平方欧几里德距离。相应的近距离算法的下一个迭代xn+1是通过最小化最大化代理函数f(x)+ρ2‖Dx-𝒫S(Dxn)‖2从当前迭代xn构建的。对于固定的ρ和一个亚解析损失f(x)和一个亚解析约束集S,我们证明了收敛到一个平稳点。在更强的假设下,我们给出了收敛率并证明了线性局部收敛。我们还构造了一个最陡下降(SD)变量,以避免昂贵的线性系统求解。为了对我们的算法进行基准测试,我们将它们的结果与乘法器交替方向方法(ADMM)提供的结果进行了比较。我们广泛的数值测试包括度量投影、凸回归、凸聚类、全变差图像去噪以及矩阵到良好条件数的投影问题。这些实验证明了我们最陡峭的变体在高维问题上的优越速度和可接受的精度。Julia复制我们所有实验的代码可以在https://github.com/alanderos91/ProximalDistanceAlgorithms.jl上找到。
{"title":"Extensions to the Proximal Distance Method of Constrained Optimization.","authors":"Alfonso Landeros,&nbsp;Oscar Hernan Madrid Padilla,&nbsp;Hua Zhou,&nbsp;Kenneth Lange","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The current paper studies the problem of minimizing a loss <i>f</i>(<b><i>x</i></b>) subject to constraints of the form <b><i>Dx</i></b> ∈ <i>S</i>, where <i>S</i> is a closed set, convex or not, and <i><b>D</b></i> is a matrix that fuses parameters. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns. To tackle this generic class of problems, we combine the Beltrami-Courant penalty method of optimization with the proximal distance principle. The latter is driven by minimization of penalized objectives <math><mrow><mi>f</mi><mo>(</mo><mstyle><mi>x</mi></mstyle><mo>)</mo><mo>+</mo><mfrac><mi>ρ</mi><mn>2</mn></mfrac><mtext>dist</mtext><msup><mrow><mo>(</mo><mstyle><mi>D</mi><mi>x</mi></mstyle><mo>,</mo><mi>S</mi><mo>)</mo></mrow><mn>2</mn></msup></mrow></math> involving large tuning constants <i>ρ</i> and the squared Euclidean distance of <b><i>Dx</i></b> from <i>S</i>. The next iterate <b><i>x</i></b><sub><i>n</i>+1</sub> of the corresponding proximal distance algorithm is constructed from the current iterate <b><i>x</i></b><sub><i>n</i></sub> by minimizing the majorizing surrogate function <math><mrow><mi>f</mi><mo>(</mo><mstyle><mi>x</mi></mstyle><mo>)</mo><mo>+</mo><mfrac><mi>ρ</mi><mn>2</mn></mfrac><msup><mrow><mrow><mo>‖</mo><mrow><mstyle><mi>D</mi><mi>x</mi></mstyle><mo>-</mo><msub><mi>𝒫</mi><mi>S</mi></msub><mrow><mo>(</mo><mrow><mstyle><mi>D</mi></mstyle><msub><mstyle><mi>x</mi></mstyle><mi>n</mi></msub></mrow><mo>)</mo></mrow></mrow><mo>‖</mo></mrow></mrow><mn>2</mn></msup></mrow></math>. For fixed <i>ρ</i> and a subanalytic loss <i>f</i>(<b><i>x</i></b>) and a subanalytic constraint set <i>S</i>, we prove convergence to a stationary point. Under stronger assumptions, we provide convergence rates and demonstrate linear local convergence. We also construct a steepest descent (SD) variant to avoid costly linear system solves. To benchmark our algorithms, we compare their results to those delivered by the alternating direction method of multipliers (ADMM). Our extensive numerical tests include problems on metric projection, convex regression, convex clustering, total variation image denoising, and projection of a matrix to a good condition number. These experiments demonstrate the superior speed and acceptable accuracy of our steepest variant on high-dimensional problems. Julia code to replicate all of our experiments can be found at https://github.com/alanderos91/ProximalDistanceAlgorithms.jl.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10191389/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9875590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Importance of Being Correlated: Implications of Dependence in Joint Spectral Inference across Multiple Networks. 相关的重要性:多个网络联合频谱推断中的依赖性影响。
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2022-01-01
Konstantinos Pantazis, Avanti Athreya, Jesús Arroyo, William N Frost, Evan S Hill, Vince Lyzinski

Spectral inference on multiple networks is a rapidly-developing subfield of graph statistics. Recent work has demonstrated that joint, or simultaneous, spectral embedding of multiple independent networks can deliver more accurate estimation than individual spectral decompositions of those same networks. Such inference procedures typically rely heavily on independence assumptions across the multiple network realizations, and even in this case, little attention has been paid to the induced network correlation that can be a consequence of such joint embeddings. In this paper, we present a generalized omnibus embedding methodology and we provide a detailed analysis of this embedding across both independent and correlated networks, the latter of which significantly extends the reach of such procedures, and we describe how this omnibus embedding can itself induce correlation. This leads us to distinguish between inherent correlation-that is, the correlation that arises naturally in multisample network data-and induced correlation, which is an artifice of the joint embedding methodology. We show that the generalized omnibus embedding procedure is flexible and robust, and we prove both consistency and a central limit theorem for the embedded points. We examine how induced and inherent correlation can impact inference for network time series data, and we provide network analogues of classical questions such as the effective sample size for more generally correlated data. Further, we show how an appropriately calibrated generalized omnibus embedding can detect changes in real biological networks that previous embedding procedures could not discern, confirming that the effect of inherent and induced correlation can be subtle and transformative. By allowing for and deconstructing both forms of correlation, our methodology widens the scope of spectral techniques for network inference, with import in theory and practice.

多个网络的光谱推断是图统计中发展迅速的一个子领域。最近的研究表明,对多个独立网络进行联合或同步频谱嵌入,比对相同网络进行单独频谱分解能提供更精确的估计。此类推断程序通常严重依赖于多个网络实现的独立性假设,即使在这种情况下,人们也很少关注此类联合嵌入可能导致的网络相关性。在本文中,我们提出了一种通用的总括嵌入方法,并详细分析了这种嵌入在独立和相关网络中的应用,后者大大扩展了此类程序的应用范围,我们还描述了这种总括嵌入本身是如何诱发相关性的。这使我们区分了固有相关性(即多样本网络数据中自然产生的相关性)和诱导相关性(联合嵌入方法的一种伪装)。我们证明了广义总括嵌入程序的灵活性和稳健性,并证明了嵌入点的一致性和中心极限定理。我们研究了诱导相关性和内在相关性如何影响网络时间序列数据的推断,并提供了经典问题的网络类比,如更一般相关数据的有效样本大小。此外,我们还展示了经过适当校准的广义总括嵌入是如何在真实的生物网络中发现以前的嵌入程序无法辨别的变化的,从而证实了固有相关性和诱导相关性的影响是微妙的,也是可以改变的。通过允许和解构这两种形式的相关性,我们的方法拓宽了用于网络推断的光谱技术的范围,在理论和实践上都具有重要意义。
{"title":"The Importance of Being Correlated: Implications of Dependence in Joint Spectral Inference across Multiple Networks.","authors":"Konstantinos Pantazis, Avanti Athreya, Jesús Arroyo, William N Frost, Evan S Hill, Vince Lyzinski","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Spectral inference on multiple networks is a rapidly-developing subfield of graph statistics. Recent work has demonstrated that joint, or simultaneous, spectral embedding of multiple independent networks can deliver more accurate estimation than individual spectral decompositions of those same networks. Such inference procedures typically rely heavily on independence assumptions across the multiple network realizations, and even in this case, little attention has been paid to the induced network correlation that can be a consequence of such joint embeddings. In this paper, we present a <i>generalized omnibus</i> embedding methodology and we provide a detailed analysis of this embedding across both independent and correlated networks, the latter of which significantly extends the reach of such procedures, and we describe how this omnibus embedding can itself induce correlation. This leads us to distinguish between <i>inherent</i> correlation-that is, the correlation that arises naturally in multisample network data-and <i>induced</i> correlation, which is an artifice of the joint embedding methodology. We show that the generalized omnibus embedding procedure is flexible and robust, and we prove both consistency and a central limit theorem for the embedded points. We examine how induced and inherent correlation can impact inference for network time series data, and we provide network analogues of classical questions such as the effective sample size for more generally correlated data. Further, we show how an appropriately calibrated generalized omnibus embedding can detect changes in real biological networks that previous embedding procedures could not discern, confirming that the effect of inherent and induced correlation can be subtle and transformative. By allowing for and deconstructing both forms of correlation, our methodology widens the scope of spectral techniques for network inference, with import in theory and practice.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10465120/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10127031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized Sparse Additive Models. 广义稀疏加性模型。
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2022-01-01
Asad Haris, Noah Simon, Ali Shojaie

We present a unified framework for estimation and analysis of generalized additive models in high dimensions. The framework defines a large class of penalized regression estimators, encompassing many existing methods. An efficient computational algorithm for this class is presented that easily scales to thousands of observations and features. We prove minimax optimal convergence bounds for this class under a weak compatibility condition. In addition, we characterize the rate of convergence when this compatibility condition is not met. Finally, we also show that the optimal penalty parameters for structure and sparsity penalties in our framework are linked, allowing cross-validation to be conducted over only a single tuning parameter. We complement our theoretical results with empirical studies comparing some existing methods within this framework.

我们提出了一个统一的框架来估计和分析高维广义加性模型。该框架定义了一大类惩罚回归估计量,包括许多现有的方法。针对这一类,提出了一种有效的计算算法,可以轻松地扩展到数千个观测值和特征。在弱相容条件下,我们证明了这一类的极小极大最优收敛界。此外,我们还刻画了当不满足该相容条件时的收敛速度。最后,我们还表明,在我们的框架中,结构的最优惩罚参数和稀疏性惩罚是相互关联的,从而允许仅在单个调整参数上进行交叉验证。我们通过比较该框架内的一些现有方法的实证研究来补充我们的理论结果。
{"title":"Generalized Sparse Additive Models.","authors":"Asad Haris, Noah Simon, Ali Shojaie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We present a unified framework for estimation and analysis of generalized additive models in high dimensions. The framework defines a large class of penalized regression estimators, encompassing many existing methods. An efficient computational algorithm for this class is presented that easily scales to thousands of observations and features. We prove minimax optimal convergence bounds for this class under a weak compatibility condition. In addition, we characterize the rate of convergence when this compatibility condition is not met. Finally, we also show that the optimal penalty parameters for structure and sparsity penalties in our framework are linked, allowing cross-validation to be conducted over only a single tuning parameter. We complement our theoretical results with empirical studies comparing some existing methods within this framework.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10593424/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49693499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping. 先验自适应半监督学习在EHR表型中的应用。
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2022-01-01
Yichi Zhang, Molei Liu, Matey Neykov, Tianxi Cai

Electronic Health Record (EHR) data, a rich source for biomedical research, have been successfully used to gain novel insight into a wide range of diseases. Despite its potential, EHR is currently underutilized for discovery research due to its major limitation in the lack of precise phenotype information. To overcome such difficulties, recent efforts have been devoted to developing supervised algorithms to accurately predict phenotypes based on relatively small training datasets with gold standard labels extracted via chart review. However, supervised methods typically require a sizable training set to yield generalizable algorithms, especially when the number of candidate features, p, is large. In this paper, we propose a semi-supervised (SS) EHR phenotyping method that borrows information from both a small, labeled dataset (where both the label Y and the feature set X are observed) and a much larger, weakly-labeled dataset in which the feature set X is accompanied only by a surrogate label S that is available to all patients. Under a working prior assumption that S is related to X only through Y and allowing it to hold approximately, we propose a prior adaptive semi-supervised (PASS) estimator that incorporates the prior knowledge by shrinking the estimator towards a direction derived under the prior. We derive asymptotic theory for the proposed estimator and justify its efficiency and robustness to prior information of poor quality. We also demonstrate its superiority over existing estimators under various scenarios via simulation studies and on three real-world EHR phenotyping studies at a large tertiary hospital.

电子健康记录(EHR)数据是生物医学研究的一个丰富来源,已成功地用于获得对各种疾病的新见解。尽管电子病历具有潜力,但由于其缺乏精确表型信息的主要限制,目前在发现研究中未得到充分利用。为了克服这些困难,最近的努力致力于开发监督算法,以基于相对较小的训练数据集,通过图表审查提取金标准标签,准确预测表型。然而,监督方法通常需要一个相当大的训练集来产生可推广的算法,特别是当候选特征的数量p很大时。在本文中,我们提出了一种半监督(SS) EHR表型方法,该方法从一个小的、标记的数据集(其中标签Y和特征集X都被观察到)和一个更大的、弱标记的数据集(其中特征集X只伴随着一个可用于所有患者的替代标签S)中借鉴信息。在工作先验假设S仅通过Y与X相关并允许其近似保持的情况下,我们提出了一个先验自适应半监督(PASS)估计器,该估计器通过将估计器缩小到在先验下导出的方向来结合先验知识。我们推导了该估计器的渐近理论,并证明了其对低质量先验信息的有效性和鲁棒性。我们还通过模拟研究和在一家大型三级医院进行的三个现实世界的EHR表型研究,证明了它在各种情况下比现有估计器的优越性。
{"title":"Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping.","authors":"Yichi Zhang, Molei Liu, Matey Neykov, Tianxi Cai","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Electronic Health Record (EHR) data, a rich source for biomedical research, have been successfully used to gain novel insight into a wide range of diseases. Despite its potential, EHR is currently underutilized for discovery research due to its major limitation in the lack of precise phenotype information. To overcome such difficulties, recent efforts have been devoted to developing supervised algorithms to accurately predict phenotypes based on relatively small training datasets with gold standard labels extracted via chart review. However, supervised methods typically require a sizable training set to yield generalizable algorithms, especially when the number of candidate features, <math><mi>p</mi></math>, is large. In this paper, we propose a semi-supervised (SS) EHR phenotyping method that borrows information from both a small, labeled dataset (where both the label <math><mi>Y</mi></math> and the feature set <math><mi>X</mi></math> are observed) and a much larger, weakly-labeled dataset in which the feature set <math><mi>X</mi></math> is accompanied only by a surrogate label <math><mi>S</mi></math> that is available to all patients. Under a <i>working</i> prior assumption that <math><mi>S</mi></math> is related to <math><mi>X</mi></math> only through <math><mi>Y</mi></math> and allowing it to hold <i>approximately</i>, we propose a prior adaptive semi-supervised (PASS) estimator that incorporates the prior knowledge by shrinking the estimator towards a direction derived under the prior. We derive asymptotic theory for the proposed estimator and justify its efficiency and robustness to prior information of poor quality. We also demonstrate its superiority over existing estimators under various scenarios via simulation studies and on three real-world EHR phenotyping studies at a large tertiary hospital.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10653017/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136400046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation and inference on high-dimensional individualized treatment rule in observational data using split-and-pooled de-correlated score. 使用分割和池化去相关分数对观察数据中的高维个体化治疗规则进行估计和推断。
IF 6 3区 计算机科学 Q1 Mathematics Pub Date : 2022-01-01
Muxuan Liang, Young-Geun Choi, Yang Ning, Maureen A Smith, Ying-Qi Zhao

With the increasing adoption of electronic health records, there is an increasing interest in developing individualized treatment rules, which recommend treatments according to patients' characteristics, from large observational data. However, there is a lack of valid inference procedures for such rules developed from this type of data in the presence of high-dimensional covariates. In this work, we develop a penalized doubly robust method to estimate the optimal individualized treatment rule from high-dimensional data. We propose a split-and-pooled de-correlated score to construct hypothesis tests and confidence intervals. Our proposal adopts the data splitting to conquer the slow convergence rate of nuisance parameter estimations, such as non-parametric methods for outcome regression or propensity models. We establish the limiting distributions of the split-and-pooled de-correlated score test and the corresponding one-step estimator in high-dimensional setting. Simulation and real data analysis are conducted to demonstrate the superiority of the proposed method.

随着电子健康记录的应用日益广泛,人们越来越关注从大型观察数据中开发个性化治疗规则,根据患者的特征推荐治疗方法。然而,在存在高维协变量的情况下,从这类数据中制定的规则缺乏有效的推断程序。在这项工作中,我们开发了一种惩罚性双重稳健方法,用于从高维数据中估计最优个体化治疗规则。我们提出了一种拆分和池化去相关分数来构建假设检验和置信区间。我们的建议采用数据拆分来克服骚扰参数估计收敛速度慢的问题,如结果回归或倾向模型的非参数方法。我们在高维环境中建立了拆分和池化去相关分数检验的极限分布和相应的一步估计器。通过模拟和实际数据分析,证明了所提方法的优越性。
{"title":"Estimation and inference on high-dimensional individualized treatment rule in observational data using split-and-pooled de-correlated score.","authors":"Muxuan Liang, Young-Geun Choi, Yang Ning, Maureen A Smith, Ying-Qi Zhao","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>With the increasing adoption of electronic health records, there is an increasing interest in developing individualized treatment rules, which recommend treatments according to patients' characteristics, from large observational data. However, there is a lack of valid inference procedures for such rules developed from this type of data in the presence of high-dimensional covariates. In this work, we develop a penalized doubly robust method to estimate the optimal individualized treatment rule from high-dimensional data. We propose a split-and-pooled de-correlated score to construct hypothesis tests and confidence intervals. Our proposal adopts the data splitting to conquer the slow convergence rate of nuisance parameter estimations, such as non-parametric methods for outcome regression or propensity models. We establish the limiting distributions of the split-and-pooled de-correlated score test and the corresponding one-step estimator in high-dimensional setting. Simulation and real data analysis are conducted to demonstrate the superiority of the proposed method.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10720606/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138811858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Machine Learning Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1