A Difference of Convex Functions Approach to Large-Scale Log-Linear Model Estimation

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI:10.1109/TASL.2013.2271592

Theodoros Tsiligkaridis, E. Marcheret, V. Goel

{"title":"A Difference of Convex Functions Approach to Large-Scale Log-Linear Model Estimation","authors":"Theodoros Tsiligkaridis, E. Marcheret, V. Goel","doi":"10.1109/TASL.2013.2271592","DOIUrl":null,"url":null,"abstract":"We introduce a new class of parameter estimation methods for log-linear models. Our approach relies on the fact that minimizing a rational function of mixtures of exponentials is equivalent to minimizing a difference of convex functions. This allows us to construct convex auxiliary functions by applying the concave-convex procedure (CCCP). We consider a modification of CCCP where a proximal term is added (ProxCCCP), and extend it further by introducing an ℓ1 penalty. For solving the ` convex + ℓ1' auxiliary problem, we propose an approach called SeqGPSR that is based on sequential application of the GPSR procedure. We present convergence analysis of the algorithms, including sufficient conditions for convergence to a critical point of the objective function. We propose an adaptive procedure for varying the strength of the proximal regularization term in each ProxCCCP iteration, and show this procedure (AProxCCCP) is effective in practice and stable under some mild conditions. The CCCP procedure and proposed variants are applied to the task of optimizing the cross-entropy objective function for an audio frame classification problem. Class posteriors are modeled using log-linear models consisting of approximately 6 million parameters. Our results show that CCCP variants achieve a much better cross-entropy objective value as compared to direct optimization of the objective function by a first order gradient based approach, stochastic gradient descent or the L-BFGS procedure.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2255-2266"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2271592","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Audio Speech and Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASL.2013.2271592","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

We introduce a new class of parameter estimation methods for log-linear models. Our approach relies on the fact that minimizing a rational function of mixtures of exponentials is equivalent to minimizing a difference of convex functions. This allows us to construct convex auxiliary functions by applying the concave-convex procedure (CCCP). We consider a modification of CCCP where a proximal term is added (ProxCCCP), and extend it further by introducing an ℓ1 penalty. For solving the ` convex + ℓ1' auxiliary problem, we propose an approach called SeqGPSR that is based on sequential application of the GPSR procedure. We present convergence analysis of the algorithms, including sufficient conditions for convergence to a critical point of the objective function. We propose an adaptive procedure for varying the strength of the proximal regularization term in each ProxCCCP iteration, and show this procedure (AProxCCCP) is effective in practice and stable under some mild conditions. The CCCP procedure and proposed variants are applied to the task of optimizing the cross-entropy objective function for an audio frame classification problem. Class posteriors are modeled using log-linear models consisting of approximately 6 million parameters. Our results show that CCCP variants achieve a much better cross-entropy objective value as compared to direct optimization of the objective function by a first order gradient based approach, stochastic gradient descent or the L-BFGS procedure.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大规模对数线性模型估计的凸函数差分法

介绍了一类新的对数线性模型参数估计方法。我们的方法依赖于这样一个事实，即最小化指数混合的有理函数等价于最小化凸函数的差。这允许我们通过应用凹凸过程(CCCP)来构造凸辅助函数。我们考虑了CCCP的一个修改，其中增加了一个近项(ProxCCCP)，并通过引入一个1惩罚进一步扩展了它。为了解决“凸+ 1”辅助问题，我们提出了一种基于GPSR过程的顺序应用的方法，称为SeqGPSR。我们给出了算法的收敛性分析，包括收敛到目标函数临界点的充分条件。我们提出了一种自适应过程来改变每次ProxCCCP迭代中近端正则化项的强度，并证明了该过程(AProxCCCP)在实践中是有效的，在一些温和的条件下是稳定的。将CCCP过程及其提出的变体应用于音频帧分类问题的交叉熵目标函数优化任务。类后验使用由大约600万个参数组成的对数线性模型建模。我们的研究结果表明，与使用基于一阶梯度的方法、随机梯度下降或L-BFGS过程直接优化目标函数相比，CCCP变量获得了更好的交叉熵目标值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Audio Speech and Language Processing 工程技术-工程：电子与电气

自引率

0.00%

发文量

审稿时长

24.0 months

期刊介绍： The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.