{"title":"A Difference of Convex Functions Approach to Large-Scale Log-Linear Model Estimation","authors":"Theodoros Tsiligkaridis, E. Marcheret, V. Goel","doi":"10.1109/TASL.2013.2271592","DOIUrl":null,"url":null,"abstract":"We introduce a new class of parameter estimation methods for log-linear models. Our approach relies on the fact that minimizing a rational function of mixtures of exponentials is equivalent to minimizing a difference of convex functions. This allows us to construct convex auxiliary functions by applying the concave-convex procedure (CCCP). We consider a modification of CCCP where a proximal term is added (ProxCCCP), and extend it further by introducing an ℓ1 penalty. For solving the ` convex + ℓ1' auxiliary problem, we propose an approach called SeqGPSR that is based on sequential application of the GPSR procedure. We present convergence analysis of the algorithms, including sufficient conditions for convergence to a critical point of the objective function. We propose an adaptive procedure for varying the strength of the proximal regularization term in each ProxCCCP iteration, and show this procedure (AProxCCCP) is effective in practice and stable under some mild conditions. The CCCP procedure and proposed variants are applied to the task of optimizing the cross-entropy objective function for an audio frame classification problem. Class posteriors are modeled using log-linear models consisting of approximately 6 million parameters. Our results show that CCCP variants achieve a much better cross-entropy objective value as compared to direct optimization of the objective function by a first order gradient based approach, stochastic gradient descent or the L-BFGS procedure.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2271592","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Audio Speech and Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASL.2013.2271592","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
We introduce a new class of parameter estimation methods for log-linear models. Our approach relies on the fact that minimizing a rational function of mixtures of exponentials is equivalent to minimizing a difference of convex functions. This allows us to construct convex auxiliary functions by applying the concave-convex procedure (CCCP). We consider a modification of CCCP where a proximal term is added (ProxCCCP), and extend it further by introducing an ℓ1 penalty. For solving the ` convex + ℓ1' auxiliary problem, we propose an approach called SeqGPSR that is based on sequential application of the GPSR procedure. We present convergence analysis of the algorithms, including sufficient conditions for convergence to a critical point of the objective function. We propose an adaptive procedure for varying the strength of the proximal regularization term in each ProxCCCP iteration, and show this procedure (AProxCCCP) is effective in practice and stable under some mild conditions. The CCCP procedure and proposed variants are applied to the task of optimizing the cross-entropy objective function for an audio frame classification problem. Class posteriors are modeled using log-linear models consisting of approximately 6 million parameters. Our results show that CCCP variants achieve a much better cross-entropy objective value as compared to direct optimization of the objective function by a first order gradient based approach, stochastic gradient descent or the L-BFGS procedure.
期刊介绍:
The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.