利用机器学习分类器并行化 MCMC 及其基于库尔贝-莱布勒发散的标准

arXiv - STAT - Computation Pub Date : 2024-06-17 DOI:arxiv-2406.11246

Tomoki Matsumoto, Yuichiro Kanazawa

{"title":"利用机器学习分类器并行化 MCMC 及其基于库尔贝-莱布勒发散的标准","authors":"Tomoki Matsumoto, Yuichiro Kanazawa","doi":"arxiv-2406.11246","DOIUrl":null,"url":null,"abstract":"In the era of Big Data, analyzing high-dimensional and large datasets\npresents significant computational challenges. Although Bayesian statistics is\nwell-suited for these complex data structures, Markov chain Monte Carlo (MCMC)\nmethod, which are essential for Bayesian estimation, suffers from computation\ncost because of its sequential nature. For faster and more effective\ncomputation, this paper introduces an algorithm to enhance a parallelizing MCMC\nmethod to handle this computation problem. We highlight the critical role of\nthe overlapped area of posterior distributions after data partitioning, and\npropose a method using a machine learning classifier to effectively identify\nand extract MCMC draws from the area to approximate the actual posterior\ndistribution. Our main contribution is the development of a Kullback-Leibler\n(KL) divergence-based criterion that simplifies hyperparameter tuning in\ntraining a classifier and makes the method nearly hyperparameter-free.\nSimulation studies validate the efficacy of our proposed methods.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"173 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parallelizing MCMC with Machine Learning Classifier and Its Criterion Based on Kullback-Leibler Divergence\",\"authors\":\"Tomoki Matsumoto, Yuichiro Kanazawa\",\"doi\":\"arxiv-2406.11246\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the era of Big Data, analyzing high-dimensional and large datasets\\npresents significant computational challenges. Although Bayesian statistics is\\nwell-suited for these complex data structures, Markov chain Monte Carlo (MCMC)\\nmethod, which are essential for Bayesian estimation, suffers from computation\\ncost because of its sequential nature. For faster and more effective\\ncomputation, this paper introduces an algorithm to enhance a parallelizing MCMC\\nmethod to handle this computation problem. We highlight the critical role of\\nthe overlapped area of posterior distributions after data partitioning, and\\npropose a method using a machine learning classifier to effectively identify\\nand extract MCMC draws from the area to approximate the actual posterior\\ndistribution. Our main contribution is the development of a Kullback-Leibler\\n(KL) divergence-based criterion that simplifies hyperparameter tuning in\\ntraining a classifier and makes the method nearly hyperparameter-free.\\nSimulation studies validate the efficacy of our proposed methods.\",\"PeriodicalId\":501215,\"journal\":{\"name\":\"arXiv - STAT - Computation\",\"volume\":\"173 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.11246\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.11246","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在大数据时代，分析高维和大型数据集给计算带来了巨大挑战。虽然贝叶斯统计法非常适合这些复杂的数据结构，但作为贝叶斯估计必不可少的马尔科夫链蒙特卡洛（MCMC）方法却因其顺序性而受到计算成本的困扰。为了实现更快、更有效的计算，本文介绍了一种增强并行化 MCMC 方法的算法，以解决这一计算问题。我们强调了数据分割后后验分布重叠区域的关键作用，并提出了一种使用机器学习分类器的方法，以有效识别和提取该区域的 MCMC 抽样，从而逼近实际的后验分布。我们的主要贡献是开发了基于库尔贝-莱布勒（KL）发散的准则，简化了分类器中超参数的调整，使该方法几乎不需要超参数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Parallelizing MCMC with Machine Learning Classifier and Its Criterion Based on Kullback-Leibler Divergence

In the era of Big Data, analyzing high-dimensional and large datasets presents significant computational challenges. Although Bayesian statistics is well-suited for these complex data structures, Markov chain Monte Carlo (MCMC) method, which are essential for Bayesian estimation, suffers from computation cost because of its sequential nature. For faster and more effective computation, this paper introduces an algorithm to enhance a parallelizing MCMC method to handle this computation problem. We highlight the critical role of the overlapped area of posterior distributions after data partitioning, and propose a method using a machine learning classifier to effectively identify and extract MCMC draws from the area to approximate the actual posterior distribution. Our main contribution is the development of a Kullback-Leibler (KL) divergence-based criterion that simplifies hyperparameter tuning in training a classifier and makes the method nearly hyperparameter-free. Simulation studies validate the efficacy of our proposed methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - STAT - Computation

自引率

0.00%

发文量