{"title":"Parallelizing MCMC with Machine Learning Classifier and Its Criterion Based on Kullback-Leibler Divergence","authors":"Tomoki Matsumoto, Yuichiro Kanazawa","doi":"arxiv-2406.11246","DOIUrl":null,"url":null,"abstract":"In the era of Big Data, analyzing high-dimensional and large datasets\npresents significant computational challenges. Although Bayesian statistics is\nwell-suited for these complex data structures, Markov chain Monte Carlo (MCMC)\nmethod, which are essential for Bayesian estimation, suffers from computation\ncost because of its sequential nature. For faster and more effective\ncomputation, this paper introduces an algorithm to enhance a parallelizing MCMC\nmethod to handle this computation problem. We highlight the critical role of\nthe overlapped area of posterior distributions after data partitioning, and\npropose a method using a machine learning classifier to effectively identify\nand extract MCMC draws from the area to approximate the actual posterior\ndistribution. Our main contribution is the development of a Kullback-Leibler\n(KL) divergence-based criterion that simplifies hyperparameter tuning in\ntraining a classifier and makes the method nearly hyperparameter-free.\nSimulation studies validate the efficacy of our proposed methods.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"173 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.11246","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the era of Big Data, analyzing high-dimensional and large datasets
presents significant computational challenges. Although Bayesian statistics is
well-suited for these complex data structures, Markov chain Monte Carlo (MCMC)
method, which are essential for Bayesian estimation, suffers from computation
cost because of its sequential nature. For faster and more effective
computation, this paper introduces an algorithm to enhance a parallelizing MCMC
method to handle this computation problem. We highlight the critical role of
the overlapped area of posterior distributions after data partitioning, and
propose a method using a machine learning classifier to effectively identify
and extract MCMC draws from the area to approximate the actual posterior
distribution. Our main contribution is the development of a Kullback-Leibler
(KL) divergence-based criterion that simplifies hyperparameter tuning in
training a classifier and makes the method nearly hyperparameter-free.
Simulation studies validate the efficacy of our proposed methods.