决策树上的单MCMC链并行化

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Annals of Mathematics and Artificial Intelligence Pub Date : 2023-07-02 DOI:10.1007/s10472-023-09876-9

Efthyvoulos Drousiotis, Paul Spirakis

{"title":"决策树上的单MCMC链并行化","authors":"Efthyvoulos Drousiotis, Paul Spirakis","doi":"10.1007/s10472-023-09876-9","DOIUrl":null,"url":null,"abstract":"<p>Decision trees (DT) are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic version that encodes prior assumptions about tree structures and shares statistical strength between node parameters. Existing work on Bayesian DT depends on Markov Chain Monte Carlo (MCMC), which can be computationally slow, especially on high dimensional data and expensive proposals. In this study, we propose a method to parallelise a single MCMC DT chain on an average laptop or personal computer that enables us to reduce its run-time through multi-core processing while the results are statistically identical to conventional sequential implementation. We also calculate the theoretical and practical reduction in run time, which can be obtained utilising our method on multi-processor architectures. Experiments showed that we could achieve 18 times faster running time provided that the serial and the parallel implementation are statistically identical.</p>","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"84 2","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2023-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Single MCMC chain parallelisation on decision trees\",\"authors\":\"Efthyvoulos Drousiotis, Paul Spirakis\",\"doi\":\"10.1007/s10472-023-09876-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Decision trees (DT) are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic version that encodes prior assumptions about tree structures and shares statistical strength between node parameters. Existing work on Bayesian DT depends on Markov Chain Monte Carlo (MCMC), which can be computationally slow, especially on high dimensional data and expensive proposals. In this study, we propose a method to parallelise a single MCMC DT chain on an average laptop or personal computer that enables us to reduce its run-time through multi-core processing while the results are statistically identical to conventional sequential implementation. We also calculate the theoretical and practical reduction in run time, which can be obtained utilising our method on multi-processor architectures. Experiments showed that we could achieve 18 times faster running time provided that the serial and the parallel implementation are statistically identical.</p>\",\"PeriodicalId\":7971,\"journal\":{\"name\":\"Annals of Mathematics and Artificial Intelligence\",\"volume\":\"84 2\",\"pages\":\"\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2023-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Mathematics and Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10472-023-09876-9\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Mathematics and Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10472-023-09876-9","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

决策树(DT)在机器学习中非常有名，通常可以获得最先进的性能。尽管如此，CART、ID3、随机森林和增强树等众所周知的变体都错过了一个概率版本，该版本对树结构的先验假设进行编码，并在节点参数之间共享统计强度。现有的关于贝叶斯DT的工作依赖于马尔可夫链蒙特卡罗(MCMC)，它的计算速度很慢，特别是在高维数据和昂贵的建议上。在本研究中，我们提出了一种在普通笔记本电脑或个人计算机上并行化单个MCMC DT链的方法，该方法使我们能够通过多核处理减少其运行时间，同时结果在统计上与传统的顺序实现相同。我们还计算了理论和实际的运行时间减少，可以利用我们的方法在多处理器架构上获得。实验表明，在串行实现和并行实现统计相同的情况下，我们可以将运行时间提高18倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Single MCMC chain parallelisation on decision trees

Decision trees (DT) are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic version that encodes prior assumptions about tree structures and shares statistical strength between node parameters. Existing work on Bayesian DT depends on Markov Chain Monte Carlo (MCMC), which can be computationally slow, especially on high dimensional data and expensive proposals. In this study, we propose a method to parallelise a single MCMC DT chain on an average laptop or personal computer that enables us to reduce its run-time through multi-core processing while the results are statistically identical to conventional sequential implementation. We also calculate the theoretical and practical reduction in run time, which can be obtained utilising our method on multi-processor architectures. Experiments showed that we could achieve 18 times faster running time provided that the serial and the parallel implementation are statistically identical.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annals of Mathematics and Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

3.00

自引率

8.30%

发文量

审稿时长

>12 weeks

期刊介绍： Annals of Mathematics and Artificial Intelligence presents a range of topics of concern to scholars applying quantitative, combinatorial, logical, algebraic and algorithmic methods to diverse areas of Artificial Intelligence, from decision support, automated deduction, and reasoning, to knowledge-based systems, machine learning, computer vision, robotics and planning. The journal features collections of papers appearing either in volumes (400 pages) or in separate issues (100-300 pages), which focus on one topic and have one or more guest editors. Annals of Mathematics and Artificial Intelligence hopes to influence the spawning of new areas of applied mathematics and strengthen the scientific underpinnings of Artificial Intelligence.

期刊最新文献

Time-penalised trees (TpT): introducing a new tree-based data mining algorithm for time-varying covariates Conformal test martingales for hypergraphical models Costly information providing in binary contests Tumato 2.0 - a constraint-based planning approach for safe and robust robot behavior Calibration methods in imbalanced binary classification