决策树上的单MCMC链并行化

IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Annals of Mathematics and Artificial Intelligence Pub Date : 2023-07-02 DOI:10.1007/s10472-023-09876-9
Efthyvoulos Drousiotis, Paul Spirakis
{"title":"决策树上的单MCMC链并行化","authors":"Efthyvoulos Drousiotis, Paul Spirakis","doi":"10.1007/s10472-023-09876-9","DOIUrl":null,"url":null,"abstract":"<p>Decision trees (DT) are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic version that encodes prior assumptions about tree structures and shares statistical strength between node parameters. Existing work on Bayesian DT depends on Markov Chain Monte Carlo (MCMC), which can be computationally slow, especially on high dimensional data and expensive proposals. In this study, we propose a method to parallelise a single MCMC DT chain on an average laptop or personal computer that enables us to reduce its run-time through multi-core processing while the results are statistically identical to conventional sequential implementation. We also calculate the theoretical and practical reduction in run time, which can be obtained utilising our method on multi-processor architectures. Experiments showed that we could achieve 18 times faster running time provided that the serial and the parallel implementation are statistically identical.</p>","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"84 2","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2023-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Single MCMC chain parallelisation on decision trees\",\"authors\":\"Efthyvoulos Drousiotis, Paul Spirakis\",\"doi\":\"10.1007/s10472-023-09876-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Decision trees (DT) are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic version that encodes prior assumptions about tree structures and shares statistical strength between node parameters. Existing work on Bayesian DT depends on Markov Chain Monte Carlo (MCMC), which can be computationally slow, especially on high dimensional data and expensive proposals. In this study, we propose a method to parallelise a single MCMC DT chain on an average laptop or personal computer that enables us to reduce its run-time through multi-core processing while the results are statistically identical to conventional sequential implementation. We also calculate the theoretical and practical reduction in run time, which can be obtained utilising our method on multi-processor architectures. Experiments showed that we could achieve 18 times faster running time provided that the serial and the parallel implementation are statistically identical.</p>\",\"PeriodicalId\":7971,\"journal\":{\"name\":\"Annals of Mathematics and Artificial Intelligence\",\"volume\":\"84 2\",\"pages\":\"\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2023-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Mathematics and Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10472-023-09876-9\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Mathematics and Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10472-023-09876-9","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

决策树(DT)在机器学习中非常有名,通常可以获得最先进的性能。尽管如此,CART、ID3、随机森林和增强树等众所周知的变体都错过了一个概率版本,该版本对树结构的先验假设进行编码,并在节点参数之间共享统计强度。现有的关于贝叶斯DT的工作依赖于马尔可夫链蒙特卡罗(MCMC),它的计算速度很慢,特别是在高维数据和昂贵的建议上。在本研究中,我们提出了一种在普通笔记本电脑或个人计算机上并行化单个MCMC DT链的方法,该方法使我们能够通过多核处理减少其运行时间,同时结果在统计上与传统的顺序实现相同。我们还计算了理论和实际的运行时间减少,可以利用我们的方法在多处理器架构上获得。实验表明,在串行实现和并行实现统计相同的情况下,我们可以将运行时间提高18倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Single MCMC chain parallelisation on decision trees

Decision trees (DT) are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic version that encodes prior assumptions about tree structures and shares statistical strength between node parameters. Existing work on Bayesian DT depends on Markov Chain Monte Carlo (MCMC), which can be computationally slow, especially on high dimensional data and expensive proposals. In this study, we propose a method to parallelise a single MCMC DT chain on an average laptop or personal computer that enables us to reduce its run-time through multi-core processing while the results are statistically identical to conventional sequential implementation. We also calculate the theoretical and practical reduction in run time, which can be obtained utilising our method on multi-processor architectures. Experiments showed that we could achieve 18 times faster running time provided that the serial and the parallel implementation are statistically identical.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Annals of Mathematics and Artificial Intelligence
Annals of Mathematics and Artificial Intelligence 工程技术-计算机:人工智能
CiteScore
3.00
自引率
8.30%
发文量
37
审稿时长
>12 weeks
期刊介绍: Annals of Mathematics and Artificial Intelligence presents a range of topics of concern to scholars applying quantitative, combinatorial, logical, algebraic and algorithmic methods to diverse areas of Artificial Intelligence, from decision support, automated deduction, and reasoning, to knowledge-based systems, machine learning, computer vision, robotics and planning. The journal features collections of papers appearing either in volumes (400 pages) or in separate issues (100-300 pages), which focus on one topic and have one or more guest editors. Annals of Mathematics and Artificial Intelligence hopes to influence the spawning of new areas of applied mathematics and strengthen the scientific underpinnings of Artificial Intelligence.
期刊最新文献
Time-penalised trees (TpT): introducing a new tree-based data mining algorithm for time-varying covariates Conformal test martingales for hypergraphical models Costly information providing in binary contests Tumato 2.0 - a constraint-based planning approach for safe and robust robot behavior Calibration methods in imbalanced binary classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1