用于快速梯度增强树训练的可扩展硬件架构

Tamon Sadasue, Takuya Tanaka, Ryosuke Kasahara, Arief Darmawan, T. Isshiki
{"title":"用于快速梯度增强树训练的可扩展硬件架构","authors":"Tamon Sadasue, Takuya Tanaka, Ryosuke Kasahara, Arief Darmawan, T. Isshiki","doi":"10.2197/ipsjtsldm.14.11","DOIUrl":null,"url":null,"abstract":": Gradient Boosted Tree is a powerful machine learning method that supports both classification and regres- sion, and is widely used in fields requiring high-precision prediction, particularly for various types of tabular data sets. Owing to the recent increase in data size, the number of attributes, and the demand for frequent model updates, a fast and e ffi cient training is required. FPGA is suitable for acceleration with power e ffi ciency because it can realize a domain specific hardware architecture; however it is necessary to flexibly support many hyper-parameters to adapt to various dataset sizes, dataset properties, and system limitations such as memory capacity and logic capacity. We introduce a fully pipelined hardware implementation of Gradient Boosted Tree training and a design framework that enables a versatile hardware system description with high performance and flexibility to realize highly parameterized machine learning models. Experimental results show that our FPGA implementation achieves a 11- to 33-times faster performance and more than 300-times higher power e ffi ciency than a state-of-the-art GPU accelerated software implementation.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scalable Hardware Architecture for fast Gradient Boosted Tree Training\",\"authors\":\"Tamon Sadasue, Takuya Tanaka, Ryosuke Kasahara, Arief Darmawan, T. Isshiki\",\"doi\":\"10.2197/ipsjtsldm.14.11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Gradient Boosted Tree is a powerful machine learning method that supports both classification and regres- sion, and is widely used in fields requiring high-precision prediction, particularly for various types of tabular data sets. Owing to the recent increase in data size, the number of attributes, and the demand for frequent model updates, a fast and e ffi cient training is required. FPGA is suitable for acceleration with power e ffi ciency because it can realize a domain specific hardware architecture; however it is necessary to flexibly support many hyper-parameters to adapt to various dataset sizes, dataset properties, and system limitations such as memory capacity and logic capacity. We introduce a fully pipelined hardware implementation of Gradient Boosted Tree training and a design framework that enables a versatile hardware system description with high performance and flexibility to realize highly parameterized machine learning models. Experimental results show that our FPGA implementation achieves a 11- to 33-times faster performance and more than 300-times higher power e ffi ciency than a state-of-the-art GPU accelerated software implementation.\",\"PeriodicalId\":38964,\"journal\":{\"name\":\"IPSJ Transactions on System LSI Design Methodology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IPSJ Transactions on System LSI Design Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2197/ipsjtsldm.14.11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IPSJ Transactions on System LSI Design Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2197/ipsjtsldm.14.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0

摘要

梯度提升树是一种强大的机器学习方法,支持分类和回归,广泛应用于需要高精度预测的领域,特别是各种类型的表格数据集。由于最近数据大小、属性数量的增加以及对频繁模型更新的需求,需要快速有效的训练。FPGA可以实现特定领域的硬件架构,适合于功率效率的加速;然而,有必要灵活地支持许多超参数,以适应不同的数据集大小、数据集属性和系统限制,如内存容量和逻辑容量。我们介绍了一个完全流水线的梯度增强树训练硬件实现和一个设计框架,该框架能够实现高性能和灵活性的通用硬件系统描述,以实现高度参数化的机器学习模型。实验结果表明,我们的FPGA实现实现了比最先进的GPU加速软件实现快11到33倍的性能和300倍以上的功率效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Scalable Hardware Architecture for fast Gradient Boosted Tree Training
: Gradient Boosted Tree is a powerful machine learning method that supports both classification and regres- sion, and is widely used in fields requiring high-precision prediction, particularly for various types of tabular data sets. Owing to the recent increase in data size, the number of attributes, and the demand for frequent model updates, a fast and e ffi cient training is required. FPGA is suitable for acceleration with power e ffi ciency because it can realize a domain specific hardware architecture; however it is necessary to flexibly support many hyper-parameters to adapt to various dataset sizes, dataset properties, and system limitations such as memory capacity and logic capacity. We introduce a fully pipelined hardware implementation of Gradient Boosted Tree training and a design framework that enables a versatile hardware system description with high performance and flexibility to realize highly parameterized machine learning models. Experimental results show that our FPGA implementation achieves a 11- to 33-times faster performance and more than 300-times higher power e ffi ciency than a state-of-the-art GPU accelerated software implementation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IPSJ Transactions on System LSI Design Methodology
IPSJ Transactions on System LSI Design Methodology Engineering-Electrical and Electronic Engineering
CiteScore
1.20
自引率
0.00%
发文量
0
期刊最新文献
Measurement Results of Real Circuit Delay Degradation under Realistic Workload A CMOS-compatible Non-volatile Memory Element using Fishbone-in-cage Capacitor Parallelizing Random and SAT-based Verification Processes for Improving Toggle Coverage LLVM-C2RTL: C/C++ Based System Level RTL Design Framework Using LLVM Compiler Infrastructure Feature Vectors Based on Wire Width and Distance for Lithography Hotspot Detection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1