在OpenCL框架下实现深度学习神经网络对象检测的异构系统

Shuai Li, Yukui Luo, K. Sun, K. Choi
{"title":"在OpenCL框架下实现深度学习神经网络对象检测的异构系统","authors":"Shuai Li, Yukui Luo, K. Sun, K. Choi","doi":"10.23919/ELINFOCOM.2018.8330645","DOIUrl":null,"url":null,"abstract":"One of the major challenges in these days is \"How can we implement up-to-date object detection algorithm in the heterogeneous system?\" As in 2012 Visual Object Classes Challenge (VOC)[1] have achieved a very satisfied performance of deep learning neural network (DNN) algorithm, but it depends on CUDA [2] GPU framework and can only be applied on NVIDIA accelerators. We prefer to use a more generic acceleration framework, OpenCL [3] is a golden key to achieve the requirement. Instead of CUDA for NVIDIA GPU only, OpenCL can be applied to the heterogeneous system including CPU, GPU, DSP, FPGA, etc. Heterogeneous systems are more flexible, some of them are designed for portable devices, and some are designed for low power parallel computation. These special devices play a very important role in modern life. In this paper, we present OpenCL based heterogeneous system implementation and apply DNN framework in two typical heterogeneous systems: portable system and FPGA system. Our work shows following contributions: (1) We implement a generic OpenCL based DNN object recognition framework which can executed on general GPUs (AMD, NVIDIA, etc). (2) We implement our framework on embedded system Odroid XU4 [4] by using multiple GPUs and increase 25.8% processing time. (3) We implement our framework on FPGA system and reduce the power consumption by 84.3% compared with TitanXGPU.","PeriodicalId":413646,"journal":{"name":"2018 International Conference on Electronics, Information, and Communication (ICEIC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Heterogeneous system implementation of deep learning neural network for object detection in OpenCL framework\",\"authors\":\"Shuai Li, Yukui Luo, K. Sun, K. Choi\",\"doi\":\"10.23919/ELINFOCOM.2018.8330645\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the major challenges in these days is \\\"How can we implement up-to-date object detection algorithm in the heterogeneous system?\\\" As in 2012 Visual Object Classes Challenge (VOC)[1] have achieved a very satisfied performance of deep learning neural network (DNN) algorithm, but it depends on CUDA [2] GPU framework and can only be applied on NVIDIA accelerators. We prefer to use a more generic acceleration framework, OpenCL [3] is a golden key to achieve the requirement. Instead of CUDA for NVIDIA GPU only, OpenCL can be applied to the heterogeneous system including CPU, GPU, DSP, FPGA, etc. Heterogeneous systems are more flexible, some of them are designed for portable devices, and some are designed for low power parallel computation. These special devices play a very important role in modern life. In this paper, we present OpenCL based heterogeneous system implementation and apply DNN framework in two typical heterogeneous systems: portable system and FPGA system. Our work shows following contributions: (1) We implement a generic OpenCL based DNN object recognition framework which can executed on general GPUs (AMD, NVIDIA, etc). (2) We implement our framework on embedded system Odroid XU4 [4] by using multiple GPUs and increase 25.8% processing time. (3) We implement our framework on FPGA system and reduce the power consumption by 84.3% compared with TitanXGPU.\",\"PeriodicalId\":413646,\"journal\":{\"name\":\"2018 International Conference on Electronics, Information, and Communication (ICEIC)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Electronics, Information, and Communication (ICEIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/ELINFOCOM.2018.8330645\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Electronics, Information, and Communication (ICEIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ELINFOCOM.2018.8330645","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

目前的主要挑战之一是“我们如何在异构系统中实现最新的目标检测算法?”如2012年Visual Object Classes Challenge (VOC)[1]已经实现了非常令人满意的深度学习神经网络(DNN)算法性能,但它依赖于CUDA [2] GPU框架,只能应用在NVIDIA加速器上。我们更倾向于使用更通用的加速框架,OpenCL[3]是实现这一需求的金钥匙。OpenCL可以应用于包括CPU、GPU、DSP、FPGA等在内的异构系统,而不是仅针对NVIDIA GPU的CUDA。异构系统更加灵活,有些是为便携式设备设计的,有些是为低功耗并行计算设计的。这些特殊的设备在现代生活中起着非常重要的作用。本文提出了基于OpenCL的异构系统实现,并将深度神经网络框架应用于两种典型的异构系统:便携式系统和FPGA系统。我们的工作显示了以下贡献:(1)我们实现了一个通用的基于OpenCL的DNN对象识别框架,该框架可以在通用gpu (AMD, NVIDIA等)上执行。(2)我们在嵌入式系统Odroid XU4[4]上使用多个gpu实现了我们的框架,处理时间提高了25.8%。(3)我们在FPGA系统上实现了该框架,与TitanXGPU相比,功耗降低了84.3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Heterogeneous system implementation of deep learning neural network for object detection in OpenCL framework
One of the major challenges in these days is "How can we implement up-to-date object detection algorithm in the heterogeneous system?" As in 2012 Visual Object Classes Challenge (VOC)[1] have achieved a very satisfied performance of deep learning neural network (DNN) algorithm, but it depends on CUDA [2] GPU framework and can only be applied on NVIDIA accelerators. We prefer to use a more generic acceleration framework, OpenCL [3] is a golden key to achieve the requirement. Instead of CUDA for NVIDIA GPU only, OpenCL can be applied to the heterogeneous system including CPU, GPU, DSP, FPGA, etc. Heterogeneous systems are more flexible, some of them are designed for portable devices, and some are designed for low power parallel computation. These special devices play a very important role in modern life. In this paper, we present OpenCL based heterogeneous system implementation and apply DNN framework in two typical heterogeneous systems: portable system and FPGA system. Our work shows following contributions: (1) We implement a generic OpenCL based DNN object recognition framework which can executed on general GPUs (AMD, NVIDIA, etc). (2) We implement our framework on embedded system Odroid XU4 [4] by using multiple GPUs and increase 25.8% processing time. (3) We implement our framework on FPGA system and reduce the power consumption by 84.3% compared with TitanXGPU.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Sensing voltage compensation circuit for low-power dram bit-line sense amplifier Coordinate-RNN for error correction on numerical weather prediction Pulsed PMOS sense amplifier for high speed single-ended SRAM An estimation of road surface conditions using participatory sensing Cycle-accurate full system simulation for CPU+GPU+HBM computing platform
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1