Hardware and software performance in deep learning

Andrew Anderson, James Garland, Yuan Wen, B. Barabasz, Kaveena Persand, Aravind Vasudevan, David Gregg
{"title":"Hardware and software performance in deep learning","authors":"Andrew Anderson, James Garland, Yuan Wen, B. Barabasz, Kaveena Persand, Aravind Vasudevan, David Gregg","doi":"10.1049/pbpc022e_ch6","DOIUrl":null,"url":null,"abstract":"In recent years, deep neural networks (DNNs) have emerged as the most successful technology for many difficult problems in image, video, voice and text processing. DNNs are resource hungry and require very large amounts of computation and memory, which is a particular challenge on IoT, mobile and embedded systems. In this chapter, we outline some major performance challenges of DNNs such as computation, parallelism, data locality and memory requirements. We describe research on these problems, such as the use of existing high-performance linear algebra libraries, hardware acceleration, reduced-precision storage and arithmetic and sparse data representations. Finally, we discuss recent trends in adapting compiler and domain-specific program generation techniques to create high-performance parallel DNN programs.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Many-Core Computing: Hardware and Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/pbpc022e_ch6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, deep neural networks (DNNs) have emerged as the most successful technology for many difficult problems in image, video, voice and text processing. DNNs are resource hungry and require very large amounts of computation and memory, which is a particular challenge on IoT, mobile and embedded systems. In this chapter, we outline some major performance challenges of DNNs such as computation, parallelism, data locality and memory requirements. We describe research on these problems, such as the use of existing high-performance linear algebra libraries, hardware acceleration, reduced-precision storage and arithmetic and sparse data representations. Finally, we discuss recent trends in adapting compiler and domain-specific program generation techniques to create high-performance parallel DNN programs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
深度学习中的软硬件性能
近年来,深度神经网络(dnn)已成为解决图像、视频、语音和文本处理中许多难题的最成功的技术。dnn需要大量的资源,需要大量的计算和内存,这对物联网、移动和嵌入式系统来说是一个特别的挑战。在本章中,我们概述了dnn的一些主要性能挑战,如计算,并行性,数据局部性和内存需求。我们描述了对这些问题的研究,例如使用现有的高性能线性代数库,硬件加速,降低精度的存储和算术以及稀疏数据表示。最后,我们讨论了采用编译器和特定领域程序生成技术来创建高性能并行DNN程序的最新趋势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Adaptive packet processing on CPU-GPU heterogeneous platforms From power-efficient to power-driven computing Biologically-inspired massively-parallel computing Developing portable embedded software for multicore systems through formal abstraction and refinement Many-core systems for big-data computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1