Hardware and software performance in deep learning

Many-Core Computing: Hardware and Software Pub Date : 2019-06-03 DOI:10.1049/pbpc022e_ch6

Andrew Anderson, James Garland, Yuan Wen, B. Barabasz, Kaveena Persand, Aravind Vasudevan, David Gregg

引用次数: 0

Abstract

In recent years, deep neural networks (DNNs) have emerged as the most successful technology for many difficult problems in image, video, voice and text processing. DNNs are resource hungry and require very large amounts of computation and memory, which is a particular challenge on IoT, mobile and embedded systems. In this chapter, we outline some major performance challenges of DNNs such as computation, parallelism, data locality and memory requirements. We describe research on these problems, such as the use of existing high-performance linear algebra libraries, hardware acceleration, reduced-precision storage and arithmetic and sparse data representations. Finally, we discuss recent trends in adapting compiler and domain-specific program generation techniques to create high-performance parallel DNN programs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

深度学习中的软硬件性能

近年来，深度神经网络(dnn)已成为解决图像、视频、语音和文本处理中许多难题的最成功的技术。dnn需要大量的资源，需要大量的计算和内存，这对物联网、移动和嵌入式系统来说是一个特别的挑战。在本章中，我们概述了dnn的一些主要性能挑战，如计算，并行性，数据局部性和内存需求。我们描述了对这些问题的研究，例如使用现有的高性能线性代数库，硬件加速，降低精度的存储和算术以及稀疏数据表示。最后，我们讨论了采用编译器和特定领域程序生成技术来创建高性能并行DNN程序的最新趋势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Many-Core Computing: Hardware and Software

自引率

0.00%

发文量

期刊最新文献

Adaptive packet processing on CPU-GPU heterogeneous platforms From power-efficient to power-driven computing Biologically-inspired massively-parallel computing Developing portable embedded software for multicore systems through formal abstraction and refinement Many-core systems for big-data computing