移动计算机视觉中卷积神经网络的延迟和吞吐量表征

Proceedings of the 9th ACM Multimedia Systems Conference Pub Date : 2018-03-26 DOI:10.1145/3204949.3204975

Jussi Hanhirova, Teemu Kämäräinen, S. Seppälä, M. Siekkinen, V. Hirvisalo, Antti Ylä-Jääski

{"title":"移动计算机视觉中卷积神经网络的延迟和吞吐量表征","authors":"Jussi Hanhirova, Teemu Kämäräinen, S. Seppälä, M. Siekkinen, V. Hirvisalo, Antti Ylä-Jääski","doi":"10.1145/3204949.3204975","DOIUrl":null,"url":null,"abstract":"We study performance characteristics of convolutional neural networks (CNN) for mobile computer vision systems. CNNs have proven to be a powerful and efficient approach to implement such systems. However, the system performance depends largely on the utilization of hardware accelerators, which are able to speed up the execution of the underlying mathematical operations tremendously through massive parallelism. Our contribution is performance characterization of multiple CNN-based models for object recognition and detection with several different hardware platforms and software frameworks, using both local (on-device) and remote (network-side server) computation. The measurements are conducted using real workloads and real processing platforms. On the platform side, we concentrate especially on TensorFlow and TensorRT. Our measurements include embedded processors found on mobile devices and high-performance processors that can be used on the network side of mobile systems. We show that there exists significant latency-throughput trade-offs but the behavior is very complex. We demonstrate and discuss several factors that affect the performance and yield this complex behavior.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"79","resultStr":"{\"title\":\"Latency and throughput characterization of convolutional neural networks for mobile computer vision\",\"authors\":\"Jussi Hanhirova, Teemu Kämäräinen, S. Seppälä, M. Siekkinen, V. Hirvisalo, Antti Ylä-Jääski\",\"doi\":\"10.1145/3204949.3204975\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study performance characteristics of convolutional neural networks (CNN) for mobile computer vision systems. CNNs have proven to be a powerful and efficient approach to implement such systems. However, the system performance depends largely on the utilization of hardware accelerators, which are able to speed up the execution of the underlying mathematical operations tremendously through massive parallelism. Our contribution is performance characterization of multiple CNN-based models for object recognition and detection with several different hardware platforms and software frameworks, using both local (on-device) and remote (network-side server) computation. The measurements are conducted using real workloads and real processing platforms. On the platform side, we concentrate especially on TensorFlow and TensorRT. Our measurements include embedded processors found on mobile devices and high-performance processors that can be used on the network side of mobile systems. We show that there exists significant latency-throughput trade-offs but the behavior is very complex. We demonstrate and discuss several factors that affect the performance and yield this complex behavior.\",\"PeriodicalId\":141196,\"journal\":{\"name\":\"Proceedings of the 9th ACM Multimedia Systems Conference\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"79\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 9th ACM Multimedia Systems Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3204949.3204975\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th ACM Multimedia Systems Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3204949.3204975","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 79

摘要

我们研究了卷积神经网络(CNN)在移动计算机视觉系统中的性能特征。cnn已经被证明是实现这种系统的一种强大而有效的方法。然而，系统性能在很大程度上取决于硬件加速器的使用，硬件加速器能够通过大规模并行性极大地加快底层数学运算的执行。我们的贡献是在几种不同的硬件平台和软件框架下，使用本地(设备上)和远程(网络端服务器)计算，对多个基于cnn的对象识别和检测模型进行性能表征。这些测量是使用真实的工作负载和真实的处理平台进行的。在平台方面，我们特别关注TensorFlow和TensorRT。我们的测量包括移动设备上的嵌入式处理器和可用于移动系统网络端的高性能处理器。我们表明存在显著的延迟-吞吐量权衡，但行为非常复杂。我们演示并讨论了影响性能和产生这种复杂行为的几个因素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Latency and throughput characterization of convolutional neural networks for mobile computer vision

We study performance characteristics of convolutional neural networks (CNN) for mobile computer vision systems. CNNs have proven to be a powerful and efficient approach to implement such systems. However, the system performance depends largely on the utilization of hardware accelerators, which are able to speed up the execution of the underlying mathematical operations tremendously through massive parallelism. Our contribution is performance characterization of multiple CNN-based models for object recognition and detection with several different hardware platforms and software frameworks, using both local (on-device) and remote (network-side server) computation. The measurements are conducted using real workloads and real processing platforms. On the platform side, we concentrate especially on TensorFlow and TensorRT. Our measurements include embedded processors found on mobile devices and high-performance processors that can be used on the network side of mobile systems. We show that there exists significant latency-throughput trade-offs but the behavior is very complex. We demonstrate and discuss several factors that affect the performance and yield this complex behavior.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助