VIP: Virtualizing IP chains on handheld platforms

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2015-06-13 DOI:10.1145/2749469.2750382

N. Nachiappan, Haibo Zhang, Jihyun Ryoo, N. Soundararajan, A. Sivasubramaniam, M. Kandemir, Ravishankar R. Iyer, C. Das

{"title":"VIP: Virtualizing IP chains on handheld platforms","authors":"N. Nachiappan, Haibo Zhang, Jihyun Ryoo, N. Soundararajan, A. Sivasubramaniam, M. Kandemir, Ravishankar R. Iyer, C. Das","doi":"10.1145/2749469.2750382","DOIUrl":null,"url":null,"abstract":"Energy-efficient user-interactive and display-oriented applications on handhelds rely heavily on multiple accelerators (termed IP cores) to meet their periodic frame processing needs. Further, these platforms are starting to host multiple applications concurrently on the multiple CPU cores. Unfortunately, today's hardware exposes an interface that forces the host software (Android drivers) to treat each IP core as an isolated device. Consequently, the host CPU has to get involved in the (i) processing of each frame, (ii) scheduling them to ensure timely progress through the IP cores to meet their QoS needs, and (iii) explicitly having to move data from one IP core to the next, with main memory serving as the common staging area. We show in this paper through measurements on a Nexus 7 platform that the frequent invocation of the CPU for processing these frames and the involvement of main memory as a data flow conduit, are serious limitations. Instead, we propose a novel IP virtualization framework (VIP), involving three key ideas that allow several IPs to be chained together and made to appear to the software as a single device. First, chaining of IPs avoids data transfer through the memory system, enhancing the throughput of flows through the IPs. Second, by using a burst-mode, the CPU can initiate the processing of several frames through the virtual IP chain, without getting involved (and interrupted) for each frame, thereby allowing better energy saving and utilization opportunities. Removing the CPU from this loop, requires alternate orchestration of frame flows to ensure QoS guarantees for each frame of each application. Our third enhancement in VIP creates several virtual paths, one for each flow, through these IP chains with the hardware scheduling the frames to enforce QoS guarantees despite any contention for resources along the way. Our experimental evaluations demonstrate the effectiveness of VIP on energy consumption and QoS for multiple applications.","PeriodicalId":6878,"journal":{"name":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","volume":"3 1","pages":"655-667"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2749469.2750382","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

Abstract

Energy-efficient user-interactive and display-oriented applications on handhelds rely heavily on multiple accelerators (termed IP cores) to meet their periodic frame processing needs. Further, these platforms are starting to host multiple applications concurrently on the multiple CPU cores. Unfortunately, today's hardware exposes an interface that forces the host software (Android drivers) to treat each IP core as an isolated device. Consequently, the host CPU has to get involved in the (i) processing of each frame, (ii) scheduling them to ensure timely progress through the IP cores to meet their QoS needs, and (iii) explicitly having to move data from one IP core to the next, with main memory serving as the common staging area. We show in this paper through measurements on a Nexus 7 platform that the frequent invocation of the CPU for processing these frames and the involvement of main memory as a data flow conduit, are serious limitations. Instead, we propose a novel IP virtualization framework (VIP), involving three key ideas that allow several IPs to be chained together and made to appear to the software as a single device. First, chaining of IPs avoids data transfer through the memory system, enhancing the throughput of flows through the IPs. Second, by using a burst-mode, the CPU can initiate the processing of several frames through the virtual IP chain, without getting involved (and interrupted) for each frame, thereby allowing better energy saving and utilization opportunities. Removing the CPU from this loop, requires alternate orchestration of frame flows to ensure QoS guarantees for each frame of each application. Our third enhancement in VIP creates several virtual paths, one for each flow, through these IP chains with the hardware scheduling the frames to enforce QoS guarantees despite any contention for resources along the way. Our experimental evaluations demonstrate the effectiveness of VIP on energy consumption and QoS for multiple applications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

VIP:在手持平台虚拟化IP链

手持设备上节能的用户交互和面向显示的应用程序严重依赖多个加速器(称为IP核)来满足其周期性帧处理需求。此外，这些平台开始在多个CPU核心上并发地托管多个应用程序。不幸的是，今天的硬件暴露了一个接口，迫使主机软件(Android驱动程序)将每个IP核视为一个孤立的设备。因此，主机CPU必须参与(i)处理每一帧，(ii)调度它们以确保及时通过IP核以满足它们的QoS需求，以及(iii)明确地将数据从一个IP核移动到下一个，主存储器作为公共staging区域。在本文中，我们通过在Nexus 7平台上的测量表明，频繁调用CPU来处理这些帧以及主存储器作为数据流管道的参与是严重的限制。相反，我们提出了一种新颖的IP虚拟化框架(VIP)，涉及三个关键思想，允许多个IP链接在一起，并使软件作为单个设备出现。首先，ip链避免了数据通过存储系统传输，提高了ip流的吞吐量。其次，通过使用突发模式，CPU可以通过虚拟IP链启动多个帧的处理，而不涉及(和中断)每一帧，从而提供更好的节能和利用机会。将CPU从这个循环中移除，需要对帧流进行备用编排，以确保每个应用程序的每个帧都有QoS保证。我们在VIP中的第三个增强创建了几个虚拟路径，每个流一个，通过这些IP链，通过硬件调度帧来强制QoS保证，尽管沿途存在资源争用。我们的实验评估证明了VIP在多种应用中对能耗和QoS的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)

自引率

0.00%

发文量

期刊最新文献

Redundant Memory Mappings for fast access to large memories Multiple Clone Row DRAM: A low latency and area optimized DRAM Manycore Network Interfaces for in-memory rack-scale computing Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures ShiDianNao: Shifting vision processing closer to the sensor