Short-Circuiting Memory Traffic in Handheld Platforms

Praveen Yedlapalli, N. Nachiappan, N. Soundararajan, A. Sivasubramaniam, M. Kandemir, C. Das
{"title":"Short-Circuiting Memory Traffic in Handheld Platforms","authors":"Praveen Yedlapalli, N. Nachiappan, N. Soundararajan, A. Sivasubramaniam, M. Kandemir, C. Das","doi":"10.1109/MICRO.2014.60","DOIUrl":null,"url":null,"abstract":"Handheld devices are ubiquitous in today's world. With their advent, we also see a tremendous increase in device-user interactivity and real-time data processing needs. Media (audio/video/camera) and gaming use-cases are gaining substantial user attention and are defining product successes. The combination of increasing demand from these use-cases and having to run them at low power (from a battery) means that architects have to carefully study the applications and optimize the hardware and software stack together to gain significant optimizations. In this work, we study workloads from these domains and identify the memory subsystem (system agent) to be a critical bottleneck to performance scaling. We characterize the lifetime of the \"frame-based\" data used in these workloads through the system and show that, by communicating at frame granularity, we miss significant performance optimization opportunities, caused by large IP-to-IP data reuse distances. By carefully breaking these frames into sub-frames, while maintaining correctness, we demonstrate substantial gains with limited hardware requirements. Specifically, we evaluate two techniques, flow-buffering and IP-IP short-circuiting, and show that these techniques bring both power-performance benefits and enhanced user experience.","PeriodicalId":6591,"journal":{"name":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","volume":"106 1 Suppl 1","pages":"166-177"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MICRO.2014.60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

Handheld devices are ubiquitous in today's world. With their advent, we also see a tremendous increase in device-user interactivity and real-time data processing needs. Media (audio/video/camera) and gaming use-cases are gaining substantial user attention and are defining product successes. The combination of increasing demand from these use-cases and having to run them at low power (from a battery) means that architects have to carefully study the applications and optimize the hardware and software stack together to gain significant optimizations. In this work, we study workloads from these domains and identify the memory subsystem (system agent) to be a critical bottleneck to performance scaling. We characterize the lifetime of the "frame-based" data used in these workloads through the system and show that, by communicating at frame granularity, we miss significant performance optimization opportunities, caused by large IP-to-IP data reuse distances. By carefully breaking these frames into sub-frames, while maintaining correctness, we demonstrate substantial gains with limited hardware requirements. Specifically, we evaluate two techniques, flow-buffering and IP-IP short-circuiting, and show that these techniques bring both power-performance benefits and enhanced user experience.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
手持平台中的内存流量短路
手持设备在当今世界无处不在。随着它们的出现,我们也看到了设备-用户交互性和实时数据处理需求的巨大增长。媒体(音频/视频/相机)和游戏用例获得了大量用户的关注,并决定了产品的成功。这些用例不断增长的需求和必须以低功耗(来自电池)运行它们的组合意味着架构师必须仔细研究应用程序,并同时优化硬件和软件堆栈,以获得显著的优化。在这项工作中,我们研究了这些领域的工作负载,并确定内存子系统(系统代理)是性能扩展的关键瓶颈。我们描述了通过系统在这些工作负载中使用的“基于帧”的数据的生命周期,并表明,通过以帧粒度进行通信,我们错过了重要的性能优化机会,这是由大的ip到ip数据重用距离造成的。通过小心地将这些帧分解成子帧,同时保持正确性,我们可以在有限的硬件需求下获得可观的收益。具体来说,我们评估了两种技术,流量缓冲和IP-IP短路,并表明这些技术既带来了功率性能优势,又增强了用户体验。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Specializing Compiler Optimizations through Programmable Composition for Dense Matrix Computations Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers Equalizer: Dynamic Tuning of GPU Resources for Efficient Execution Harnessing Soft Computations for Low-Budget Fault Tolerance
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1