A Study of Core Utilization and Residency in Heterogeneous Smart Phone Architectures

Joseph Whitehouse, Qinzhe Wu, Shuang Song, E. John, A. Gerstlauer, L. John
{"title":"A Study of Core Utilization and Residency in Heterogeneous Smart Phone Architectures","authors":"Joseph Whitehouse, Qinzhe Wu, Shuang Song, E. John, A. Gerstlauer, L. John","doi":"10.1145/3297663.3310304","DOIUrl":null,"url":null,"abstract":"In recent years, the smart phone platform has seen a rise in the number of cores and the use of heterogeneous clusters as in the Qualcomm Snapdragon, Apple A10 and the Samsung Exynos processors. This paper attempts to understand characteristics of mobile workloads, with measurements on heterogeneous multicore phone platforms with big and little cores. It answers questions such as the following: (i) Do smart phones need multiple cores of different types (eg: big or little)? (ii) Is it energy-efficient to operate with more cores (with less time) or fewer cores even if it might take longer? (iii)What are the best frequencies to operate the cores considering energy efficiency? (iv) Do mobile applications need out-of-order speculative execution cores with complex branch prediction? (v) Is IPC a good performance indicator for early design tradeoff evaluation while working on mobile processor design? Using Geekbench and more than 3 dozen Android applications, and the Workload Automation tool from ARM, we measure core utilization, frequency residencies, and energy efficiency characteristics on two leading edge smart phones. Many characteristics of smartphone platforms are presented, and architectural implications of the observations as well as design considerations for future mobile processors are discussed. A key insight is that multiple big and complex cores are beneficial both from a performance as well as an energy point of view in certain scenarios. It is seen that 4 big cores are utilized during application launch and update phases of applications. Similarly, reboot using all 4 cores at maximum performance provides latency advantages. However, it consumes higher power and energy, and reboot with 2 cores was seen to be more energy efficient than reboot with 1 or 4 cores. Furthermore, inaccurate branch prediction is seen to result in up to 40% mis-speculated instructions in many applications, suggesting that it is important to improve the accuracy of branch predictors in mobile processors. While absolute IPCs are observed to be a poor predictor of benchmark scores, relative IPCs are useful for estimating the impact of microarchitectural changes on benchmark scores.","PeriodicalId":273447,"journal":{"name":"Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3297663.3310304","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In recent years, the smart phone platform has seen a rise in the number of cores and the use of heterogeneous clusters as in the Qualcomm Snapdragon, Apple A10 and the Samsung Exynos processors. This paper attempts to understand characteristics of mobile workloads, with measurements on heterogeneous multicore phone platforms with big and little cores. It answers questions such as the following: (i) Do smart phones need multiple cores of different types (eg: big or little)? (ii) Is it energy-efficient to operate with more cores (with less time) or fewer cores even if it might take longer? (iii)What are the best frequencies to operate the cores considering energy efficiency? (iv) Do mobile applications need out-of-order speculative execution cores with complex branch prediction? (v) Is IPC a good performance indicator for early design tradeoff evaluation while working on mobile processor design? Using Geekbench and more than 3 dozen Android applications, and the Workload Automation tool from ARM, we measure core utilization, frequency residencies, and energy efficiency characteristics on two leading edge smart phones. Many characteristics of smartphone platforms are presented, and architectural implications of the observations as well as design considerations for future mobile processors are discussed. A key insight is that multiple big and complex cores are beneficial both from a performance as well as an energy point of view in certain scenarios. It is seen that 4 big cores are utilized during application launch and update phases of applications. Similarly, reboot using all 4 cores at maximum performance provides latency advantages. However, it consumes higher power and energy, and reboot with 2 cores was seen to be more energy efficient than reboot with 1 or 4 cores. Furthermore, inaccurate branch prediction is seen to result in up to 40% mis-speculated instructions in many applications, suggesting that it is important to improve the accuracy of branch predictors in mobile processors. While absolute IPCs are observed to be a poor predictor of benchmark scores, relative IPCs are useful for estimating the impact of microarchitectural changes on benchmark scores.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
异构智能手机架构中的核心利用与驻留研究
近年来,智能手机平台的核心数量和异构集群的使用有所增加,如高通骁龙、苹果A10和三星Exynos处理器。本文试图通过对大核和小核异构多核手机平台的测量来了解移动工作负载的特征。它回答了以下问题:(i)智能手机是否需要不同类型的多个内核(例如:大的还是小的)?(ii)使用更多核(更少时间)或使用更少核(即使可能需要更长的时间)是否更节能?(iii)考虑到能源效率,运行核心的最佳频率是什么?(iv)移动应用是否需要无序的推测执行核心和复杂的分支预测?(v)在移动处理器设计时,IPC是否是早期设计权衡评估的良好性能指标?使用Geekbench和30多个Android应用程序,以及ARM的Workload Automation工具,我们测量了两款前沿智能手机的核心利用率、频率驻留和能效特征。介绍了智能手机平台的许多特征,并讨论了观察结果的架构含义以及对未来移动处理器的设计考虑。一个关键的见解是,在某些情况下,从性能和能量的角度来看,多个大而复杂的核心都是有益的。可以看出,在应用程序的启动和更新阶段,使用了4个大内核。类似地,在最大性能下使用所有4个核心进行重启,可以提供延迟优势。然而,它消耗更高的功率和能量,2核重启被认为比1核或4核重启更节能。此外,在许多应用中,不准确的分支预测会导致高达40%的错误推测指令,这表明提高移动处理器中分支预测器的准确性非常重要。虽然绝对ipc不能很好地预测基准分数,但相对ipc对于估计微架构更改对基准分数的影响是有用的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Performance Evaluation of Multi-Path TCP for Data Center and Cloud Workloads Cachematic - Automatic Invalidation in Application-Level Caching Systems Memory Centric Characterization and Analysis of SPEC CPU2017 Suite Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects Yardstick: A Benchmark for Minecraft-like Services
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1