非对称多核处理器上对称特征问题的三对角化简

P. Alonso, Sandra Catalán, J. Herrero, E. S. Quintana‐Ortí, Rafael Rodríguez-Sánchez
{"title":"非对称多核处理器上对称特征问题的三对角化简","authors":"P. Alonso, Sandra Catalán, J. Herrero, E. S. Quintana‐Ortí, Rafael Rodríguez-Sánchez","doi":"10.1145/3026937.3026938","DOIUrl":null,"url":null,"abstract":"Asymmetric multicore processors (AMPs), as those present in ARM big.LITTLE technology, have been proposed as a means to address the end of Dennard power scaling law. The idea of these architectures is to activate only the type (and number) of cores that satisfy the quality of service requested by the application(s) in execution while delivering high energy efficiency. For dense linear algebra problems though, performance is of paramount importance, asking for an efficient use of all computational resources in the AMP. In response to this, we investigate how to exploit the asymmetric cores of an ARMv7 big.LITTLE AMP in order to attain high performance for the reduction to tridiagonal form, an essential step towards the solution of dense symmetric eigenvalue problems. The routine for this purpose in LAPACK is especially challenging, since half of its floating-point arithmetic operations (flops) are cast in terms of compute-bound kernels while the remaining half correspond to memory-bound kernels. To deal with this scenario: 1) we leverage a tuned implementation of the compute-bound kernels for AMPs; 2) we develop and parallelize new architecture-aware micro-kernels for the memory-bound kernels; 3) and we carefully adjust the type and number of cores to use at each step of the reduction procedure.","PeriodicalId":161677,"journal":{"name":"Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Reduction to Tridiagonal Form for Symmetric Eigenproblems on Asymmetric Multicore Processors\",\"authors\":\"P. Alonso, Sandra Catalán, J. Herrero, E. S. Quintana‐Ortí, Rafael Rodríguez-Sánchez\",\"doi\":\"10.1145/3026937.3026938\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Asymmetric multicore processors (AMPs), as those present in ARM big.LITTLE technology, have been proposed as a means to address the end of Dennard power scaling law. The idea of these architectures is to activate only the type (and number) of cores that satisfy the quality of service requested by the application(s) in execution while delivering high energy efficiency. For dense linear algebra problems though, performance is of paramount importance, asking for an efficient use of all computational resources in the AMP. In response to this, we investigate how to exploit the asymmetric cores of an ARMv7 big.LITTLE AMP in order to attain high performance for the reduction to tridiagonal form, an essential step towards the solution of dense symmetric eigenvalue problems. The routine for this purpose in LAPACK is especially challenging, since half of its floating-point arithmetic operations (flops) are cast in terms of compute-bound kernels while the remaining half correspond to memory-bound kernels. To deal with this scenario: 1) we leverage a tuned implementation of the compute-bound kernels for AMPs; 2) we develop and parallelize new architecture-aware micro-kernels for the memory-bound kernels; 3) and we carefully adjust the type and number of cores to use at each step of the reduction procedure.\",\"PeriodicalId\":161677,\"journal\":{\"name\":\"Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3026937.3026938\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3026937.3026938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

非对称多核处理器(amp),就像ARM中的那些大处理器一样。LITTLE技术,已被提出作为解决登纳德幂标度定律终结的一种手段。这些体系结构的思想是,在提供高能效的同时,仅激活满足应用程序在执行中所请求的服务质量的核心类型(和数量)。然而,对于密集线性代数问题,性能是至关重要的,要求有效利用AMP中的所有计算资源。为此,我们研究了如何利用ARMv7大处理器的非对称内核。为了获得高性能的简化到三对角线形式,这是解决密集对称特征值问题的重要一步。LAPACK中用于此目的的例程尤其具有挑战性,因为其一半的浮点算术运算(flops)是根据计算绑定的内核进行强制转换的,而其余一半则对应于内存绑定的内核。为了处理这种情况:1)我们利用amp的计算绑定内核的优化实现;2)针对内存约束内核,开发并并行化新的架构感知微内核;3)我们仔细调整在每一步的减少过程中使用的芯的类型和数量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Reduction to Tridiagonal Form for Symmetric Eigenproblems on Asymmetric Multicore Processors
Asymmetric multicore processors (AMPs), as those present in ARM big.LITTLE technology, have been proposed as a means to address the end of Dennard power scaling law. The idea of these architectures is to activate only the type (and number) of cores that satisfy the quality of service requested by the application(s) in execution while delivering high energy efficiency. For dense linear algebra problems though, performance is of paramount importance, asking for an efficient use of all computational resources in the AMP. In response to this, we investigate how to exploit the asymmetric cores of an ARMv7 big.LITTLE AMP in order to attain high performance for the reduction to tridiagonal form, an essential step towards the solution of dense symmetric eigenvalue problems. The routine for this purpose in LAPACK is especially challenging, since half of its floating-point arithmetic operations (flops) are cast in terms of compute-bound kernels while the remaining half correspond to memory-bound kernels. To deal with this scenario: 1) we leverage a tuned implementation of the compute-bound kernels for AMPs; 2) we develop and parallelize new architecture-aware micro-kernels for the memory-bound kernels; 3) and we carefully adjust the type and number of cores to use at each step of the reduction procedure.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
High Performance Detection of Strongly Connected Components in Sparse Graphs on GPUs PETRAS: Performance, Energy and Thermal Aware Resource Allocation and Scheduling for Heterogeneous Systems TaskInsight: Understanding Task Schedules Effects on Memory and Performance Towards Composable GPU Programming: Programming GPUs with Eager Actions and Lazy Views A high-performance portable abstract interface for explicit SIMD vectorization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1