嵌入式应用中基于跟踪的分割阵列缓存设计

A. Tokarnia, Marina Tachibana
{"title":"嵌入式应用中基于跟踪的分割阵列缓存设计","authors":"A. Tokarnia, Marina Tachibana","doi":"10.1109/DSD.2010.33","DOIUrl":null,"url":null,"abstract":"Since many embedded systems execute a predefined set of programs, tuning system components to application programs and data is the approach chosen by many design techniques to optimize performance and power consumption. In this paper, we propose a method based on the analysis of accesses to vector, arrays, and other complex data structures to design a size-constrained two-partition array cache. This method reorganizes the ways of set-associative arrays caches into partitions with different line sizes and defines array-partition mappings so as to minimize the average memory access energy-delay product. Experimental results have shown that these split array caches have lower average energy-delay product for memory accesses as compared with unified set-associative array caches of the same size. For an MPEG-2 decoder, even with no parallel accesses to cache partitions, the average memory access energy-delay product of an 8K-byte trace-based split array cache is reduced by 50% as compared to that of the unified set-associative array cache with the lowest energy-delay product. If 25% of the accesses occur in pairs, there is an additional reduction of 9%.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Design of Trace-Based Split Array Caches for Embedded Applications\",\"authors\":\"A. Tokarnia, Marina Tachibana\",\"doi\":\"10.1109/DSD.2010.33\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since many embedded systems execute a predefined set of programs, tuning system components to application programs and data is the approach chosen by many design techniques to optimize performance and power consumption. In this paper, we propose a method based on the analysis of accesses to vector, arrays, and other complex data structures to design a size-constrained two-partition array cache. This method reorganizes the ways of set-associative arrays caches into partitions with different line sizes and defines array-partition mappings so as to minimize the average memory access energy-delay product. Experimental results have shown that these split array caches have lower average energy-delay product for memory accesses as compared with unified set-associative array caches of the same size. For an MPEG-2 decoder, even with no parallel accesses to cache partitions, the average memory access energy-delay product of an 8K-byte trace-based split array cache is reduced by 50% as compared to that of the unified set-associative array cache with the lowest energy-delay product. If 25% of the accesses occur in pairs, there is an additional reduction of 9%.\",\"PeriodicalId\":356885,\"journal\":{\"name\":\"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSD.2010.33\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2010.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

由于许多嵌入式系统执行一组预定义的程序,因此将系统组件调优到应用程序和数据是许多设计技术选择的方法,以优化性能和功耗。在本文中,我们提出了一种基于对向量、数组和其他复杂数据结构的访问分析的方法来设计一个大小受限的双分区数组缓存。该方法将集合关联数组缓存的方式重新组织成不同行大小的分区,并定义数组-分区映射,使平均存储器访问能量延迟积最小。实验结果表明,与相同大小的统一集合关联数组缓存相比,这些分割数组缓存具有更低的内存访问平均能量延迟积。对于MPEG-2解码器,即使没有并行访问缓存分区,与具有最低能量延迟积的统一集关联数组缓存相比,8k字节基于跟踪的分割数组缓存的平均内存访问能量延迟积减少了50%。如果25%的访问是成对进行的,则会额外减少9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Design of Trace-Based Split Array Caches for Embedded Applications
Since many embedded systems execute a predefined set of programs, tuning system components to application programs and data is the approach chosen by many design techniques to optimize performance and power consumption. In this paper, we propose a method based on the analysis of accesses to vector, arrays, and other complex data structures to design a size-constrained two-partition array cache. This method reorganizes the ways of set-associative arrays caches into partitions with different line sizes and defines array-partition mappings so as to minimize the average memory access energy-delay product. Experimental results have shown that these split array caches have lower average energy-delay product for memory accesses as compared with unified set-associative array caches of the same size. For an MPEG-2 decoder, even with no parallel accesses to cache partitions, the average memory access energy-delay product of an 8K-byte trace-based split array cache is reduced by 50% as compared to that of the unified set-associative array cache with the lowest energy-delay product. If 25% of the accesses occur in pairs, there is an additional reduction of 9%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Multicore SDR Architecture for Reconfigurable WiMAX Downlink Design of Testable Universal Logic Gate Targeting Minimum Wire-Crossings in QCA Logic Circuit Low Latency Recovery from Transient Faults for Pipelined Processor Architectures System Level Hardening by Computing with Matrices Reconfigurable Grid Alu Processor: Optimization and Design Space Exploration
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1