HCMA: fpga中使用刮板存储器支持高并发内存访问

2019 IEEE International Conference on Networking, Architecture and Storage (NAS) Pub Date : 2019-08-01 DOI:10.1109/NAS.2019.8834726

Yangyang Zhao, Yuhang Liu, Wei Li, Mingyu Chen

{"title":"HCMA: fpga中使用刮板存储器支持高并发内存访问","authors":"Yangyang Zhao, Yuhang Liu, Wei Li, Mingyu Chen","doi":"10.1109/NAS.2019.8834726","DOIUrl":null,"url":null,"abstract":"Currently many researches focus on new methods of accelerating memory accesses between memory controller and memory modules. However, the absence of an accelerator for memory accesses between CPU and memory controller wastes the performance benefits of new methods. Therefore, we propose a coordinated batch method to support high concurrency of memory accesses (HCMA). Compared to the conventional method of holding outstanding memory access requests in miss status handling registers (MSHRs), HCMA method takes advantage of scratchpad memory in FPGAs or SoCs to circumvent the limitation of MSHR entries. The concurrency of requests is only limited by the capacity of scratchpad memory. Moreover, to avoid the higher latency when searching more entries, we design an efficient coordinating mechanism based on circular queues.We evaluate the performance of HCMA method on an MP-SoC FPGA platform. Compared to conventional methods based on MSHRs, HCMA method supports ten times of concurrent memory accesses (from 10 to 128 entries on our evaluation platform). HCMA method achieves up to 2.72× memory bandwidth utilization for applications that access memory with massive fine-grained random requests, and to 3.46× memory bandwidth utilization for stream-based memory accesses. For real applications like CG, our method improves speedup performance by 29.87%.","PeriodicalId":230796,"journal":{"name":"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HCMA: Supporting High Concurrency of Memory Accesses with Scratchpad Memory in FPGAs\",\"authors\":\"Yangyang Zhao, Yuhang Liu, Wei Li, Mingyu Chen\",\"doi\":\"10.1109/NAS.2019.8834726\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently many researches focus on new methods of accelerating memory accesses between memory controller and memory modules. However, the absence of an accelerator for memory accesses between CPU and memory controller wastes the performance benefits of new methods. Therefore, we propose a coordinated batch method to support high concurrency of memory accesses (HCMA). Compared to the conventional method of holding outstanding memory access requests in miss status handling registers (MSHRs), HCMA method takes advantage of scratchpad memory in FPGAs or SoCs to circumvent the limitation of MSHR entries. The concurrency of requests is only limited by the capacity of scratchpad memory. Moreover, to avoid the higher latency when searching more entries, we design an efficient coordinating mechanism based on circular queues.We evaluate the performance of HCMA method on an MP-SoC FPGA platform. Compared to conventional methods based on MSHRs, HCMA method supports ten times of concurrent memory accesses (from 10 to 128 entries on our evaluation platform). HCMA method achieves up to 2.72× memory bandwidth utilization for applications that access memory with massive fine-grained random requests, and to 3.46× memory bandwidth utilization for stream-based memory accesses. For real applications like CG, our method improves speedup performance by 29.87%.\",\"PeriodicalId\":230796,\"journal\":{\"name\":\"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NAS.2019.8834726\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAS.2019.8834726","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

目前许多研究都集中在加速存储器控制器和存储器模块之间存储器访问的新方法上。但是，由于CPU和内存控制器之间没有内存访问加速器，因此浪费了新方法的性能优势。因此，我们提出了一种协调批处理方法来支持内存访问的高并发性。与在miss状态处理寄存器(MSHR)中保存未完成的内存访问请求的传统方法相比，HCMA方法利用fpga或soc中的刮板存储器来规避MSHR条目的限制。请求的并发性仅受临时存储器容量的限制。此外，为了避免在搜索更多条目时产生更高的延迟，我们设计了一种基于循环队列的高效协调机制。我们在MP-SoC FPGA平台上评估了HCMA方法的性能。与基于MSHRs的传统方法相比，HCMA方法支持10倍的并发内存访问(在我们的评估平台上从10到128个条目)。对于具有大量细粒度随机请求访问内存的应用程序，HCMA方法的内存带宽利用率最高可达2.72倍，对于基于流的内存访问，HCMA方法的内存带宽利用率最高可达3.46倍。对于像CG这样的实际应用，我们的方法将加速性能提高了29.87%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

HCMA: Supporting High Concurrency of Memory Accesses with Scratchpad Memory in FPGAs

Currently many researches focus on new methods of accelerating memory accesses between memory controller and memory modules. However, the absence of an accelerator for memory accesses between CPU and memory controller wastes the performance benefits of new methods. Therefore, we propose a coordinated batch method to support high concurrency of memory accesses (HCMA). Compared to the conventional method of holding outstanding memory access requests in miss status handling registers (MSHRs), HCMA method takes advantage of scratchpad memory in FPGAs or SoCs to circumvent the limitation of MSHR entries. The concurrency of requests is only limited by the capacity of scratchpad memory. Moreover, to avoid the higher latency when searching more entries, we design an efficient coordinating mechanism based on circular queues.We evaluate the performance of HCMA method on an MP-SoC FPGA platform. Compared to conventional methods based on MSHRs, HCMA method supports ten times of concurrent memory accesses (from 10 to 128 entries on our evaluation platform). HCMA method achieves up to 2.72× memory bandwidth utilization for applications that access memory with massive fine-grained random requests, and to 3.46× memory bandwidth utilization for stream-based memory accesses. For real applications like CG, our method improves speedup performance by 29.87%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Conference on Networking, Architecture and Storage (NAS)

自引率

0.00%

发文量

期刊最新文献

NAS 2019 Program Optimizing Tail Latency of LDPC based Flash Memory Storage Systems Via Smart Refresh HCMonitor: An Accurate Measurement System for High Concurrent Network Services Learning Workflow Scheduling on Multi-Resource Clusters An Adaptive SSD Cache Architecture Simultaneously Using Multiple Caches