关于预取和重用在减少L1数据缓存流量方面的有效性:Snort的案例研究

Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI:10.1145/1054943.1054955

G. Surendra, Subhasish Banerjee, S. Nandy

{"title":"关于预取和重用在减少L1数据缓存流量方面的有效性:Snort的案例研究","authors":"G. Surendra, Subhasish Banerjee, S. Nandy","doi":"10.1145/1054943.1054955","DOIUrl":null,"url":null,"abstract":"Reducing the number of data cache accesses improves performance, port efficiency, bandwidth and motivates the use of single ported caches instead of complex and expensive multi-ported ones. In this paper we consider an intrusion detection system as a target application and study the effectiveness of two techniques - (i) prefetching data from the cache into local buffers in the processor core and (ii) load Instruction Reuse (IR) - in reducing data cache traffic. The analysis is carried out using a microarchitecture and instruction set representative of a programmable processor with the aim of determining if the above techniques are viable for a programmable pattern matching engine found in many network processors. We find that IR is the most generic and efficient technique which reduces cache traffic by up to 60%. However, a combination of prefetching and IR with application specific tuning performs as well as and sometimes better than IR alone.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"201202 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"On the effectiveness of prefetching and reuse in reducing L1 data cache traffic: a case study of Snort\",\"authors\":\"G. Surendra, Subhasish Banerjee, S. Nandy\",\"doi\":\"10.1145/1054943.1054955\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reducing the number of data cache accesses improves performance, port efficiency, bandwidth and motivates the use of single ported caches instead of complex and expensive multi-ported ones. In this paper we consider an intrusion detection system as a target application and study the effectiveness of two techniques - (i) prefetching data from the cache into local buffers in the processor core and (ii) load Instruction Reuse (IR) - in reducing data cache traffic. The analysis is carried out using a microarchitecture and instruction set representative of a programmable processor with the aim of determining if the above techniques are viable for a programmable pattern matching engine found in many network processors. We find that IR is the most generic and efficient technique which reduces cache traffic by up to 60%. However, a combination of prefetching and IR with application specific tuning performs as well as and sometimes better than IR alone.\",\"PeriodicalId\":249099,\"journal\":{\"name\":\"Workshop on Memory Performance Issues\",\"volume\":\"201202 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Memory Performance Issues\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1054943.1054955\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Memory Performance Issues","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1054943.1054955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

减少数据缓存访问的数量可以提高性能、端口效率和带宽，并鼓励使用单端口缓存，而不是复杂且昂贵的多端口缓存。本文以入侵检测系统为目标应用，研究了两种技术(1)从缓存中预取数据到处理器核心的本地缓冲区和(2)加载指令重用(IR))在减少数据缓存流量方面的有效性。分析是使用微架构和可编程处理器的指令集进行的，目的是确定上述技术是否适用于许多网络处理器中的可编程模式匹配引擎。我们发现IR是最通用和有效的技术，它可以减少高达60%的缓存流量。但是，将预取和IR与特定于应用程序的调优相结合，其性能与单独使用IR一样好，有时甚至更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

On the effectiveness of prefetching and reuse in reducing L1 data cache traffic: a case study of Snort

Reducing the number of data cache accesses improves performance, port efficiency, bandwidth and motivates the use of single ported caches instead of complex and expensive multi-ported ones. In this paper we consider an intrusion detection system as a target application and study the effectiveness of two techniques - (i) prefetching data from the cache into local buffers in the processor core and (ii) load Instruction Reuse (IR) - in reducing data cache traffic. The analysis is carried out using a microarchitecture and instruction set representative of a programmable processor with the aim of determining if the above techniques are viable for a programmable pattern matching engine found in many network processors. We find that IR is the most generic and efficient technique which reduces cache traffic by up to 60%. However, a combination of prefetching and IR with application specific tuning performs as well as and sometimes better than IR alone.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Memory Performance Issues

自引率

0.00%

发文量

期刊最新文献

Compiler-optimized usage of partitioned memories A case for multi-level main memory On the effectiveness of prefetching and reuse in reducing L1 data cache traffic: a case study of Snort SCIMA-SMP: on-chip memory processor architecture for SMP Evaluating kilo-instruction multiprocessors