{"title":"关于预取和重用在减少L1数据缓存流量方面的有效性:Snort的案例研究","authors":"G. Surendra, Subhasish Banerjee, S. Nandy","doi":"10.1145/1054943.1054955","DOIUrl":null,"url":null,"abstract":"Reducing the number of data cache accesses improves performance, port efficiency, bandwidth and motivates the use of single ported caches instead of complex and expensive multi-ported ones. In this paper we consider an intrusion detection system as a target application and study the effectiveness of two techniques - (i) prefetching data from the cache into local buffers in the processor core and (ii) load Instruction Reuse (IR) - in reducing data cache traffic. The analysis is carried out using a microarchitecture and instruction set representative of a programmable processor with the aim of determining if the above techniques are viable for a programmable pattern matching engine found in many network processors. We find that IR is the most generic and efficient technique which reduces cache traffic by up to 60%. However, a combination of prefetching and IR with application specific tuning performs as well as and sometimes better than IR alone.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"201202 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"On the effectiveness of prefetching and reuse in reducing L1 data cache traffic: a case study of Snort\",\"authors\":\"G. Surendra, Subhasish Banerjee, S. Nandy\",\"doi\":\"10.1145/1054943.1054955\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reducing the number of data cache accesses improves performance, port efficiency, bandwidth and motivates the use of single ported caches instead of complex and expensive multi-ported ones. In this paper we consider an intrusion detection system as a target application and study the effectiveness of two techniques - (i) prefetching data from the cache into local buffers in the processor core and (ii) load Instruction Reuse (IR) - in reducing data cache traffic. The analysis is carried out using a microarchitecture and instruction set representative of a programmable processor with the aim of determining if the above techniques are viable for a programmable pattern matching engine found in many network processors. We find that IR is the most generic and efficient technique which reduces cache traffic by up to 60%. However, a combination of prefetching and IR with application specific tuning performs as well as and sometimes better than IR alone.\",\"PeriodicalId\":249099,\"journal\":{\"name\":\"Workshop on Memory Performance Issues\",\"volume\":\"201202 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Memory Performance Issues\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1054943.1054955\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Memory Performance Issues","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1054943.1054955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the effectiveness of prefetching and reuse in reducing L1 data cache traffic: a case study of Snort
Reducing the number of data cache accesses improves performance, port efficiency, bandwidth and motivates the use of single ported caches instead of complex and expensive multi-ported ones. In this paper we consider an intrusion detection system as a target application and study the effectiveness of two techniques - (i) prefetching data from the cache into local buffers in the processor core and (ii) load Instruction Reuse (IR) - in reducing data cache traffic. The analysis is carried out using a microarchitecture and instruction set representative of a programmable processor with the aim of determining if the above techniques are viable for a programmable pattern matching engine found in many network processors. We find that IR is the most generic and efficient technique which reduces cache traffic by up to 60%. However, a combination of prefetching and IR with application specific tuning performs as well as and sometimes better than IR alone.