Pub Date : 2002-09-04DOI: 10.1109/DSD.2002.1115364
D. Cheresiz, B. Juurlink, S. Vassiliadis, H. Wijshoff
The Complex Streamed Instruction (CSI) set is an ISA extension targeted at multimedia applications. CSI instructions process two-dimensional data streams stored in memory, performing sectioning, data alignment and conversion between different packed data types all in hardware. It has been shown previously that CSI provides significant speedups compared to current media ISA extensions such as MMX and VIS. This paper presents a detailed design of a unit that can execute CSI instructions under the assumption that the unit is interfaced with the L1 data cache. In particular it is shown that the complex, two-dimensional, address-generation calculations can be performed in a pipelined fashion and implemented using a three-stage pipeline with acceptable delay and hardware cost.
{"title":"Implementation of a streaming execution unit","authors":"D. Cheresiz, B. Juurlink, S. Vassiliadis, H. Wijshoff","doi":"10.1109/DSD.2002.1115364","DOIUrl":"https://doi.org/10.1109/DSD.2002.1115364","url":null,"abstract":"The Complex Streamed Instruction (CSI) set is an ISA extension targeted at multimedia applications. CSI instructions process two-dimensional data streams stored in memory, performing sectioning, data alignment and conversion between different packed data types all in hardware. It has been shown previously that CSI provides significant speedups compared to current media ISA extensions such as MMX and VIS. This paper presents a detailed design of a unit that can execute CSI instructions under the assumption that the unit is interfaced with the L1 data cache. In particular it is shown that the complex, two-dimensional, address-generation calculations can be performed in a pipelined fashion and implemented using a three-stage pipeline with acceptable delay and hardware cost.","PeriodicalId":330609,"journal":{"name":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132367089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-04DOI: 10.1109/DSD.2002.1115372
Daniel Piso Fernandez, José-Alejandro Piñeiro, J. Bruguera
An analysis of the impact of different methods for the double-precision computation of division and square root in the performance of a superscalar processor is presented in this paper. This analysis is carried out combining the SimpleScalar toolset, estimates of the latency and throughput of the compared methods and a set of benchmarks with typical features of intensive computing applications. Simulation results show the importance of having an efficient unit for the computation of these operations, since changes in the density of division and square root below 1% lead to changes in the performance around a 20%.
{"title":"Analysis of the impact of different methods for division/square root computation in the performance of a superscalar microprocessor","authors":"Daniel Piso Fernandez, José-Alejandro Piñeiro, J. Bruguera","doi":"10.1109/DSD.2002.1115372","DOIUrl":"https://doi.org/10.1109/DSD.2002.1115372","url":null,"abstract":"An analysis of the impact of different methods for the double-precision computation of division and square root in the performance of a superscalar processor is presented in this paper. This analysis is carried out combining the SimpleScalar toolset, estimates of the latency and throughput of the compared methods and a set of benchmarks with typical features of intensive computing applications. Simulation results show the importance of having an efficient unit for the computation of these operations, since changes in the density of division and square root below 1% lead to changes in the performance around a 20%.","PeriodicalId":330609,"journal":{"name":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127172368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-04DOI: 10.1109/DSD.2002.1115356
M. Boden, Jörg Schneider, K. Feske, Steffen Rülke
This paper addresses design methods for SoC-based HW/SW systems using reconfigurable architectures. The emphasis is the development of a method to enhance the reusability of HW and SW in the co-design process using proven languages like ANSI-C and VHDL. We distinguish between three abstraction layers for design modules consisting of both HW and SW This approach benefits the reuse of HW sources as well as SW sources for different applications as well as on different devices. We utilize the reconfigurable SoC Atmel FPSLIC for experimental tests and obtain a significant reuse ratio.
{"title":"Enhanced reusability for SoC-based HW/SW co-design","authors":"M. Boden, Jörg Schneider, K. Feske, Steffen Rülke","doi":"10.1109/DSD.2002.1115356","DOIUrl":"https://doi.org/10.1109/DSD.2002.1115356","url":null,"abstract":"This paper addresses design methods for SoC-based HW/SW systems using reconfigurable architectures. The emphasis is the development of a method to enhance the reusability of HW and SW in the co-design process using proven languages like ANSI-C and VHDL. We distinguish between three abstraction layers for design modules consisting of both HW and SW This approach benefits the reuse of HW sources as well as SW sources for different applications as well as on different devices. We utilize the reconfigurable SoC Atmel FPSLIC for experimental tests and obtain a significant reuse ratio.","PeriodicalId":330609,"journal":{"name":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129614107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}