D. Cheresiz, B. Juurlink, S. Vassiliadis, H. Wijshoff
{"title":"Implementation of a streaming execution unit","authors":"D. Cheresiz, B. Juurlink, S. Vassiliadis, H. Wijshoff","doi":"10.1109/DSD.2002.1115364","DOIUrl":null,"url":null,"abstract":"The Complex Streamed Instruction (CSI) set is an ISA extension targeted at multimedia applications. CSI instructions process two-dimensional data streams stored in memory, performing sectioning, data alignment and conversion between different packed data types all in hardware. It has been shown previously that CSI provides significant speedups compared to current media ISA extensions such as MMX and VIS. This paper presents a detailed design of a unit that can execute CSI instructions under the assumption that the unit is interfaced with the L1 data cache. In particular it is shown that the complex, two-dimensional, address-generation calculations can be performed in a pipelined fashion and implemented using a three-stage pipeline with acceptable delay and hardware cost.","PeriodicalId":330609,"journal":{"name":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2002.1115364","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
The Complex Streamed Instruction (CSI) set is an ISA extension targeted at multimedia applications. CSI instructions process two-dimensional data streams stored in memory, performing sectioning, data alignment and conversion between different packed data types all in hardware. It has been shown previously that CSI provides significant speedups compared to current media ISA extensions such as MMX and VIS. This paper presents a detailed design of a unit that can execute CSI instructions under the assumption that the unit is interfaced with the L1 data cache. In particular it is shown that the complex, two-dimensional, address-generation calculations can be performed in a pipelined fashion and implemented using a three-stage pipeline with acceptable delay and hardware cost.
Maria João Forjaz , Carmen Rodriguez-Blazquez , Alba Ayala , Vicente Rodriguez-Rodriguez , Jesús de Pedro-Cuesta , Susana Garcia-Gutierrez , Alexandra Prados-Torres