{"title":"Implementation and trade-offs of a DCT architecture using high-level synthesis","authors":"E. Torbey, J. Knight","doi":"10.1109/ASIC.1998.722897","DOIUrl":null,"url":null,"abstract":"This paper presents architectural trade-offs of a time-shared implementation of a modified fast discrete cosine transform algorithm using a high-level synthesis tool. The architecture presented here allows time-sharing of operators in different stages. The overhead in control and multiplexing is minimal. A full implementation of an 8/spl times/8 2-D DCT outperforms the original pipelined architecture and a hand-crafted time-shared architecture by reducing the required area by up to 50%. It also improves the latency by up to 70%. It achieves these improvements maintaining the throughput for a 5% decrease in the required critical path timing. The complexity of the 2-D DCT used is higher than the traditional benchmarks for high-level synthesis. This paper shows the effectiveness of the synthesis tool used for large, practical algorithms.","PeriodicalId":104431,"journal":{"name":"Proceedings Eleventh Annual IEEE International ASIC Conference (Cat. No.98TH8372)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Eleventh Annual IEEE International ASIC Conference (Cat. No.98TH8372)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASIC.1998.722897","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper presents architectural trade-offs of a time-shared implementation of a modified fast discrete cosine transform algorithm using a high-level synthesis tool. The architecture presented here allows time-sharing of operators in different stages. The overhead in control and multiplexing is minimal. A full implementation of an 8/spl times/8 2-D DCT outperforms the original pipelined architecture and a hand-crafted time-shared architecture by reducing the required area by up to 50%. It also improves the latency by up to 70%. It achieves these improvements maintaining the throughput for a 5% decrease in the required critical path timing. The complexity of the 2-D DCT used is higher than the traditional benchmarks for high-level synthesis. This paper shows the effectiveness of the synthesis tool used for large, practical algorithms.