{"title":"A hardware accelerator IP for EBCOT tier-1 coding in JPEG2000 standard","authors":"Tien-Wei Hsieh, Y. Lin","doi":"10.1109/ESTMED.2004.1359713","DOIUrl":null,"url":null,"abstract":"We propose a hardware accelerator IP for the Tier-1 portion of Embedded Block Coding with Optimal Truncation (EBCOT) used in the JPEG2000 next generation image compression standard. EBCOT Tier-1 accounts for more than 70% of encoding time due to extensive bit-level processing. Our architecture consists of a 16-way parallel context formation module and a 3-stage pipelined arithmetic encoder. We reduce power consumption by properly shutting down parts of the circuit. Compared with the known best design, we reduce 17% of the cycle count and reach a level within 5% of the theoretical lower bound. We have implemented the design in synthesizable Verilog RTL with an AMBA-AHB interface for SOC design. FPGA prototyping has been successfully demonstrated and substantial speedup achieved.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESTMED.2004.1359713","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
We propose a hardware accelerator IP for the Tier-1 portion of Embedded Block Coding with Optimal Truncation (EBCOT) used in the JPEG2000 next generation image compression standard. EBCOT Tier-1 accounts for more than 70% of encoding time due to extensive bit-level processing. Our architecture consists of a 16-way parallel context formation module and a 3-stage pipelined arithmetic encoder. We reduce power consumption by properly shutting down parts of the circuit. Compared with the known best design, we reduce 17% of the cycle count and reach a level within 5% of the theoretical lower bound. We have implemented the design in synthesizable Verilog RTL with an AMBA-AHB interface for SOC design. FPGA prototyping has been successfully demonstrated and substantial speedup achieved.