Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957356
'. StephawMietens, P. H. N. de, Christian Hentsche
The applicability of MPEG video coding can be improved by scaling the algorithmic complexity and resource usage to the desired application and device. This paper presents a new DCT computation technique of which the quality and amount of computations is maximized for a limited number of operations. For halved computing resources, about 2-4 SNR dB improvement was obtained when compared to a diagonally oriented computation of coefficients, matching with the conventional MPEG scanning.
{"title":"New scalable DCT computation for resource-constrained systems","authors":"'. StephawMietens, P. H. N. de, Christian Hentsche","doi":"10.1109/SIPS.2001.957356","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957356","url":null,"abstract":"The applicability of MPEG video coding can be improved by scaling the algorithmic complexity and resource usage to the desired application and device. This paper presents a new DCT computation technique of which the quality and amount of computations is maximized for a limited number of operations. For halved computing resources, about 2-4 SNR dB improvement was obtained when compared to a diagonally oriented computation of coefficients, matching with the conventional MPEG scanning.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121891899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957335
L. Ting, C. Cowan, Roger Francis Woods, P. R. Cork, C. L. Sprigings
The initial frequency pass-band of the LMS filter remains whilst tracking a non-stationary chirped signal. This past memory effect causes unwanted white noise to leak through the initial residual pass-band of the adaptive filter. A leakage term is applied to the LMS algorithm to remove the memory effect of the tracking filter which leads to a reduction in the noise power at the output of the adaptive filter. This reduced noise power is reflected in an improved SNR (signal-to-noise ratio) of a low SNR chirped signal compared to the standard LMS algorithm.
{"title":"Tracking performance of leakage LMS for chirped signals","authors":"L. Ting, C. Cowan, Roger Francis Woods, P. R. Cork, C. L. Sprigings","doi":"10.1109/SIPS.2001.957335","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957335","url":null,"abstract":"The initial frequency pass-band of the LMS filter remains whilst tracking a non-stationary chirped signal. This past memory effect causes unwanted white noise to leak through the initial residual pass-band of the adaptive filter. A leakage term is applied to the LMS algorithm to remove the memory effect of the tracking filter which leads to a reduction in the noise power at the output of the adaptive filter. This reduced noise power is reflected in an improved SNR (signal-to-noise ratio) of a low SNR chirped signal compared to the standard LMS algorithm.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127176776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957352
T. Glokler, H. Meyr
Application specific instruction set processors (ASIPs) are an excellent architecture for mixed control/data-flow oriented tasks with medium to low data rate and high complexity. The main advantage of ASIPs is the higher flexibility due to programmability compared to dedicated hardware. A drawback of this design style is an increase in power consumption. The current case study focuses on an ASIP design methodology considering the classical parameters computational performance and area as well as energy consumption simultaneously. Several ASIP power optimization options have been applied and evaluated: clock-gating, logic netlist restructuring, ISA optimization, instruction memory power reduction, and use of a dedicated coprocessor. These optimizations are demonstrated with the WORE (ISS-core) ASIP for DVB-T acquisition and tracking algorithms. The results reveal a potential of about one order of magnitude in energy savings for these optimizations.
应用特定指令集处理器(Application specific instruction set processor, asip)是一种面向混合控制/数据流任务的优秀架构,具有中低数据速率和高复杂性。与专用硬件相比,api的主要优点是由于可编程性而具有更高的灵活性。这种设计风格的缺点是增加了功耗。当前的案例研究侧重于同时考虑经典参数、计算性能和面积以及能耗的ASIP设计方法。已经应用和评估了几个ASIP功耗优化选项:时钟门控、逻辑网络列表重构、ISA优化、指令存储器功耗降低和使用专用协处理器。这些优化通过用于DVB-T采集和跟踪算法的wear (ISS-core) ASIP进行了演示。结果显示,这些优化可以节省大约一个数量级的能源。
{"title":"Power reduction for ASIPS: a case study","authors":"T. Glokler, H. Meyr","doi":"10.1109/SIPS.2001.957352","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957352","url":null,"abstract":"Application specific instruction set processors (ASIPs) are an excellent architecture for mixed control/data-flow oriented tasks with medium to low data rate and high complexity. The main advantage of ASIPs is the higher flexibility due to programmability compared to dedicated hardware. A drawback of this design style is an increase in power consumption. The current case study focuses on an ASIP design methodology considering the classical parameters computational performance and area as well as energy consumption simultaneously. Several ASIP power optimization options have been applied and evaluated: clock-gating, logic netlist restructuring, ISA optimization, instruction memory power reduction, and use of a dedicated coprocessor. These optimizations are demonstrated with the WORE (ISS-core) ASIP for DVB-T acquisition and tracking algorithms. The results reveal a potential of about one order of magnitude in energy savings for these optimizations.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"92 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120886205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957341
J. Hezavei, N. Vijaykrishnan, M. J. Irwin, M. Kandemir, D. Duarte
An input sensitive table based power estimation technique is proposed. The proposed technique has been applied to different circuits and validated using circuit-level simulation for 0.25 /spl mu/m, 2.5 V CMOS technology. It is observed that the proposed scheme achieves an average error margin of 3.2% as compared to HSPICE, while running 27 times faster.
提出了一种基于输入敏感表的功率估计技术。该技术已应用于不同的电路中,并通过0.25 /spl mu/m, 2.5 V CMOS技术的电路级仿真进行了验证。与HSPICE相比,该方案的平均误差为3.2%,运行速度提高了27倍。
{"title":"Input sensitive high-level power analysis","authors":"J. Hezavei, N. Vijaykrishnan, M. J. Irwin, M. Kandemir, D. Duarte","doi":"10.1109/SIPS.2001.957341","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957341","url":null,"abstract":"An input sensitive table based power estimation technique is proposed. The proposed technique has been applied to different circuits and validated using circuit-level simulation for 0.25 /spl mu/m, 2.5 V CMOS technology. It is observed that the proposed scheme achieves an average error margin of 3.2% as compared to HSPICE, while running 27 times faster.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128727184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957342
W. Gass
Summary form only given. Architecture enhancements to the C6000 architecture have improved performance, reduced code size, lowered power, and increased compiler efficiency. Benchmarks of DSP kernels and typical DSP applications are used to compare commercially available DSP in terms of cycle count, power, and compiler efficiency. The C6000 VLIW family is an 8-issue instruction architecture that has four execution units for each of the two register banks. The C62x, first-generation processor runs at 300 MHz, has 2 multipliers, and dual 32-bit read/write ports. The 64x, second-generation processor extends the performance by increasing the speed to 600 MHz, adding 2 more multipliers and increasing the load/store width to 64-bits. In addition, the 64x adds SIMD operations to support packed data operations. The 62x is an excellent compiler target due to deterministic order and time of instruction execution, a general purpose 32-word register file, simple independent instructions, and no special modes or status bits. The 64x has improved the compiler efficiency by increasing the register file to 64 words, increasing the number of common instructions that will execute on each unit, and providing for non-aligned loads of packed data. The 64x reduces code size by decreasing the number of NOP with non-aligned program memory fetches and by adding complex instructions that combine several RISC instructions into one 32-bit opcode. The 64x reduces power by adding a 2-level on-chip cache, thereby enabling most of the memory accesses to hit the smaller first level cache. In addition, a reduction in code size decreases the number of first-level instruction fetches and the larger register file decreases the number of data memory accesses. The second-generation processor has been optimized for image, graphics, and telecommunication applications. For 2D algorithms such as 30 correlation, median filtering, motion estimation and polyphase filter, the cycle count improvements for the kernels range from 2.3x to 7.6x. For communication algorithms such as Reed Solomon decoding, Viterbi decoding and FFT, the cycle count improvements of the kernels range from 2.1 x to 3.5x.
{"title":"Higher performance and lower power enhancements to VLIW architectures","authors":"W. Gass","doi":"10.1109/SIPS.2001.957342","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957342","url":null,"abstract":"Summary form only given. Architecture enhancements to the C6000 architecture have improved performance, reduced code size, lowered power, and increased compiler efficiency. Benchmarks of DSP kernels and typical DSP applications are used to compare commercially available DSP in terms of cycle count, power, and compiler efficiency. The C6000 VLIW family is an 8-issue instruction architecture that has four execution units for each of the two register banks. The C62x, first-generation processor runs at 300 MHz, has 2 multipliers, and dual 32-bit read/write ports. The 64x, second-generation processor extends the performance by increasing the speed to 600 MHz, adding 2 more multipliers and increasing the load/store width to 64-bits. In addition, the 64x adds SIMD operations to support packed data operations. The 62x is an excellent compiler target due to deterministic order and time of instruction execution, a general purpose 32-word register file, simple independent instructions, and no special modes or status bits. The 64x has improved the compiler efficiency by increasing the register file to 64 words, increasing the number of common instructions that will execute on each unit, and providing for non-aligned loads of packed data. The 64x reduces code size by decreasing the number of NOP with non-aligned program memory fetches and by adding complex instructions that combine several RISC instructions into one 32-bit opcode. The 64x reduces power by adding a 2-level on-chip cache, thereby enabling most of the memory accesses to hit the smaller first level cache. In addition, a reduction in code size decreases the number of first-level instruction fetches and the larger register file decreases the number of data memory accesses. The second-generation processor has been optimized for image, graphics, and telecommunication applications. For 2D algorithms such as 30 correlation, median filtering, motion estimation and polyphase filter, the cycle count improvements for the kernels range from 2.3x to 7.6x. For communication algorithms such as Reed Solomon decoding, Viterbi decoding and FFT, the cycle count improvements of the kernels range from 2.1 x to 3.5x.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129890453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957332
J. Ramírez, P. G. Fernández, U. Meyer-Base, F. Taylor, A. García, A. Lloris
The design of high-performance, high-precision, real-time digital signal processing (DSP) systems, such as those associated with wavelet signal processing, is a challenging problem. This paper reports on the innovative use of the residue number system (RNS) for implementing high-end wavelet filter banks. The disclosed system uses an enhanced index-transformation defined over Galois fields to efficiently support different wavelet filter instantiations without adding any extra cost or additional lookup tables (LUT). An exhaustive comparison against existing two's complement (2C) designs for different custom IC technologies was carried out. These structures have been demonstrated to be well suited for field programmable logic (FPL) assimilation as well as for CBIC (cell-based integrated circuit) technologies.
{"title":"Index-based RNS DWT architectures for custom IC designs","authors":"J. Ramírez, P. G. Fernández, U. Meyer-Base, F. Taylor, A. García, A. Lloris","doi":"10.1109/SIPS.2001.957332","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957332","url":null,"abstract":"The design of high-performance, high-precision, real-time digital signal processing (DSP) systems, such as those associated with wavelet signal processing, is a challenging problem. This paper reports on the innovative use of the residue number system (RNS) for implementing high-end wavelet filter banks. The disclosed system uses an enhanced index-transformation defined over Galois fields to efficiently support different wavelet filter instantiations without adding any extra cost or additional lookup tables (LUT). An exhaustive comparison against existing two's complement (2C) designs for different custom IC technologies was carried out. These structures have been demonstrated to be well suited for field programmable logic (FPL) assimilation as well as for CBIC (cell-based integrated circuit) technologies.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116459391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957368
V. Lappalainen, A. Hallapuro, T. Hamalainen
Two optimized implementations of the emerging ITU-T H.26L video encoder are described. The first, medium-optimized version, is implemented in C and the latter, highly optimized version, utilizes MMX assembly instructions. Comparisons to a correspondingly optimized H.263/H.263+ implementation are given with the spatial and temporal video quality fixed and the bit rate and complexity varied. On a 733 Pentium III processor, a real-time encoding speed of 10 fps for QCIF (quarter common intermediate format) sequences is achieved with a 29% reduction in bit rate compared to H.263+. The complexity of H.26L is about 3.4 times more than that of H.263+.
{"title":"Optimization of emerging H.26L video encoder","authors":"V. Lappalainen, A. Hallapuro, T. Hamalainen","doi":"10.1109/SIPS.2001.957368","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957368","url":null,"abstract":"Two optimized implementations of the emerging ITU-T H.26L video encoder are described. The first, medium-optimized version, is implemented in C and the latter, highly optimized version, utilizes MMX assembly instructions. Comparisons to a correspondingly optimized H.263/H.263+ implementation are given with the spatial and temporal video quality fixed and the bit rate and complexity varied. On a 733 Pentium III processor, a real-time encoding speed of 10 fps for QCIF (quarter common intermediate format) sequences is achieved with a 29% reduction in bit rate compared to H.263+. The complexity of H.26L is about 3.4 times more than that of H.263+.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132420906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957359
Y. Hwang, Nan-Jung Liu, Ming-Chang Tsai
This paper presents a high quality audio codec design based on a transform-domain weighted interleave vector quantization (Twin-VQ) scheme adopted in the MPEG-4 audio standard. Three novel techniques are employed in this scheme to compress the data, ie, (1) flattening of MDCT coefficients by the spectrum of linear predictive coding (LPC) coefficients; (2) further flattening of MDCT coefficients by the Bark envelope; and (3) weighted interleave vector quantization. This paper examines the related design issues in implementing an efficient Twin-VQ codec. Fast computation algorithms are derived for the computationally intensive modules. Design parameters of each module are determined and the codebooks for weighted interleave vector quantization are constructed. Experimental results show that the designed codec can compress natural audio efficiently and reproduce high quality outputs.
{"title":"An MPEG-4 Twin-VQ based high quality audio codec design","authors":"Y. Hwang, Nan-Jung Liu, Ming-Chang Tsai","doi":"10.1109/SIPS.2001.957359","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957359","url":null,"abstract":"This paper presents a high quality audio codec design based on a transform-domain weighted interleave vector quantization (Twin-VQ) scheme adopted in the MPEG-4 audio standard. Three novel techniques are employed in this scheme to compress the data, ie, (1) flattening of MDCT coefficients by the spectrum of linear predictive coding (LPC) coefficients; (2) further flattening of MDCT coefficients by the Bark envelope; and (3) weighted interleave vector quantization. This paper examines the related design issues in implementing an efficient Twin-VQ codec. Fast computation algorithms are derived for the computationally intensive modules. Design parameters of each module are determined and the codebooks for weighted interleave vector quantization are constructed. Experimental results show that the designed codec can compress natural audio efficiently and reproduce high quality outputs.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126741517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957338
Jiyang Kang, Wonyong Sung
A new instruction caching scheme that utilizes the block priority information is proposed mainly targeted for embedded multimedia processors. The block priority information is obtained by profiling application programs. The goal of this caching scheme is to keep more important code blocks longer using the block priority information, which programmers provide by analyzing the profiling results of multimedia applications. In addition to a new caching scheme, algorithms for determining the priority of each code block statically are also developed and their performances are evaluated using an H.263 video encoder. The experimental results show that the cache miss ratio can be reduced up to nearly a half of that of the normal least recently used (LRU) replacement scheme although the improvement depends on the cache size. The effects of varying cache size, associativity, and line size on the performance of proposed prioritization methods are also investigated. Moreover, the performance gain that can be achieved by employing more than two priority levels is also discussed.
{"title":"A multi-level block priority based instruction caching scheme for multimedia processors","authors":"Jiyang Kang, Wonyong Sung","doi":"10.1109/SIPS.2001.957338","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957338","url":null,"abstract":"A new instruction caching scheme that utilizes the block priority information is proposed mainly targeted for embedded multimedia processors. The block priority information is obtained by profiling application programs. The goal of this caching scheme is to keep more important code blocks longer using the block priority information, which programmers provide by analyzing the profiling results of multimedia applications. In addition to a new caching scheme, algorithms for determining the priority of each code block statically are also developed and their performances are evaluated using an H.263 video encoder. The experimental results show that the cache miss ratio can be reduced up to nearly a half of that of the normal least recently used (LRU) replacement scheme although the improvement depends on the cache size. The effects of varying cache size, associativity, and line size on the performance of proposed prioritization methods are also investigated. Moreover, the performance gain that can be achieved by employing more than two priority levels is also discussed.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114516596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957363
MGre McLoone, J. McCanny
An FPGA Rijndael encryption design is presented, which utilizes look-up tables to implement the entire Rijndael Round function. A comparison is provided between this design and similar existing implementations. Hardware implementations of encryption algorithms prove much faster than equivalent software implementations and since there is a need to perform encryption on data in real time, speed is very important. In particular, field programmable gate arrays (FPGAs) are well suited to encryption implementations due to their flexibility and an architecture, which can be exploited to accommodate typical encryption transformations. A look-up table based Rijndael design achieves a speed of 12 Gbits/sec, which is a factor 1.2 times faster than an alternative design in which look-up tables are utilized to implement only one of the Round function transformations, and 6 times faster than other previous implementations.
{"title":"Rijndael FPGA implementation utilizing look-up tables","authors":"MGre McLoone, J. McCanny","doi":"10.1109/SIPS.2001.957363","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957363","url":null,"abstract":"An FPGA Rijndael encryption design is presented, which utilizes look-up tables to implement the entire Rijndael Round function. A comparison is provided between this design and similar existing implementations. Hardware implementations of encryption algorithms prove much faster than equivalent software implementations and since there is a need to perform encryption on data in real time, speed is very important. In particular, field programmable gate arrays (FPGAs) are well suited to encryption implementations due to their flexibility and an architecture, which can be exploited to accommodate typical encryption transformations. A look-up table based Rijndael design achieves a speed of 12 Gbits/sec, which is a factor 1.2 times faster than an alternative design in which look-up tables are utilized to implement only one of the Round function transformations, and 6 times faster than other previous implementations.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127451824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}