Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957334
L. Imbert, G. Jullien
We investigate efficiencies that may be introduced into the fault-tolerant modulus replication RNS (MRRNS) system by restricting the data sample polynomials to be even. We refer to this as the symmetrical MRRNS (SMRRNS) technique.
{"title":"Efficient fault-tolerant arithmetic using a symmetrical modulus replication RNS","authors":"L. Imbert, G. Jullien","doi":"10.1109/SIPS.2001.957334","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957334","url":null,"abstract":"We investigate efficiencies that may be introduced into the fault-tolerant modulus replication RNS (MRRNS) system by restricting the data sample polynomials to be even. We refer to this as the symmetrical MRRNS (SMRRNS) technique.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116527361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957353
Chuxiang Li, Jianhua Lu, Jun Gu, Ming L. Liou
An error-resilient scheme incorporated with the MPEG-2 standard is developed to support robust video transmission in digital terrestrial TV broadcasting (DTTB) systems. In particular, a novel concealment algorithm based on temporal error concealment and block-matching methodology is proposed. This algorithm achieves an effective concealment while keeping low computational complexity with small size of search-window for block-matching. Likewise, an effective reception for isolate I-pictures is developed. Moreover, combining with an efficient detection of spatial/temporal activity, an adaptive error concealment scheme is further contrived. Extensive simulations have confirmed that the proposed error-resilient schemes may achieve efficient and robust video transmission in a DTTB system even with a very high packet error rate.
{"title":"Error resilience schemes for digital terrestrial TV broadcasting system","authors":"Chuxiang Li, Jianhua Lu, Jun Gu, Ming L. Liou","doi":"10.1109/SIPS.2001.957353","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957353","url":null,"abstract":"An error-resilient scheme incorporated with the MPEG-2 standard is developed to support robust video transmission in digital terrestrial TV broadcasting (DTTB) systems. In particular, a novel concealment algorithm based on temporal error concealment and block-matching methodology is proposed. This algorithm achieves an effective concealment while keeping low computational complexity with small size of search-window for block-matching. Likewise, an effective reception for isolate I-pictures is developed. Moreover, combining with an efficient detection of spatial/temporal activity, an adaptive error concealment scheme is further contrived. Extensive simulations have confirmed that the proposed error-resilient schemes may achieve efficient and robust video transmission in a DTTB system even with a very high packet error rate.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117088242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957325
Ching-wen Wang, Yun-Nan Chang
A novel design of Viterbi (1965) decoder based on in-place state metric update and hybrid survivor path management is presented. For those Viterbi applications with large constraint length, the proposed design methodology can result in high-speed and modular architectures by exploiting the in-place computation feature of the Viterbi algorithm. This feature is not only applied to the design of highly regular add-compare-select (ACS) units, but also exploited in the design of trace-back units for the first time. The proposed hybrid survivor path management based on the combination of register-exchange and trace-back schemes cannot only reduce the number of memory operations, but also the size of memory required. Compared with the general hybrid trace-back structure, the overhead of the register-exchange circuit in our architecture is significantly less. Therefore, the proposed architecture can find promising applications in digital communication systems where high-speed large state Viterbi decoders are desirable.
{"title":"Design of Viterbi decoders with in-place state metric update and hybrid traceback processing","authors":"Ching-wen Wang, Yun-Nan Chang","doi":"10.1109/SIPS.2001.957325","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957325","url":null,"abstract":"A novel design of Viterbi (1965) decoder based on in-place state metric update and hybrid survivor path management is presented. For those Viterbi applications with large constraint length, the proposed design methodology can result in high-speed and modular architectures by exploiting the in-place computation feature of the Viterbi algorithm. This feature is not only applied to the design of highly regular add-compare-select (ACS) units, but also exploited in the design of trace-back units for the first time. The proposed hybrid survivor path management based on the combination of register-exchange and trace-back schemes cannot only reduce the number of memory operations, but also the size of memory required. Compared with the general hybrid trace-back structure, the overhead of the register-exchange circuit in our architecture is significantly less. Therefore, the proposed architecture can find promising applications in digital communication systems where high-speed large state Viterbi decoders are desirable.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115479684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957363
MGre McLoone, J. McCanny
An FPGA Rijndael encryption design is presented, which utilizes look-up tables to implement the entire Rijndael Round function. A comparison is provided between this design and similar existing implementations. Hardware implementations of encryption algorithms prove much faster than equivalent software implementations and since there is a need to perform encryption on data in real time, speed is very important. In particular, field programmable gate arrays (FPGAs) are well suited to encryption implementations due to their flexibility and an architecture, which can be exploited to accommodate typical encryption transformations. A look-up table based Rijndael design achieves a speed of 12 Gbits/sec, which is a factor 1.2 times faster than an alternative design in which look-up tables are utilized to implement only one of the Round function transformations, and 6 times faster than other previous implementations.
{"title":"Rijndael FPGA implementation utilizing look-up tables","authors":"MGre McLoone, J. McCanny","doi":"10.1109/SIPS.2001.957363","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957363","url":null,"abstract":"An FPGA Rijndael encryption design is presented, which utilizes look-up tables to implement the entire Rijndael Round function. A comparison is provided between this design and similar existing implementations. Hardware implementations of encryption algorithms prove much faster than equivalent software implementations and since there is a need to perform encryption on data in real time, speed is very important. In particular, field programmable gate arrays (FPGAs) are well suited to encryption implementations due to their flexibility and an architecture, which can be exploited to accommodate typical encryption transformations. A look-up table based Rijndael design achieves a speed of 12 Gbits/sec, which is a factor 1.2 times faster than an alternative design in which look-up tables are utilized to implement only one of the Round function transformations, and 6 times faster than other previous implementations.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127451824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957339
Yun-Nan Chang, Yan-Sheng Li
This paper presents a design methodology for the implementation of high-performance 2-D discrete wavelet transform (DWT) and 2-D inverse DWT (IDWT). By exploiting the multi-rate feature inherent in the algorithms, an effective schedule that interleaves all the row-wise and column-wise computations of different octaves onto three fundamental convolutional filters is proposed. Based on this computation schedule, very high efficient architectures can be synthesized. The resulting architectures cannot only achieve fast computation time at less silicon cost due to nearly full hardware utilization, but they are also simple and modular, making them very suitable for VLSI implementation. Furthermore, the proposed design methodology enables the design of the configurable architecture that can process both DWT and IDWT.
{"title":"Design of highly efficient VLSI architectures for 2-D DWT and 2-D IDWT","authors":"Yun-Nan Chang, Yan-Sheng Li","doi":"10.1109/SIPS.2001.957339","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957339","url":null,"abstract":"This paper presents a design methodology for the implementation of high-performance 2-D discrete wavelet transform (DWT) and 2-D inverse DWT (IDWT). By exploiting the multi-rate feature inherent in the algorithms, an effective schedule that interleaves all the row-wise and column-wise computations of different octaves onto three fundamental convolutional filters is proposed. Based on this computation schedule, very high efficient architectures can be synthesized. The resulting architectures cannot only achieve fast computation time at less silicon cost due to nearly full hardware utilization, but they are also simple and modular, making them very suitable for VLSI implementation. Furthermore, the proposed design methodology enables the design of the configurable architecture that can process both DWT and IDWT.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131461675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957358
Quynh-Lien Nguyen-Phuc, C.M. Sorolla
This paper presents the architecture of a hardware block supporting the real-time rendering of all 2D natural or synthetic visual objects proposed by the MPEG-4 standard as well as sprite decoding. It is compliant to main profile, Level3 and hybrid visual profile. A software model allows us to validate the visual quality of the rendered scene. The complexity of this architecture is evaluated and the architectural choices are validated by means of simulations of a behavioral model.
{"title":"VLSI architecture of a MPEG-4 visual renderer","authors":"Quynh-Lien Nguyen-Phuc, C.M. Sorolla","doi":"10.1109/SIPS.2001.957358","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957358","url":null,"abstract":"This paper presents the architecture of a hardware block supporting the real-time rendering of all 2D natural or synthetic visual objects proposed by the MPEG-4 standard as well as sprite decoding. It is compliant to main profile, Level3 and hybrid visual profile. A software model allows us to validate the visual quality of the rendered scene. The complexity of this architecture is evaluated and the architectural choices are validated by means of simulations of a behavioral model.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133359452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957346
V. Moshnyaga
A new architectural technique to reduce energy dissipation of frame memory is proposed. Unlike existing approaches, the technique exploits the pixel correlation in video sequences, dynamically adjusting the memory bit-width to the number of bits changed per pixel. Instead of treating the data bits independently, we group the most significant bits together, activating the corresponding group of bit-lines adaptively to data variation. The method is not restricted to the specific bit-patterns nor depends on the storage phase. It works equally well on read and write accesses, as well as during precharging. Simulation results show that using this method we can reduce the total energy consumption of frame memory by 20% without affecting the picture quality.
{"title":"Adaptive bit-width compression for low-energy frame memory design","authors":"V. Moshnyaga","doi":"10.1109/SIPS.2001.957346","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957346","url":null,"abstract":"A new architectural technique to reduce energy dissipation of frame memory is proposed. Unlike existing approaches, the technique exploits the pixel correlation in video sequences, dynamically adjusting the memory bit-width to the number of bits changed per pixel. Instead of treating the data bits independently, we group the most significant bits together, activating the corresponding group of bit-lines adaptively to data variation. The method is not restricted to the specific bit-patterns nor depends on the storage phase. It works equally well on read and write accesses, as well as during precharging. Simulation results show that using this method we can reduce the total energy consumption of frame memory by 20% without affecting the picture quality.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132357754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957370
N. Roma, L. Sousa
A new class of fully parameterizable multiple array architectures for motion estimation (ME) in video sequences based on the full search block matching (FSBM) algorithm is proposed in this paper. This class is based on a new and efficient AB2 single array architecture with minimum latency, maximum throughput and full utilization of the hardware resources. It provides the ability to configure the target processors according to the setup parameters, the processing time and the circuit area specified limits. With this purpose, a software configuration tool has been implemented to determine the set of possible configurations which fulfill the requisites of the video coder, providing the ability to automatically generate the VHDL description of the selected configuration. The implementation of a single array processor configuration on a single-chip is presented. Experimental results evidence the ability to estimate motion vectors in real-time with this configuration.
{"title":"Parameterizable hardware architectures for automatic synthesis of motion estimation processors","authors":"N. Roma, L. Sousa","doi":"10.1109/SIPS.2001.957370","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957370","url":null,"abstract":"A new class of fully parameterizable multiple array architectures for motion estimation (ME) in video sequences based on the full search block matching (FSBM) algorithm is proposed in this paper. This class is based on a new and efficient AB2 single array architecture with minimum latency, maximum throughput and full utilization of the hardware resources. It provides the ability to configure the target processors according to the setup parameters, the processing time and the circuit area specified limits. With this purpose, a software configuration tool has been implemented to determine the set of possible configurations which fulfill the requisites of the video coder, providing the ability to automatically generate the VHDL description of the selected configuration. The implementation of a single array processor configuration on a single-chip is presented. Experimental results evidence the ability to estimate motion vectors in real-time with this configuration.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"1165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134519408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957344
Tim Kogel, Andreas Wieferink, H. Meyr, A. Kroll
We propose a system level design and refinement methodology based on the SystemC class library. We address design space exploration and performance profiling at the highest possible level of abstraction. System level design starts with the initial functional specification and validation of the system behavior in SystemC. The refinement methodology covers architecture exploration and results in an executable system architecture model, which is able to generate the relevant profiling data and to verify if the chosen architecture meets the performance requirements. We have applied this methodology to a 100 million gate design of a 3D graphic processor. We were able to demonstrate the feasibility and define the final system architecture within 2 months. This 3D processor implements the ray-tracing rendering paradigm on one chip allowing real time rendering of 3D scenes with photo-realistic quality. Based on the results of this case study, we present the benefits of our methodology to define successively a feasible system architecture coping with the processing and memory bandwidth requirements.
{"title":"SystemC based architecture exploration of a 3D graphic processor","authors":"Tim Kogel, Andreas Wieferink, H. Meyr, A. Kroll","doi":"10.1109/SIPS.2001.957344","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957344","url":null,"abstract":"We propose a system level design and refinement methodology based on the SystemC class library. We address design space exploration and performance profiling at the highest possible level of abstraction. System level design starts with the initial functional specification and validation of the system behavior in SystemC. The refinement methodology covers architecture exploration and results in an executable system architecture model, which is able to generate the relevant profiling data and to verify if the chosen architecture meets the performance requirements. We have applied this methodology to a 100 million gate design of a 3D graphic processor. We were able to demonstrate the feasibility and define the final system architecture within 2 months. This 3D processor implements the ray-tracing rendering paradigm on one chip allowing real time rendering of 3D scenes with photo-realistic quality. Based on the results of this case study, we present the benefits of our methodology to define successively a feasible system architecture coping with the processing and memory bandwidth requirements.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129995251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-26DOI: 10.1109/SIPS.2001.957350
W. Wolf, I. Burak Ozer
This paper describes a smart camera system under development at Princeton University. This smart camera is designed for use in a smart room in which the camera detects the presence of a person in its visual field and determines when various gestures are made by the person. As a first step toward a VLSI implementation, we use Trimedia processors hosted by a PC. This paper describes the relationship between the algorithms used for human activity detection and the architectures required to perform these tasks in real time.
{"title":"A smart camera for real-time human activity recognition","authors":"W. Wolf, I. Burak Ozer","doi":"10.1109/SIPS.2001.957350","DOIUrl":"https://doi.org/10.1109/SIPS.2001.957350","url":null,"abstract":"This paper describes a smart camera system under development at Princeton University. This smart camera is designed for use in a smart room in which the camera detects the presence of a person in its visual field and determines when various gestures are made by the person. As a first step toward a VLSI implementation, we use Trimedia processors hosted by a PC. This paper describes the relationship between the algorithms used for human activity detection and the architectures required to perform these tasks in real time.","PeriodicalId":246898,"journal":{"name":"2001 IEEE Workshop on Signal Processing Systems. SiPS 2001. Design and Implementation (Cat. No.01TH8578)","volume":"278 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122125596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}