Pub Date : 2004-12-06DOI: 10.1109/SIPS.2004.1363073
A. Erdogan, T. Arslan, R. Lai
This paper presents the implementation of the decorrelating (DECOR) transformation technique for low power FIR filtering cores. The technique was introduced in the past, but was not fully evaluated for its area, delay and power performance. Early evaluations did not consider the whole implementation and were merely based on either some analytical methods or high level simulation models. This paper presents the complete VLSI implementation of the technique and a study of its area, delay and power performance with different order of coefficient differences and various multiplier types. We show that although the technique achieves up to 47% power saving in the multiplier unit, the overall power saving is up to 25% with up to 24% increase in area.
{"title":"Implementation of the decorrelating transformation for low power FIR filters","authors":"A. Erdogan, T. Arslan, R. Lai","doi":"10.1109/SIPS.2004.1363073","DOIUrl":"https://doi.org/10.1109/SIPS.2004.1363073","url":null,"abstract":"This paper presents the implementation of the decorrelating (DECOR) transformation technique for low power FIR filtering cores. The technique was introduced in the past, but was not fully evaluated for its area, delay and power performance. Early evaluations did not consider the whole implementation and were merely based on either some analytical methods or high level simulation models. This paper presents the complete VLSI implementation of the technique and a study of its area, delay and power performance with different order of coefficient differences and various multiplier types. We show that although the technique achieves up to 47% power saving in the multiplier unit, the overall power saving is up to 25% with up to 24% increase in area.","PeriodicalId":384858,"journal":{"name":"IEEE Workshop onSignal Processing Systems, 2004. SIPS 2004.","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122014460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-12-06DOI: 10.1109/SIPS.2004.1363044
S. Mamagkakis, David Atienza Alonso, C. Poucet, F. Catthoor, D. Soudris, J. Mendias
In this paper, we propose a new approach to design convenient dynamic memory management subsystems, profiting from the multiple memory levels. It analyzes the logical phases involved in modem dynamic applications to effectively distribute the dynamically allocated data among the multi-level memory hierarchies present in embedded devices. We assess the effectiveness of the proposed approach for three representative real-life case studies of the new dynamic application domains (i.e., network and 3D rendering applications) ported to embedded systems. The results accomplished with our approach show a very significant reduction in energy consumption (up to 40%) over state-of-the-art solutions for dynamic memory management on embedded systems with typical cache-main memory architectures while respecting the real-time requirements of these applications.
{"title":"Custom design of multi-level dynamic memory management subsystem for embedded systems","authors":"S. Mamagkakis, David Atienza Alonso, C. Poucet, F. Catthoor, D. Soudris, J. Mendias","doi":"10.1109/SIPS.2004.1363044","DOIUrl":"https://doi.org/10.1109/SIPS.2004.1363044","url":null,"abstract":"In this paper, we propose a new approach to design convenient dynamic memory management subsystems, profiting from the multiple memory levels. It analyzes the logical phases involved in modem dynamic applications to effectively distribute the dynamically allocated data among the multi-level memory hierarchies present in embedded devices. We assess the effectiveness of the proposed approach for three representative real-life case studies of the new dynamic application domains (i.e., network and 3D rendering applications) ported to embedded systems. The results accomplished with our approach show a very significant reduction in energy consumption (up to 40%) over state-of-the-art solutions for dynamic memory management on embedded systems with typical cache-main memory architectures while respecting the real-time requirements of these applications.","PeriodicalId":384858,"journal":{"name":"IEEE Workshop onSignal Processing Systems, 2004. SIPS 2004.","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115946943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-12-06DOI: 10.1109/SIPS.2004.1363034
Sang-Min Kim, K. Parhi
In low-density parity-check (LDPC) code decoding with the iterative sum-product algorithm (SPA), due to the randomness of the parity-check matrix, H, the overlapping of the check node processing unit (CNU) and variable node processing unit (VNU) in the same clock cycle is difficult. The paper demonstrates that overlapped decoding can be exploited as long as the LDPC matrix is composed of identity matrices and their cyclic-shifted matrices, i.e., the parity-check matrix, H, belongs to a class of quasi-cyclic LDPC codes. It is shown that the number of clock cycles required for decoding can be reduced by 50% when overlapped decoding is applied to a (3,6)-regular LDPC code decoder.
{"title":"Overlapped decoding for a class of quasi-cyclic LDPC codes","authors":"Sang-Min Kim, K. Parhi","doi":"10.1109/SIPS.2004.1363034","DOIUrl":"https://doi.org/10.1109/SIPS.2004.1363034","url":null,"abstract":"In low-density parity-check (LDPC) code decoding with the iterative sum-product algorithm (SPA), due to the randomness of the parity-check matrix, H, the overlapping of the check node processing unit (CNU) and variable node processing unit (VNU) in the same clock cycle is difficult. The paper demonstrates that overlapped decoding can be exploited as long as the LDPC matrix is composed of identity matrices and their cyclic-shifted matrices, i.e., the parity-check matrix, H, belongs to a class of quasi-cyclic LDPC codes. It is shown that the number of clock cycles required for decoding can be reduced by 50% when overlapped decoding is applied to a (3,6)-regular LDPC code decoder.","PeriodicalId":384858,"journal":{"name":"IEEE Workshop onSignal Processing Systems, 2004. SIPS 2004.","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133354932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-12-06DOI: 10.1109/SIPS.2004.1363024
M. Hosemann, G. Cichon, P. Robelly, H. Seidel, Thorsten Dräger, T. Richter, M. Bronzel, G. Fettweis
Terrestrial digital video broadcasting (DVB-T) is currently being introduced in many European countries and planned to supplement or replace current analogue broadcasting schemes in a large part of the world. It is also considered as an additional downlink medium for third generation UMTS mobile telephones, where a special variant, DVB-H, is under development. Current DVB-T receivers still are built upon dedicated application specific integrated circuits (ASIC). However, designing ASIC is a tedious and expensive task. We show that it is possible to implement a DVB-T receiver in software on an application-specific digital signal processor (AS-DSP). We analyze the computational requirements of a DVB-T receiver and investigate its potential for parallelization. Further, we present our AS-DSP, the M5-DSP, which is based on a novel architecture and design methodology, and report on implementing the core algorithms of a DVB-T receiver on it.
{"title":"Implementing a receiver for terrestrial digital video broadcasting in software on an application-specific DSP","authors":"M. Hosemann, G. Cichon, P. Robelly, H. Seidel, Thorsten Dräger, T. Richter, M. Bronzel, G. Fettweis","doi":"10.1109/SIPS.2004.1363024","DOIUrl":"https://doi.org/10.1109/SIPS.2004.1363024","url":null,"abstract":"Terrestrial digital video broadcasting (DVB-T) is currently being introduced in many European countries and planned to supplement or replace current analogue broadcasting schemes in a large part of the world. It is also considered as an additional downlink medium for third generation UMTS mobile telephones, where a special variant, DVB-H, is under development. Current DVB-T receivers still are built upon dedicated application specific integrated circuits (ASIC). However, designing ASIC is a tedious and expensive task. We show that it is possible to implement a DVB-T receiver in software on an application-specific digital signal processor (AS-DSP). We analyze the computational requirements of a DVB-T receiver and investigate its potential for parallelization. Further, we present our AS-DSP, the M5-DSP, which is based on a novel architecture and design methodology, and report on implementing the core algorithms of a DVB-T receiver on it.","PeriodicalId":384858,"journal":{"name":"IEEE Workshop onSignal Processing Systems, 2004. SIPS 2004.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121165732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-12-06DOI: 10.1109/SIPS.2004.1363026
A. Wellig
Interleaving is a key component of many digital communication systems where the encoded data is reshuffled prior to transmission to protect against burst errors. Coupled with multiplexing schemes such multi-stage subsystems achieve the necessary quality and flexibility to support a variety of different services. In 3GPP, a 2-stage multiplexing channel interleaver network is adopted. Its state-of-the-art implementation is both memory- and control-intensive, since the deinterleaving is done explicitly implying dedicated storage and processing units at each stage. In this paper, we show that the C-fold decimation property which characterizes typical block interleavers is preserved in 2-stage interleaving networks. Thus, the underlying architecture not only results in significant memory size and access rate reductions but also greatly simplifies control processing. A decline in memory size of up to 31% and in access energy of up to 54% has been observed for STMicroelectronics' 0.13 /spl mu/m CMOS technology for various 3GPP capability classes.
{"title":"Two-stage interleaving network analysis to design area- and energy-efficient 3GPP-compliant receiver architectures","authors":"A. Wellig","doi":"10.1109/SIPS.2004.1363026","DOIUrl":"https://doi.org/10.1109/SIPS.2004.1363026","url":null,"abstract":"Interleaving is a key component of many digital communication systems where the encoded data is reshuffled prior to transmission to protect against burst errors. Coupled with multiplexing schemes such multi-stage subsystems achieve the necessary quality and flexibility to support a variety of different services. In 3GPP, a 2-stage multiplexing channel interleaver network is adopted. Its state-of-the-art implementation is both memory- and control-intensive, since the deinterleaving is done explicitly implying dedicated storage and processing units at each stage. In this paper, we show that the C-fold decimation property which characterizes typical block interleavers is preserved in 2-stage interleaving networks. Thus, the underlying architecture not only results in significant memory size and access rate reductions but also greatly simplifies control processing. A decline in memory size of up to 31% and in access energy of up to 54% has been observed for STMicroelectronics' 0.13 /spl mu/m CMOS technology for various 3GPP capability classes.","PeriodicalId":384858,"journal":{"name":"IEEE Workshop onSignal Processing Systems, 2004. SIPS 2004.","volume":"266 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116065398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-12-06DOI: 10.1109/SIPS.2004.1363025
S. Im, E. Powers
In this paper, we consider an application of an iterative decorrelating receiver to direct sequence ultra wideband (DS-UWB) multiple access systems, which utilize biphase modulation. As the number of users increases in the DS-UWB system, multiple access interference becomes a dominant source to degrade system performance. In order to efficiently suppress multiple access interference, a multiuser receiver is required. The high computational complexity of the optimal multiuser receiver prohibits its application. The iterative decorrelating receiver approximates the conventional decorrelating receiver with lower computational complexity. According to the simulation results, the proposed decorrelating receiver clearly improves the system performance. In addition, the convergence characteristics of the proposed iterative decorrelator are investigated in terms of the optimal convergence constant and the error bound.
{"title":"An iterative decorrelating receiver for DS-UWB multiple access systems using biphase modulation","authors":"S. Im, E. Powers","doi":"10.1109/SIPS.2004.1363025","DOIUrl":"https://doi.org/10.1109/SIPS.2004.1363025","url":null,"abstract":"In this paper, we consider an application of an iterative decorrelating receiver to direct sequence ultra wideband (DS-UWB) multiple access systems, which utilize biphase modulation. As the number of users increases in the DS-UWB system, multiple access interference becomes a dominant source to degrade system performance. In order to efficiently suppress multiple access interference, a multiuser receiver is required. The high computational complexity of the optimal multiuser receiver prohibits its application. The iterative decorrelating receiver approximates the conventional decorrelating receiver with lower computational complexity. According to the simulation results, the proposed decorrelating receiver clearly improves the system performance. In addition, the convergence characteristics of the proposed iterative decorrelator are investigated in terms of the optimal convergence constant and the error bound.","PeriodicalId":384858,"journal":{"name":"IEEE Workshop onSignal Processing Systems, 2004. SIPS 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125293409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-12-06DOI: 10.1109/SIPS.2004.1363071
Sangjin Hong, Xiaoyao Liang, P. Djurić
This paper presents reconfigurable particle filter design, which provides a capability of selecting a single particle filter from multiple particle filter realizations. The execution of the design is based on block level pipelining where data transfer between processing blocks is effectively controlled by autonomous controllers. With a simple switching mechanism that allows transformation of dataflow structure in addition to autonomous buffer controller, any desired particle filter can be performed. Two target particle filters, based on SIRF and GPF, are realized. From the execution characteristics obtained from the FPGA implementation, overall controller structure is derived according to the methodology and verified using Verilog and SystemC.
{"title":"Reconfigurable particle filter design using dataflow structure translation","authors":"Sangjin Hong, Xiaoyao Liang, P. Djurić","doi":"10.1109/SIPS.2004.1363071","DOIUrl":"https://doi.org/10.1109/SIPS.2004.1363071","url":null,"abstract":"This paper presents reconfigurable particle filter design, which provides a capability of selecting a single particle filter from multiple particle filter realizations. The execution of the design is based on block level pipelining where data transfer between processing blocks is effectively controlled by autonomous controllers. With a simple switching mechanism that allows transformation of dataflow structure in addition to autonomous buffer controller, any desired particle filter can be performed. Two target particle filters, based on SIRF and GPF, are realized. From the execution characteristics obtained from the FPGA implementation, overall controller structure is derived according to the methodology and verified using Verilog and SystemC.","PeriodicalId":384858,"journal":{"name":"IEEE Workshop onSignal Processing Systems, 2004. SIPS 2004.","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126850238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-12-06DOI: 10.1109/SIPS.2004.1363048
Neil Smyth, M. McLoone, J. McCanny
A novel wireless local area network (WLAN) security processor is described in this paper. This processor is capable of offloading all security encapsulation in an IEEE 802.11i compliant medium access control (MAC) layer to a reconfigurable hardware accelerator. Embedded software provides flexible support for many other RC4 and AES based security protocols, such as those relevant to Internet protocol security (IPSec). The unique design is primarily targeted at WLAN applications, and as such is capable of performing wired equivalent privacy (WEP), temporal key integrity protocol (TKIP), counter mode with CBC-MAC protocol (CCMP), and wireless robust authentication protocol (WRAP). The use of dedicated instructions designed for WLAN applications results in reduced instruction code footprints in comparison to general-purpose processors, and provides the high throughput necessary for 54 Mbps IEEE 802.11 a/g.
{"title":"Reconfigurable hardware acceleration of WLAN security","authors":"Neil Smyth, M. McLoone, J. McCanny","doi":"10.1109/SIPS.2004.1363048","DOIUrl":"https://doi.org/10.1109/SIPS.2004.1363048","url":null,"abstract":"A novel wireless local area network (WLAN) security processor is described in this paper. This processor is capable of offloading all security encapsulation in an IEEE 802.11i compliant medium access control (MAC) layer to a reconfigurable hardware accelerator. Embedded software provides flexible support for many other RC4 and AES based security protocols, such as those relevant to Internet protocol security (IPSec). The unique design is primarily targeted at WLAN applications, and as such is capable of performing wired equivalent privacy (WEP), temporal key integrity protocol (TKIP), counter mode with CBC-MAC protocol (CCMP), and wireless robust authentication protocol (WRAP). The use of dedicated instructions designed for WLAN applications results in reduced instruction code footprints in comparison to general-purpose processors, and provides the high throughput necessary for 54 Mbps IEEE 802.11 a/g.","PeriodicalId":384858,"journal":{"name":"IEEE Workshop onSignal Processing Systems, 2004. SIPS 2004.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128043599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-12-06DOI: 10.1109/SIPS.2004.1363028
B. Bougard, S. Pollin, G. Lenoir, L. Van der Perre, F. Catthoor, W. Dehaene
Low power consumption is imperative to enable the deployment of broadband wireless connectivity in portable devices such as PDA or smart telephones. Next to low power circuit and architecture design, system-level power management is revealed to be a key technology for low power consumption. Recently, "lazy scheduling" has been proposed for system level power reduction. It has been shown to be very effective and complementary to more traditional shutdown based approaches. So far, analysis has been carried out from the viewpoint of medium access control (MAC) and data link control (DLC) layers. Yet, effective power management in radio communication requires consideration of end-to-end cross-layer interactions. In this paper, we analyze the implication of "lazy scheduling" from the transport layer perspective. It is shown that a key trade-off between queuing delay and physical layer energy drives the global trade-off between user throughput and system power. Conditions under which "lazy scheduling" is efficient are established and important conclusions on effective system-level architecture and cross-layer power management are drawn.
{"title":"Transport level performance-energy trade-off in wireless networks and consequences on the system-level architecture and design paradigm","authors":"B. Bougard, S. Pollin, G. Lenoir, L. Van der Perre, F. Catthoor, W. Dehaene","doi":"10.1109/SIPS.2004.1363028","DOIUrl":"https://doi.org/10.1109/SIPS.2004.1363028","url":null,"abstract":"Low power consumption is imperative to enable the deployment of broadband wireless connectivity in portable devices such as PDA or smart telephones. Next to low power circuit and architecture design, system-level power management is revealed to be a key technology for low power consumption. Recently, \"lazy scheduling\" has been proposed for system level power reduction. It has been shown to be very effective and complementary to more traditional shutdown based approaches. So far, analysis has been carried out from the viewpoint of medium access control (MAC) and data link control (DLC) layers. Yet, effective power management in radio communication requires consideration of end-to-end cross-layer interactions. In this paper, we analyze the implication of \"lazy scheduling\" from the transport layer perspective. It is shown that a key trade-off between queuing delay and physical layer energy drives the global trade-off between user throughput and system power. Conditions under which \"lazy scheduling\" is efficient are established and important conclusions on effective system-level architecture and cross-layer power management are drawn.","PeriodicalId":384858,"journal":{"name":"IEEE Workshop onSignal Processing Systems, 2004. SIPS 2004.","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117193279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-12-06DOI: 10.1093/ietcom/e88-b.12.4667
Jungwoo Lee
A new channel estimator that does not require a separate frequency offset estimator is proposed. The new algorithm has low complexity and low latency compared to the weighted multi-slot averaging algorithm. The simulation results demonstrate the improved resistance to high Doppler frequency and high frequency offset.
{"title":"A novel low complexity channel estimator with frequency offset resistance for CDMA [3G wireless applications]","authors":"Jungwoo Lee","doi":"10.1093/ietcom/e88-b.12.4667","DOIUrl":"https://doi.org/10.1093/ietcom/e88-b.12.4667","url":null,"abstract":"A new channel estimator that does not require a separate frequency offset estimator is proposed. The new algorithm has low complexity and low latency compared to the weighted multi-slot averaging algorithm. The simulation results demonstrate the improved resistance to high Doppler frequency and high frequency offset.","PeriodicalId":384858,"journal":{"name":"IEEE Workshop onSignal Processing Systems, 2004. SIPS 2004.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115818628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}