Sphere decoding has become a popular implementation of MIMO detection due to its improved performance at lower hardware complexity. ASIC implementations have proven the feasibility of this method but fail to effectively address the issue of power efficiency. In this work, we propose an improved architecture that aims to exploit a combination of a deeper pipeline and the use of single-port read and write memories to increase the energy efficiency (bits/sec/mW) of the implementation. We see a 30% and 80% increase in memory and logic energy efficiencies when compared to an unpipelined version of the implementation in 0.18 mu technology.
{"title":"Architecture for Energy Efficient Sphere Decoding","authors":"Ravi Jenkal, W. R. Davis","doi":"10.1145/1283780.1283833","DOIUrl":"https://doi.org/10.1145/1283780.1283833","url":null,"abstract":"Sphere decoding has become a popular implementation of MIMO detection due to its improved performance at lower hardware complexity. ASIC implementations have proven the feasibility of this method but fail to effectively address the issue of power efficiency. In this work, we propose an improved architecture that aims to exploit a combination of a deeper pipeline and the use of single-port read and write memories to increase the energy efficiency (bits/sec/mW) of the implementation. We see a 30% and 80% increase in memory and logic energy efficiencies when compared to an unpipelined version of the implementation in 0.18 mu technology.","PeriodicalId":345714,"journal":{"name":"2006 IEEE International SOC Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131356219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-09-01DOI: 10.1109/SOCC.2006.283892
N. Mohan, Wilson W. L. Fung, M. Sachdev
Multiple match detection (MMD) circuits and priority encoders (PEs) are employed in ternary content addressable memory (TCAM) chips to detect multiple matches and to resolve the highest priority match. This paper presents novel PE and MMD circuits. Measurement results of the proposed circuits, fabricated in 0.18 mum CMOS technology, show significant (up to 70%) speed and energy improvements over the existing designs.
在三元内容可寻址存储器(TCAM)芯片中采用多匹配检测电路和优先级编码器(pe)来检测多个匹配并求解优先级最高的匹配。本文提出了一种新型PE和MMD电路。采用0.18 μ m CMOS技术制造的拟议电路的测量结果显示,与现有设计相比,速度和能量有显著(高达70%)的改进。
{"title":"Low-Power Priority Encoder and Multiple Match Detection Circuit for Ternary Content Addressable Memory","authors":"N. Mohan, Wilson W. L. Fung, M. Sachdev","doi":"10.1109/SOCC.2006.283892","DOIUrl":"https://doi.org/10.1109/SOCC.2006.283892","url":null,"abstract":"Multiple match detection (MMD) circuits and priority encoders (PEs) are employed in ternary content addressable memory (TCAM) chips to detect multiple matches and to resolve the highest priority match. This paper presents novel PE and MMD circuits. Measurement results of the proposed circuits, fabricated in 0.18 mum CMOS technology, show significant (up to 70%) speed and energy improvements over the existing designs.","PeriodicalId":345714,"journal":{"name":"2006 IEEE International SOC Conference","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126033700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-09-01DOI: 10.1109/SOCC.2006.283891
S. Suhaib, D. Mathaikutty, S. Shukla
Composing synchronous intellectual property (IP) blocks over asynchronous communication links for an system-on-chip (SoC) design is a challenging task, especially for ensuring the functional correctness of the overall design. In this paper, we propose a trace based framework to assist in validation of globally asynchronous locally synchronous (GALS) designs. We provide a specific characterization of synchronous IPs in our framework such that a simple barrier synchronization protocol would be sufficient for asynchronous communication between them. We theoretically show that IPs with single activation property, composed asynchronously, are behaviorally equivalent to those composed synchronously.
{"title":"A Trace Based Framework for Validation of SoC Designs with GALS Systems","authors":"S. Suhaib, D. Mathaikutty, S. Shukla","doi":"10.1109/SOCC.2006.283891","DOIUrl":"https://doi.org/10.1109/SOCC.2006.283891","url":null,"abstract":"Composing synchronous intellectual property (IP) blocks over asynchronous communication links for an system-on-chip (SoC) design is a challenging task, especially for ensuring the functional correctness of the overall design. In this paper, we propose a trace based framework to assist in validation of globally asynchronous locally synchronous (GALS) designs. We provide a specific characterization of synchronous IPs in our framework such that a simple barrier synchronization protocol would be sufficient for asynchronous communication between them. We theoretically show that IPs with single activation property, composed asynchronously, are behaviorally equivalent to those composed synchronously.","PeriodicalId":345714,"journal":{"name":"2006 IEEE International SOC Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125523873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-09-01DOI: 10.1109/SOCC.2006.283876
S. Srinivasan, Raghavan Ramadoss, N. Vijaykrishnan
Scaling of microprocessors is aggravating the gap between design and manufacturing expectations. Such variations may lead to manufacturing of processors cores with frequencies lower or higher than their expected frequencies. In particular, with the rapid advent of multiprocessor system on chips (MPSoC), such manufacturing uncertainties may lead to significant variations in the operating frequencies of different processor cores on the same chip. In this work, we demonstrate that traditional load balanced parallelization schemes need to be revisited to account for such variations. Specifically, we highlight the need for tuning the degree of parallelization and non-uniform workload generation to achieve lower power consumption in next generation MPSoCs.
{"title":"Process Variation Aware Parallelization Strategies for MPSoCs","authors":"S. Srinivasan, Raghavan Ramadoss, N. Vijaykrishnan","doi":"10.1109/SOCC.2006.283876","DOIUrl":"https://doi.org/10.1109/SOCC.2006.283876","url":null,"abstract":"Scaling of microprocessors is aggravating the gap between design and manufacturing expectations. Such variations may lead to manufacturing of processors cores with frequencies lower or higher than their expected frequencies. In particular, with the rapid advent of multiprocessor system on chips (MPSoC), such manufacturing uncertainties may lead to significant variations in the operating frequencies of different processor cores on the same chip. In this work, we demonstrate that traditional load balanced parallelization schemes need to be revisited to account for such variations. Specifically, we highlight the need for tuning the degree of parallelization and non-uniform workload generation to achieve lower power consumption in next generation MPSoCs.","PeriodicalId":345714,"journal":{"name":"2006 IEEE International SOC Conference","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128215866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-09-01DOI: 10.1109/SOCC.2006.283878
Guangyu Chen, M. Kandemir, Mustafa Karaköy
Recent research demonstrates that voltage islands provide the flexibility to reduce power by selectively shutting down the different regions of the chip and/or running the select parts of the chip at different voltage/frequency levels. As against most of the prior work on voltage islands that mainly focused on the architecture design and IP placement issues, this paper studies the necessary software compiler support for voltage islands. Specifically, we focus on an embedded multiprocessor architecture that supports both voltage islands and control domains within these islands, and determine how an optimizing compiler can automatically map an embedded application onto this architecture. Our experiments with the proposed compiler support show that our approach is very effective in reducing energy consumption.
{"title":"Compiler Support for Voltage Islands","authors":"Guangyu Chen, M. Kandemir, Mustafa Karaköy","doi":"10.1109/SOCC.2006.283878","DOIUrl":"https://doi.org/10.1109/SOCC.2006.283878","url":null,"abstract":"Recent research demonstrates that voltage islands provide the flexibility to reduce power by selectively shutting down the different regions of the chip and/or running the select parts of the chip at different voltage/frequency levels. As against most of the prior work on voltage islands that mainly focused on the architecture design and IP placement issues, this paper studies the necessary software compiler support for voltage islands. Specifically, we focus on an embedded multiprocessor architecture that supports both voltage islands and control domains within these islands, and determine how an optimizing compiler can automatically map an embedded application onto this architecture. Our experiments with the proposed compiler support show that our approach is very effective in reducing energy consumption.","PeriodicalId":345714,"journal":{"name":"2006 IEEE International SOC Conference","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124201838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-09-01DOI: 10.1109/SOCC.2006.283909
K. Nagaraj, N. Nayak
Until recently, a vast majority of PLLs have been Analog PLLs (APLLs). The block schematic of a commonly used APLL is shown in Fig. 1. Here, divided versions of an input reference clock and the output of a Voltage Controlled Oscillator (VCO) are compared in Phase Frequency Detector (PFD), which in conjunction with a Charge Pump and a low pass loop filter generates a control signal for the VCO. This results in a phase lock between REFINT and FBCLK, making fo,t equal to M/NQ times fREF. Thus, the output frequency can be programmed by means of M, N and Q.
{"title":"Design of Low Power Digital Phase Lock Loops","authors":"K. Nagaraj, N. Nayak","doi":"10.1109/SOCC.2006.283909","DOIUrl":"https://doi.org/10.1109/SOCC.2006.283909","url":null,"abstract":"Until recently, a vast majority of PLLs have been Analog PLLs (APLLs). The block schematic of a commonly used APLL is shown in Fig. 1. Here, divided versions of an input reference clock and the output of a Voltage Controlled Oscillator (VCO) are compared in Phase Frequency Detector (PFD), which in conjunction with a Charge Pump and a low pass loop filter generates a control signal for the VCO. This results in a phase lock between REFINT and FBCLK, making fo,t equal to M/NQ times fREF. Thus, the output frequency can be programmed by means of M, N and Q.","PeriodicalId":345714,"journal":{"name":"2006 IEEE International SOC Conference","volume":"277 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114078003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-09-01DOI: 10.1109/SOCC.2006.283841
A. Major, Y. Yi, I. Nousias, M. Milward, S. Khawam, T. Arslan
This paper presents a new baseline profile compliant H.264 decoder implementation specifically tailored for an ANSI-C programmable, dynamically reconfigurable, instruction cell based architecture which has been developed. We use the ffmpeg libavcodec library as the basis for our decoder and identify the most processor intensive functions. These functions are tailored in a novel framework incorporating established software techniques alongside several architecture specific transforms. Initial results demonstrate that our reconfigurable architecture based decoder provides a significant performance boost with power figures below that of a microcontroller such as ARM.
{"title":"H.264 Decoder Implementation on a Dynamically Reconfigurable Instruction Cell Based Architecture","authors":"A. Major, Y. Yi, I. Nousias, M. Milward, S. Khawam, T. Arslan","doi":"10.1109/SOCC.2006.283841","DOIUrl":"https://doi.org/10.1109/SOCC.2006.283841","url":null,"abstract":"This paper presents a new baseline profile compliant H.264 decoder implementation specifically tailored for an ANSI-C programmable, dynamically reconfigurable, instruction cell based architecture which has been developed. We use the ffmpeg libavcodec library as the basis for our decoder and identify the most processor intensive functions. These functions are tailored in a novel framework incorporating established software techniques alongside several architecture specific transforms. Initial results demonstrate that our reconfigurable architecture based decoder provides a significant performance boost with power figures below that of a microcontroller such as ARM.","PeriodicalId":345714,"journal":{"name":"2006 IEEE International SOC Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127786225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-09-01DOI: 10.1109/SOCC.2006.283850
M. Hansson, A. Alvandpour
This paper presents analysis and measurement of a leakage current compensation technique aimed to preserve traditional operation of dynamic flip-flops in nano-scale CMOS. Over 7.4X larger leakage tolerance was observed for a dynamic transmission-gate flip-flop utilizing the proposed technique. Furthermore, a conditional static keeper ensures robust operation at low-frequency/standby..
{"title":"A Leakage Compensation Technique for Dynamic Latches and Flip-Flops in Nano-Scale CMOS","authors":"M. Hansson, A. Alvandpour","doi":"10.1109/SOCC.2006.283850","DOIUrl":"https://doi.org/10.1109/SOCC.2006.283850","url":null,"abstract":"This paper presents analysis and measurement of a leakage current compensation technique aimed to preserve traditional operation of dynamic flip-flops in nano-scale CMOS. Over 7.4X larger leakage tolerance was observed for a dynamic transmission-gate flip-flop utilizing the proposed technique. Furthermore, a conditional static keeper ensures robust operation at low-frequency/standby..","PeriodicalId":345714,"journal":{"name":"2006 IEEE International SOC Conference","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126216228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-09-01DOI: 10.1109/SOCC.2006.283905
M. Yamaoka, H. Onodera
A Vth variation has large impact on SRAM operation. To predict an SRAM operating margin in design phase, a Vth window analysis is used. We propose an improved Vth window analysis, which considers a relationship between global and local Vth variation, and the analysis enables accurate operating margin prediction. This analysis predicts 7.7% larger yield deterioration than conventional method in 65-nm manufacturing process and gives a chance to introduce some operating margin enhancement circuits in design phase.
{"title":"A Detailed Vth-Variation Analysis for Sub-100-nm Embedded SRAM Design","authors":"M. Yamaoka, H. Onodera","doi":"10.1109/SOCC.2006.283905","DOIUrl":"https://doi.org/10.1109/SOCC.2006.283905","url":null,"abstract":"A Vth variation has large impact on SRAM operation. To predict an SRAM operating margin in design phase, a Vth window analysis is used. We propose an improved Vth window analysis, which considers a relationship between global and local Vth variation, and the analysis enables accurate operating margin prediction. This analysis predicts 7.7% larger yield deterioration than conventional method in 65-nm manufacturing process and gives a chance to introduce some operating margin enhancement circuits in design phase.","PeriodicalId":345714,"journal":{"name":"2006 IEEE International SOC Conference","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115870309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-09-01DOI: 10.1109/SOCC.2006.283836
Dhruba Chandra, U. Pazhayaveetil, P. Franzon
This paper proposes an architecture for real-time large vocabulary speech recognition on a mobile embedded device. The speech recognition system is based on Hidden Markov Model (HMM), which involves complex mathematical operations such as probability estimation and Viterbi decoding. This computational nature makes it power hungry and realtime recognition is not achieved by porting software solutions on embedded device. Our system architecture has a low power embedded processor and dedicated ASIC units for complex computations. These units operate at a low frequency of 50 MHz thus consuming low power. The system uses RAM for the intermediate values and flash memory to store acoustic and language models for speech recognition.
{"title":"Architecture for Low Power Large Vocabulary Speech Recognition","authors":"Dhruba Chandra, U. Pazhayaveetil, P. Franzon","doi":"10.1109/SOCC.2006.283836","DOIUrl":"https://doi.org/10.1109/SOCC.2006.283836","url":null,"abstract":"This paper proposes an architecture for real-time large vocabulary speech recognition on a mobile embedded device. The speech recognition system is based on Hidden Markov Model (HMM), which involves complex mathematical operations such as probability estimation and Viterbi decoding. This computational nature makes it power hungry and realtime recognition is not achieved by porting software solutions on embedded device. Our system architecture has a low power embedded processor and dedicated ASIC units for complex computations. These units operate at a low frequency of 50 MHz thus consuming low power. The system uses RAM for the intermediate values and flash memory to store acoustic and language models for speech recognition.","PeriodicalId":345714,"journal":{"name":"2006 IEEE International SOC Conference","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122061135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}