Pub Date : 2010-12-17DOI: 10.1109/NORCHIP.2010.5669495
Chandrika Jena, Tim Mason, Tom Chen
This paper presents the power-performance trade off of three different cache compression algorithms. Cache compression improves performance, since the compressed data increases the effective cache capacity by reducing the cache misses. The unused memory cells can be put into sleep mode to save static power. The increased performance and saved power due to cache compression must be more than the delay and power consumption added due to CODEC(COmpressor and DECompressor) block respectively. Among the studied algorithms, powerdelay characteristic of Frequent Pattern compression(FPC) is found to be the most suitable for cache compression.
{"title":"On power and performance tradeoff of L2 cache compression","authors":"Chandrika Jena, Tim Mason, Tom Chen","doi":"10.1109/NORCHIP.2010.5669495","DOIUrl":"https://doi.org/10.1109/NORCHIP.2010.5669495","url":null,"abstract":"This paper presents the power-performance trade off of three different cache compression algorithms. Cache compression improves performance, since the compressed data increases the effective cache capacity by reducing the cache misses. The unused memory cells can be put into sleep mode to save static power. The increased performance and saved power due to cache compression must be more than the delay and power consumption added due to CODEC(COmpressor and DECompressor) block respectively. Among the studied algorithms, powerdelay characteristic of Frequent Pattern compression(FPC) is found to be the most suitable for cache compression.","PeriodicalId":292342,"journal":{"name":"NORCHIP 2010","volume":"279 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134343793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-17DOI: 10.1109/NORCHIP.2010.5669432
Thilo Pionteck, C. Osterloh, C. Albrecht
This paper reviews Network-on-Chip architectures with prioritization of selected data streams targeting runtime reconfigurable manycore systems. The common idea of these architectures is to minimize the latency of selected packet transmissions by either bypassing or parallelizing processing stages in routers or by using dedicated links bypassing complete routers. Potential classes of selected data streams are latency critical messages, i.e. cache accesses in multiprocessor systems, or systems with semi-static data streams, i.e. systems in which the same components continuously exchange data for a longer period. The review categorizes the diverse architectures and evaluates their pros and cons in terms of universality, hardware efficiency and support of changing traffic patterns.
{"title":"Latency reduction of selected data streams in Network-on-Chips for adaptive manycore systems","authors":"Thilo Pionteck, C. Osterloh, C. Albrecht","doi":"10.1109/NORCHIP.2010.5669432","DOIUrl":"https://doi.org/10.1109/NORCHIP.2010.5669432","url":null,"abstract":"This paper reviews Network-on-Chip architectures with prioritization of selected data streams targeting runtime reconfigurable manycore systems. The common idea of these architectures is to minimize the latency of selected packet transmissions by either bypassing or parallelizing processing stages in routers or by using dedicated links bypassing complete routers. Potential classes of selected data streams are latency critical messages, i.e. cache accesses in multiprocessor systems, or systems with semi-static data streams, i.e. systems in which the same components continuously exchange data for a longer period. The review categorizes the diverse architectures and evaluates their pros and cons in terms of universality, hardware efficiency and support of changing traffic patterns.","PeriodicalId":292342,"journal":{"name":"NORCHIP 2010","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133912852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-17DOI: 10.1109/NORCHIP.2010.5669469
Yashar Hesamiafshar, Sanaz Momeni
This report introduces a new approach for detection and correction of gain mismatch between ADC sub-channels in time- interleaved ADCs. Based on discrete Fourier transform, this technique uses a simple approach for gain mismatch correction. MATLAB simulation results are represented for correction of ±2% gain mismatch in a two-channel time-interleaved ADC where the proposed approach improves the SFDR by more than 30dB.
{"title":"A new DFT based approach for gain mismatch detection and correction in time-interleaved ADCs","authors":"Yashar Hesamiafshar, Sanaz Momeni","doi":"10.1109/NORCHIP.2010.5669469","DOIUrl":"https://doi.org/10.1109/NORCHIP.2010.5669469","url":null,"abstract":"This report introduces a new approach for detection and correction of gain mismatch between ADC sub-channels in time- interleaved ADCs. Based on discrete Fourier transform, this technique uses a simple approach for gain mismatch correction. MATLAB simulation results are represented for correction of ±2% gain mismatch in a two-channel time-interleaved ADC where the proposed approach improves the SFDR by more than 30dB.","PeriodicalId":292342,"journal":{"name":"NORCHIP 2010","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124155912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-17DOI: 10.1109/NORCHIP.2010.5669485
J. E. Ramstad, O. Soeraasen
This paper demonstrates how micromechanical on-chip MEMS resonators can be used as higher-order mixer-filters in RF front-end WSN nodes. Vibrating FFSFRs (Free-free Square Frame Resonator) connected together can create 4th and 6th order mixer-filter responses. The output is further enhanced by an on-chip amplifier, thus reducing stray capacitances. These mixer-filters are fabricated utilizing a CMOS-MEMS approach where the movable MEMS structure is defined by the metal layers offered by the CMOS foundry and released using a few simple etch steps. The system is implemented in TSMC 0.35µm CMOS and was post-CMOS processed at NTHU in Taiwan. Detailed modeling, simulation and implementation of the system show the performance of these higher order MEMS resonator mixer-filters as a potential candidate to replace bulky off-chip transceiver components.
{"title":"Higher order FFSFR coupled micromechanical mixer-filters integrated in CMOS","authors":"J. E. Ramstad, O. Soeraasen","doi":"10.1109/NORCHIP.2010.5669485","DOIUrl":"https://doi.org/10.1109/NORCHIP.2010.5669485","url":null,"abstract":"This paper demonstrates how micromechanical on-chip MEMS resonators can be used as higher-order mixer-filters in RF front-end WSN nodes. Vibrating FFSFRs (Free-free Square Frame Resonator) connected together can create 4th and 6th order mixer-filter responses. The output is further enhanced by an on-chip amplifier, thus reducing stray capacitances. These mixer-filters are fabricated utilizing a CMOS-MEMS approach where the movable MEMS structure is defined by the metal layers offered by the CMOS foundry and released using a few simple etch steps. The system is implemented in TSMC 0.35µm CMOS and was post-CMOS processed at NTHU in Taiwan. Detailed modeling, simulation and implementation of the system show the performance of these higher order MEMS resonator mixer-filters as a potential candidate to replace bulky off-chip transceiver components.","PeriodicalId":292342,"journal":{"name":"NORCHIP 2010","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134140273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-17DOI: 10.1109/NORCHIP.2010.5669473
W. Ahmad, Qiang Chen, Li-Rong Zheng, H. Tenhunen
On-chip power supply noise has become a bottleneck in 3D ICs as scaling of the supply network impedance has not been kept up with increasing device densities and operating currents with each technology node due to limited wire resources. In this paper we proposed an efficient and accurate model to estimate peak-to-peak switching noise, caused by simultaneous switching of logic loads along a vertical chain of power distribution TSV pairs in a 3D stack of ICs. The proposed model is quite accurate with only 2–3% difference from Ansoft Nexxim4.1 equivalent model. The proposed model is 3–4 times faster than Nexxim4.1 as well as consumes two times less memory as compared to Nexxim4.1equivalent model. We analyzed peak-to-peak switching noise along a vertical chain of power distribution TSV pairs by varying physical dimensions of TSVs and value of decoupling capacitance. We also thoroughly investigated the peak-to-peak noise sensitivity to TSV effective inductance and decoupling capacitance.
{"title":"Modeling of peak-to-peak switching noise along a vertical chain of power distribution TSV pairs in a 3D stack of ICs interconnected through TSVs","authors":"W. Ahmad, Qiang Chen, Li-Rong Zheng, H. Tenhunen","doi":"10.1109/NORCHIP.2010.5669473","DOIUrl":"https://doi.org/10.1109/NORCHIP.2010.5669473","url":null,"abstract":"On-chip power supply noise has become a bottleneck in 3D ICs as scaling of the supply network impedance has not been kept up with increasing device densities and operating currents with each technology node due to limited wire resources. In this paper we proposed an efficient and accurate model to estimate peak-to-peak switching noise, caused by simultaneous switching of logic loads along a vertical chain of power distribution TSV pairs in a 3D stack of ICs. The proposed model is quite accurate with only 2–3% difference from Ansoft Nexxim4.1 equivalent model. The proposed model is 3–4 times faster than Nexxim4.1 as well as consumes two times less memory as compared to Nexxim4.1equivalent model. We analyzed peak-to-peak switching noise along a vertical chain of power distribution TSV pairs by varying physical dimensions of TSVs and value of decoupling capacitance. We also thoroughly investigated the peak-to-peak noise sensitivity to TSV effective inductance and decoupling capacitance.","PeriodicalId":292342,"journal":{"name":"NORCHIP 2010","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114054194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-17DOI: 10.1109/NORCHIP.2010.5669430
Xiaowen Chen, Shuming Chen, Zhonghai Lu, A. Jantsch, Bangjiang Xu, Heng Luo
In this paper, we propose a fast barrier synchronization mechanism, targeting Network-on-Chip based many-core architectures. Its salient feature is that, once the barrier condition is reached, the “barrier release” acknowledgement is routed to all processor nodes in a broadcast way in order to save area by avoiding storing source node information and to minimize completion time by eliminating serialization of barrier releasing. Then, we construct a multi-FPGA platform using Xilinx® Virtex 5 as FPGA chips and implement a NoC based many-core architecture on it. FPGA utilization and simulation results show that our mechanism demonstrates both area and performance advantages over the barrier synchronization counterpart with unicast barrier releasing.
{"title":"Multi-FPGA implementation of a Network-on-Chip based many-core architecture with fast barrier synchronization mechanism","authors":"Xiaowen Chen, Shuming Chen, Zhonghai Lu, A. Jantsch, Bangjiang Xu, Heng Luo","doi":"10.1109/NORCHIP.2010.5669430","DOIUrl":"https://doi.org/10.1109/NORCHIP.2010.5669430","url":null,"abstract":"In this paper, we propose a fast barrier synchronization mechanism, targeting Network-on-Chip based many-core architectures. Its salient feature is that, once the barrier condition is reached, the “barrier release” acknowledgement is routed to all processor nodes in a broadcast way in order to save area by avoiding storing source node information and to minimize completion time by eliminating serialization of barrier releasing. Then, we construct a multi-FPGA platform using Xilinx® Virtex 5 as FPGA chips and implement a NoC based many-core architecture on it. FPGA utilization and simulation results show that our mechanism demonstrates both area and performance advantages over the barrier synchronization counterpart with unicast barrier releasing.","PeriodicalId":292342,"journal":{"name":"NORCHIP 2010","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116441725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-17DOI: 10.1109/NORCHIP.2010.5669480
A. Miremadi, Hassan Faraji Baghtash
A novel and simple structure for improving CMRR is introduced. This structure can be added to the circuits like folded cascode amplifier, telescopic amplifier, current buffers, .etc to improve the CMRR of these circuits. This simple and effective circuit uses common mode deviating technique to improve CMRR at least 12dB while preserves CMRR bandwidth which is a novel technique in order to improve CMRR. Application of this structure in both current buffer and folded cascode structures are shown. Simulation results in TSMC 0.18µm CMOS technology with HSPICE are presented to demonstrate the validity of the proposed circuit. In addition Monte Carlo analysis is performed to simulate the fabrication condition.
{"title":"A novel simple and high performance structure for improving CMRR: Application to current buffers and folded cascode ampilifier","authors":"A. Miremadi, Hassan Faraji Baghtash","doi":"10.1109/NORCHIP.2010.5669480","DOIUrl":"https://doi.org/10.1109/NORCHIP.2010.5669480","url":null,"abstract":"A novel and simple structure for improving CMRR is introduced. This structure can be added to the circuits like folded cascode amplifier, telescopic amplifier, current buffers, .etc to improve the CMRR of these circuits. This simple and effective circuit uses common mode deviating technique to improve CMRR at least 12dB while preserves CMRR bandwidth which is a novel technique in order to improve CMRR. Application of this structure in both current buffer and folded cascode structures are shown. Simulation results in TSMC 0.18µm CMOS technology with HSPICE are presented to demonstrate the validity of the proposed circuit. In addition Monte Carlo analysis is performed to simulate the fabrication condition.","PeriodicalId":292342,"journal":{"name":"NORCHIP 2010","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133212342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-17DOI: 10.1109/NORCHIP.2010.5669452
S. M. Yasser Sherazi, J. Rodrigues, Omer Can Akgun, H. Sjoland, P. Nilsson
This paper presents an analysis on energy dissipation of a digital half band filters operated in the the sub-threshold (sub-VT ) region with throughput constraints. The degradation of speed in the sub-VT domain is counteracted by unfolding the architectures. A filter is implemented in a basic 12-bit and its various unfolded structures. The designs are synthesized in a 65 nm low-leakage high-threshold CMOS technology. A sub-VT energy model is applied to characterize the designs in the sub-VT domain. The results from application of an energy model shows that the unfolded by 2 architecture is most energy efficient, dissipating 22% less energy compared to it the original filter implementation at energy minimum voltage. Unfolded by 4 architecture, however, is the best for throughput requirements of around 120Ksamples/sec to 1Msamples/s, as it dissipates less energy than any other implementation in this speed range.
{"title":"Ultra low energy vs throughput design exploration of 65 nm sub-VT CMOS digital filters","authors":"S. M. Yasser Sherazi, J. Rodrigues, Omer Can Akgun, H. Sjoland, P. Nilsson","doi":"10.1109/NORCHIP.2010.5669452","DOIUrl":"https://doi.org/10.1109/NORCHIP.2010.5669452","url":null,"abstract":"This paper presents an analysis on energy dissipation of a digital half band filters operated in the the sub-threshold (sub-VT ) region with throughput constraints. The degradation of speed in the sub-VT domain is counteracted by unfolding the architectures. A filter is implemented in a basic 12-bit and its various unfolded structures. The designs are synthesized in a 65 nm low-leakage high-threshold CMOS technology. A sub-VT energy model is applied to characterize the designs in the sub-VT domain. The results from application of an energy model shows that the unfolded by 2 architecture is most energy efficient, dissipating 22% less energy compared to it the original filter implementation at energy minimum voltage. Unfolded by 4 architecture, however, is the best for throughput requirements of around 120Ksamples/sec to 1Msamples/s, as it dissipates less energy than any other implementation in this speed range.","PeriodicalId":292342,"journal":{"name":"NORCHIP 2010","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129186950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-17DOI: 10.1109/NORCHIP.2010.5669478
F. Brandner, Viktor Pavlu, A. Krall
Modeling the execution of a processor and its instructions is a challenging problem, in particular in the presence of long pipelines, parallelism, and out-of-order execution. A naive approach based on finite state automata inevitably leads to an explosion in the number of states and is thus only applicable to simple minimalistic processors.
{"title":"Execution models for processors and instructions","authors":"F. Brandner, Viktor Pavlu, A. Krall","doi":"10.1109/NORCHIP.2010.5669478","DOIUrl":"https://doi.org/10.1109/NORCHIP.2010.5669478","url":null,"abstract":"Modeling the execution of a processor and its instructions is a challenging problem, in particular in the presence of long pipelines, parallelism, and out-of-order execution. A naive approach based on finite state automata inevitably leads to an explosion in the number of states and is thus only applicable to simple minimalistic processors.","PeriodicalId":292342,"journal":{"name":"NORCHIP 2010","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130476817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-17DOI: 10.1109/NORCHIP.2010.5669475
S. Hietakangas, T. Rahkonen
The purpose of this paper is to study the entire dimensioning space of a parallel-tuned integrated circuit that was designed and implemented earlier. The main parameters were swept while keeping the remaining component values fixed, and performance contours were derived. The main finding was that the traditional sweeps of resistively damped switching amplifiers match poorly if the resonator Q value is low - instead, the effect of external impedance matching circuit is very significant.
{"title":"Dimensioning space of a parallel tuned amplifier","authors":"S. Hietakangas, T. Rahkonen","doi":"10.1109/NORCHIP.2010.5669475","DOIUrl":"https://doi.org/10.1109/NORCHIP.2010.5669475","url":null,"abstract":"The purpose of this paper is to study the entire dimensioning space of a parallel-tuned integrated circuit that was designed and implemented earlier. The main parameters were swept while keeping the remaining component values fixed, and performance contours were derived. The main finding was that the traditional sweeps of resistively damped switching amplifiers match poorly if the resonator Q value is low - instead, the effect of external impedance matching circuit is very significant.","PeriodicalId":292342,"journal":{"name":"NORCHIP 2010","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128895413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}