Pub Date : 2008-10-10DOI: 10.1109/SOCC.2008.4641492
P. Ferguson, T. Arslan, A. Erdogan, Andrew Parmley
This paper presents the CLAHE method of contrast enhancement targeted to a FPGA based embedded platform. A novel approach to constructing the algorithm utilizing FPGA resources is discussed. A comparative accuracy analysis is performed against the equivalent software implementation. The FPGA resources and operational power consumption are also highlighted in the considerations required to include effective contrast enhancement on a FPGA video chain.
{"title":"Evaluation of contrast limited adaptive histogram equalization (CLAHE) enhancement on a FPGA","authors":"P. Ferguson, T. Arslan, A. Erdogan, Andrew Parmley","doi":"10.1109/SOCC.2008.4641492","DOIUrl":"https://doi.org/10.1109/SOCC.2008.4641492","url":null,"abstract":"This paper presents the CLAHE method of contrast enhancement targeted to a FPGA based embedded platform. A novel approach to constructing the algorithm utilizing FPGA resources is discussed. A comparative accuracy analysis is performed against the equivalent software implementation. The FPGA resources and operational power consumption are also highlighted in the considerations required to include effective contrast enhancement on a FPGA video chain.","PeriodicalId":368115,"journal":{"name":"2008 IEEE International SOC Conference","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124806468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-10DOI: 10.1109/SOCC.2008.4641476
Wei Han, Y. Yi, M. Muir, I. Nousias, T. Arslan, A. Erdogan
As multiprocessor system-on-chip (MPSoC) approaches become popular in embedded system designs, simulation tools for modelling these systems are highly in demand for evaluating the performance and cost at both hardware design stage and software development phase. This paper presents a fast, flexible, and cycle-accurate simulation tool for MPSoCs targeting emerging dynamically reconfigurable processors. Based on a complex embedded application - WiMAX, a range of test benches have been implemented on the proposed simulation tool for evaluating the impact on simulation speed of a variety of architectural parameters and task mapping strategies. Experimental results demonstrate that up to 60K cycles per second can be achieved.
{"title":"MRPSIM: A TLM based simulation tool for MPSOCS targeting dynamically reconfigurable processors","authors":"Wei Han, Y. Yi, M. Muir, I. Nousias, T. Arslan, A. Erdogan","doi":"10.1109/SOCC.2008.4641476","DOIUrl":"https://doi.org/10.1109/SOCC.2008.4641476","url":null,"abstract":"As multiprocessor system-on-chip (MPSoC) approaches become popular in embedded system designs, simulation tools for modelling these systems are highly in demand for evaluating the performance and cost at both hardware design stage and software development phase. This paper presents a fast, flexible, and cycle-accurate simulation tool for MPSoCs targeting emerging dynamically reconfigurable processors. Based on a complex embedded application - WiMAX, a range of test benches have been implemented on the proposed simulation tool for evaluating the impact on simulation speed of a variety of architectural parameters and task mapping strategies. Experimental results demonstrate that up to 60K cycles per second can be achieved.","PeriodicalId":368115,"journal":{"name":"2008 IEEE International SOC Conference","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122778074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-10DOI: 10.1109/SOCC.2008.4641509
Jian Wang, Gang Hua
The emergence of high-definition video into mainstream applications has imposed a challenge to researchers and engineers in the field: achieving optimal balance between programmability and hardware efficiency. The DM6467 System-on-Chip (SOC) from Texas Instruments provides both DSP-like design flexibility and ASIC-like performance at the same time for codec designs. In this paper, we investigate how to efficiently implement H.264 HD encoder and decoder based on a C64+ DSP core accompanied by the on-chip coprocessor. The overall system provides full programmability on critical coding toolsets that impact codec quality, such as motion estimation and rate control, while at the same time, utilize base functions that are fully hardware accelerated. We provide performance analysis based on such a system, in terms of system loading for HD encoders.
{"title":"Implementing high definition video codec on TI DM6467 SOC","authors":"Jian Wang, Gang Hua","doi":"10.1109/SOCC.2008.4641509","DOIUrl":"https://doi.org/10.1109/SOCC.2008.4641509","url":null,"abstract":"The emergence of high-definition video into mainstream applications has imposed a challenge to researchers and engineers in the field: achieving optimal balance between programmability and hardware efficiency. The DM6467 System-on-Chip (SOC) from Texas Instruments provides both DSP-like design flexibility and ASIC-like performance at the same time for codec designs. In this paper, we investigate how to efficiently implement H.264 HD encoder and decoder based on a C64+ DSP core accompanied by the on-chip coprocessor. The overall system provides full programmability on critical coding toolsets that impact codec quality, such as motion estimation and rate control, while at the same time, utilize base functions that are fully hardware accelerated. We provide performance analysis based on such a system, in terms of system loading for HD encoders.","PeriodicalId":368115,"journal":{"name":"2008 IEEE International SOC Conference","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114240979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-10DOI: 10.1109/SOCC.2008.4641469
S. Jayaprakash, N. Mahapatra
Narrow-width and multiplexed buses are suitable for underutilized interconnects in microprocessors to reduce area/cost with minimal performance overheads. However, due to the interleaving of uncorrelated traffic, they have higher switching activity and energy dissipation compared to demultiplexed buses. We demonstrate the effectiveness of energy-optimal bit signaling and ordering for multi-plexed buses in significantly reducing this energy overhead across SPEC CPU2k benchmarks.
{"title":"Energy-optimal signaling and ordering of bits for area-constrained interconnects","authors":"S. Jayaprakash, N. Mahapatra","doi":"10.1109/SOCC.2008.4641469","DOIUrl":"https://doi.org/10.1109/SOCC.2008.4641469","url":null,"abstract":"Narrow-width and multiplexed buses are suitable for underutilized interconnects in microprocessors to reduce area/cost with minimal performance overheads. However, due to the interleaving of uncorrelated traffic, they have higher switching activity and energy dissipation compared to demultiplexed buses. We demonstrate the effectiveness of energy-optimal bit signaling and ordering for multi-plexed buses in significantly reducing this energy overhead across SPEC CPU2k benchmarks.","PeriodicalId":368115,"journal":{"name":"2008 IEEE International SOC Conference","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127893203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-10DOI: 10.1109/SOCC.2008.4641490
Shaomin Hsu, Yuyu Chang, J. Choma
With the development toward an integrated single-chip multi-standard radio receiver, a DCR (direct conversion receiver) is usually preferable due to its low power consumption and low manufactory cost. Backward compatibility to narrow band standards such as GSM/EDGE is also important to avoid possible service interruption. However, the excess flicker noise in modern CMOS technology degrades the DCR sensitivity in narrow band radios. Flicker noise which has most of its power centered in the low frequency band is first introduced to the receiver by the mixer and the degraded noise figure can not be restored by any subsequent amplification or processing. Thus, it is critical to reduce the mixer flicker noise for narrow band applications. In this paper, the flicker noise coupling mechanism in an active CMOS mixer is analyzed and circuit techniques are proposed to significantly lower the mixerpsilas flicker noise.
{"title":"Design of low flicker noise active CMOS mixer","authors":"Shaomin Hsu, Yuyu Chang, J. Choma","doi":"10.1109/SOCC.2008.4641490","DOIUrl":"https://doi.org/10.1109/SOCC.2008.4641490","url":null,"abstract":"With the development toward an integrated single-chip multi-standard radio receiver, a DCR (direct conversion receiver) is usually preferable due to its low power consumption and low manufactory cost. Backward compatibility to narrow band standards such as GSM/EDGE is also important to avoid possible service interruption. However, the excess flicker noise in modern CMOS technology degrades the DCR sensitivity in narrow band radios. Flicker noise which has most of its power centered in the low frequency band is first introduced to the receiver by the mixer and the degraded noise figure can not be restored by any subsequent amplification or processing. Thus, it is critical to reduce the mixer flicker noise for narrow band applications. In this paper, the flicker noise coupling mechanism in an active CMOS mixer is analyzed and circuit techniques are proposed to significantly lower the mixerpsilas flicker noise.","PeriodicalId":368115,"journal":{"name":"2008 IEEE International SOC Conference","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130451511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-10DOI: 10.1109/SOCC.2008.4641507
Jun Zhao, Yong-Bin Kim
In this paper, a low power and low jitter 12-bit CMOS digitally controlled oscillator (DCO) design is presented. The CMOS DCO design is based on a ring oscillator implemented with Schmitt trigger based inverters. Simulations of the proposed DCO using 32 nm predictive transistor model (PTM) achieve controllable frequency range of around 570 MHz~850 MHz with a wide range of linearity. Monte Carlo simulation demonstrates that the time-period jitter due to random power supply fluctuation is under 75 ps and the power consumption is 2.3 mW at 800 MHz and 0.9 power supply.
{"title":"A low power 32 nanometer CMOS digitally controlled oscillator","authors":"Jun Zhao, Yong-Bin Kim","doi":"10.1109/SOCC.2008.4641507","DOIUrl":"https://doi.org/10.1109/SOCC.2008.4641507","url":null,"abstract":"In this paper, a low power and low jitter 12-bit CMOS digitally controlled oscillator (DCO) design is presented. The CMOS DCO design is based on a ring oscillator implemented with Schmitt trigger based inverters. Simulations of the proposed DCO using 32 nm predictive transistor model (PTM) achieve controllable frequency range of around 570 MHz~850 MHz with a wide range of linearity. Monte Carlo simulation demonstrates that the time-period jitter due to random power supply fluctuation is under 75 ps and the power consumption is 2.3 mW at 800 MHz and 0.9 power supply.","PeriodicalId":368115,"journal":{"name":"2008 IEEE International SOC Conference","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134137986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-10DOI: 10.1109/SOCC.2008.4641473
Yibo Chen, J. Ouyang, Yuan Xie
The impact of process variations on circuit timing increases rapidly as technology scales. Consequently, it is important to consider timing variations at the early stages of circuit designs. Conventional high level synthesis relies on the worst-case delay analysis to guide the design space exploration, however, such worst-case timing analysis can results in overly conservative designs with pessimistic performance estimation. This paper presents a 0-1 integer linear programming (ILP) formulation that aims at reducing the impact of timing variations in high-level synthesis, by integrating overall timing yield constraints into scheduling and resource binding. The proposed approach focuses on how to achieve the maximum performance (minimum latency) under given timing yield constraints with affordable computation time. Experiment results show that significant latency reduction is achieved under different timing yield constraints, compared to traditional worst-case based approach.
{"title":"ILP-based scheme for timing variation-aware scheduling and resource binding","authors":"Yibo Chen, J. Ouyang, Yuan Xie","doi":"10.1109/SOCC.2008.4641473","DOIUrl":"https://doi.org/10.1109/SOCC.2008.4641473","url":null,"abstract":"The impact of process variations on circuit timing increases rapidly as technology scales. Consequently, it is important to consider timing variations at the early stages of circuit designs. Conventional high level synthesis relies on the worst-case delay analysis to guide the design space exploration, however, such worst-case timing analysis can results in overly conservative designs with pessimistic performance estimation. This paper presents a 0-1 integer linear programming (ILP) formulation that aims at reducing the impact of timing variations in high-level synthesis, by integrating overall timing yield constraints into scheduling and resource binding. The proposed approach focuses on how to achieve the maximum performance (minimum latency) under given timing yield constraints with affordable computation time. Experiment results show that significant latency reduction is achieved under different timing yield constraints, compared to traditional worst-case based approach.","PeriodicalId":368115,"journal":{"name":"2008 IEEE International SOC Conference","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130719196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-10DOI: 10.1109/SOCC.2008.4641491
F. Moradi, D. Wisland, S. Aunet, H. Mahmoodi, T. Cao
In this paper a new ultra low power SRAM cell is proposed. In the proposed SRAM topology, additional circuitry has been added to a standard 6T-SRAM cell to improve the static noise margin (SNM) and the performance. Foundry models for a 65 nm standard CMOS process were used for obtaining reliable simulated results. The circuit was simulated for supply voltages from 0.2 V to 0.35 V verifying the robustness of the proposed circuit for different supply voltages. The simulations show a significant improvement in SNM and a 4X improvement in read speed still maintaining a satisfactory write noise margin compared with the 6T-SRAM cell. The proposed circuit has an area overhead between 22%-28% compared with the 6T-SRAM.
本文提出了一种新型的超低功耗SRAM单元。在提出的SRAM拓扑中,在标准6T-SRAM单元中添加了额外的电路,以提高静态噪声裕度(SNM)和性能。采用65nm标准CMOS工艺的铸造厂模型,获得了可靠的仿真结果。在0.2 V ~ 0.35 V的电压范围内对电路进行了仿真,验证了该电路在不同电压下的鲁棒性。仿真结果表明,与6T-SRAM单元相比,SNM有了显著改善,读取速度提高了4倍,仍然保持了令人满意的写入噪声裕度。与6T-SRAM相比,该电路的面积开销在22%-28%之间。
{"title":"65NM sub-threshold 11T-SRAM for ultra low voltage applications","authors":"F. Moradi, D. Wisland, S. Aunet, H. Mahmoodi, T. Cao","doi":"10.1109/SOCC.2008.4641491","DOIUrl":"https://doi.org/10.1109/SOCC.2008.4641491","url":null,"abstract":"In this paper a new ultra low power SRAM cell is proposed. In the proposed SRAM topology, additional circuitry has been added to a standard 6T-SRAM cell to improve the static noise margin (SNM) and the performance. Foundry models for a 65 nm standard CMOS process were used for obtaining reliable simulated results. The circuit was simulated for supply voltages from 0.2 V to 0.35 V verifying the robustness of the proposed circuit for different supply voltages. The simulations show a significant improvement in SNM and a 4X improvement in read speed still maintaining a satisfactory write noise margin compared with the 6T-SRAM cell. The proposed circuit has an area overhead between 22%-28% compared with the 6T-SRAM.","PeriodicalId":368115,"journal":{"name":"2008 IEEE International SOC Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132136047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-10DOI: 10.1109/SOCC.2008.4641512
Chae-Eun Rhee, Jin-Su Jung, Hyuk-Jae Lee
This paper proposes a novel processing time control algorithm for a hardware-based H.264/AVC encoder. In the proposed speed control, a macroblock processing time budget is allocated adaptively according to the processing time of the other blocks. Then, twelve complexity levels are defined to provide various combinations of processing time and compression efficiency. For a given time budget, the algorithm selects the proper complexity level that compresses most efficiently among the levels that meet the time budget. Experimental results show that real-time processing is achieved by the speed control with negligible quality degradation while between 31.2% and 50% macroblocks violates its time budget without speed control.
{"title":"Speed control for a hardware based H.264/AVC encoder","authors":"Chae-Eun Rhee, Jin-Su Jung, Hyuk-Jae Lee","doi":"10.1109/SOCC.2008.4641512","DOIUrl":"https://doi.org/10.1109/SOCC.2008.4641512","url":null,"abstract":"This paper proposes a novel processing time control algorithm for a hardware-based H.264/AVC encoder. In the proposed speed control, a macroblock processing time budget is allocated adaptively according to the processing time of the other blocks. Then, twelve complexity levels are defined to provide various combinations of processing time and compression efficiency. For a given time budget, the algorithm selects the proper complexity level that compresses most efficiently among the levels that meet the time budget. Experimental results show that real-time processing is achieved by the speed control with negligible quality degradation while between 31.2% and 50% macroblocks violates its time budget without speed control.","PeriodicalId":368115,"journal":{"name":"2008 IEEE International SOC Conference","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114324963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-10DOI: 10.1109/SOCC.2008.4641496
Hanni Bagnordi, M. Ito
This paper presents an experimental evaluation on the feasibility of using an adaptive clock to enhance the performance of a Fast Fourier Transform (FFT). The FFT is implemented on an FPGA and results are simulated using commercial EDA tools. Dynamic power consumption and processing speed are compared to a standard FFT implementation using a fixed clock. Results show that using a dynamically variable frequency clock offers a potential speed improvement while maintaining energy efficiency.
{"title":"Performance evaluation of a FFT using adpative clocking","authors":"Hanni Bagnordi, M. Ito","doi":"10.1109/SOCC.2008.4641496","DOIUrl":"https://doi.org/10.1109/SOCC.2008.4641496","url":null,"abstract":"This paper presents an experimental evaluation on the feasibility of using an adaptive clock to enhance the performance of a Fast Fourier Transform (FFT). The FFT is implemented on an FPGA and results are simulated using commercial EDA tools. Dynamic power consumption and processing speed are compared to a standard FFT implementation using a fixed clock. Results show that using a dynamically variable frequency clock offers a potential speed improvement while maintaining energy efficiency.","PeriodicalId":368115,"journal":{"name":"2008 IEEE International SOC Conference","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114523002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}