Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398001
Norman Nolte, S. Moch, Markus Kock, P. Pirsch
Decoding of high bitrate video bitstreams is an application field traditionally claimed by dedicated hardware architectures, since embedded general purpose processors are not able to satisfy the high performance requirements of entropy decoding. We present a fully programmable multi-standard bitstream processor. The proposed bit granular memory and data path architecture provides efficient processing and storage capabilities for data words of arbitrary length. Running at a 300 MHz clock frequency, the processor is able to decode, e.g., MPEG-2 and VC-1 1080p HDTV bitstreams with a maximum bitrate of 100 Mbit/s.
{"title":"Memory efficient programmable processor for bitstream processing and entropy decoding of multiple-standard high-bitrate HDTV video bitstreams","authors":"Norman Nolte, S. Moch, Markus Kock, P. Pirsch","doi":"10.1109/SOCCON.2009.5398001","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398001","url":null,"abstract":"Decoding of high bitrate video bitstreams is an application field traditionally claimed by dedicated hardware architectures, since embedded general purpose processors are not able to satisfy the high performance requirements of entropy decoding. We present a fully programmable multi-standard bitstream processor. The proposed bit granular memory and data path architecture provides efficient processing and storage capabilities for data words of arbitrary length. Running at a 300 MHz clock frequency, the processor is able to decode, e.g., MPEG-2 and VC-1 1080p HDTV bitstreams with a maximum bitrate of 100 Mbit/s.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123859489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398072
Zhonghai Lu, Dimitris Brachos, A. Jantsch
We have proposed (σ, ρ)-based flow regulation as a design instrument for System-on-Chip (SoC) architects to control quality-of-service and achieve cost-effective communication, where σ bounds the traffic burstiness and ρ the traffic rate. In this paper, we present a hardware implementation of the regulator. We discuss its microarchitecture. Based on this microarchitecture, we design, implement and synthesize a multi-flow regulator for AXI. Our experiments show the effectiveness of such a regulation device on the control of delay, jitter and buffer requirements.
{"title":"A flow regulator for On-Chip Communication","authors":"Zhonghai Lu, Dimitris Brachos, A. Jantsch","doi":"10.1109/SOCCON.2009.5398072","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398072","url":null,"abstract":"We have proposed (σ, ρ)-based flow regulation as a design instrument for System-on-Chip (SoC) architects to control quality-of-service and achieve cost-effective communication, where σ bounds the traffic burstiness and ρ the traffic rate. In this paper, we present a hardware implementation of the regulator. We discuss its microarchitecture. Based on this microarchitecture, we design, implement and synthesize a multi-flow regulator for AXI. Our experiments show the effectiveness of such a regulation device on the control of delay, jitter and buffer requirements.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121921163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398068
Cristian E. Onete
In this paper, a method of testing a flash A/D converter is presented. The flash A/D is first reconfigured as a Propagation Type A/D and it is tested afterwards. It is shown that the testing method is suitable for a fully automated use i.e. without the need of external devices.
本文介绍了一种flash a /D转换器的测试方法。首先将闪存A/D重新配置为传播类型A/D,然后对其进行测试。结果表明,该测试方法适用于完全自动化的使用,即不需要外部设备。
{"title":"Comparator testing in a flash A/D converter","authors":"Cristian E. Onete","doi":"10.1109/SOCCON.2009.5398068","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398068","url":null,"abstract":"In this paper, a method of testing a flash A/D converter is presented. The flash A/D is first reconfigured as a Propagation Type A/D and it is tested afterwards. It is shown that the testing method is suitable for a fully automated use i.e. without the need of external devices.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121525616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398106
J. Lancaster, J. Buhler, R. Chamberlain
Embedded computing platforms have long incorporated non-traditional architectures (e.g., FPGAs, ASICs) to combat the diminishing returns of Moore's Law as applied to traditional processors. These specialized architectures can offer higher performance potential in a smaller space, higher power efficiency, and competitive costs. A price is paid, however, in development difficulty in determining functional correctness and understanding the performance of such a system. In this paper we focus on improving the task of performance debugging streaming applications deployed on FPGAs. We describe our runtime performance monitoring infrastructure, its capabilities and overheads on several different configurations of the monitor. We then employ the monitoring system to study the performance effects of provisioning resources for Mercury BLASTN, an implementation of the BLASTN sequence comparison application on an FPGA-accelerated system.
{"title":"Efficient runtime performance monitoring of FPGA-based applications","authors":"J. Lancaster, J. Buhler, R. Chamberlain","doi":"10.1109/SOCCON.2009.5398106","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398106","url":null,"abstract":"Embedded computing platforms have long incorporated non-traditional architectures (e.g., FPGAs, ASICs) to combat the diminishing returns of Moore's Law as applied to traditional processors. These specialized architectures can offer higher performance potential in a smaller space, higher power efficiency, and competitive costs. A price is paid, however, in development difficulty in determining functional correctness and understanding the performance of such a system. In this paper we focus on improving the task of performance debugging streaming applications deployed on FPGAs. We describe our runtime performance monitoring infrastructure, its capabilities and overheads on several different configurations of the monitor. We then employ the monitoring system to study the performance effects of provisioning resources for Mercury BLASTN, an implementation of the BLASTN sequence comparison application on an FPGA-accelerated system.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"2 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132468380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398055
Robert J. Ascott, E. Swartzlander
JavaFlow is a systolic array of heterogeneous processing elements with two interconnection schemas configured as a two phase dataflow machine coupled with a General Purpose Processor implementing a Java Virtual Machine (JVM). The processor is described and shown to offer an attractive solution to address challenges of locality, design complexity, power, and reliability in a Java application processor.
{"title":"JavaFlow — A Java dataflow machine","authors":"Robert J. Ascott, E. Swartzlander","doi":"10.1109/SOCCON.2009.5398055","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398055","url":null,"abstract":"JavaFlow is a systolic array of heterogeneous processing elements with two interconnection schemas configured as a two phase dataflow machine coupled with a General Purpose Processor implementing a Java Virtual Machine (JVM). The processor is described and shown to offer an attractive solution to address challenges of locality, design complexity, power, and reliability in a Java application processor.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134035498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398026
Wei Wang, Weiqian Liang
This paper presents an efficient and reconfigurable co-processor to calculate Mahalanobis distance, which is the most computation-intensive part in the GMM (Gaussian Mixture Models)-based classifier. The Mahalanobis distance's calculation is divided into three parts (vector-vector subtraction, matrix-vector multiplication, and vector-vector multiplication) and these three parts can operate in a parallel way. The proposed architecture was implemented in Xilinx FPGA XC5VLX110T. Tested with a 358-state 3-mixture 39-feature 800-word HMM, co-processor operates at 35MHz to meet real-time requirement of speech recognition.
{"title":"A reconfigurable co-processor for GMM-based classifier","authors":"Wei Wang, Weiqian Liang","doi":"10.1109/SOCCON.2009.5398026","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398026","url":null,"abstract":"This paper presents an efficient and reconfigurable co-processor to calculate Mahalanobis distance, which is the most computation-intensive part in the GMM (Gaussian Mixture Models)-based classifier. The Mahalanobis distance's calculation is divided into three parts (vector-vector subtraction, matrix-vector multiplication, and vector-vector multiplication) and these three parts can operate in a parallel way. The proposed architecture was implemented in Xilinx FPGA XC5VLX110T. Tested with a 358-state 3-mixture 39-feature 800-word HMM, co-processor operates at 35MHz to meet real-time requirement of speech recognition.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127669601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398102
P. Lu, Danfeng Chen, Fan Ye, Junyan Ren
A 5.4GHz multiple-pass ring voltage controlled oscillator (VCO) based phase-locked loop (PLL) is described. For the sake of active devices' sensitivity to process and temperature regarding ring oscillators, an effective automatic frequency calibration scheme is proposed. A new process-independent differential to single (DTOS) is used to adjust control voltage range and loop gain. The chip is implemented in 0.18-um CMOS process and achieves phase noise of −100dBc/Hz@1MHz and a 40% tuning range.
介绍了一种基于5.4GHz多通环压控振荡器(VCO)的锁相环。针对环振有源器件对工艺和温度的敏感性,提出了一种有效的频率自动校准方案。采用一种新的过程无关的差分到单(DTOS)来调节控制电压范围和环路增益。该芯片采用0.18 um CMOS工艺实现,相位噪声为−100dBc/Hz@1MHz,调谐范围为40%。
{"title":"A 5.4GHz wide tuning range CMOS PLL using an auto-calibration multiple-pass ring oscillator","authors":"P. Lu, Danfeng Chen, Fan Ye, Junyan Ren","doi":"10.1109/SOCCON.2009.5398102","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398102","url":null,"abstract":"A 5.4GHz multiple-pass ring voltage controlled oscillator (VCO) based phase-locked loop (PLL) is described. For the sake of active devices' sensitivity to process and temperature regarding ring oscillators, an effective automatic frequency calibration scheme is proposed. A new process-independent differential to single (DTOS) is used to adjust control voltage range and loop gain. The chip is implemented in 0.18-um CMOS process and achieves phase noise of −100dBc/Hz@1MHz and a 40% tuning range.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121178446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5397998
C. Haubelt
The continuous increase in size, complexity, and heterogeneity of embedded system design has introduced new challenges in their modeling and implementation. Multi-Processor Systems-on-Chip (MPSoC) design requires high speed models for early verification and performance evaluation. As a result, electronic system level (ESL) modeling has moved up in abstraction from cycle accurate RTL to timed and untimed transaction-level models (TLMs). However, the open question is how to get from a high level system description to a hardware/software implementation? The goal of this tutorial is to answer such questions and to provide system designers and managers with new insight into ESL modeling concepts and synthesis techniques for MPSoCs. In this tutorial, we will cover the key concepts and state of the art tools for MPSoC design. We will discuss TLM semantics for automatic model generation, methods for automatic design space exploration, and hardware/software synthesis. This tutorial is targeted towards embedded software and hardware developers, engineers who use or are interested in using ESL design tools, managers of system designers, and verification engineers.
{"title":"Designing multi-processor Systems-on-Chip","authors":"C. Haubelt","doi":"10.1109/SOCCON.2009.5397998","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5397998","url":null,"abstract":"The continuous increase in size, complexity, and heterogeneity of embedded system design has introduced new challenges in their modeling and implementation. Multi-Processor Systems-on-Chip (MPSoC) design requires high speed models for early verification and performance evaluation. As a result, electronic system level (ESL) modeling has moved up in abstraction from cycle accurate RTL to timed and untimed transaction-level models (TLMs). However, the open question is how to get from a high level system description to a hardware/software implementation? The goal of this tutorial is to answer such questions and to provide system designers and managers with new insight into ESL modeling concepts and synthesis techniques for MPSoCs. In this tutorial, we will cover the key concepts and state of the art tools for MPSoC design. We will discuss TLM semantics for automatic model generation, methods for automatic design space exploration, and hardware/software synthesis. This tutorial is targeted towards embedded software and hardware developers, engineers who use or are interested in using ESL design tools, managers of system designers, and verification engineers.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129128126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398019
Lei Wang, M. Olbrich, E. Barke, Thomas Büchner, Markus Bühler
In this paper, we discuss probabilistic simulation techniques used to estimate dynamic power and especially glitch power. Major attention is paid to the problem of modeling inertial delay for using these techniques to estimate switching density at gate level. The inertial delay has a great impact on the glitch power due to filtering effects and is almost impossible to be modeled completely. We propose an approximation algorithm to achieve better accuracy compared to existing approaches. Examples show up to 60% improvement using our solution.
{"title":"Fast dynamic power estimation considering glitch filtering","authors":"Lei Wang, M. Olbrich, E. Barke, Thomas Büchner, Markus Bühler","doi":"10.1109/SOCCON.2009.5398019","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398019","url":null,"abstract":"In this paper, we discuss probabilistic simulation techniques used to estimate dynamic power and especially glitch power. Major attention is paid to the problem of modeling inertial delay for using these techniques to estimate switching density at gate level. The inertial delay has a great impact on the glitch power due to filtering effects and is almost impossible to be modeled completely. We propose an approximation algorithm to achieve better accuracy compared to existing approaches. Examples show up to 60% improvement using our solution.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116787549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398087
D. Puschini, F. Clermidy, P. Benoit, G. Sassatelli, L. Torres
In this paper we propose an adaptive technique to reduce power consumption of Multiprocessor Systems-on-Chip. The method, based on Game Theory, optimizes the frequencies of local processors while fulfilling applicative real-time constraints. Contrary to other approaches, our solution is compatible with reconfigurable Systems-on-Chip. The obtained power consumption gains on a telecommunication test-case are between 10% and 25%, while the reaction time to temporal variations due to application reconfiguration is less than 25μs.
{"title":"Adaptive energy-aware latency-constrained DVFS policy for MPSoC","authors":"D. Puschini, F. Clermidy, P. Benoit, G. Sassatelli, L. Torres","doi":"10.1109/SOCCON.2009.5398087","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398087","url":null,"abstract":"In this paper we propose an adaptive technique to reduce power consumption of Multiprocessor Systems-on-Chip. The method, based on Game Theory, optimizes the frequencies of local processors while fulfilling applicative real-time constraints. Contrary to other approaches, our solution is compatible with reconfigurable Systems-on-Chip. The obtained power consumption gains on a telecommunication test-case are between 10% and 25%, while the reaction time to temporal variations due to application reconfiguration is less than 25μs.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125336114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}