Pub Date : 2007-04-16DOI: 10.1109/DATE.2007.364614
J. Dielissen, A. Hekstra
Because of its excellent bit-error-rate performance, the low-density parity-check (LDPC) decoding algorithm is gaining increased attention in communication standards and literature. Also the new Chinese digital video broadcast standard (CDVB-T) uses LDPC codes. This standard uses a large prime number as the parallelism factor, leading to high area cost. This paper presents a new method to allow fractional dividers to be used. The method depends on the property that consecutive sub-circulants have one memory row in common. Several techniques are shown for assuring this property, or solving memory conflicts, making the method more generally applicable. In fact, the proposed technique is a first step towards a general purpose LDPC processor. For the CDVB-T decoder implementation the method leads to a factor 3 improvement in area
{"title":"Non-fractional parallelism in LDPC Decoder implementations","authors":"J. Dielissen, A. Hekstra","doi":"10.1109/DATE.2007.364614","DOIUrl":"https://doi.org/10.1109/DATE.2007.364614","url":null,"abstract":"Because of its excellent bit-error-rate performance, the low-density parity-check (LDPC) decoding algorithm is gaining increased attention in communication standards and literature. Also the new Chinese digital video broadcast standard (CDVB-T) uses LDPC codes. This standard uses a large prime number as the parallelism factor, leading to high area cost. This paper presents a new method to allow fractional dividers to be used. The method depends on the property that consecutive sub-circulants have one memory row in common. Several techniques are shown for assuring this property, or solving memory conflicts, making the method more generally applicable. In fact, the proposed technique is a first step towards a general purpose LDPC processor. For the CDVB-T decoder implementation the method leads to a factor 3 improvement in area","PeriodicalId":298961,"journal":{"name":"2007 Design, Automation & Test in Europe Conference & Exhibition","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127783612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-04-16DOI: 10.1109/DATE.2007.364395
Maurizio Paganini, Georg Kimmich, Stephane Ducrey, Guilhem Caubit, Vincent Coeffe
The intrinsic capability brought by each new technology node opens the way to a broad range of system integration options and continuously enables new applications to be integrated in a single device to the point that almost everything seems possible. In reality the difference between a successful design and a failure resides today more then ever in the ability of the design team to properly master all the critical design factors at once. In essence, today's system on chip design represent a multidiscipline challenge that spans from architecture through design to test and finally mass production. SoC design for portable applications has to cope with very unique constraints that normally greatly challenge the ability of an organization and most of the times of an entire company to fully master its industrialization capabilities and pushes concurrent design to new limits. In the end, only a well thought out Architecture followed by best practices design techniques with a high level of understanding of the manufacturing constraints and excellent logistics can result in a device that can be produced in the volume required by the cell phone industry today. This paper will try to capture how these challenges have been addressed to design the family of Application Processing Engines named Nomadiktrade. The paper will specifically focus on the third generation device labeled STn8815S22 where the integration capabilities of silicon technology have been pared with those of System in Package design to provide and extremely compact and effective System on Chip for portable multimedia applications. An overview of the main success factors and challenges will be presented driving the reader from the Architecture conception through the chip industrialization. Both silicon design and packaging design will be illustrated, highlighting those techniques that made this incredible product a reality
{"title":"Portable Multimedia SoC Design: a Global Challenge","authors":"Maurizio Paganini, Georg Kimmich, Stephane Ducrey, Guilhem Caubit, Vincent Coeffe","doi":"10.1109/DATE.2007.364395","DOIUrl":"https://doi.org/10.1109/DATE.2007.364395","url":null,"abstract":"The intrinsic capability brought by each new technology node opens the way to a broad range of system integration options and continuously enables new applications to be integrated in a single device to the point that almost everything seems possible. In reality the difference between a successful design and a failure resides today more then ever in the ability of the design team to properly master all the critical design factors at once. In essence, today's system on chip design represent a multidiscipline challenge that spans from architecture through design to test and finally mass production. SoC design for portable applications has to cope with very unique constraints that normally greatly challenge the ability of an organization and most of the times of an entire company to fully master its industrialization capabilities and pushes concurrent design to new limits. In the end, only a well thought out Architecture followed by best practices design techniques with a high level of understanding of the manufacturing constraints and excellent logistics can result in a device that can be produced in the volume required by the cell phone industry today. This paper will try to capture how these challenges have been addressed to design the family of Application Processing Engines named Nomadiktrade. The paper will specifically focus on the third generation device labeled STn8815S22 where the integration capabilities of silicon technology have been pared with those of System in Package design to provide and extremely compact and effective System on Chip for portable multimedia applications. An overview of the main success factors and challenges will be presented driving the reader from the Architecture conception through the chip industrialization. Both silicon design and packaging design will be illustrated, highlighting those techniques that made this incredible product a reality","PeriodicalId":298961,"journal":{"name":"2007 Design, Automation & Test in Europe Conference & Exhibition","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126303825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-04-16DOI: 10.1109/DATE.2007.364581
Roman L. Lysecky
Researchers previously proposed warp processors, a novel architecture capable of transparently optimizing an executing application by dynamically re-implementing critical kernels within the software as custom hardware circuits in an on-chip FPGA. However, the original warp processor design was primarily performance-driven and did not focus on power consumption, which is becoming an increasingly important design constraint. Focusing on power consumption, we present an alternative low-power warp processor design and methodology that can dynamically and transparently reduce power consumption of an executing application with no degradation in system performance, achieving an average reduction in power consumption of 74%. We further demonstrate the flexibility of this approach to provide dynamic control between high-performance and low-power consumption
{"title":"Low-Power Warp Processor for Power Efficient High-Performance Embedded Systems","authors":"Roman L. Lysecky","doi":"10.1109/DATE.2007.364581","DOIUrl":"https://doi.org/10.1109/DATE.2007.364581","url":null,"abstract":"Researchers previously proposed warp processors, a novel architecture capable of transparently optimizing an executing application by dynamically re-implementing critical kernels within the software as custom hardware circuits in an on-chip FPGA. However, the original warp processor design was primarily performance-driven and did not focus on power consumption, which is becoming an increasingly important design constraint. Focusing on power consumption, we present an alternative low-power warp processor design and methodology that can dynamically and transparently reduce power consumption of an executing application with no degradation in system performance, achieving an average reduction in power consumption of 74%. We further demonstrate the flexibility of this approach to provide dynamic control between high-performance and low-power consumption","PeriodicalId":298961,"journal":{"name":"2007 Design, Automation & Test in Europe Conference & Exhibition","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128201350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-04-16DOI: 10.1109/DATE.2007.364664
N. Banerjee, G. Karakonstantis, K. Roy
2D discrete cosine transform (DCT) is widely used as the core of digital image and video compression. In this paper, the authors present a novel DCT architecture that allows aggressive voltage scaling by exploiting the fact that not all intermediate computations are equally important in a DCT system to obtain "good" image quality with peak signal to noise ratio (PSNR) > 30 dB. This observation has led us to propose a DCT architecture where the signal paths that are less contributive to PSNR improvement are designed to be longer than the paths that are more contributive to PSNR improvement It should also be noted that robustness with respect to parameter variations and low power operation typically impose contradictory requirements in terms of architecture design. However, the proposed architecture lends itself to aggressive voltage scaling for low-power dissipation even under process parameter variations. Under a scaled supply voltage and/or variations in process parameters, any possible delay errors would only appear from the long paths that are less contributive towards PSNR improvement, providing large improvement in power dissipation with small PSNR degradation. Results show that even under large process variation and supply voltage scaling (0.8V), there is a gradual degradation of image quality with considerable power savings (62.8%) for the proposed architecture when compared to existing implementations in 70 nm process technology
{"title":"Process Variation Tolerant Low Power DCT Architecture","authors":"N. Banerjee, G. Karakonstantis, K. Roy","doi":"10.1109/DATE.2007.364664","DOIUrl":"https://doi.org/10.1109/DATE.2007.364664","url":null,"abstract":"2D discrete cosine transform (DCT) is widely used as the core of digital image and video compression. In this paper, the authors present a novel DCT architecture that allows aggressive voltage scaling by exploiting the fact that not all intermediate computations are equally important in a DCT system to obtain \"good\" image quality with peak signal to noise ratio (PSNR) > 30 dB. This observation has led us to propose a DCT architecture where the signal paths that are less contributive to PSNR improvement are designed to be longer than the paths that are more contributive to PSNR improvement It should also be noted that robustness with respect to parameter variations and low power operation typically impose contradictory requirements in terms of architecture design. However, the proposed architecture lends itself to aggressive voltage scaling for low-power dissipation even under process parameter variations. Under a scaled supply voltage and/or variations in process parameters, any possible delay errors would only appear from the long paths that are less contributive towards PSNR improvement, providing large improvement in power dissipation with small PSNR degradation. Results show that even under large process variation and supply voltage scaling (0.8V), there is a gradual degradation of image quality with considerable power savings (62.8%) for the proposed architecture when compared to existing implementations in 70 nm process technology","PeriodicalId":298961,"journal":{"name":"2007 Design, Automation & Test in Europe Conference & Exhibition","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132070128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-04-16DOI: 10.1109/DATE.2007.364598
Min Zhang, M. Olbrich, D. Seider, M. Frerichs, H. Kinzelbach, E. Barke
As technology rapidly scales, performance variations (delay, power etc.) arising from process variation are becoming a significant problem. The use of linear models has been proven to be very critical in many today's applications. Even for well-behaved performance functions, linearising approaches as well as quadratic model provide serious errors in calculating expected value, variance and higher central moments. This paper presents a novel approach to analyse the impacts of process variations with low efforts and minimum assumption. Circuit performance was formulated as a function of the random parameters and approximated it by Taylor expansion up to 4th order. Taking advantage of the knowledge about higher moments, the Taylor series was converted to characteristics of performance distribution. The experiments show that this approach provides extremely exact results even in strongly non-linear problems with large process variations. Its simplicity, efficiency and accuracy make this approach a promising alternative to the Monte Carlo method in most practical applications
{"title":"CMCal: An Accurate Analytical Approach for the Analysis of Process Variations with Non-Gaussian Parameters and Nonlinear Functions","authors":"Min Zhang, M. Olbrich, D. Seider, M. Frerichs, H. Kinzelbach, E. Barke","doi":"10.1109/DATE.2007.364598","DOIUrl":"https://doi.org/10.1109/DATE.2007.364598","url":null,"abstract":"As technology rapidly scales, performance variations (delay, power etc.) arising from process variation are becoming a significant problem. The use of linear models has been proven to be very critical in many today's applications. Even for well-behaved performance functions, linearising approaches as well as quadratic model provide serious errors in calculating expected value, variance and higher central moments. This paper presents a novel approach to analyse the impacts of process variations with low efforts and minimum assumption. Circuit performance was formulated as a function of the random parameters and approximated it by Taylor expansion up to 4th order. Taking advantage of the knowledge about higher moments, the Taylor series was converted to characteristics of performance distribution. The experiments show that this approach provides extremely exact results even in strongly non-linear problems with large process variations. Its simplicity, efficiency and accuracy make this approach a promising alternative to the Monte Carlo method in most practical applications","PeriodicalId":298961,"journal":{"name":"2007 Design, Automation & Test in Europe Conference & Exhibition","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132456583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An important aspect of design for yield for embedded SRAM is identifying the expected worst case behavior in order to guarantee that sufficient design margin is present. Previously, this has involved multiple simulation corners and extreme test conditions. It is shown that statistical concerns and device variability now require a different approach, based on work in extreme value theory. This method is used to develop a lower-bound for variability-related yield in memories
{"title":"Worst-Case Design and Margin for Embedded SRAM","authors":"R. Aitken, Sachin Idgunji","doi":"10.1145/1266366.1266648","DOIUrl":"https://doi.org/10.1145/1266366.1266648","url":null,"abstract":"An important aspect of design for yield for embedded SRAM is identifying the expected worst case behavior in order to guarantee that sufficient design margin is present. Previously, this has involved multiple simulation corners and extreme test conditions. It is shown that statistical concerns and device variability now require a different approach, based on work in extreme value theory. This method is used to develop a lower-bound for variability-related yield in memories","PeriodicalId":298961,"journal":{"name":"2007 Design, Automation & Test in Europe Conference & Exhibition","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133889971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-04-16DOI: 10.1109/DATE.2007.364603
S. Schliecker, S. Stein, R. Ernst
In this paper we integrate two established approaches to formal multiprocessor performance analysis, namely synchronous dataflow graphs and compositional performance analysis. Both make different trade-offs between precision and applicability. We show how the strengths of both can be combined to achieve a very precise and adaptive model. We couple these models of completely different paradigms by relying on load descriptions of event streams. The results show a superior performance analysis quality
{"title":"Performance Analysis of Complex Systems by Integration of Dataflow Graphs and Compositional Performance Analysis","authors":"S. Schliecker, S. Stein, R. Ernst","doi":"10.1109/DATE.2007.364603","DOIUrl":"https://doi.org/10.1109/DATE.2007.364603","url":null,"abstract":"In this paper we integrate two established approaches to formal multiprocessor performance analysis, namely synchronous dataflow graphs and compositional performance analysis. Both make different trade-offs between precision and applicability. We show how the strengths of both can be combined to achieve a very precise and adaptive model. We couple these models of completely different paradigms by relying on load descriptions of event streams. The results show a superior performance analysis quality","PeriodicalId":298961,"journal":{"name":"2007 Design, Automation & Test in Europe Conference & Exhibition","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131048693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The design of ultra wideband (UWB) mixed-signal SoC for localization applications in wireless personal area networks is currently investigated by several researchers. The complexity of the design claims for effective top-down methodologies. We propose a layered approach based on VHDL-AMS for the first design stages and on an intelligent use of a circuit-level simulator for the transistor-level phase. We apply the latter just to one block at a time and wrap it within the system-level VHDL-AMS description. This method allows to capture the impact of circuit-level design choices and non-idealities on system performance. To demonstrate the effectiveness of the methodology we show how the refinement of the design affects specific UWB system parameters such as bit-error rate and localization estimations
{"title":"An effective AMS Top-Down Methodology Applied to the Design of a Mixed-Signal UWB System-on-Chip","authors":"M. Crepaldi, M. Casu, M. Graziano, M. Zamboni","doi":"10.1145/1266366.1266677","DOIUrl":"https://doi.org/10.1145/1266366.1266677","url":null,"abstract":"The design of ultra wideband (UWB) mixed-signal SoC for localization applications in wireless personal area networks is currently investigated by several researchers. The complexity of the design claims for effective top-down methodologies. We propose a layered approach based on VHDL-AMS for the first design stages and on an intelligent use of a circuit-level simulator for the transistor-level phase. We apply the latter just to one block at a time and wrap it within the system-level VHDL-AMS description. This method allows to capture the impact of circuit-level design choices and non-idealities on system performance. To demonstrate the effectiveness of the methodology we show how the refinement of the design affects specific UWB system parameters such as bit-error rate and localization estimations","PeriodicalId":298961,"journal":{"name":"2007 Design, Automation & Test in Europe Conference & Exhibition","volume":"15 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131071362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-04-16DOI: 10.1109/DATE.2007.364605
M. Geilen, T. Basten
This paper presents the Pareto calculator, a tool for compositional computation of Pareto points, based on the algebra of Pareto points. The tool is a useful instrument for multidimensional optimisation problems, design-space exploration and development of quality management and control strategies. Implementations and their complexity of the operations of the algebra are discussed. In particular, a generalisation of the well-known divide-and-conquer algorithm was discussed to compute the Pareto points (optimal solutions) from a set of possible configurations, also known as the maximal vector or skyline problem. The generalisation lies in the fact that we allow for partially ordered domains instead of only totally ordered ones. The calculator is available through the following url: http://www.es.ele.tue.nl/pareto
{"title":"A Calculator for Pareto Points","authors":"M. Geilen, T. Basten","doi":"10.1109/DATE.2007.364605","DOIUrl":"https://doi.org/10.1109/DATE.2007.364605","url":null,"abstract":"This paper presents the Pareto calculator, a tool for compositional computation of Pareto points, based on the algebra of Pareto points. The tool is a useful instrument for multidimensional optimisation problems, design-space exploration and development of quality management and control strategies. Implementations and their complexity of the operations of the algebra are discussed. In particular, a generalisation of the well-known divide-and-conquer algorithm was discussed to compute the Pareto points (optimal solutions) from a set of possible configurations, also known as the maximal vector or skyline problem. The generalisation lies in the fact that we allow for partially ordered domains instead of only totally ordered ones. The calculator is available through the following url: http://www.es.ele.tue.nl/pareto","PeriodicalId":298961,"journal":{"name":"2007 Design, Automation & Test in Europe Conference & Exhibition","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133008301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-04-16DOI: 10.1109/DATE.2007.364683
I. O’Connor, B. Courtois, K. Chakrabarty, N. Delorme, M. Hampton, J. Hartung
This paper discusses several forms of heterogeneity in systems on chip and systems in package. A means to distinguish the various forms of heterogeneity is given, with an estimation of the maturity of design and modeling techniques with respect to various physical domains. Industry-level MEMS integration, and more prospective microfluidic biochip systems are considered at both technological and EDA levels. Finally, specific flows for signal abstraction heterogeneity in RF SiP and for functional co-verification are discussed
{"title":"Heterogeneous Systems on Chip and Systems in Package","authors":"I. O’Connor, B. Courtois, K. Chakrabarty, N. Delorme, M. Hampton, J. Hartung","doi":"10.1109/DATE.2007.364683","DOIUrl":"https://doi.org/10.1109/DATE.2007.364683","url":null,"abstract":"This paper discusses several forms of heterogeneity in systems on chip and systems in package. A means to distinguish the various forms of heterogeneity is given, with an estimation of the maturity of design and modeling techniques with respect to various physical domains. Industry-level MEMS integration, and more prospective microfluidic biochip systems are considered at both technological and EDA levels. Finally, specific flows for signal abstraction heterogeneity in RF SiP and for functional co-verification are discussed","PeriodicalId":298961,"journal":{"name":"2007 Design, Automation & Test in Europe Conference & Exhibition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128848773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}