Exponential increases in architectural design complexity threaten to make traditional processor design optimization techniques intractable. Genetically programmed response surfaces (GPRS) address this challenge by transforming the optimization process from a lengthy series of detailed simulations into the tractable formulation and rapid evaluation of a predictive model. We validate GPRS methodology on realistic processor design spaces and compare it to recently proposed techniques for predictive microarchitectural design space exploration.
{"title":"Predictive design space exploration using genetically programmed response surfaces","authors":"Henry Cook, K. Skadron","doi":"10.1145/1391469.1391711","DOIUrl":"https://doi.org/10.1145/1391469.1391711","url":null,"abstract":"Exponential increases in architectural design complexity threaten to make traditional processor design optimization techniques intractable. Genetically programmed response surfaces (GPRS) address this challenge by transforming the optimization process from a lengthy series of detailed simulations into the tractable formulation and rapid evaluation of a predictive model. We validate GPRS methodology on realistic processor design spaces and compare it to recently proposed techniques for predictive microarchitectural design space exploration.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114927916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Gao, K. Karuri, S. Kraemer, R. Leupers, G. Ascheid, H. Meyr
With the growing number of programmable processing elements in today's Multiprocessor System-on-Chip (MPSoC) designs, the synergy required for the development of the hardware architecture and the software running on them is also increasing. In MPSoC development environment, changes in the hardware architecture can bring in extensive re-partitioning or re-parallelization of the software architecture. Fast and accurate functional simulation and performance estimation techniques are needed to cope with this co-design problem at the early phases of MPSoC design space exploration. The current paper addresses this issue by introducing a framework which combines hybrid simulation, cache simulation and online trace-driven replay techniques to accurately predict performance of programmable elements in an MPSoC environment. The resulting simulation technique can easily cope with the continuous re-organizations of software architectures during an Instruction Set Simulator (ISS) based design process. Experimental results show that this framework can improve system simulation speed by 3-5X on average while achieving accuracy closely comparable to traditional ISSes.
{"title":"Multiprocessor performance estimation using hybrid simulation","authors":"L. Gao, K. Karuri, S. Kraemer, R. Leupers, G. Ascheid, H. Meyr","doi":"10.1145/1391469.1391552","DOIUrl":"https://doi.org/10.1145/1391469.1391552","url":null,"abstract":"With the growing number of programmable processing elements in today's Multiprocessor System-on-Chip (MPSoC) designs, the synergy required for the development of the hardware architecture and the software running on them is also increasing. In MPSoC development environment, changes in the hardware architecture can bring in extensive re-partitioning or re-parallelization of the software architecture. Fast and accurate functional simulation and performance estimation techniques are needed to cope with this co-design problem at the early phases of MPSoC design space exploration. The current paper addresses this issue by introducing a framework which combines hybrid simulation, cache simulation and online trace-driven replay techniques to accurately predict performance of programmable elements in an MPSoC environment. The resulting simulation technique can easily cope with the continuous re-organizations of software architectures during an Instruction Set Simulator (ISS) based design process. Experimental results show that this framework can improve system simulation speed by 3-5X on average while achieving accuracy closely comparable to traditional ISSes.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116286020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we propose to replace all data and control pads generally present in conventional chips with a new type of ultra-compact, low-power optical interconnect implemented almost entirely in CMOS. The proposed scheme enables optical through-chip buses that could service hundreds of thinned stacked dies. High throughputs and communication density could be achieved even in tight power budgets. The core of the optical interconnect is a single-photon avalanche diode operating in pulse position modulation. We demonstrate how throughputs of several gigabits per second may be achieved. We also show a systematic analysis of the system and preliminary results to support its suitability in emerging DSM technologies.
{"title":"Techniques for fully integrated intra-/inter-chip optical communication","authors":"C. Favi, E. Charbon","doi":"10.1145/1391469.1391558","DOIUrl":"https://doi.org/10.1145/1391469.1391558","url":null,"abstract":"In this paper we propose to replace all data and control pads generally present in conventional chips with a new type of ultra-compact, low-power optical interconnect implemented almost entirely in CMOS. The proposed scheme enables optical through-chip buses that could service hundreds of thinned stacked dies. High throughputs and communication density could be achieved even in tight power budgets. The core of the optical interconnect is a single-photon avalanche diode operating in pulse position modulation. We demonstrate how throughputs of several gigabits per second may be achieved. We also show a systematic analysis of the system and preliminary results to support its suitability in emerging DSM technologies.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116299422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Ceng, J. Castrillón, Weihua Sheng, H. Scharwächter, R. Leupers, G. Ascheid, H. Meyr, T. Isshiki, H. Kunieda
In the past few years, MPSoC has become the most popular solution for embedded computing. However, the challenge of programming MPSoCs also comes as the biggest side-effect of the solution. Especially, when designers have to face the legacy C code accumulated through the years, the tool support is mostly unsatisfactory. In this paper, we propose an integrated framework, MAPS, which aims at parallelizing C applications for MPSoC platforms. It extracts coarse-grained parallelism on a novel granularity level. A set of tools have been developed for the framework. We will introduce the major components and their functionalities. Two case studies will be given, which demonstrate the use of MAPS on two different kinds of applications. In both cases the proposed framework helps the programmer to extract parallelism efficiently.
{"title":"MAPS: An integrated framework for MPSoC application parallelization","authors":"J. Ceng, J. Castrillón, Weihua Sheng, H. Scharwächter, R. Leupers, G. Ascheid, H. Meyr, T. Isshiki, H. Kunieda","doi":"10.1145/1391469.1391663","DOIUrl":"https://doi.org/10.1145/1391469.1391663","url":null,"abstract":"In the past few years, MPSoC has become the most popular solution for embedded computing. However, the challenge of programming MPSoCs also comes as the biggest side-effect of the solution. Especially, when designers have to face the legacy C code accumulated through the years, the tool support is mostly unsatisfactory. In this paper, we propose an integrated framework, MAPS, which aims at parallelizing C applications for MPSoC platforms. It extracts coarse-grained parallelism on a novel granularity level. A set of tools have been developed for the framework. We will introduce the major components and their functionalities. Two case studies will be given, which demonstrate the use of MAPS on two different kinds of applications. In both cases the proposed framework helps the programmer to extract parallelism efficiently.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114480558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a novel computing framework consisting of multiple processing cores that exhibits swarm-like behavior. Conventional parallel processing paradigm typically requires a central controller for job assignment, inter-core communications and defect-tolerance. The proposed system leverages on the collective intelligence of a swarm of processing elements to avoid the bottleneck imposed by a central scheduler. Preliminary simulations show promising results for common signal processing applications.
{"title":"Collective computing based on swarm intelligence","authors":"S. Narasimhan, Somnath Paul, S. Bhunia","doi":"10.1145/1391469.1391561","DOIUrl":"https://doi.org/10.1145/1391469.1391561","url":null,"abstract":"We present a novel computing framework consisting of multiple processing cores that exhibits swarm-like behavior. Conventional parallel processing paradigm typically requires a central controller for job assignment, inter-core communications and defect-tolerance. The proposed system leverages on the collective intelligence of a swarm of processing elements to avoid the bottleneck imposed by a central scheduler. Preliminary simulations show promising results for common signal processing applications.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114745904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Computer manufacturers spend a huge amount of time, resources, and money in designing new systems and newer configurations, and their ability to reduce costs, charge competitive prices and gain market share depends on how good these systems perform. In this work, we develop predictive models for estimating the performance of systems by using performance numbers from only a small fraction of the overall design space. Specifically, we first develop three models, two based on artificial neural networks and another based on linear regression. Using these models, we analyze the published Standard Performance Evaluation Corporation (SPEC) benchmark results and show that by using the performance numbers of only 2% and 5% of the machines in the design space, we can estimate the performance of all the systems within 9.1% and 4.6% on average, respectively. Then, we show that the performance of future systems can be estimated with less than 2.2% error rate on average by using the data of systems from a previous year. We believe that these tools can accelerate the design space exploration significantly and aid in reducing the corresponding research/development cost and time- to-market.
{"title":"Efficient system design space exploration using machine learning techniques","authors":"Berkin Özisikyilmaz, G. Memik, A. Choudhary","doi":"10.1145/1391469.1391712","DOIUrl":"https://doi.org/10.1145/1391469.1391712","url":null,"abstract":"Computer manufacturers spend a huge amount of time, resources, and money in designing new systems and newer configurations, and their ability to reduce costs, charge competitive prices and gain market share depends on how good these systems perform. In this work, we develop predictive models for estimating the performance of systems by using performance numbers from only a small fraction of the overall design space. Specifically, we first develop three models, two based on artificial neural networks and another based on linear regression. Using these models, we analyze the published Standard Performance Evaluation Corporation (SPEC) benchmark results and show that by using the performance numbers of only 2% and 5% of the machines in the design space, we can estimate the performance of all the systems within 9.1% and 4.6% on average, respectively. Then, we show that the performance of future systems can be estimated with less than 2.2% error rate on average by using the data of systems from a previous year. We believe that these tools can accelerate the design space exploration significantly and aid in reducing the corresponding research/development cost and time- to-market.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128306052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Franzon, W. R. Davis, M. Steer, S. Lipa, Eun Chu Oh, T. Thorolfsson, S. Melamed, S. Luniya, Tad Doxsee, Stephen Berkeley, Ben Shani, Kurt Obermiller
High density through silicon vias (TSV) can be used to build 3DICs that enable unique applications in computing, signal processing and memory intensive systems. This paper presents several case studies that are uniquely enhanced through 3D implementation, including a 3D CAM, an FFT processor, and a SAR processor. The CAD flow used to implement for these designs is described. 3DIC requires higher fidelity thermal modeling than 2DIC design. The rationale for this requirement is established and a possible solution is presented.
{"title":"Design and CAD for 3D integrated circuits","authors":"P. Franzon, W. R. Davis, M. Steer, S. Lipa, Eun Chu Oh, T. Thorolfsson, S. Melamed, S. Luniya, Tad Doxsee, Stephen Berkeley, Ben Shani, Kurt Obermiller","doi":"10.1145/1391469.1391642","DOIUrl":"https://doi.org/10.1145/1391469.1391642","url":null,"abstract":"High density through silicon vias (TSV) can be used to build 3DICs that enable unique applications in computing, signal processing and memory intensive systems. This paper presents several case studies that are uniquely enhanced through 3D implementation, including a 3D CAM, an FFT processor, and a SAR processor. The CAD flow used to implement for these designs is described. 3DIC requires higher fidelity thermal modeling than 2DIC design. The rationale for this requirement is established and a possible solution is presented.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"4 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128508179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The threshold voltage (Vth) of a nanoscale transistor is severely affected by random dopant fluctuations and line-edge roughness. The analysis of these effects usually requires atomistic simulations that are too expensive computationally for statistical circuit design. In this work, we develop an efficient SPICE simulation method and statistical transistor model that accurately predict threshold variation as a function of dopant fluctuations and gate length change caused by sub-wavelength lithography and gate etching process. By understanding the physical principles of atomistic simulations, we (a) identify the appropriate method to divide a non-uniform gate into slices in order to map those fluctuations into the device model; (b) extract the variation of Vth from the strong-inversion region instead of the leakage current, benefiting from the linearity of the saturation current with respect to Vth and (c) propose a compact model of Vth variation that is scalable with gate size and the amount of dopant and gate length fluctuations. The proposed SPICE simulation method is fully validated against atomistic simulation results. Given the post-lithography gate geometry, this approach correctly models the variation of device output current in all operating regions. Based on the new results, we further project the amount of Vth variation at advanced technology nodes, to help shed light on the challenges of future robust circuit design.
{"title":"Statistical modeling and simulation of threshold variation under dopant fluctuations and line-edge roughness","authors":"Y. Ye, Frank Liu, S. Nassif, Yu Cao","doi":"10.1145/1391469.1391698","DOIUrl":"https://doi.org/10.1145/1391469.1391698","url":null,"abstract":"The threshold voltage (Vth) of a nanoscale transistor is severely affected by random dopant fluctuations and line-edge roughness. The analysis of these effects usually requires atomistic simulations that are too expensive computationally for statistical circuit design. In this work, we develop an efficient SPICE simulation method and statistical transistor model that accurately predict threshold variation as a function of dopant fluctuations and gate length change caused by sub-wavelength lithography and gate etching process. By understanding the physical principles of atomistic simulations, we (a) identify the appropriate method to divide a non-uniform gate into slices in order to map those fluctuations into the device model; (b) extract the variation of Vth from the strong-inversion region instead of the leakage current, benefiting from the linearity of the saturation current with respect to Vth and (c) propose a compact model of Vth variation that is scalable with gate size and the amount of dopant and gate length fluctuations. The proposed SPICE simulation method is fully validated against atomistic simulation results. Given the post-lithography gate geometry, this approach correctly models the variation of device output current in all operating regions. Based on the new results, we further project the amount of Vth variation at advanced technology nodes, to help shed light on the challenges of future robust circuit design.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132725062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ling Zhang, Wenjian Yu, Haikun Zhu, A. Deutsch, G. Katopis, D. Dreps, E. Kuh, Chung-Kuan Cheng
A low power passive equalizer using RL terminator is proposed and optimized in this work. The equalizer includes an inductor in series with the resistive terminator, which boosts high frequency components and therefore improves the interconnect bandwidth with little overhead on power consumption. An analytic estimation method for eye-opening and jitter based on tritonic step response is also introduced in this work, which enables the optimization procedure. Our experimental results show that our estimation method is accurate and a board level transmission line of 50 cm wire length can achieve 15 Gb/s data rate. With 15 GHz frequency input, the power consumption of the equalizer is less than 2.5 mW, and the total power is 5 mW.
{"title":"Low power passive equalizer optimization using tritonic step response","authors":"Ling Zhang, Wenjian Yu, Haikun Zhu, A. Deutsch, G. Katopis, D. Dreps, E. Kuh, Chung-Kuan Cheng","doi":"10.1145/1391469.1391613","DOIUrl":"https://doi.org/10.1145/1391469.1391613","url":null,"abstract":"A low power passive equalizer using RL terminator is proposed and optimized in this work. The equalizer includes an inductor in series with the resistive terminator, which boosts high frequency components and therefore improves the interconnect bandwidth with little overhead on power consumption. An analytic estimation method for eye-opening and jitter based on tritonic step response is also introduced in this work, which enables the optimization procedure. Our experimental results show that our estimation method is accurate and a board level transmission line of 50 cm wire length can achieve 15 Gb/s data rate. With 15 GHz frequency input, the power consumption of the equalizer is less than 2.5 mW, and the total power is 5 mW.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"04 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130604740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Liberty Format is an open source industry standard for library modeling that has seen significant enhancement in recent years to address the challenges introduced by the new smaller technologies at 65 nm and below. Issues associated with modeling Timing, Power and Noise have seen an explosion in complexity. The paper discusses the challenges introduced by the new high accuracy models and techniques to ameliorate them for Library providers.
{"title":"Addressing library creation challenges from recent liberty extensions","authors":"R. Trihy","doi":"10.1145/1391469.1391591","DOIUrl":"https://doi.org/10.1145/1391469.1391591","url":null,"abstract":"The Liberty Format is an open source industry standard for library modeling that has seen significant enhancement in recent years to address the challenges introduced by the new smaller technologies at 65 nm and below. Issues associated with modeling Timing, Power and Noise have seen an explosion in complexity. The paper discusses the challenges introduced by the new high accuracy models and techniques to ameliorate them for Library providers.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133193681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}