With burgeoning growth of mobile systems, multiprocessor System-on-Chip (MPSoC) connected via Network-on-Chip (NoC) has become ubiquitous. A typical MPSoC in mobile applications consists of multiple CPU cores of varying capabilities, GPU cores, DSP cores, and crypto accelerators and such cores differ widely in their physical size and their bandwidth requirements. Traditional mesh based NoC systems work well for regular structures, but do not map well to heterogeneous MPSoCs. In MPSoC programming model, an application consists of tasks, that represent a unit of work on a core which can be executed asynchronously. The communication between tasks is represented in the form of a directed acyclic graph. The temporal burstness of data which arise from programming model provide opportunity for multiplexing communication between cores, which may be advantageous in reducing network size. Often a task graph needs to meet a real-time deadline. The actual execution time may vary based on the application data. The uncertainty in the execution time may be modeled by a statistical distribution, which further complicates the NoC design. In this paper, we present a synthesis method for hierarchical design of NoC for a given task graph system deadline, that optimizes for router area. A 2-phase design flow is proposed, which consists of topology generation and statistical analysis in an iterative loop. We adopt proportion of Monte-Carlo test cases that meet the deadline as a metric for goodness. The proposed solution is compared against static design approach and simulated annealing (SA) based network generation. On an average, a performance benefit of 10% over SA, 16% over standard mesh and 30% over static design was obtained and a total router area benefit of 59% over SA, 48% over mesh and 55% over static design was observed.
{"title":"Network-on-Chip Design for Heterogeneous Multiprocessor System-on-Chip","authors":"Bharath Phanibhushana, S. Kundu","doi":"10.1109/ISVLSI.2014.96","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.96","url":null,"abstract":"With burgeoning growth of mobile systems, multiprocessor System-on-Chip (MPSoC) connected via Network-on-Chip (NoC) has become ubiquitous. A typical MPSoC in mobile applications consists of multiple CPU cores of varying capabilities, GPU cores, DSP cores, and crypto accelerators and such cores differ widely in their physical size and their bandwidth requirements. Traditional mesh based NoC systems work well for regular structures, but do not map well to heterogeneous MPSoCs. In MPSoC programming model, an application consists of tasks, that represent a unit of work on a core which can be executed asynchronously. The communication between tasks is represented in the form of a directed acyclic graph. The temporal burstness of data which arise from programming model provide opportunity for multiplexing communication between cores, which may be advantageous in reducing network size. Often a task graph needs to meet a real-time deadline. The actual execution time may vary based on the application data. The uncertainty in the execution time may be modeled by a statistical distribution, which further complicates the NoC design. In this paper, we present a synthesis method for hierarchical design of NoC for a given task graph system deadline, that optimizes for router area. A 2-phase design flow is proposed, which consists of topology generation and statistical analysis in an iterative loop. We adopt proportion of Monte-Carlo test cases that meet the deadline as a metric for goodness. The proposed solution is compared against static design approach and simulated annealing (SA) based network generation. On an average, a performance benefit of 10% over SA, 16% over standard mesh and 30% over static design was obtained and a total router area benefit of 59% over SA, 48% over mesh and 55% over static design was observed.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124846319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A memory-based approach is described for performing basic logic gate functions. CMOS transistors are used in a non-traditional way for multi-level operations and memory manipulation. Sense amplifier circuits drive an array of pass amplifiers in which memory values are set by reference connections. The combination of multi-level architectures and matrix algebra principles can create flexible, modular systems using standard fabrication methods. Logic gate functions of AND, OR, NAND, and NOR are implemented in quaternary, memory-based architectures. The circuit layouts and functional simulations are given and are compared to those of similar binary circuits. Experimental performance of a hardware AND chip is also demonstrated. The approach requires more chip area for basic logic gates, but it grows increasingly efficient for more complex systems through hardware reuse. The benefits and feasibility of more complex applications are discussed.
{"title":"Multi-level, Memory-Based Logic Using CMOS Technology","authors":"Indira Dugganapally, S. Watkins, B. Cooper","doi":"10.1109/ISVLSI.2014.91","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.91","url":null,"abstract":"A memory-based approach is described for performing basic logic gate functions. CMOS transistors are used in a non-traditional way for multi-level operations and memory manipulation. Sense amplifier circuits drive an array of pass amplifiers in which memory values are set by reference connections. The combination of multi-level architectures and matrix algebra principles can create flexible, modular systems using standard fabrication methods. Logic gate functions of AND, OR, NAND, and NOR are implemented in quaternary, memory-based architectures. The circuit layouts and functional simulations are given and are compared to those of similar binary circuits. Experimental performance of a hardware AND chip is also demonstrated. The approach requires more chip area for basic logic gates, but it grows increasingly efficient for more complex systems through hardware reuse. The benefits and feasibility of more complex applications are discussed.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123503693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As prevailing copper interconnect technology advances to its fundamental physical limit, interconnect delay due to ever-increasing wire resistivity has greatly limited the circuit miniaturization. Single-walled carbon nanotubes (SWCNTs) bundle interconnects have emerged as a promising replacement material for copper interconnects due to their superior conductivity. Previous works have focused on studying device and interconnect modeling for bundled SWCNTs while none of them consider deployment of such an advanced technology into VLSI physical design. To the best of the authors' knowledge, this paper develops the first physical design technique for the interconnect optimization using carbon nanotube interconnects. We propose a timing driven buffer insertion technique for bundled SWCNTs, where the standard buffering algorithm has been enhanced to accommodate some features in the SWCNT timing modelling. Our experimental results on a set of scaled industrial nets at 22nm technology demonstrate that compared to copper buffering, CNT buffering can save over 50% buffer area with the same timing constraint. In addition, CNT buffering can effectively reduce the delay by up to 32%. Further, CNT buffering runs in time similar to copper buffering.
{"title":"Buffering Single-Walled Carbon Nanotubes Bundle Interconnects for Timing Optimization","authors":"Lin Liu, Yuchen Zhou, Shiyan Hu","doi":"10.1109/ISVLSI.2014.35","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.35","url":null,"abstract":"As prevailing copper interconnect technology advances to its fundamental physical limit, interconnect delay due to ever-increasing wire resistivity has greatly limited the circuit miniaturization. Single-walled carbon nanotubes (SWCNTs) bundle interconnects have emerged as a promising replacement material for copper interconnects due to their superior conductivity. Previous works have focused on studying device and interconnect modeling for bundled SWCNTs while none of them consider deployment of such an advanced technology into VLSI physical design. To the best of the authors' knowledge, this paper develops the first physical design technique for the interconnect optimization using carbon nanotube interconnects. We propose a timing driven buffer insertion technique for bundled SWCNTs, where the standard buffering algorithm has been enhanced to accommodate some features in the SWCNT timing modelling. Our experimental results on a set of scaled industrial nets at 22nm technology demonstrate that compared to copper buffering, CNT buffering can save over 50% buffer area with the same timing constraint. In addition, CNT buffering can effectively reduce the delay by up to 32%. Further, CNT buffering runs in time similar to copper buffering.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125524827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Trusted Platform Module (TPM) has gained its popularity in computing systems as a hardware security approach. TPM provides the boot time security by verifying the platform integrity including hardware and software. However, once the software is loaded, TPM can no longer protect the software execution. In this work, we propose a dynamic TPM design, which performs control flow checking to protect the program from runtime attacks. The control flow checker is integrated at the commit stage of the processor pipeline. The control flow of program is verified to defend the attacks such as stack smashing using buffer overflow and code reuse. We implement the proposed dynamic TPM design in FPGA to achieve high performance, low cost and flexibility for easy functionality upgrade based on FPGA. In our design, neither the source code nor the Instruction Set Architecture (ISA) needs to be changed. The benchmark simulations demonstrate less than 1% of performance penalty on the processor, and an effective software protection from the attacks.
{"title":"Reconfigurable Dynamic Trusted Platform Module for Control Flow Checking","authors":"Sanjeev Das, Wei Zhang, Yang Liu","doi":"10.1109/ISVLSI.2014.84","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.84","url":null,"abstract":"Trusted Platform Module (TPM) has gained its popularity in computing systems as a hardware security approach. TPM provides the boot time security by verifying the platform integrity including hardware and software. However, once the software is loaded, TPM can no longer protect the software execution. In this work, we propose a dynamic TPM design, which performs control flow checking to protect the program from runtime attacks. The control flow checker is integrated at the commit stage of the processor pipeline. The control flow of program is verified to defend the attacks such as stack smashing using buffer overflow and code reuse. We implement the proposed dynamic TPM design in FPGA to achieve high performance, low cost and flexibility for easy functionality upgrade based on FPGA. In our design, neither the source code nor the Instruction Set Architecture (ISA) needs to be changed. The benchmark simulations demonstrate less than 1% of performance penalty on the processor, and an effective software protection from the attacks.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"5 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120941414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yassine Fkih, P. Vivet, B. Rouzeyre, M. Flottes, G. D. Natale, J. Schlöffel
Design For Test (DFT) of 3D stacked integrated circuits based on Through Silicon Vias (TSVs) is one of the hot topics in the field of test of integrated circuits. This is due to the hard test accessibility (especially for upper dies) and to the high complexity where each die can embed hundreds of IPs. In this paper we propose a DFT architecture based on IEEE P1687 to enable the test of 3D stacked ICs. The proposed test architecture allows the test at all 3D fabrication levels: pre-, mid-, and postbond levels. We present a test pattern retargeting flow using IEEE P1687 languages ICL (Instrument Connectivity Language) and PDL (Procedural Description Language), which allows easy retargeting from 2D (die-level) to 3D (stack-level). Compared to IEEE 1149.1 based 3D test architecture, our proposed 3D test architecture is more flexible and enhances test concurrency without an additional area cost.
{"title":"2D to 3D Test Pattern Retargeting Using IEEE P1687 Based 3D DFT Architectures","authors":"Yassine Fkih, P. Vivet, B. Rouzeyre, M. Flottes, G. D. Natale, J. Schlöffel","doi":"10.1109/ISVLSI.2014.83","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.83","url":null,"abstract":"Design For Test (DFT) of 3D stacked integrated circuits based on Through Silicon Vias (TSVs) is one of the hot topics in the field of test of integrated circuits. This is due to the hard test accessibility (especially for upper dies) and to the high complexity where each die can embed hundreds of IPs. In this paper we propose a DFT architecture based on IEEE P1687 to enable the test of 3D stacked ICs. The proposed test architecture allows the test at all 3D fabrication levels: pre-, mid-, and postbond levels. We present a test pattern retargeting flow using IEEE P1687 languages ICL (Instrument Connectivity Language) and PDL (Procedural Description Language), which allows easy retargeting from 2D (die-level) to 3D (stack-level). Compared to IEEE 1149.1 based 3D test architecture, our proposed 3D test architecture is more flexible and enhances test concurrency without an additional area cost.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129687947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The 2013 edition of the International Technology Roadmap for Semiconductors [10] highlights a slowdown of traditional pitch and density scaling in leading-edge patterning technologies. Through the foundry N5/N7 nodes, the roadmap also projects unfavorable scaling of device and interconnect electrical performance (drive vs. leakage, resistivity, capacitive coupling, etc.). IC product value is also challenged by increasingly dominant variability mechanisms ranging from lithography and planarization in manufacturing, to dynamic voltage droop and aging in the field. Design teams compensate variability with margin (guardbanding), but this substantially reduces the value of designs at the next technology node. In this context, it is increasingly critical to deliver design-based equivalent scaling through novel design technologies. This paper reviews recent research directions that seek to improve modeling, margining and tolerance of IC variability. Collectively, these design methods offer new means by which product companies can extract greater value from available technologies, even as traditional scaling slows for patterning, devices and interconnects.
{"title":"Toward Holistic Modeling, Margining and Tolerance of IC Variability","authors":"A. Kahng","doi":"10.1109/ISVLSI.2014.118","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.118","url":null,"abstract":"The 2013 edition of the International Technology Roadmap for Semiconductors [10] highlights a slowdown of traditional pitch and density scaling in leading-edge patterning technologies. Through the foundry N5/N7 nodes, the roadmap also projects unfavorable scaling of device and interconnect electrical performance (drive vs. leakage, resistivity, capacitive coupling, etc.). IC product value is also challenged by increasingly dominant variability mechanisms ranging from lithography and planarization in manufacturing, to dynamic voltage droop and aging in the field. Design teams compensate variability with margin (guardbanding), but this substantially reduces the value of designs at the next technology node. In this context, it is increasingly critical to deliver design-based equivalent scaling through novel design technologies. This paper reviews recent research directions that seek to improve modeling, margining and tolerance of IC variability. Collectively, these design methods offer new means by which product companies can extract greater value from available technologies, even as traditional scaling slows for patterning, devices and interconnects.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128225917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper discusses modification to algorithms for computing within a parallel cubing unit. The algorithms discussed in this paper shows several architectures for various operand sizes ranging from 8 to 32 bits. The method proposed in this paper separates the cubing partial product matrix into smaller elements and organizes these partial products into repeatable manageable groups. Consequently, the overall partial product matrix is substantially reduced from previous methods. An algorithmic analysis is also presented that demonstrates reduction in area and delay for several operand widths as well as their implementations in a Vitex 5 Xilinx FPGAs and for IBM 65nm ASIC standard-cell library.
{"title":"Experiments with High Speed Parallel Cubing Units","authors":"Son Bui, J. Stine, M. Sadeghian","doi":"10.1109/ISVLSI.2014.97","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.97","url":null,"abstract":"This paper discusses modification to algorithms for computing within a parallel cubing unit. The algorithms discussed in this paper shows several architectures for various operand sizes ranging from 8 to 32 bits. The method proposed in this paper separates the cubing partial product matrix into smaller elements and organizes these partial products into repeatable manageable groups. Consequently, the overall partial product matrix is substantially reduced from previous methods. An algorithmic analysis is also presented that demonstrates reduction in area and delay for several operand widths as well as their implementations in a Vitex 5 Xilinx FPGAs and for IBM 65nm ASIC standard-cell library.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130981246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We demonstrate the growth of III-Sb buffers on GaAs and Silicon substrates through the use of an epitaxial technique involving the formation of interfacial misfit dislocation arrays that is formed between the III-Sb alloy and the substrate. The interfacial misfit array results in the spontaneous relaxation of the highly mismatched III-Sb semiconductor and provides a platform for the realization of high mobility channels on GaAs and Silicon. We make use of InAs type -- II confinement structures for n-type and pseudomorphic InGaSb type -- I structures for p-type channels.
{"title":"High Mobility n and p Channels on Gallium Arsenide and Silicon Substrates Using Interfacial Misfit Dislocation Arrays","authors":"D. Shima, G. Balakrishnan","doi":"10.1109/ISVLSI.2014.99","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.99","url":null,"abstract":"We demonstrate the growth of III-Sb buffers on GaAs and Silicon substrates through the use of an epitaxial technique involving the formation of interfacial misfit dislocation arrays that is formed between the III-Sb alloy and the substrate. The interfacial misfit array results in the spontaneous relaxation of the highly mismatched III-Sb semiconductor and provides a platform for the realization of high mobility channels on GaAs and Silicon. We make use of InAs type -- II confinement structures for n-type and pseudomorphic InGaSb type -- I structures for p-type channels.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133612760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With transistor dimensions shrinking due to continued scaling, integrated circuits are increasingly susceptible to radiation upset. This paper presents a systematic methodology for evaluating circuit hardness, as well as graph clustering approaches to determine effective node separation to protect against upset due to multiple node charge collection. The methodology is circuit simulation based, making it efficient and usable by circuit designers. Example designs are presented to demonstrate the analysis and clustering for real flip-flop designs. Finally, the methodology is utilized to provide critical node separation for a new hardened flip-flop design that reduces the power and area by 27% and 19.5% respectively.
{"title":"Methodical Design Approaches to Radiation Effects Analysis and Mitigation in Flip-Flop Circuits","authors":"L. Clark, Sandeep Shambhulingaiah","doi":"10.1109/ISVLSI.2014.74","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.74","url":null,"abstract":"With transistor dimensions shrinking due to continued scaling, integrated circuits are increasingly susceptible to radiation upset. This paper presents a systematic methodology for evaluating circuit hardness, as well as graph clustering approaches to determine effective node separation to protect against upset due to multiple node charge collection. The methodology is circuit simulation based, making it efficient and usable by circuit designers. Example designs are presented to demonstrate the analysis and clustering for real flip-flop designs. Finally, the methodology is utilized to provide critical node separation for a new hardened flip-flop design that reduces the power and area by 27% and 19.5% respectively.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116196229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the primary challenges with fully integratedvoltage regulation is to maintain a high power efficiency overa wide output current range. A multiphase distributed switchedcapacitor (SC) converter with a new control method that adaptivelyturns on and off certain interleaved stages is proposed. By controlling the number of active interleaved stages basedon the load current, the proposed system achieves a higherpower efficiency for lower output currents, forcing active stagesto operate at highest possible power efficiency. By distributingthe interleaved stages, lower IR and Ldi/dt drop is achieved.
{"title":"Regulator-Gating Methodology with Distributed Switched Capacitor Voltage Converters","authors":"Orhun Aras Uzun, Selçuk Köse","doi":"10.1109/ISVLSI.2014.111","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.111","url":null,"abstract":"One of the primary challenges with fully integratedvoltage regulation is to maintain a high power efficiency overa wide output current range. A multiphase distributed switchedcapacitor (SC) converter with a new control method that adaptivelyturns on and off certain interleaved stages is proposed. By controlling the number of active interleaved stages basedon the load current, the proposed system achieves a higherpower efficiency for lower output currents, forcing active stagesto operate at highest possible power efficiency. By distributingthe interleaved stages, lower IR and Ldi/dt drop is achieved.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123438656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}