Subhendu Roy, D. Pan, Pavlos M. Mattheakis, P. S. Colyer, L. Masse-Navette, Pierre-Olivier Ribet
With aggressive technology scaling in nanometer regime, a significant fraction of dynamic power is consumed in the clock network due to its high switching activity. Clock networks are typically synthesized and routed to optimize for zero clock skew. However, clock skew optimization is often accompanied with routing overhead which increases the clock net capacitance thereby consuming more power. In this paper, we propose a skew bounded buffer tree resynthesis algorithm to optimize clock net capacitance after the clock network has been synthesized and routed. Our algorithm restricts the skew of the designs within a specified margin from its original skew, and does not introduce any additional Design Rule Check (DRC) violation. Experimental results on industrial designs, with clock networks synthesized and routed by an industrial tool, have demonstrated that our approach can achieve an average reduction of 5.6% and 3.5% in clock net capacitance and clock dynamic power respectively with a marginal overhead in the clock skew.
{"title":"Skew Bounded Buffer Tree Resynthesis For Clock Power Optimization","authors":"Subhendu Roy, D. Pan, Pavlos M. Mattheakis, P. S. Colyer, L. Masse-Navette, Pierre-Olivier Ribet","doi":"10.1145/2742060.2742119","DOIUrl":"https://doi.org/10.1145/2742060.2742119","url":null,"abstract":"With aggressive technology scaling in nanometer regime, a significant fraction of dynamic power is consumed in the clock network due to its high switching activity. Clock networks are typically synthesized and routed to optimize for zero clock skew. However, clock skew optimization is often accompanied with routing overhead which increases the clock net capacitance thereby consuming more power. In this paper, we propose a skew bounded buffer tree resynthesis algorithm to optimize clock net capacitance after the clock network has been synthesized and routed. Our algorithm restricts the skew of the designs within a specified margin from its original skew, and does not introduce any additional Design Rule Check (DRC) violation. Experimental results on industrial designs, with clock networks synthesized and routed by an industrial tool, have demonstrated that our approach can achieve an average reduction of 5.6% and 3.5% in clock net capacitance and clock dynamic power respectively with a marginal overhead in the clock skew.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115484157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: VLSI Design","authors":"E. Salman","doi":"10.1145/3254019","DOIUrl":"https://doi.org/10.1145/3254019","url":null,"abstract":"","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114307790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Energy is a critical challenge in computing performance. Due to "word size creep" from modern CPUs are inefficient for short-data element processing. We propose and evaluate a new microarchitecture called "Bit-Nibble-Byte"(BnB). We describe our design which includes both long fixed point vectors and as well as novel variable length instructions. Together, these features provide energy and performance benefits on a wide range of applications. We evaluate BnB with a detailed design of 5 vector sizes (128,256,512,1024,2048) mapped into 32nm and 7nm transistor technologies, and in combination with a variety of memory systems (DDR3 and HMC). The evaluation is based on both handwritten and compiled code with a custom compiler built for BnB. Our results include significant performance (19x-252x) and energy benefits (5.6x-140.7x) for short bit-field operations typically assumed to require hardwired accelerators and large-scale applications with compiled code.
{"title":"The Bit-Nibble-Byte MicroEngine (BnB) for Efficient Computing on Short Data","authors":"Dilip P. Vasudevan, A. Chien","doi":"10.1145/2742060.2742106","DOIUrl":"https://doi.org/10.1145/2742060.2742106","url":null,"abstract":"Energy is a critical challenge in computing performance. Due to \"word size creep\" from modern CPUs are inefficient for short-data element processing. We propose and evaluate a new microarchitecture called \"Bit-Nibble-Byte\"(BnB). We describe our design which includes both long fixed point vectors and as well as novel variable length instructions. Together, these features provide energy and performance benefits on a wide range of applications. We evaluate BnB with a detailed design of 5 vector sizes (128,256,512,1024,2048) mapped into 32nm and 7nm transistor technologies, and in combination with a variety of memory systems (DDR3 and HMC). The evaluation is based on both handwritten and compiled code with a custom compiler built for BnB. Our results include significant performance (19x-252x) and energy benefits (5.6x-140.7x) for short bit-field operations typically assumed to require hardwired accelerators and large-scale applications with compiled code.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115063470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
System lifetime reliability is an important design consideration for many real-time embedded systems. Increasing integrated circuit power density and the subsequent rise in chip temperature negatively impact the lifetime reliability of such systems. Although existing thermal-aware methods are effective in reducing temperature, they cannot increase, and may even hamper, the system lifetime reliability. The complicated relationship between temperature and system lifetime requires that reliability be considered explicitly during system design. This paper presents a reliability-aware utilization control framework for homogeneous multicore soft real-time systems. The framework employs a model predictive controller to increase the system lifetime by manipulating the utilization of real-time tasks. An online heuristic algorithm is introduced to adjust the controller's sampling window in order to reduce the effects of thermal cycling on reliability. Simulation results show that the proposed approach can improve the system mean time to failure by at least 43% and as much as 369% compared to existing techniques.
{"title":"Improving Lifetime of Multicore Soft Real-Time Systems through Global Utilization Control","authors":"Yuexi Ma, Thidapat Chantem, X. Hu, R. Dick","doi":"10.1145/2742060.2742113","DOIUrl":"https://doi.org/10.1145/2742060.2742113","url":null,"abstract":"System lifetime reliability is an important design consideration for many real-time embedded systems. Increasing integrated circuit power density and the subsequent rise in chip temperature negatively impact the lifetime reliability of such systems. Although existing thermal-aware methods are effective in reducing temperature, they cannot increase, and may even hamper, the system lifetime reliability. The complicated relationship between temperature and system lifetime requires that reliability be considered explicitly during system design. This paper presents a reliability-aware utilization control framework for homogeneous multicore soft real-time systems. The framework employs a model predictive controller to increase the system lifetime by manipulating the utilization of real-time tasks. An online heuristic algorithm is introduced to adjust the controller's sampling window in order to reduce the effects of thermal cycling on reliability. Simulation results show that the proposed approach can improve the system mean time to failure by at least 43% and as much as 369% compared to existing techniques.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115067897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Keynote II","authors":"Alex K. Jones","doi":"10.1145/3254018","DOIUrl":"https://doi.org/10.1145/3254018","url":null,"abstract":"","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116962747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With a specified Mean Timing Between Failure (MTBF), the metastability resolution time of synchronizers possibly constrains the system performance. To enhance metastability resolution time under single low-voltage supply environments, Voltage-Boosted Synchronizers (VBSs) consisting of a basic minimum-sized Jamb latch and a switched-capacitor-based charge pump are proposed. The capacitor of the charge pump is sized 13 times the area of the Jamb latch. Two powering strategies of the charge pump, namely Metastability-driven VBS (MVBS) and Clock-driven VBS (CVBS), are proposed. For a 1-year MTBF specification, MVBS and CVBS show 2.0-2.7 and 5.1-9.8 times the performance improvement over the basic Jamb latch, respectively, without incurring large power consumption.
{"title":"Voltage-Boosted Synchronizers","authors":"Yaoqiang Li, P. Chuang, A. Kennings, M. Sachdev","doi":"10.1145/2742060.2742075","DOIUrl":"https://doi.org/10.1145/2742060.2742075","url":null,"abstract":"With a specified Mean Timing Between Failure (MTBF), the metastability resolution time of synchronizers possibly constrains the system performance. To enhance metastability resolution time under single low-voltage supply environments, Voltage-Boosted Synchronizers (VBSs) consisting of a basic minimum-sized Jamb latch and a switched-capacitor-based charge pump are proposed. The capacitor of the charge pump is sized 13 times the area of the Jamb latch. Two powering strategies of the charge pump, namely Metastability-driven VBS (MVBS) and Clock-driven VBS (CVBS), are proposed. For a 1-year MTBF specification, MVBS and CVBS show 2.0-2.7 and 5.1-9.8 times the performance improvement over the basic Jamb latch, respectively, without incurring large power consumption.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123434943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Synthetic gene networks enable the programming of living cells to perform novel behaviors. Mammalian synthetic gene networks have largely been used as research tools to probe cellular function and more recently to engineer therapeutic capabilities. To create these networks researchers have developed a vast array of DNA -encoded parts that can serve as sensors, computational regulators, and actuators. Many of these gene circuit components have varying temporal characteristics and sensitivities making them well - suited for engineering systems that can act on different time scales and at different molecular concentrations. These components have been combined to create increasingly complex gene circuits. Major challenges for engineering mammalian synthetic gene networks include further improving scalability and predictability. Recent technological advancements in site - directed genome engineering and programmable DNA binding domains will likely aid in addressing these issues. Other important future directions will include incorporating new regulators that act at the levels of chromatin remodeling and DNA methylation and the division of computational loads among different cell types with population - based computing.
{"title":"Mammalian Synthetic Gene Networks","authors":"J. Lohmueller","doi":"10.1145/2742060.2743764","DOIUrl":"https://doi.org/10.1145/2742060.2743764","url":null,"abstract":"Synthetic gene networks enable the programming of living cells to perform novel behaviors. Mammalian synthetic gene networks have largely been used as research tools to probe cellular function and more recently to engineer therapeutic capabilities. To create these networks researchers have developed a vast array of DNA -encoded parts that can serve as sensors, computational regulators, and actuators. Many of these gene circuit components have varying temporal characteristics and sensitivities making them well - suited for engineering systems that can act on different time scales and at different molecular concentrations. These components have been combined to create increasingly complex gene circuits. Major challenges for engineering mammalian synthetic gene networks include further improving scalability and predictability. Recent technological advancements in site - directed genome engineering and programmable DNA binding domains will likely aid in addressing these issues. Other important future directions will include incorporating new regulators that act at the levels of chromatin remodeling and DNA methylation and the division of computational loads among different cell types with population - based computing.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123681294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robert Najvirt, U. Schmid, M. Hofbauer, Matthias Függer, Thomas Nowak, K. Schweiger
Fast digital timing simulations based on continuous-time, digital-value circuit models are an attractive and heavily used alternative to analog simulations. Models based on analytic delay formulas are particularly interesting here, as they also facilitate formal verification and delay bound synthesis of complex circuits. Recently, Függer et al. (arXiv:1406.2544 [cs.OH]) proposed a circuit model based on so-called involution channels. It is the first binary circuit model that realistically captures solvability of short-pulse filtration, a non-trivial glitch propagation problem related to building one-shot inertial delays. In this work, we address the question of whether involution channels also accurately model the delay of real circuits. Using both Spice simulations and physical measurements, we confirm that modeling an inverter chain by involution channels accurately describes reality. We also demonstrate that transitions in vanishing pulse trains are accurately predicted by the involution model. For our Spice simulations, we used both UMC-90 and UMC-65 technology, with varying supply voltages from nominal down to near sub-threshold range. The measurements were performed on a special-purpose UMC-90 ASIC that combines an inverter chain with low-intrusive high-speed on-chip analog amplifiers.
{"title":"Experimental Validation of a Faithful Binary Circuit Model","authors":"Robert Najvirt, U. Schmid, M. Hofbauer, Matthias Függer, Thomas Nowak, K. Schweiger","doi":"10.1145/2742060.2742081","DOIUrl":"https://doi.org/10.1145/2742060.2742081","url":null,"abstract":"Fast digital timing simulations based on continuous-time, digital-value circuit models are an attractive and heavily used alternative to analog simulations. Models based on analytic delay formulas are particularly interesting here, as they also facilitate formal verification and delay bound synthesis of complex circuits. Recently, Függer et al. (arXiv:1406.2544 [cs.OH]) proposed a circuit model based on so-called involution channels. It is the first binary circuit model that realistically captures solvability of short-pulse filtration, a non-trivial glitch propagation problem related to building one-shot inertial delays. In this work, we address the question of whether involution channels also accurately model the delay of real circuits. Using both Spice simulations and physical measurements, we confirm that modeling an inverter chain by involution channels accurately describes reality. We also demonstrate that transitions in vanishing pulse trains are accurately predicted by the involution model. For our Spice simulations, we used both UMC-90 and UMC-65 technology, with varying supply voltages from nominal down to near sub-threshold range. The measurements were performed on a special-purpose UMC-90 ASIC that combines an inverter chain with low-intrusive high-speed on-chip analog amplifiers.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127421400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes Flip-Mirror-Rotate (FMR), an architecture for bit-write reduction and endurance enhancement in emerging non-volatile memories (NVMs). FMR comprises three components: adaptive Flip-N-Write (aFNW), Mirror-N-Write (MNW), and Rotate-N-Write (RNW). aFNW and MNW focus on word-level bit-write reduction, which reduces NVM dynamic energy while also improving endurance. RNW is an intra-word wear leveling scheme that operates at cache line granularity. The proposed FMR architecture is integrated with frequent pattern compression (FPC) to simultaneously reduce bit-writes and wear in NVMs. Trace-based simulations of the SPEC CPU2006 benchmarks show that for the same memory overhead and < 1% loss in memory bandwidth, FMR reduces bit-writes (dynamic energy) by 48% (29%) in comparison to classical read-modify-write (DCW), 39% (13%) in comparison to Flip-N-Write (FNW), and 21% (14%) in comparison to FPC. Simultaneously, FMR also reduces the peak bit-writes per cell by 47% in comparison to DCW, 34% in comparison to FNW, and 47% in comparison to FPC, improving NVM endurance.
本文提出了一种用于减少新兴非易失性存储器(nvm)的比特写入和增强持久性的结构——翻转镜像旋转(FMR)。FMR由三个部分组成:自适应翻转- n -写(aFNW)、镜像- n -写(MNW)和旋转- n -写(RNW)。aFNW和MNW侧重于字级比特写减少,这降低了NVM的动态能量,同时也提高了耐用性。RNW是一种按缓存线粒度操作的字内损耗均衡方案。提出的FMR架构与频繁模式压缩(FPC)相结合,可以同时减少nvm中的比特写入和损耗。基于跟踪的SPEC CPU2006基准测试模拟表明,对于相同的内存开销和< 1%的内存带宽损失,FMR与传统的读-修改-写(DCW)相比减少了48%(29%)的写位(动态能量),与翻转- n -写(FNW)相比减少了39%(13%),与FPC相比减少了21%(14%)。同时,与DCW相比,FMR还将每个单元的峰值比特写入减少了47%,与FNW相比减少了34%,与FPC相比减少了47%,从而提高了NVM的耐用性。
{"title":"Flip-Mirror-Rotate: An Architecture for Bit-write Reduction and Wear Leveling in Non-volatile Memories","authors":"Poovaiah M. Palangappa, K. Mohanram","doi":"10.1145/2742060.2742110","DOIUrl":"https://doi.org/10.1145/2742060.2742110","url":null,"abstract":"This paper proposes Flip-Mirror-Rotate (FMR), an architecture for bit-write reduction and endurance enhancement in emerging non-volatile memories (NVMs). FMR comprises three components: adaptive Flip-N-Write (aFNW), Mirror-N-Write (MNW), and Rotate-N-Write (RNW). aFNW and MNW focus on word-level bit-write reduction, which reduces NVM dynamic energy while also improving endurance. RNW is an intra-word wear leveling scheme that operates at cache line granularity. The proposed FMR architecture is integrated with frequent pattern compression (FPC) to simultaneously reduce bit-writes and wear in NVMs. Trace-based simulations of the SPEC CPU2006 benchmarks show that for the same memory overhead and < 1% loss in memory bandwidth, FMR reduces bit-writes (dynamic energy) by 48% (29%) in comparison to classical read-modify-write (DCW), 39% (13%) in comparison to Flip-N-Write (FNW), and 21% (14%) in comparison to FPC. Simultaneously, FMR also reduces the peak bit-writes per cell by 47% in comparison to DCW, 34% in comparison to FNW, and 47% in comparison to FPC, improving NVM endurance.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123362385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rosario Distefano, F. Fummi, C. Laudanna, N. Bombieri, R. Giugno
Signal transduction is a class of cell's biological processes, which are commonly represented as highly concurrent reactive systems. In the Systems Biology community, modelling and simulation of signal transduction require overcoming issues like discrete event-based execution of complex systems, description from building blocks through composition and encapsulation, description at different levels of granularity, methods for abstraction and refinement. This paper presents a signal transduction modelling and simulation platform based on SystemC, and shows how the platform allows handling the system complexity by modelling it at different abstraction levels. The paper reports the results obtained by applying the platform to model the intracellular signalling network controlling integrin activation mediating leukocyte recruitment from the blood into the tissues.
{"title":"A SystemC Platform for Signal Transduction Modelling and Simulation in Systems Biology","authors":"Rosario Distefano, F. Fummi, C. Laudanna, N. Bombieri, R. Giugno","doi":"10.1145/2742060.2742115","DOIUrl":"https://doi.org/10.1145/2742060.2742115","url":null,"abstract":"Signal transduction is a class of cell's biological processes, which are commonly represented as highly concurrent reactive systems. In the Systems Biology community, modelling and simulation of signal transduction require overcoming issues like discrete event-based execution of complex systems, description from building blocks through composition and encapsulation, description at different levels of granularity, methods for abstraction and refinement. This paper presents a signal transduction modelling and simulation platform based on SystemC, and shows how the platform allows handling the system complexity by modelling it at different abstraction levels. The paper reports the results obtained by applying the platform to model the intracellular signalling network controlling integrin activation mediating leukocyte recruitment from the blood into the tissues.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123379850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}