{"title":"Session details: Reliability, Resiliency, Robustness II","authors":"S. Cotofana","doi":"10.1145/3254010","DOIUrl":"https://doi.org/10.1145/3254010","url":null,"abstract":"","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123119424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. G. Rizzo, S. Miryala, A. Calimera, E. Macii, M. Poncino
Electrostatically controlled graphene p-n junctions are devices built on single-layer graphene sheets whose in-to-out resistance can be dynamically tuned through external voltage potentials. While several recent works mainly focused on the possibility of using those devices as a new logic primitive for digital circuits, in this paper we address a complementary problem, that is, how to efficiently implement Analog-to-Digital Converters (ADCs) that can be integrated in future all-graphene flexible ICs. The contribution of this work is threefold: (i) introduce a new ADC architecture that perfectly matches with the main characteristics of graphene p-n junctions; (ii) give a first, yet detailed parametric characterization of the proposed ADC architecture as to validate its functionality and quantify its figures of merit; (iii) provide a fully automated design flow that, given as input the design specs, i.e., input voltage range, voltage resolution and sampling rate, returns an optimally sized ADC circuitry. Few case studies also demonstrate p-n junction based graphene ADCs have characteristics in line with those offered by todays' CMOS ones.
{"title":"Design and Characterization of Analog-to-Digital Converters using Graphene P-N Junctions","authors":"R. G. Rizzo, S. Miryala, A. Calimera, E. Macii, M. Poncino","doi":"10.1145/2742060.2742099","DOIUrl":"https://doi.org/10.1145/2742060.2742099","url":null,"abstract":"Electrostatically controlled graphene p-n junctions are devices built on single-layer graphene sheets whose in-to-out resistance can be dynamically tuned through external voltage potentials. While several recent works mainly focused on the possibility of using those devices as a new logic primitive for digital circuits, in this paper we address a complementary problem, that is, how to efficiently implement Analog-to-Digital Converters (ADCs) that can be integrated in future all-graphene flexible ICs. The contribution of this work is threefold: (i) introduce a new ADC architecture that perfectly matches with the main characteristics of graphene p-n junctions; (ii) give a first, yet detailed parametric characterization of the proposed ADC architecture as to validate its functionality and quantify its figures of merit; (iii) provide a fully automated design flow that, given as input the design specs, i.e., input voltage range, voltage resolution and sampling rate, returns an optimally sized ADC circuitry. Few case studies also demonstrate p-n junction based graphene ADCs have characteristics in line with those offered by todays' CMOS ones.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125849328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Redundancy is now routinely allocated in circuits, microarchitectural structures, or at the system level, to mitigate mounting manufacturing yield losses. In this paper, we propose spare lane sharing, which reduces the cost of multi-core SIMT systems by allowing one of two neighboring cores to make use of a redundant lane if necessary. We have evaluated the performance-cost trade-offs of core-, lane-, and shared-lane-sparing under a variety of benchmarks, and found that for nearly all applications shared-lane-sparing outperforms lane-sparing, reducing cost by up to 20%.
{"title":"Yield-aware Performance-Cost Characterization for Multi-Core SIMT","authors":"S. H. Mozafari, B. Meyer, K. Skadron","doi":"10.1145/2742060.2742112","DOIUrl":"https://doi.org/10.1145/2742060.2742112","url":null,"abstract":"Redundancy is now routinely allocated in circuits, microarchitectural structures, or at the system level, to mitigate mounting manufacturing yield losses. In this paper, we propose spare lane sharing, which reduces the cost of multi-core SIMT systems by allowing one of two neighboring cores to make use of a redundant lane if necessary. We have evaluated the performance-cost trade-offs of core-, lane-, and shared-lane-sparing under a variety of benchmarks, and found that for nearly all applications shared-lane-sparing outperforms lane-sparing, reducing cost by up to 20%.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125881428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Emerging Technologies","authors":"Yiran Chen","doi":"10.1145/3254014","DOIUrl":"https://doi.org/10.1145/3254014","url":null,"abstract":"","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116611231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graphene has been studied extensively for their properties in the electrical, mechanical, and optical domains. Graphene"s flexible, transparent, and bio-compatible characteristics expand its boundaries from electrical applications to biological applications. Here, we present graphene neural sensors that allow for next generation in vivo imaging and optogenetics for its transparency over a broad wavelength spectrum and ultra-mechanical flexibility. The neural sensors implanted on the brain surface in rodents verify their unique abilities, including see-through in vivo imaging via fluorescence microscopy and 3D optical coherence tomography, and performance in advanced optogenetic experiments. The study is expected to deliver key information regarding the use of graphene in biological environments, specifically the brain. Subsequently, the study will have a strong impact on a wide spectrum of research areas spanning electrical engineering, neural science, and neural engineering.
{"title":"Graphene Neural Sensors for Next Generation In Vivo Imaging and Optogenetics","authors":"Z. Ma","doi":"10.1145/2742060.2745702","DOIUrl":"https://doi.org/10.1145/2742060.2745702","url":null,"abstract":"Graphene has been studied extensively for their properties in the electrical, mechanical, and optical domains. Graphene\"s flexible, transparent, and bio-compatible characteristics expand its boundaries from electrical applications to biological applications. Here, we present graphene neural sensors that allow for next generation in vivo imaging and optogenetics for its transparency over a broad wavelength spectrum and ultra-mechanical flexibility. The neural sensors implanted on the brain surface in rodents verify their unique abilities, including see-through in vivo imaging via fluorescence microscopy and 3D optical coherence tomography, and performance in advanced optogenetic experiments. The study is expected to deliver key information regarding the use of graphene in biological environments, specifically the brain. Subsequently, the study will have a strong impact on a wide spectrum of research areas spanning electrical engineering, neural science, and neural engineering.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123983034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Georgios Zervakis, Kostas Tsoumanis, S. Xydis, N. Axelos, K. Pekmestzi
Approximate computing has received significant attention as a promising strategy to decrease power consumption of inherently error-tolerant applications. Hardware approximation mainly targets arithmetic units, e.g. adders and multipliers. In this paper, we design new approximate hardware multipliers and propose the Partial Product Perforation technique, which omits a number of consecutive partial products by perforating their generation. Through extensive experimental evaluation, we apply the partial product perforation method on different multiplier architectures and expose the optimal configurations for different error values. We show that the partial product perforation delivers reductions of up to 50% in power consumption, 45% in area and 35% in critical delay. Also, the product perforation method is compared with state-of-the-art works on approximate computing that consider the Voltage Over-Scaling (VOS) and logic approximation (i.e. design of approximate compressors) techniques, outperforming them in terms of power dissipation by up to 17% and 20% on average respectively. Finally, with respect to the aforementioned gains, the error value delivered by the proposed product perforation method is smaller by 70% and 99% than the VOS and logic approximation methods respectively.
{"title":"Approximate Multiplier Architectures Through Partial Product Perforation: Power-Area Tradeoffs Analysis","authors":"Georgios Zervakis, Kostas Tsoumanis, S. Xydis, N. Axelos, K. Pekmestzi","doi":"10.1145/2742060.2742109","DOIUrl":"https://doi.org/10.1145/2742060.2742109","url":null,"abstract":"Approximate computing has received significant attention as a promising strategy to decrease power consumption of inherently error-tolerant applications. Hardware approximation mainly targets arithmetic units, e.g. adders and multipliers. In this paper, we design new approximate hardware multipliers and propose the Partial Product Perforation technique, which omits a number of consecutive partial products by perforating their generation. Through extensive experimental evaluation, we apply the partial product perforation method on different multiplier architectures and expose the optimal configurations for different error values. We show that the partial product perforation delivers reductions of up to 50% in power consumption, 45% in area and 35% in critical delay. Also, the product perforation method is compared with state-of-the-art works on approximate computing that consider the Voltage Over-Scaling (VOS) and logic approximation (i.e. design of approximate compressors) techniques, outperforming them in terms of power dissipation by up to 17% and 20% on average respectively. Finally, with respect to the aforementioned gains, the error value delivered by the proposed product perforation method is smaller by 70% and 99% than the VOS and logic approximation methods respectively.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123667250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Songwei Pei, Jingdong Zhang, Yu Jin, Song Jin, Jun Liu, Weizhi Xu
Various types of defects are prone to be occurred inside the TSV during the manufacturing and bonding steps, thereby severely impacting the yield of 3D-stacked ICs. Moreover, several types of TSV defects are latent and may easily escape detection during the manufacturing test. However, these latent TSVs are prone to degrade during the field operation and may eventually become faulty and then destroy the entire 3D-stacked IC. To tackle the above problems, in this paper, we present an effective TSV self-repair scheme for 3D-stacked ICs. By designing redundant TSVs and a TSV self-repair architecture, the proposed scheme can effectively repair faulty TSVs detected by manufacturing test for improving the yield of 3D-stacked ICs. Moreover, the latent TSVS failed and then detected during the in-field operation can also be self-repaired, thereby elevating the 3D ICs' quality and reliability. Experimental results are presented to validate the proposed method.
{"title":"An Effective TSV Self-Repair Scheme for 3D-Stacked ICs","authors":"Songwei Pei, Jingdong Zhang, Yu Jin, Song Jin, Jun Liu, Weizhi Xu","doi":"10.1145/2742060.2742071","DOIUrl":"https://doi.org/10.1145/2742060.2742071","url":null,"abstract":"Various types of defects are prone to be occurred inside the TSV during the manufacturing and bonding steps, thereby severely impacting the yield of 3D-stacked ICs. Moreover, several types of TSV defects are latent and may easily escape detection during the manufacturing test. However, these latent TSVs are prone to degrade during the field operation and may eventually become faulty and then destroy the entire 3D-stacked IC. To tackle the above problems, in this paper, we present an effective TSV self-repair scheme for 3D-stacked ICs. By designing redundant TSVs and a TSV self-repair architecture, the proposed scheme can effectively repair faulty TSVs detected by manufacturing test for improving the yield of 3D-stacked ICs. Moreover, the latent TSVS failed and then detected during the in-field operation can also be self-repaired, thereby elevating the 3D ICs' quality and reliability. Experimental results are presented to validate the proposed method.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121182485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katayoun Neshatpour, H. Homayoun, A. Djahromi, W. Burleson
As CMOS technology scales down towards nanometer regime and the supply voltage approaches the threshold voltage, increase in operating temperature results in increased circuit current, which in turn reduces circuit propagation delay. This paper exploits this new phenomenon, known as inverse thermal dependence (ITD) for power, performance, and temperature optimization in processor architecture. ITD changes the maximum achievable operating frequency of the processor at high temperatures. Dynamic thermal management techniques such as activity migration, dynamic voltage frequency scaling, and throttling are revisited in this paper, with a focus on the effect of ITD. Results are obtained using the predictive technology models of 7nm, 10nm 14nm and 20nm technology nodes and with extensive architectural and circuit simulations. The results show that based on the design goals, various design corners should be re-investigated for power, performance and energy-efficiency optimization. Architectural simulations for a multi-core processor and across standard benchmarks show that utilizing ITD-aware schemes for thermal management improves the performance of the processor in terms of speed and energy-delay-product by 8.55% and 4.4%, respectively.
{"title":"Revisiting Dynamic Thermal Management Exploiting Inverse Thermal Dependence","authors":"Katayoun Neshatpour, H. Homayoun, A. Djahromi, W. Burleson","doi":"10.1145/2742060.2742086","DOIUrl":"https://doi.org/10.1145/2742060.2742086","url":null,"abstract":"As CMOS technology scales down towards nanometer regime and the supply voltage approaches the threshold voltage, increase in operating temperature results in increased circuit current, which in turn reduces circuit propagation delay. This paper exploits this new phenomenon, known as inverse thermal dependence (ITD) for power, performance, and temperature optimization in processor architecture. ITD changes the maximum achievable operating frequency of the processor at high temperatures. Dynamic thermal management techniques such as activity migration, dynamic voltage frequency scaling, and throttling are revisited in this paper, with a focus on the effect of ITD. Results are obtained using the predictive technology models of 7nm, 10nm 14nm and 20nm technology nodes and with extensive architectural and circuit simulations. The results show that based on the design goals, various design corners should be re-investigated for power, performance and energy-efficiency optimization. Architectural simulations for a multi-core processor and across standard benchmarks show that utilizing ITD-aware schemes for thermal management improves the performance of the processor in terms of speed and energy-delay-product by 8.55% and 4.4%, respectively.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132388251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Keynote IV","authors":"M. Margala","doi":"10.1145/3254021","DOIUrl":"https://doi.org/10.1145/3254021","url":null,"abstract":"","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"133 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131658193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stochastic Computing (SC) is an attractive solution for implementing Low Density Parity Codes (LDPC) decoders due to its fault tolerance capability and low hardware requirements. However, in practical implementations, SC efficiency is limited by the Stochastic Bitstream (SB) length and by the computation inaccuracies due to non-unique SB representations. In this paper, rather than statically fixing the SB length at run-time, we propose a Dynamic Bitstream Length Scaling (DBLS) technique, which adjusts on-the-fly the SB length such that Quality of Service requirements for energy efficient LDPC decoding are fulfilled. In this way, depending on the communication channel condition, different SB lengths are adaptively utilized such that the best decoding performance vs energy consumption tradeoff is achieved. To evaluate the DBLS practical implications we selected an (1296,648) LDPC with dv=3 and dc=6 and implemented our approach and the best state-of-the-art stochastic LDPC decoder with 64-bit edge memory on a Virtex-7 FPGA. Experimental results indicate that our proposal requires 9% more FFs and 3% more LUTs while diminishing the energy consumption by 31-80% and providing 1.5-5.1x higher throughput.
{"title":"Dynamic Bitstream Length Scaling Energy Effective Stochastic LDPC Decoding","authors":"T. Marconi, S. Cotofana","doi":"10.1145/2742060.2742117","DOIUrl":"https://doi.org/10.1145/2742060.2742117","url":null,"abstract":"Stochastic Computing (SC) is an attractive solution for implementing Low Density Parity Codes (LDPC) decoders due to its fault tolerance capability and low hardware requirements. However, in practical implementations, SC efficiency is limited by the Stochastic Bitstream (SB) length and by the computation inaccuracies due to non-unique SB representations. In this paper, rather than statically fixing the SB length at run-time, we propose a Dynamic Bitstream Length Scaling (DBLS) technique, which adjusts on-the-fly the SB length such that Quality of Service requirements for energy efficient LDPC decoding are fulfilled. In this way, depending on the communication channel condition, different SB lengths are adaptively utilized such that the best decoding performance vs energy consumption tradeoff is achieved. To evaluate the DBLS practical implications we selected an (1296,648) LDPC with dv=3 and dc=6 and implemented our approach and the best state-of-the-art stochastic LDPC decoder with 64-bit edge memory on a Virtex-7 FPGA. Experimental results indicate that our proposal requires 9% more FFs and 3% more LUTs while diminishing the energy consumption by 31-80% and providing 1.5-5.1x higher throughput.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130766258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}