An evolutionary algorithm (EA) driven novel design space exploration (DSE) of an optimized hardware Trojan secured datapath based on user power-delay constraint during high level synthesis (HLS) is presented. The focus on hardware Trojan secured datapath generation during HLS has been very little with absolutely zero effort so far in design space exploration of a user multi-objective (MO) constraint optimized hardware Trojan secured datapath. This problem mandates attention as producing a Trojan secured datapath is not inconsequential. Merely the detection process of Trojan is not as straightforward as concurrent error detection (CED) of transient faults as it involves the concept of multiple third party intellectual property (3PIP) vendors to facilitate detection, let aside the exploration process of a user optimized Trojan secured datapath based on MO constraints. The proposed DSE for hardware Trojan detection includes novel problem encoding technique that enables exploration of efficient distinct vendor allocation as well as enables exploration of an optimized Trojan secured datapath structure. The exploration backbone for the proposed approach is bacterial foraging optimization algorithm (BFOA) which is known for its adaptive feature (tumbling/swimming) and simplified model. Results of comparison with recent approach indicated an average improvement in quality of results (QoR) of >14.1%
{"title":"Untrusted Third Party Digital IP Cores: Power-Delay Trade-off Driven Exploration of Hardware Trojan Secured Datapath during High Level Synthesis","authors":"A. Sengupta, Saumya Bhadauria","doi":"10.1145/2742060.2742061","DOIUrl":"https://doi.org/10.1145/2742060.2742061","url":null,"abstract":"An evolutionary algorithm (EA) driven novel design space exploration (DSE) of an optimized hardware Trojan secured datapath based on user power-delay constraint during high level synthesis (HLS) is presented. The focus on hardware Trojan secured datapath generation during HLS has been very little with absolutely zero effort so far in design space exploration of a user multi-objective (MO) constraint optimized hardware Trojan secured datapath. This problem mandates attention as producing a Trojan secured datapath is not inconsequential. Merely the detection process of Trojan is not as straightforward as concurrent error detection (CED) of transient faults as it involves the concept of multiple third party intellectual property (3PIP) vendors to facilitate detection, let aside the exploration process of a user optimized Trojan secured datapath based on MO constraints. The proposed DSE for hardware Trojan detection includes novel problem encoding technique that enables exploration of efficient distinct vendor allocation as well as enables exploration of an optimized Trojan secured datapath structure. The exploration backbone for the proposed approach is bacterial foraging optimization algorithm (BFOA) which is known for its adaptive feature (tumbling/swimming) and simplified model. Results of comparison with recent approach indicated an average improvement in quality of results (QoR) of >14.1%","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133254911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, a power density analysis is presented for 7nm FinFET technology node based on both shorted-gate (SG) and independent-gate (IG) standard cells operating in multiple supply voltage regimes. A Liberty-formatted standard cell library is established by selecting the appropriate number of fins for the pull-up and pull-down networks of each logic cell. The layout of both shorted-gate and independent-gate standard cells are then characterized according to lambda-based layout design rules for FinFET devices. Finally, the power density of 7nm FinFET technology node is analyzed and compared with the 45 nm CMOS technology node for different circuits. Experimental result shows that the power density of each 7nm FinFET circuit is 3-20 times larger than that of 45nm CMOS circuit under the spacer-defined technology. Experimental result also shows that the back-gate signal enables a better control of power consumption for independent-gate FinFETs.
{"title":"Layout Characterization and Power Density Analysis for Shorted-Gate and Independent-Gate 7nm FinFET Standard Cells","authors":"Tiansong Cui, Bowen Chen, Yanzhi Wang, Shahin Nazarian, Massoud Pedram","doi":"10.1145/2742060.2742093","DOIUrl":"https://doi.org/10.1145/2742060.2742093","url":null,"abstract":"In this paper, a power density analysis is presented for 7nm FinFET technology node based on both shorted-gate (SG) and independent-gate (IG) standard cells operating in multiple supply voltage regimes. A Liberty-formatted standard cell library is established by selecting the appropriate number of fins for the pull-up and pull-down networks of each logic cell. The layout of both shorted-gate and independent-gate standard cells are then characterized according to lambda-based layout design rules for FinFET devices. Finally, the power density of 7nm FinFET technology node is analyzed and compared with the 45 nm CMOS technology node for different circuits. Experimental result shows that the power density of each 7nm FinFET circuit is 3-20 times larger than that of 45nm CMOS circuit under the spacer-defined technology. Experimental result also shows that the back-gate signal enables a better control of power consumption for independent-gate FinFETs.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132792224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Miryala, V. Tenace, A. Calimera, E. Macii, M. Poncino, L. Amarù, G. Micheli, P. Gaillardon
As an answer to the new electronics market demands, semiconductor industry is looking for different materials, new process technologies and alternative design solutions that can support Silicon replacement in the VLSI domain. The recent introduction of graphene, together with the option of electrostatically controlling its doping profile, has shown a possible way to implement fast and power efficient Reconfigurable Gates (RGs). Also, and this is the most important feature considered in this work, those graphene RGs show higher expressive power, i.e., they implement more complex functions, like Majority, MUX, XOR, with less area w.r.t. CMOS counterparts. Unfortunately, state-of-the-art synthesis tools, which have been customized for standard NAND/NOR CMOS gates, do not exploit the aforementioned feature of graphene RGs. In this paper, we present a post-synthesis tool that translates the gate level netlist obtained from commercial synthesis tools to a more optimized netlist that can efficiently integrate graphene RGs. Results conducted on a set of open-source benchmarks demonstrate that the proposed strategy improves, on average, both area and performance by 17% and 8.17% respectively.
{"title":"Exploiting the Expressive Power of Graphene Reconfigurable Gates via Post-Synthesis Optimization","authors":"S. Miryala, V. Tenace, A. Calimera, E. Macii, M. Poncino, L. Amarù, G. Micheli, P. Gaillardon","doi":"10.1145/2742060.2742098","DOIUrl":"https://doi.org/10.1145/2742060.2742098","url":null,"abstract":"As an answer to the new electronics market demands, semiconductor industry is looking for different materials, new process technologies and alternative design solutions that can support Silicon replacement in the VLSI domain. The recent introduction of graphene, together with the option of electrostatically controlling its doping profile, has shown a possible way to implement fast and power efficient Reconfigurable Gates (RGs). Also, and this is the most important feature considered in this work, those graphene RGs show higher expressive power, i.e., they implement more complex functions, like Majority, MUX, XOR, with less area w.r.t. CMOS counterparts. Unfortunately, state-of-the-art synthesis tools, which have been customized for standard NAND/NOR CMOS gates, do not exploit the aforementioned feature of graphene RGs. In this paper, we present a post-synthesis tool that translates the gate level netlist obtained from commercial synthesis tools to a more optimized netlist that can efficiently integrate graphene RGs. Results conducted on a set of open-source benchmarks demonstrate that the proposed strategy improves, on average, both area and performance by 17% and 8.17% respectively.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125194653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a circuit-level analysis of deep voltage-scaled FPGAs, which operate from full supply to sub-threshold voltages. The logic as well as the interconnect of the FPGA are modeled at the circuit level, and their relative contribution to the delay, power and energy of the FPGA are studied by means of circuit simulations. Three representative designs are studied to explore these design trade-offs. We conclude that the energy and delay-minimal FPGA design is one in which both the interconnect and logic are curtailed from scaling below a fixed voltage (about 550mV in our experiments). If power is a more important design factor (at the cost of delay), it is beneficial to operate both the logic and interconnect between 300mV and 800mV.
{"title":"Delay, Power and Energy Tradeoffs in Deep Voltage-scaled FPGAs","authors":"M. Abusultan, S. Khatri","doi":"10.1145/2742060.2742120","DOIUrl":"https://doi.org/10.1145/2742060.2742120","url":null,"abstract":"In this paper, we present a circuit-level analysis of deep voltage-scaled FPGAs, which operate from full supply to sub-threshold voltages. The logic as well as the interconnect of the FPGA are modeled at the circuit level, and their relative contribution to the delay, power and energy of the FPGA are studied by means of circuit simulations. Three representative designs are studied to explore these design trade-offs. We conclude that the energy and delay-minimal FPGA design is one in which both the interconnect and logic are curtailed from scaling below a fixed voltage (about 550mV in our experiments). If power is a more important design factor (at the cost of delay), it is beneficial to operate both the logic and interconnect between 300mV and 800mV.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128620997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Process variation presents a practical challenge on the performance of analog and mixed signal (AMS) circuits. This paper proposes a Monte Carlo-Jackknife (MC-JK) technique, a variant of Monte Carlo method, to verify process variation affecting the performance and functionality of AMS designs. We use a behavioral model to which we encompass device variation due to $65nm$ technology process. Next, we conduct hypothesis testing based on the MC-JK technique combined with Latin hypercube sampling in a statistical run-time verification environment. Experimental results demonstrate the robustness of our approach in verifying AMS circuits.
{"title":"Statistically Validating the Impact of Process Variations on Analog and Mixed Signal Designs","authors":"Ibtissem Seghaier, M. Zaki, S. Tahar","doi":"10.1145/2742060.2742122","DOIUrl":"https://doi.org/10.1145/2742060.2742122","url":null,"abstract":"Process variation presents a practical challenge on the performance of analog and mixed signal (AMS) circuits. This paper proposes a Monte Carlo-Jackknife (MC-JK) technique, a variant of Monte Carlo method, to verify process variation affecting the performance and functionality of AMS designs. We use a behavioral model to which we encompass device variation due to $65nm$ technology process. Next, we conduct hypothesis testing based on the MC-JK technique combined with Latin hypercube sampling in a statistical run-time verification environment. Experimental results demonstrate the robustness of our approach in verifying AMS circuits.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126356962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michail Mavropoulos, G. Keramidas, Grigorios Adamopoulos, D. Nikolos
Processor caches play a critical role in the performance of today"s computer systems. As technology scales, due to manufacturing defects and process variations a large number of cells in a cache is expected to be faulty. The number of faulty cells varies from die to die and in the field of the application depends on the operating conditions (e.g., supply voltage, frequency). Several techniques have been proposed to tolerate faults in caches. A drawback of the redundancy based techniques is that the amount of redundancy is decided at the design time targeting a maximum number of faults, so in cases of a small number of faults (e.g., in the nominal supply voltage in a system with DVS) only a part of the redundant resources is used. In this paper we propose a new reconfigurable-self adaptive fault tolerant cache scheme. The unique characteristic of our scheme is that it uses its resources for both the reduction of the misses caused by the faulty blocks as well as for the reduction of conflict misses, depending on the number of faults, their distribution in the cache, and the running application. Our experimental results for a wide range of scientific applications and a plethora of fault maps with different SRAM failure probabilities reveal that our proposal can achieve significant benefits.
{"title":"Reconfigurable: Self Adaptive Fault Tolerant Cache Memory for DVS enabled Systems","authors":"Michail Mavropoulos, G. Keramidas, Grigorios Adamopoulos, D. Nikolos","doi":"10.1145/2742060.2742091","DOIUrl":"https://doi.org/10.1145/2742060.2742091","url":null,"abstract":"Processor caches play a critical role in the performance of today\"s computer systems. As technology scales, due to manufacturing defects and process variations a large number of cells in a cache is expected to be faulty. The number of faulty cells varies from die to die and in the field of the application depends on the operating conditions (e.g., supply voltage, frequency). Several techniques have been proposed to tolerate faults in caches. A drawback of the redundancy based techniques is that the amount of redundancy is decided at the design time targeting a maximum number of faults, so in cases of a small number of faults (e.g., in the nominal supply voltage in a system with DVS) only a part of the redundant resources is used. In this paper we propose a new reconfigurable-self adaptive fault tolerant cache scheme. The unique characteristic of our scheme is that it uses its resources for both the reduction of the misses caused by the faulty blocks as well as for the reduction of conflict misses, depending on the number of faults, their distribution in the cache, and the running application. Our experimental results for a wide range of scientific applications and a plethora of fault maps with different SRAM failure probabilities reveal that our proposal can achieve significant benefits.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128008853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Cavigelli, David Gschwend, Christoph Mayer, Samuel Willi, Beat Muheim, L. Benini
Today advanced computer vision (CV) systems of ever increasing complexity are being deployed in a growing number of application scenarios with strong real-time and power constraints. Current trends in CV clearly show a rise of neural network-based algorithms, which have recently broken many object detection and localization records. These approaches are very flexible and can be used to tackle many different challenges by only changing their parameters. In this paper, we present the first convolutional network accelerator which is scalable to network sizes that are currently only handled by workstation GPUs, but remains within the power envelope of embedded systems. The architecture has been implemented on 3.09 mm2 core area in UMC 65 nm technology, capable of a throughput of 274 GOp/s at 369 GOp/s/W with an external memory bandwidth of just 525 MB/s full-duplex " a decrease of more than 90% from previous work.
{"title":"Origami: A Convolutional Network Accelerator","authors":"L. Cavigelli, David Gschwend, Christoph Mayer, Samuel Willi, Beat Muheim, L. Benini","doi":"10.1145/2742060.2743766","DOIUrl":"https://doi.org/10.1145/2742060.2743766","url":null,"abstract":"Today advanced computer vision (CV) systems of ever increasing complexity are being deployed in a growing number of application scenarios with strong real-time and power constraints. Current trends in CV clearly show a rise of neural network-based algorithms, which have recently broken many object detection and localization records. These approaches are very flexible and can be used to tackle many different challenges by only changing their parameters. In this paper, we present the first convolutional network accelerator which is scalable to network sizes that are currently only handled by workstation GPUs, but remains within the power envelope of embedded systems. The architecture has been implemented on 3.09 mm2 core area in UMC 65 nm technology, capable of a throughput of 274 GOp/s at 369 GOp/s/W with an external memory bandwidth of just 525 MB/s full-duplex \" a decrease of more than 90% from previous work.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134280952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Devin P. Sullivan, Rohan Arepally, R. Murphy, J. Tapia, J. Faeder, M. Dittrich, J. Czech
Understanding the dynamics of biochemical networks is a major goal of systems biology. Due to the heterogeneity of cells and the low copy numbers of key molecules, spatially resolved approaches are required to fully understand and model these systems. Until recently, most spatial modeling was performed using geometries obtained either through manual segmentation or manual fabrication both of which are time-consuming and tedious. Similarly, the system of reactions associated with the model had to be manually defined, a process that is both tedious and error-prone for large networks. As a result, spatially resolved simulations have typically only been performed in a limited number of geometries, which are often highly simplified, and with small reaction networks.
{"title":"Design Automation for Biological Models: A Pipeline that Incorporates Spatial and Molecular Complexity","authors":"Devin P. Sullivan, Rohan Arepally, R. Murphy, J. Tapia, J. Faeder, M. Dittrich, J. Czech","doi":"10.1145/2742060.2743763","DOIUrl":"https://doi.org/10.1145/2742060.2743763","url":null,"abstract":"Understanding the dynamics of biochemical networks is a major goal of systems biology. Due to the heterogeneity of cells and the low copy numbers of key molecules, spatially resolved approaches are required to fully understand and model these systems. Until recently, most spatial modeling was performed using geometries obtained either through manual segmentation or manual fabrication both of which are time-consuming and tedious. Similarly, the system of reactions associated with the model had to be manually defined, a process that is both tedious and error-prone for large networks. As a result, spatially resolved simulations have typically only been performed in a limited number of geometries, which are often highly simplified, and with small reaction networks.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129848588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In Chip Multiprocessors, traditional metallic interconnects will soon reach their bandwidth and energy dissipation limits. Photonic NoC (PNoC) is a promising alternative to renew higher performance in the advent of rising number of cores on chip. Efficient PNoC architectures are needed to reduce laser related energy consumption and maintain high performance. In this work we propose a novel sandwich layered approach to design a 3D PNoC architecture that is able to reduce no of hops, cross over points, and no of laser sources using multiplexing techniques. The 3D hybrid PNoC uses high performance 5X5 photonic routers incorporating mode division multiplexing (MDM) along with wavelength division multiplexing (WDM) and time division multiplexing (TDM). Experimental results demonstrates an increase in aggregated bandwidth up to 4x while reducing average energy consumption per router by 83% as compared to the recently reported results.
{"title":"A Multilayered Design Approach for Efficient Hybrid 3D Photonics Network-on-chip","authors":"Dharanidhar Dang, B. Patra, R. Mahapatra","doi":"10.1145/2742060.2742083","DOIUrl":"https://doi.org/10.1145/2742060.2742083","url":null,"abstract":"In Chip Multiprocessors, traditional metallic interconnects will soon reach their bandwidth and energy dissipation limits. Photonic NoC (PNoC) is a promising alternative to renew higher performance in the advent of rising number of cores on chip. Efficient PNoC architectures are needed to reduce laser related energy consumption and maintain high performance. In this work we propose a novel sandwich layered approach to design a 3D PNoC architecture that is able to reduce no of hops, cross over points, and no of laser sources using multiplexing techniques. The 3D hybrid PNoC uses high performance 5X5 photonic routers incorporating mode division multiplexing (MDM) along with wavelength division multiplexing (WDM) and time division multiplexing (TDM). Experimental results demonstrates an increase in aggregated bandwidth up to 4x while reducing average energy consumption per router by 83% as compared to the recently reported results.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127989472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Photonic interconnects is a disruptive technology solution that can overcome the power and bandwidth limitations of traditional electrical Network-on-Chips (NoCs). However, the static power dissipated in the external laser may limit the performance of future optical NoCs by dominating the stringent network power budget. From the analysis of real benchmarks for multi-cores, it is observed that high static power is consumed due to the external laser even for low channel utilization. In this paper, we propose runtime power management techniques to reduce the magnitude of laser power consumption by tuning the network in response to actual application characteristics. We scale the number of channels available for communication based on link and buffer utilization. The performance on synthetic and real traffic (PARSEC, Splash-2) for 64-cores indicate that our proposed power scaling technique can reduce optical power by about 70% with less than 1% throughput penalty for real traffic.
{"title":"Dynamic Power Reduction Techniques in On-Chip Photonic Interconnects","authors":"B. Neel, M. Kennedy, Avinash Karanth Kodi","doi":"10.1145/2742060.2742118","DOIUrl":"https://doi.org/10.1145/2742060.2742118","url":null,"abstract":"Photonic interconnects is a disruptive technology solution that can overcome the power and bandwidth limitations of traditional electrical Network-on-Chips (NoCs). However, the static power dissipated in the external laser may limit the performance of future optical NoCs by dominating the stringent network power budget. From the analysis of real benchmarks for multi-cores, it is observed that high static power is consumed due to the external laser even for low channel utilization. In this paper, we propose runtime power management techniques to reduce the magnitude of laser power consumption by tuning the network in response to actual application characteristics. We scale the number of channels available for communication based on link and buffer utilization. The performance on synthetic and real traffic (PARSEC, Splash-2) for 64-cores indicate that our proposed power scaling technique can reduce optical power by about 70% with less than 1% throughput penalty for real traffic.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129371173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}