Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8225991
G. Teepe
FDSOI and FINFet use the same electrostatic principles for their transistor architectures: the conduction properties of a thin layer of undoped semiconductor material are influenced by an isolated gate. For the same layer thickness, FINFET has more drive current and higher packing densities and FDSOI, due to a buried back-gate, shows more design flexibility, can handle extremely low supply voltages and is more cost effective due to its planar structure. While FINFet enables a continuation of Moore's Law for performance applications like Computing and Network-Switching, FDSOI shows excellent results for applications in the Internet-of-Things-domain. GLOBALFOUNDRIES has presented a dual roadmap based on FINFet and on FDSOI. On the FINFet-side it has a 14nm-technology in production and a 7nm-technology in development. Also, GLOBALFOUNDRIES has the FDSOI-based 22FDX™-Technology in production, and 12FDX™ in development. The talk will outline the application areas for FINFet and FDSOI and give examples on how to use the back-gate bias for maximum design flexibility.
{"title":"Wednesday keynote I: FDSOI and FINFET for SoC developments","authors":"G. Teepe","doi":"10.1109/SOCC.2017.8225991","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8225991","url":null,"abstract":"FDSOI and FINFet use the same electrostatic principles for their transistor architectures: the conduction properties of a thin layer of undoped semiconductor material are influenced by an isolated gate. For the same layer thickness, FINFET has more drive current and higher packing densities and FDSOI, due to a buried back-gate, shows more design flexibility, can handle extremely low supply voltages and is more cost effective due to its planar structure. While FINFet enables a continuation of Moore's Law for performance applications like Computing and Network-Switching, FDSOI shows excellent results for applications in the Internet-of-Things-domain. GLOBALFOUNDRIES has presented a dual roadmap based on FINFet and on FDSOI. On the FINFet-side it has a 14nm-technology in production and a 7nm-technology in development. Also, GLOBALFOUNDRIES has the FDSOI-based 22FDX™-Technology in production, and 12FDX™ in development. The talk will outline the application areas for FINFet and FDSOI and give examples on how to use the back-gate bias for maximum design flexibility.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129536744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226066
Zicong Wang, Xiaowen Chen, Chen Li, Yang Guo
Networks-on-Chip (NoC) is becoming the backbone of modern chip multiprocessor (CMP) systems. However, with the number of integrated cores increasing and the network size scaling up, the network-latency imbalance is becoming an important problem, which seriously influences the performance of the network and system. In this paper, we aim to alleviate this problem by optimizing the design of switch allocation. We propose fairness-oriented switch allocation (FOSA), a novel switch allocation strategy to achieve uniform network latencies. FOSA can improve system performance by achieving remarkable improvement in balancing network latencies. We evaluate the network and system performance of FOSA with synthetic traffics and SPEC CPU2006 benchmarks in a full-system simulator. Compared with the canonical separable switch allocator (Round-Robin) and the recently proposed switch allocator (TS-Router), the experiments with benchmarks show that our approach decreases maximum latency (ML) by 45.6% and 15.1%, respectively, as well as latency standard deviation (LSD) by 13.8% and 3.9%, respectively. Besides this, FOSA improves system throughput by 0.8% over that of TS-Router. Finally, we synthesize FOSA and give an evaluation of the additional consumption of area and power.
{"title":"Fairness-oriented switch allocation for networks-on-chip","authors":"Zicong Wang, Xiaowen Chen, Chen Li, Yang Guo","doi":"10.1109/SOCC.2017.8226066","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226066","url":null,"abstract":"Networks-on-Chip (NoC) is becoming the backbone of modern chip multiprocessor (CMP) systems. However, with the number of integrated cores increasing and the network size scaling up, the network-latency imbalance is becoming an important problem, which seriously influences the performance of the network and system. In this paper, we aim to alleviate this problem by optimizing the design of switch allocation. We propose fairness-oriented switch allocation (FOSA), a novel switch allocation strategy to achieve uniform network latencies. FOSA can improve system performance by achieving remarkable improvement in balancing network latencies. We evaluate the network and system performance of FOSA with synthetic traffics and SPEC CPU2006 benchmarks in a full-system simulator. Compared with the canonical separable switch allocator (Round-Robin) and the recently proposed switch allocator (TS-Router), the experiments with benchmarks show that our approach decreases maximum latency (ML) by 45.6% and 15.1%, respectively, as well as latency standard deviation (LSD) by 13.8% and 3.9%, respectively. Besides this, FOSA improves system throughput by 0.8% over that of TS-Router. Finally, we synthesize FOSA and give an evaluation of the additional consumption of area and power.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117194544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226029
Haipeng Lin, A. Zjajo, R. V. Leuken
The high level of realism of spiking neuron networks and their complexity require a substantial computational resources limiting the size of the realized networks. Consequently, the main challenge in building complex and biologically-accurate spiking neuron network is largely set by the high computational and data transfer demands. In this paper, we implement several efficient models of the spiking neurons with characteristics such as axon conduction delays and spike timing-dependent plasticity. Experimental results indicate that the proposed real-time data-flow learning network architecture allows the capacity of over 2800 (depending on the model complexity) biophysically accurate neurons in a single FPGA device.
{"title":"Digital spiking neuron cells for real-time reconfigurable learning networks","authors":"Haipeng Lin, A. Zjajo, R. V. Leuken","doi":"10.1109/SOCC.2017.8226029","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226029","url":null,"abstract":"The high level of realism of spiking neuron networks and their complexity require a substantial computational resources limiting the size of the realized networks. Consequently, the main challenge in building complex and biologically-accurate spiking neuron network is largely set by the high computational and data transfer demands. In this paper, we implement several efficient models of the spiking neurons with characteristics such as axon conduction delays and spike timing-dependent plasticity. Experimental results indicate that the proposed real-time data-flow learning network architecture allows the capacity of over 2800 (depending on the model complexity) biophysically accurate neurons in a single FPGA device.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134430859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226035
Ryosuke Koike, Takashi Imagawa, R. Y. Omaki, H. Ochi
In this paper, we describe a Selectable Grained Reconfigurable Architecture (SGRA) in which each Configurable Logic Block can be configured to operate in either fine-grained or coarse-grained mode. Compared with the Mixed Grained Reconfigurable Architecture (MGRA), which has a fixed ratio of fine- and coarse-grained operation blocks and a heterogeneous floorplan, SGRA offers greater flexibility in the mapping and placement of functional units, thus reducing wasted wiring and improving the critical path delay. We also present an automated design flow for SGRA that is developed by customizing the Verilog-to-Routing (VTR) platform. Experimental results demonstrate that SGRA achieves, on average, a 13% reduction in circuit area over MGRA.
{"title":"Selectable grained reconfigurable architecture (SGRA) and its design automation","authors":"Ryosuke Koike, Takashi Imagawa, R. Y. Omaki, H. Ochi","doi":"10.1109/SOCC.2017.8226035","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226035","url":null,"abstract":"In this paper, we describe a Selectable Grained Reconfigurable Architecture (SGRA) in which each Configurable Logic Block can be configured to operate in either fine-grained or coarse-grained mode. Compared with the Mixed Grained Reconfigurable Architecture (MGRA), which has a fixed ratio of fine- and coarse-grained operation blocks and a heterogeneous floorplan, SGRA offers greater flexibility in the mapping and placement of functional units, thus reducing wasted wiring and improving the critical path delay. We also present an automated design flow for SGRA that is developed by customizing the Verilog-to-Routing (VTR) platform. Experimental results demonstrate that SGRA achieves, on average, a 13% reduction in circuit area over MGRA.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122536714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226049
B. Yigit, Grace Li Zhang, Bing Li, Yiyu Shi, Ulf Schlichtmann
In nanometer scale manufacturing, process variations have a significant impact on circuit performance. To handle them, post-silicon clock tuning buffers can be included into the circuit to balance timing budgets of neighboring critical paths. The state of the art is a sampling-based approach, in which an integer linear programming (ILP) problem must be solved for every sample. The runtime complexity of this approach is the number of samples multiplied by the required time for an ILP solution. Existing work tries to reduce the number of samples but still leaves the problem of a long runtime unsolved. In this paper, we propose a machine learning approach to reduce the runtime by learning the positions and sizes of post-silicon tuning buffers. Experimental results demonstrate that we can predict buffer locations and sizes with a very good accuracy (90% and higher) and achieve a significant yield improvement (up to 18.8%) with a significant speed-up (up to almost 20 times) compared to existing work.
{"title":"Application of machine learning methods in post-silicon yield improvement","authors":"B. Yigit, Grace Li Zhang, Bing Li, Yiyu Shi, Ulf Schlichtmann","doi":"10.1109/SOCC.2017.8226049","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226049","url":null,"abstract":"In nanometer scale manufacturing, process variations have a significant impact on circuit performance. To handle them, post-silicon clock tuning buffers can be included into the circuit to balance timing budgets of neighboring critical paths. The state of the art is a sampling-based approach, in which an integer linear programming (ILP) problem must be solved for every sample. The runtime complexity of this approach is the number of samples multiplied by the required time for an ILP solution. Existing work tries to reduce the number of samples but still leaves the problem of a long runtime unsolved. In this paper, we propose a machine learning approach to reduce the runtime by learning the positions and sizes of post-silicon tuning buffers. Experimental results demonstrate that we can predict buffer locations and sizes with a very good accuracy (90% and higher) and achieve a significant yield improvement (up to 18.8%) with a significant speed-up (up to almost 20 times) compared to existing work.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121227948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8225981
D. Bertozzi, S. Rumley
The tutorial aims to address electrical communications link limitations by developing chipscale, integrated photonic technology to enable seamless intrachip and off-chip photonic communications that provide the required bandwidth with low energy/bit. The emerging technology will exploit wavelength division multiplexing (WDM), allowing much higher bandwidth capacity per link, which is imperative to meeting the communication needs of future microprocessors. Such a capability would propel the microprocessor onto a new performance trajectory and impact the actual runtime performance of relevant computing tasks for power-starved embedded applications and supercomputing. The challenges in realizing optical interconnect technology are developing CMOS and DRAM-compatible photonic links that are spectrally broad, operate at high bit-rates with very low power dissipation, and are tightly integrated with electronic drivers. Ultimately, the goal of this tutorial is to demonstrate photonic technologies that can be integrated within embedded microprocessors and enable seamless, energy-efficient, high-capacity communications within and between the microprocessor and DRAM. It is envisioned that optical interconnect technology will be especially useful for those platforms where extreme performance coupled with low size, weight, and power is a necessity (e.g. UAVs, and satellites).
{"title":"Propelling breakthrough embedded microprocessors by means of integrated photonics","authors":"D. Bertozzi, S. Rumley","doi":"10.1109/SOCC.2017.8225981","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8225981","url":null,"abstract":"The tutorial aims to address electrical communications link limitations by developing chipscale, integrated photonic technology to enable seamless intrachip and off-chip photonic communications that provide the required bandwidth with low energy/bit. The emerging technology will exploit wavelength division multiplexing (WDM), allowing much higher bandwidth capacity per link, which is imperative to meeting the communication needs of future microprocessors. Such a capability would propel the microprocessor onto a new performance trajectory and impact the actual runtime performance of relevant computing tasks for power-starved embedded applications and supercomputing. The challenges in realizing optical interconnect technology are developing CMOS and DRAM-compatible photonic links that are spectrally broad, operate at high bit-rates with very low power dissipation, and are tightly integrated with electronic drivers. Ultimately, the goal of this tutorial is to demonstrate photonic technologies that can be integrated within embedded microprocessors and enable seamless, energy-efficient, high-capacity communications within and between the microprocessor and DRAM. It is envisioned that optical interconnect technology will be especially useful for those platforms where extreme performance coupled with low size, weight, and power is a necessity (e.g. UAVs, and satellites).","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115172435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226027
T. Harbaum, C. Schade, Marvin Damschen, Carsten Tradowsky, L. Bauer, J. Henkel, J. Becker
Modern computer architectures have an ever-increasing demand for performance, but are constrained in power dissipation and chip area. To tackle these demands, architectures with application-specific accelerators have gained traction in research and industry. While this is a very promising direction, hard-wired accelerators fall short when too many applications need to be supported or flexibility is required. In this paper, we propose an automatic loop detection and hardware acceleration approach for an adaptive reconfigurable processor. Our contribution is Auto-SI, an automated process that transparently and dynamically provides hardware acceleration alongside a general-purpose processor by employing reconfigurable hardware. We detail the benefits of Auto-SI, i.e., transparent and flexible acceleration of unmodified binaries, provide an analysis of the overheads incurred and present an evaluation of our implementation prototype.
{"title":"Auto-SI: An adaptive reconfigurable processor with run-time loop detection and acceleration","authors":"T. Harbaum, C. Schade, Marvin Damschen, Carsten Tradowsky, L. Bauer, J. Henkel, J. Becker","doi":"10.1109/SOCC.2017.8226027","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226027","url":null,"abstract":"Modern computer architectures have an ever-increasing demand for performance, but are constrained in power dissipation and chip area. To tackle these demands, architectures with application-specific accelerators have gained traction in research and industry. While this is a very promising direction, hard-wired accelerators fall short when too many applications need to be supported or flexibility is required. In this paper, we propose an automatic loop detection and hardware acceleration approach for an adaptive reconfigurable processor. Our contribution is Auto-SI, an automated process that transparently and dynamically provides hardware acceleration alongside a general-purpose processor by employing reconfigurable hardware. We detail the benefits of Auto-SI, i.e., transparent and flexible acceleration of unmodified binaries, provide an analysis of the overheads incurred and present an evaluation of our implementation prototype.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115804241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226023
Suryanarayanan Subramaniam, Tanmay Shinde, Padmanabh Deshmukh, Md Shahriar Shamim, Mark A. Indovina, A. Ganguly
Wireless interconnects are capable of establishing energy-efficient intra and inter-chip data communications. This paper introduces a circuit level design of an energy-efficient millimeter-wave (mm-wave) non-coherent on-off keying (OOK) receiver suitable for such wireless interconnects in 45-nm CMOS process. The receiver consists of a simple two-stage common source structure based Low Noise Amplifier (LNA) and a source degenerated differential Envelope Detector (ED) followed by a Base Band (BB) amplifier stage. Operating at 60GHz, the proposed OOK receiver consumes only 6.1mW DC power from a 1V supply while providing a data rate of 17Gbps and a bit-energy efficiency of 0.36 pJ/bit.
{"title":"A 0.36pJ/bit, 17Gbps OOK receiver in 45-nm CMOS for inter and intra-chip wireless interconnects","authors":"Suryanarayanan Subramaniam, Tanmay Shinde, Padmanabh Deshmukh, Md Shahriar Shamim, Mark A. Indovina, A. Ganguly","doi":"10.1109/SOCC.2017.8226023","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226023","url":null,"abstract":"Wireless interconnects are capable of establishing energy-efficient intra and inter-chip data communications. This paper introduces a circuit level design of an energy-efficient millimeter-wave (mm-wave) non-coherent on-off keying (OOK) receiver suitable for such wireless interconnects in 45-nm CMOS process. The receiver consists of a simple two-stage common source structure based Low Noise Amplifier (LNA) and a source degenerated differential Envelope Detector (ED) followed by a Base Band (BB) amplifier stage. Operating at 60GHz, the proposed OOK receiver consumes only 6.1mW DC power from a 1V supply while providing a data rate of 17Gbps and a bit-energy efficiency of 0.36 pJ/bit.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124133768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226065
S. Muhammad, A. El-Moursy, M. El-Moursy, H. Hamed
System-Level simulator is proposed to determine the ability of synchronous and asynchronous NoCs to alleviate the process variation effect. Throughput variation and different delay components variation are provided by the newly developed framework. System-Level simulation shows similarities with circuit-level simulation in terms of behavior and performance variation trend when moving from one technology node to another. Clock skew significantly degrades synchronous NoCs performance. Clock skew is more obvious with process variation. Despite the handshaking overhead, asynchronous NoC may be more immune to process variation than synchronous networks. PV-aware routing algorithm reduces the performance degradation to 8.3% and 11.4% for 45nm and 32nm asynchronous NoCs respectively. Using different traffic workloads and PV-unaware routing algorithm, synchronous networks lose on average 17.7% and 27.8% of nominal throughput for 45nm and 32nm technologies, respectively due to process variation. Whereas, asynchronous NoC throughput degradation is about 7.4% and 11.5% for 45nm and 32nm, respectively. In addition to technology scaling, NoC scaling also affects the throughput degradation. 256-core NoC shows the highest throughput degradation of 16% and 22% for asynchronous NoC for 45nm and 32nm technologies respectively.
{"title":"System-level simulator for process variation influenced synchronous and asynchronous NoCs","authors":"S. Muhammad, A. El-Moursy, M. El-Moursy, H. Hamed","doi":"10.1109/SOCC.2017.8226065","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226065","url":null,"abstract":"System-Level simulator is proposed to determine the ability of synchronous and asynchronous NoCs to alleviate the process variation effect. Throughput variation and different delay components variation are provided by the newly developed framework. System-Level simulation shows similarities with circuit-level simulation in terms of behavior and performance variation trend when moving from one technology node to another. Clock skew significantly degrades synchronous NoCs performance. Clock skew is more obvious with process variation. Despite the handshaking overhead, asynchronous NoC may be more immune to process variation than synchronous networks. PV-aware routing algorithm reduces the performance degradation to 8.3% and 11.4% for 45nm and 32nm asynchronous NoCs respectively. Using different traffic workloads and PV-unaware routing algorithm, synchronous networks lose on average 17.7% and 27.8% of nominal throughput for 45nm and 32nm technologies, respectively due to process variation. Whereas, asynchronous NoC throughput degradation is about 7.4% and 11.5% for 45nm and 32nm, respectively. In addition to technology scaling, NoC scaling also affects the throughput degradation. 256-core NoC shows the highest throughput degradation of 16% and 22% for asynchronous NoC for 45nm and 32nm technologies respectively.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122911871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/SOCC.2017.8226018
E. M. Benhani, Cédric Marchand, A. Aubert, L. Bossuet
As the complexity of System-on-Chip (SoC) and the reuse of third party IP continues to grow, the security of a heterogeneous SoC has become a critical issue. In order to increase the software security of such SoC, the TrustZone technology has been proposed by ARM to enforce software security. Nevertheless, many SoC embed non-trusted third party Intellectual Property (IP) trying to take the benefits of this technology. In such case, is the security guaranteed by the ARM TrustZone technology reduced by the heterogeneity of SoC? In order to answer to this question, this paper presents relevant attack scenarios based on third party IP to exploit some security failures of the TrustZone extension through the all SoC. At the end, this article proposes to SoC designers to consider some design solutions to limit the impact of a malicious IP.
{"title":"On the security evaluation of the ARM TrustZone extension in a heterogeneous SoC","authors":"E. M. Benhani, Cédric Marchand, A. Aubert, L. Bossuet","doi":"10.1109/SOCC.2017.8226018","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226018","url":null,"abstract":"As the complexity of System-on-Chip (SoC) and the reuse of third party IP continues to grow, the security of a heterogeneous SoC has become a critical issue. In order to increase the software security of such SoC, the TrustZone technology has been proposed by ARM to enforce software security. Nevertheless, many SoC embed non-trusted third party Intellectual Property (IP) trying to take the benefits of this technology. In such case, is the security guaranteed by the ARM TrustZone technology reduced by the heterogeneity of SoC? In order to answer to this question, this paper presents relevant attack scenarios based on third party IP to exploit some security failures of the TrustZone extension through the all SoC. At the end, this article proposes to SoC designers to consider some design solutions to limit the impact of a malicious IP.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121649846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}