首页 > 最新文献

2017 30th IEEE International System-on-Chip Conference (SOCC)最新文献

英文 中文
Wednesday keynote I: FDSOI and FINFET for SoC developments 周三主题演讲1:SoC发展中的FDSOI和FINFET
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8225991
G. Teepe
FDSOI and FINFet use the same electrostatic principles for their transistor architectures: the conduction properties of a thin layer of undoped semiconductor material are influenced by an isolated gate. For the same layer thickness, FINFET has more drive current and higher packing densities and FDSOI, due to a buried back-gate, shows more design flexibility, can handle extremely low supply voltages and is more cost effective due to its planar structure. While FINFet enables a continuation of Moore's Law for performance applications like Computing and Network-Switching, FDSOI shows excellent results for applications in the Internet-of-Things-domain. GLOBALFOUNDRIES has presented a dual roadmap based on FINFet and on FDSOI. On the FINFet-side it has a 14nm-technology in production and a 7nm-technology in development. Also, GLOBALFOUNDRIES has the FDSOI-based 22FDX™-Technology in production, and 12FDX™ in development. The talk will outline the application areas for FINFet and FDSOI and give examples on how to use the back-gate bias for maximum design flexibility.
FDSOI和FINFet的晶体管结构采用相同的静电原理:未掺杂半导体材料薄层的传导特性受到隔离栅极的影响。对于相同的层厚,FINFET具有更大的驱动电流和更高的封装密度,而FDSOI由于具有埋置式后门,因此具有更大的设计灵活性,可以处理极低的电源电压,并且由于其平面结构而具有更高的成本效益。FINFet在计算和网络交换等性能应用中延续了摩尔定律,而FDSOI在物联网领域的应用中表现出色。GLOBALFOUNDRIES提出了基于FINFet和FDSOI的双重路线图。在finfet方面,14nm技术正在生产,7nm技术正在开发中。此外,GLOBALFOUNDRIES已在生产基于fdsoi的22FDX™技术,并在开发12FDX™。讲座将概述FINFet和FDSOI的应用领域,并举例说明如何使用后门偏置以获得最大的设计灵活性。
{"title":"Wednesday keynote I: FDSOI and FINFET for SoC developments","authors":"G. Teepe","doi":"10.1109/SOCC.2017.8225991","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8225991","url":null,"abstract":"FDSOI and FINFet use the same electrostatic principles for their transistor architectures: the conduction properties of a thin layer of undoped semiconductor material are influenced by an isolated gate. For the same layer thickness, FINFET has more drive current and higher packing densities and FDSOI, due to a buried back-gate, shows more design flexibility, can handle extremely low supply voltages and is more cost effective due to its planar structure. While FINFet enables a continuation of Moore's Law for performance applications like Computing and Network-Switching, FDSOI shows excellent results for applications in the Internet-of-Things-domain. GLOBALFOUNDRIES has presented a dual roadmap based on FINFet and on FDSOI. On the FINFet-side it has a 14nm-technology in production and a 7nm-technology in development. Also, GLOBALFOUNDRIES has the FDSOI-based 22FDX™-Technology in production, and 12FDX™ in development. The talk will outline the application areas for FINFet and FDSOI and give examples on how to use the back-gate bias for maximum design flexibility.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129536744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fairness-oriented switch allocation for networks-on-chip 面向公平的片上网络交换机分配
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226066
Zicong Wang, Xiaowen Chen, Chen Li, Yang Guo
Networks-on-Chip (NoC) is becoming the backbone of modern chip multiprocessor (CMP) systems. However, with the number of integrated cores increasing and the network size scaling up, the network-latency imbalance is becoming an important problem, which seriously influences the performance of the network and system. In this paper, we aim to alleviate this problem by optimizing the design of switch allocation. We propose fairness-oriented switch allocation (FOSA), a novel switch allocation strategy to achieve uniform network latencies. FOSA can improve system performance by achieving remarkable improvement in balancing network latencies. We evaluate the network and system performance of FOSA with synthetic traffics and SPEC CPU2006 benchmarks in a full-system simulator. Compared with the canonical separable switch allocator (Round-Robin) and the recently proposed switch allocator (TS-Router), the experiments with benchmarks show that our approach decreases maximum latency (ML) by 45.6% and 15.1%, respectively, as well as latency standard deviation (LSD) by 13.8% and 3.9%, respectively. Besides this, FOSA improves system throughput by 0.8% over that of TS-Router. Finally, we synthesize FOSA and give an evaluation of the additional consumption of area and power.
片上网络(NoC)正在成为现代芯片多处理器(CMP)系统的支柱。然而,随着集成核数的不断增加和网络规模的不断扩大,网络时延失衡问题日益突出,严重影响了网络和系统的性能。本文旨在通过优化交换机分配设计来缓解这一问题。为了实现统一的网络延迟,提出了一种新的交换机分配策略——面向公平的交换机分配(FOSA)。FOSA可以通过在平衡网络延迟方面取得显著的改进来提高系统性能。我们在全系统模拟器中使用合成流量和SPEC CPU2006基准测试来评估FOSA的网络和系统性能。与传统的可分离交换分配器(Round-Robin)和最近提出的交换分配器(TS-Router)相比,我们的方法将最大延迟(ML)分别降低了45.6%和15.1%,延迟标准差(LSD)分别降低了13.8%和3.9%。除此之外,FOSA比TS-Router提高了0.8%的系统吞吐量。最后,我们综合了FOSA,并对其额外的面积和功耗进行了评价。
{"title":"Fairness-oriented switch allocation for networks-on-chip","authors":"Zicong Wang, Xiaowen Chen, Chen Li, Yang Guo","doi":"10.1109/SOCC.2017.8226066","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226066","url":null,"abstract":"Networks-on-Chip (NoC) is becoming the backbone of modern chip multiprocessor (CMP) systems. However, with the number of integrated cores increasing and the network size scaling up, the network-latency imbalance is becoming an important problem, which seriously influences the performance of the network and system. In this paper, we aim to alleviate this problem by optimizing the design of switch allocation. We propose fairness-oriented switch allocation (FOSA), a novel switch allocation strategy to achieve uniform network latencies. FOSA can improve system performance by achieving remarkable improvement in balancing network latencies. We evaluate the network and system performance of FOSA with synthetic traffics and SPEC CPU2006 benchmarks in a full-system simulator. Compared with the canonical separable switch allocator (Round-Robin) and the recently proposed switch allocator (TS-Router), the experiments with benchmarks show that our approach decreases maximum latency (ML) by 45.6% and 15.1%, respectively, as well as latency standard deviation (LSD) by 13.8% and 3.9%, respectively. Besides this, FOSA improves system throughput by 0.8% over that of TS-Router. Finally, we synthesize FOSA and give an evaluation of the additional consumption of area and power.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117194544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital spiking neuron cells for real-time reconfigurable learning networks 用于实时可重构学习网络的数字脉冲神经元细胞
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226029
Haipeng Lin, A. Zjajo, R. V. Leuken
The high level of realism of spiking neuron networks and their complexity require a substantial computational resources limiting the size of the realized networks. Consequently, the main challenge in building complex and biologically-accurate spiking neuron network is largely set by the high computational and data transfer demands. In this paper, we implement several efficient models of the spiking neurons with characteristics such as axon conduction delays and spike timing-dependent plasticity. Experimental results indicate that the proposed real-time data-flow learning network architecture allows the capacity of over 2800 (depending on the model complexity) biophysically accurate neurons in a single FPGA device.
尖峰神经元网络的高真实感及其复杂性需要大量的计算资源,限制了所实现网络的大小。因此,构建复杂和生物精确的脉冲神经元网络的主要挑战主要是高计算和数据传输需求。在本文中,我们实现了几种具有轴突传导延迟和脉冲时间依赖可塑性等特征的脉冲神经元的有效模型。实验结果表明,所提出的实时数据流学习网络架构允许在单个FPGA器件中容纳超过2800个(取决于模型复杂度)生物物理精确神经元。
{"title":"Digital spiking neuron cells for real-time reconfigurable learning networks","authors":"Haipeng Lin, A. Zjajo, R. V. Leuken","doi":"10.1109/SOCC.2017.8226029","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226029","url":null,"abstract":"The high level of realism of spiking neuron networks and their complexity require a substantial computational resources limiting the size of the realized networks. Consequently, the main challenge in building complex and biologically-accurate spiking neuron network is largely set by the high computational and data transfer demands. In this paper, we implement several efficient models of the spiking neurons with characteristics such as axon conduction delays and spike timing-dependent plasticity. Experimental results indicate that the proposed real-time data-flow learning network architecture allows the capacity of over 2800 (depending on the model complexity) biophysically accurate neurons in a single FPGA device.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134430859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selectable grained reconfigurable architecture (SGRA) and its design automation 可选择粒度的可重构体系结构(SGRA)及其设计自动化
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226035
Ryosuke Koike, Takashi Imagawa, R. Y. Omaki, H. Ochi
In this paper, we describe a Selectable Grained Reconfigurable Architecture (SGRA) in which each Configurable Logic Block can be configured to operate in either fine-grained or coarse-grained mode. Compared with the Mixed Grained Reconfigurable Architecture (MGRA), which has a fixed ratio of fine- and coarse-grained operation blocks and a heterogeneous floorplan, SGRA offers greater flexibility in the mapping and placement of functional units, thus reducing wasted wiring and improving the critical path delay. We also present an automated design flow for SGRA that is developed by customizing the Verilog-to-Routing (VTR) platform. Experimental results demonstrate that SGRA achieves, on average, a 13% reduction in circuit area over MGRA.
在本文中,我们描述了一个可选择粒度的可重构架构(SGRA),其中每个可配置逻辑块可以配置为在细粒度或粗粒度模式下运行。混合粒度可重构架构(MGRA)具有细粒度和粗粒度操作块的固定比例和异构平面图,与之相比,SGRA在功能单元的映射和放置方面具有更大的灵活性,从而减少了浪费的布线并改善了关键路径延迟。我们还提出了通过定制Verilog-to-Routing (VTR)平台开发的SGRA自动化设计流程。实验结果表明,SGRA比MGRA平均减少了13%的电路面积。
{"title":"Selectable grained reconfigurable architecture (SGRA) and its design automation","authors":"Ryosuke Koike, Takashi Imagawa, R. Y. Omaki, H. Ochi","doi":"10.1109/SOCC.2017.8226035","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226035","url":null,"abstract":"In this paper, we describe a Selectable Grained Reconfigurable Architecture (SGRA) in which each Configurable Logic Block can be configured to operate in either fine-grained or coarse-grained mode. Compared with the Mixed Grained Reconfigurable Architecture (MGRA), which has a fixed ratio of fine- and coarse-grained operation blocks and a heterogeneous floorplan, SGRA offers greater flexibility in the mapping and placement of functional units, thus reducing wasted wiring and improving the critical path delay. We also present an automated design flow for SGRA that is developed by customizing the Verilog-to-Routing (VTR) platform. Experimental results demonstrate that SGRA achieves, on average, a 13% reduction in circuit area over MGRA.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122536714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of machine learning methods in post-silicon yield improvement 机器学习方法在后硅良率提高中的应用
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226049
B. Yigit, Grace Li Zhang, Bing Li, Yiyu Shi, Ulf Schlichtmann
In nanometer scale manufacturing, process variations have a significant impact on circuit performance. To handle them, post-silicon clock tuning buffers can be included into the circuit to balance timing budgets of neighboring critical paths. The state of the art is a sampling-based approach, in which an integer linear programming (ILP) problem must be solved for every sample. The runtime complexity of this approach is the number of samples multiplied by the required time for an ILP solution. Existing work tries to reduce the number of samples but still leaves the problem of a long runtime unsolved. In this paper, we propose a machine learning approach to reduce the runtime by learning the positions and sizes of post-silicon tuning buffers. Experimental results demonstrate that we can predict buffer locations and sizes with a very good accuracy (90% and higher) and achieve a significant yield improvement (up to 18.8%) with a significant speed-up (up to almost 20 times) compared to existing work.
在纳米级制造中,工艺变化对电路性能有重大影响。为了处理这些问题,可以在电路中加入后硅时钟调谐缓冲器来平衡相邻关键路径的时序预算。目前的现状是一种基于抽样的方法,其中必须为每个样本解决整数线性规划(ILP)问题。这种方法的运行时复杂度是样本数量乘以ILP解决方案所需的时间。现有的工作试图减少样本的数量,但仍然没有解决长时间运行的问题。在本文中,我们提出了一种机器学习方法,通过学习后硅调谐缓冲区的位置和大小来减少运行时间。实验结果表明,与现有的工作相比,我们可以以非常好的精度(90%或更高)预测缓冲区的位置和大小,并实现显着的良率提高(高达18.8%)和显着的速度提高(高达近20倍)。
{"title":"Application of machine learning methods in post-silicon yield improvement","authors":"B. Yigit, Grace Li Zhang, Bing Li, Yiyu Shi, Ulf Schlichtmann","doi":"10.1109/SOCC.2017.8226049","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226049","url":null,"abstract":"In nanometer scale manufacturing, process variations have a significant impact on circuit performance. To handle them, post-silicon clock tuning buffers can be included into the circuit to balance timing budgets of neighboring critical paths. The state of the art is a sampling-based approach, in which an integer linear programming (ILP) problem must be solved for every sample. The runtime complexity of this approach is the number of samples multiplied by the required time for an ILP solution. Existing work tries to reduce the number of samples but still leaves the problem of a long runtime unsolved. In this paper, we propose a machine learning approach to reduce the runtime by learning the positions and sizes of post-silicon tuning buffers. Experimental results demonstrate that we can predict buffer locations and sizes with a very good accuracy (90% and higher) and achieve a significant yield improvement (up to 18.8%) with a significant speed-up (up to almost 20 times) compared to existing work.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121227948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Propelling breakthrough embedded microprocessors by means of integrated photonics 利用集成光子学推动嵌入式微处理器的突破
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8225981
D. Bertozzi, S. Rumley
The tutorial aims to address electrical communications link limitations by developing chipscale, integrated photonic technology to enable seamless intrachip and off-chip photonic communications that provide the required bandwidth with low energy/bit. The emerging technology will exploit wavelength division multiplexing (WDM), allowing much higher bandwidth capacity per link, which is imperative to meeting the communication needs of future microprocessors. Such a capability would propel the microprocessor onto a new performance trajectory and impact the actual runtime performance of relevant computing tasks for power-starved embedded applications and supercomputing. The challenges in realizing optical interconnect technology are developing CMOS and DRAM-compatible photonic links that are spectrally broad, operate at high bit-rates with very low power dissipation, and are tightly integrated with electronic drivers. Ultimately, the goal of this tutorial is to demonstrate photonic technologies that can be integrated within embedded microprocessors and enable seamless, energy-efficient, high-capacity communications within and between the microprocessor and DRAM. It is envisioned that optical interconnect technology will be especially useful for those platforms where extreme performance coupled with low size, weight, and power is a necessity (e.g. UAVs, and satellites).
本教程旨在通过开发芯片级集成光子技术来解决电子通信链路的限制,以实现片内和片外的无缝光子通信,以低能量/比特提供所需的带宽。这项新兴技术将利用波分复用(WDM),允许每个链路的带宽容量更高,这对于满足未来微处理器的通信需求是必不可少的。这种能力将推动微处理器进入一个新的性能轨道,并影响耗电的嵌入式应用程序和超级计算相关计算任务的实际运行时性能。实现光互连技术的挑战是开发CMOS和dram兼容的光子链路,这些光子链路具有频谱宽,以非常低的功耗以高比特率运行,并且与电子驱动器紧密集成。最终,本教程的目标是演示可以集成在嵌入式微处理器内的光子技术,并实现微处理器和DRAM内部和之间的无缝,节能,高容量通信。预计光学互连技术将特别适用于那些需要低尺寸、低重量和低功耗的平台(例如无人机和卫星)。
{"title":"Propelling breakthrough embedded microprocessors by means of integrated photonics","authors":"D. Bertozzi, S. Rumley","doi":"10.1109/SOCC.2017.8225981","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8225981","url":null,"abstract":"The tutorial aims to address electrical communications link limitations by developing chipscale, integrated photonic technology to enable seamless intrachip and off-chip photonic communications that provide the required bandwidth with low energy/bit. The emerging technology will exploit wavelength division multiplexing (WDM), allowing much higher bandwidth capacity per link, which is imperative to meeting the communication needs of future microprocessors. Such a capability would propel the microprocessor onto a new performance trajectory and impact the actual runtime performance of relevant computing tasks for power-starved embedded applications and supercomputing. The challenges in realizing optical interconnect technology are developing CMOS and DRAM-compatible photonic links that are spectrally broad, operate at high bit-rates with very low power dissipation, and are tightly integrated with electronic drivers. Ultimately, the goal of this tutorial is to demonstrate photonic technologies that can be integrated within embedded microprocessors and enable seamless, energy-efficient, high-capacity communications within and between the microprocessor and DRAM. It is envisioned that optical interconnect technology will be especially useful for those platforms where extreme performance coupled with low size, weight, and power is a necessity (e.g. UAVs, and satellites).","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115172435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Auto-SI: An adaptive reconfigurable processor with run-time loop detection and acceleration Auto-SI:具有运行时循环检测和加速的自适应可重构处理器
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226027
T. Harbaum, C. Schade, Marvin Damschen, Carsten Tradowsky, L. Bauer, J. Henkel, J. Becker
Modern computer architectures have an ever-increasing demand for performance, but are constrained in power dissipation and chip area. To tackle these demands, architectures with application-specific accelerators have gained traction in research and industry. While this is a very promising direction, hard-wired accelerators fall short when too many applications need to be supported or flexibility is required. In this paper, we propose an automatic loop detection and hardware acceleration approach for an adaptive reconfigurable processor. Our contribution is Auto-SI, an automated process that transparently and dynamically provides hardware acceleration alongside a general-purpose processor by employing reconfigurable hardware. We detail the benefits of Auto-SI, i.e., transparent and flexible acceleration of unmodified binaries, provide an analysis of the overheads incurred and present an evaluation of our implementation prototype.
现代计算机体系结构对性能的要求越来越高,但功耗和芯片面积受到限制。为了满足这些需求,具有特定于应用程序的加速器的架构在研究和工业中得到了关注。虽然这是一个非常有前途的方向,但是当需要支持太多的应用程序或需要灵活性时,硬连线加速器就不够用了。在本文中,我们提出了一种自适应可重构处理器的自动环路检测和硬件加速方法。我们的贡献是Auto-SI,这是一个自动化的过程,通过使用可重构硬件,透明地、动态地提供硬件加速以及通用处理器。我们详细介绍了Auto-SI的好处,即对未经修改的二进制文件进行透明和灵活的加速,分析了产生的开销,并对我们的实现原型进行了评估。
{"title":"Auto-SI: An adaptive reconfigurable processor with run-time loop detection and acceleration","authors":"T. Harbaum, C. Schade, Marvin Damschen, Carsten Tradowsky, L. Bauer, J. Henkel, J. Becker","doi":"10.1109/SOCC.2017.8226027","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226027","url":null,"abstract":"Modern computer architectures have an ever-increasing demand for performance, but are constrained in power dissipation and chip area. To tackle these demands, architectures with application-specific accelerators have gained traction in research and industry. While this is a very promising direction, hard-wired accelerators fall short when too many applications need to be supported or flexibility is required. In this paper, we propose an automatic loop detection and hardware acceleration approach for an adaptive reconfigurable processor. Our contribution is Auto-SI, an automated process that transparently and dynamically provides hardware acceleration alongside a general-purpose processor by employing reconfigurable hardware. We detail the benefits of Auto-SI, i.e., transparent and flexible acceleration of unmodified binaries, provide an analysis of the overheads incurred and present an evaluation of our implementation prototype.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115804241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A 0.36pJ/bit, 17Gbps OOK receiver in 45-nm CMOS for inter and intra-chip wireless interconnects 一个0.36pJ/bit, 17Gbps的45纳米CMOS OOK接收器,用于芯片间和芯片内无线互连
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226023
Suryanarayanan Subramaniam, Tanmay Shinde, Padmanabh Deshmukh, Md Shahriar Shamim, Mark A. Indovina, A. Ganguly
Wireless interconnects are capable of establishing energy-efficient intra and inter-chip data communications. This paper introduces a circuit level design of an energy-efficient millimeter-wave (mm-wave) non-coherent on-off keying (OOK) receiver suitable for such wireless interconnects in 45-nm CMOS process. The receiver consists of a simple two-stage common source structure based Low Noise Amplifier (LNA) and a source degenerated differential Envelope Detector (ED) followed by a Base Band (BB) amplifier stage. Operating at 60GHz, the proposed OOK receiver consumes only 6.1mW DC power from a 1V supply while providing a data rate of 17Gbps and a bit-energy efficiency of 0.36 pJ/bit.
无线互连能够建立节能的芯片内部和芯片之间的数据通信。本文介绍了一种适用于45纳米CMOS工艺无线互连的高能效毫米波非相干键控(OOK)接收机的电路级设计。该接收机由一个简单的基于两级共源结构的低噪声放大器(LNA)和一个源退化差分包络检测器(ED)以及一个基带(BB)放大器级组成。该OOK接收器工作在60GHz时,在1V电源下仅消耗6.1mW直流功率,同时提供17Gbps的数据速率和0.36 pJ/bit的比特能量效率。
{"title":"A 0.36pJ/bit, 17Gbps OOK receiver in 45-nm CMOS for inter and intra-chip wireless interconnects","authors":"Suryanarayanan Subramaniam, Tanmay Shinde, Padmanabh Deshmukh, Md Shahriar Shamim, Mark A. Indovina, A. Ganguly","doi":"10.1109/SOCC.2017.8226023","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226023","url":null,"abstract":"Wireless interconnects are capable of establishing energy-efficient intra and inter-chip data communications. This paper introduces a circuit level design of an energy-efficient millimeter-wave (mm-wave) non-coherent on-off keying (OOK) receiver suitable for such wireless interconnects in 45-nm CMOS process. The receiver consists of a simple two-stage common source structure based Low Noise Amplifier (LNA) and a source degenerated differential Envelope Detector (ED) followed by a Base Band (BB) amplifier stage. Operating at 60GHz, the proposed OOK receiver consumes only 6.1mW DC power from a 1V supply while providing a data rate of 17Gbps and a bit-energy efficiency of 0.36 pJ/bit.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124133768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
System-level simulator for process variation influenced synchronous and asynchronous NoCs 过程变化影响同步和异步noc的系统级模拟器
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226065
S. Muhammad, A. El-Moursy, M. El-Moursy, H. Hamed
System-Level simulator is proposed to determine the ability of synchronous and asynchronous NoCs to alleviate the process variation effect. Throughput variation and different delay components variation are provided by the newly developed framework. System-Level simulation shows similarities with circuit-level simulation in terms of behavior and performance variation trend when moving from one technology node to another. Clock skew significantly degrades synchronous NoCs performance. Clock skew is more obvious with process variation. Despite the handshaking overhead, asynchronous NoC may be more immune to process variation than synchronous networks. PV-aware routing algorithm reduces the performance degradation to 8.3% and 11.4% for 45nm and 32nm asynchronous NoCs respectively. Using different traffic workloads and PV-unaware routing algorithm, synchronous networks lose on average 17.7% and 27.8% of nominal throughput for 45nm and 32nm technologies, respectively due to process variation. Whereas, asynchronous NoC throughput degradation is about 7.4% and 11.5% for 45nm and 32nm, respectively. In addition to technology scaling, NoC scaling also affects the throughput degradation. 256-core NoC shows the highest throughput degradation of 16% and 22% for asynchronous NoC for 45nm and 32nm technologies respectively.
提出了系统级模拟器来确定同步和异步noc减轻过程变化效应的能力。新开发的框架提供了吞吐量变化和不同延迟分量的变化。系统级仿真与电路级仿真在从一个技术节点移动到另一个技术节点时的行为和性能变化趋势方面具有相似性。时钟倾斜会显著降低同步noc的性能。随着工艺的变化,时钟偏差更为明显。尽管有握手的开销,但异步NoC可能比同步网络更不受进程变化的影响。在45nm和32nm异步noc中,pv感知路由算法的性能下降分别为8.3%和11.4%。使用不同的流量工作负载和不知道pv的路由算法,同步网络在45nm和32nm技术下,由于工艺变化,分别平均损失17.7%和27.8%的标称吞吐量。而在45nm和32nm工艺中,异步NoC的吞吐量下降分别为7.4%和11.5%。除了技术扩展之外,NoC扩展也会影响吞吐量的降低。256核NoC在45nm和32nm技术下的吞吐量下降最高,分别为16%和22%。
{"title":"System-level simulator for process variation influenced synchronous and asynchronous NoCs","authors":"S. Muhammad, A. El-Moursy, M. El-Moursy, H. Hamed","doi":"10.1109/SOCC.2017.8226065","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226065","url":null,"abstract":"System-Level simulator is proposed to determine the ability of synchronous and asynchronous NoCs to alleviate the process variation effect. Throughput variation and different delay components variation are provided by the newly developed framework. System-Level simulation shows similarities with circuit-level simulation in terms of behavior and performance variation trend when moving from one technology node to another. Clock skew significantly degrades synchronous NoCs performance. Clock skew is more obvious with process variation. Despite the handshaking overhead, asynchronous NoC may be more immune to process variation than synchronous networks. PV-aware routing algorithm reduces the performance degradation to 8.3% and 11.4% for 45nm and 32nm asynchronous NoCs respectively. Using different traffic workloads and PV-unaware routing algorithm, synchronous networks lose on average 17.7% and 27.8% of nominal throughput for 45nm and 32nm technologies, respectively due to process variation. Whereas, asynchronous NoC throughput degradation is about 7.4% and 11.5% for 45nm and 32nm, respectively. In addition to technology scaling, NoC scaling also affects the throughput degradation. 256-core NoC shows the highest throughput degradation of 16% and 22% for asynchronous NoC for 45nm and 32nm technologies respectively.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122911871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On the security evaluation of the ARM TrustZone extension in a heterogeneous SoC ARM TrustZone扩展在异构SoC中的安全性评估
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226018
E. M. Benhani, Cédric Marchand, A. Aubert, L. Bossuet
As the complexity of System-on-Chip (SoC) and the reuse of third party IP continues to grow, the security of a heterogeneous SoC has become a critical issue. In order to increase the software security of such SoC, the TrustZone technology has been proposed by ARM to enforce software security. Nevertheless, many SoC embed non-trusted third party Intellectual Property (IP) trying to take the benefits of this technology. In such case, is the security guaranteed by the ARM TrustZone technology reduced by the heterogeneity of SoC? In order to answer to this question, this paper presents relevant attack scenarios based on third party IP to exploit some security failures of the TrustZone extension through the all SoC. At the end, this article proposes to SoC designers to consider some design solutions to limit the impact of a malicious IP.
随着片上系统(SoC)的复杂性和第三方IP复用的不断增长,异构SoC的安全性已经成为一个关键问题。为了提高SoC的软件安全性,ARM提出了TrustZone技术来加强软件安全性。然而,许多SoC嵌入了不受信任的第三方知识产权(IP),试图利用这项技术的好处。在这种情况下,ARM TrustZone技术所保证的安全性是否会因为SoC的异构性而降低?为了解决这个问题,本文提出了基于第三方IP的相关攻击场景,利用所有SoC的TrustZone扩展的一些安全故障。最后,本文建议SoC设计者考虑一些设计方案来限制恶意IP的影响。
{"title":"On the security evaluation of the ARM TrustZone extension in a heterogeneous SoC","authors":"E. M. Benhani, Cédric Marchand, A. Aubert, L. Bossuet","doi":"10.1109/SOCC.2017.8226018","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226018","url":null,"abstract":"As the complexity of System-on-Chip (SoC) and the reuse of third party IP continues to grow, the security of a heterogeneous SoC has become a critical issue. In order to increase the software security of such SoC, the TrustZone technology has been proposed by ARM to enforce software security. Nevertheless, many SoC embed non-trusted third party Intellectual Property (IP) trying to take the benefits of this technology. In such case, is the security guaranteed by the ARM TrustZone technology reduced by the heterogeneity of SoC? In order to answer to this question, this paper presents relevant attack scenarios based on third party IP to exploit some security failures of the TrustZone extension through the all SoC. At the end, this article proposes to SoC designers to consider some design solutions to limit the impact of a malicious IP.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121649846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
期刊
2017 30th IEEE International System-on-Chip Conference (SOCC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1