首页 > 最新文献

2017 30th IEEE International System-on-Chip Conference (SOCC)最新文献

英文 中文
Placement algorithm for mixed-grained reconfigurable architecture with dedicated carry chain 带有专用进位链的混合粒度可重构体系结构的布局算法
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226012
Takashi Imagawa, Koki Honda, H. Ochi
This paper proposes a placement algorithm using analytical placement (AP) and low-temperature simulated annealing (SA) for mixed-grained reconfigurable architecture (MGRA) with dedicated carry chains. The target MGRAs are assumed to have fine-grained blocks with dedicated carry chains to implement high-speed adders/subtracters and coarse-grained blocks to implement complicated arithmetic operations. Although this mixed-grained architecture is expected to enhance the performance of the implemented application circuit, placement becomes problematic because of the inherent placement constraints. For example, some logic blocks have many connections to others and constraints such as these cause simple pair-swap-based SA to converge to a local optimum. The proposed algorithm uses AP to determine an initial placement for SA. The AP explores an appropriate placement of coarse-grained blocks and adders/subtracters consisting of fine-grained blocks and dedicated carry chains. On the other hand, SA is mainly used to determine optimal placement of the remaining fine-grained blocks. The evaluations show that the proposed algorithm reduces the placement cost, critical path delay, and runtime by 18.4%, 6.0%, and 67.6% on average, respectively, over the versatile place and route (VPR) approach. The benchmark includes circuits consisting of only fine-grained logic. Hence, the proposed algorithm is expected to improve the placement quality for a wide range of application circuits.
针对具有专用进位链的混合粒度可重构结构(MGRA),提出了一种基于解析式布局(AP)和低温模拟退火(SA)的布局算法。假设目标MGRAs具有具有专用进位链的细粒度块,用于实现高速加/减法,而粗粒度块用于实现复杂的算术运算。虽然这种混合粒度的体系结构有望提高实现的应用程序电路的性能,但由于固有的放置限制,放置会成为问题。例如,一些逻辑块与其他逻辑块之间有许多连接,诸如此类的约束会导致简单的基于对交换的SA收敛到局部最优。提出的算法使用AP来确定SA的初始位置。AP探讨了粗粒度块和由细粒度块和专用进位链组成的加/减法的适当位置。另一方面,SA主要用于确定剩余细粒度块的最佳位置。评估结果表明,该算法比多址多径(VPR)算法平均降低了18.4%、6.0%和67.6%的放置成本、关键路径延迟和运行时间。基准测试包括仅由细粒度逻辑组成的电路。因此,所提出的算法有望改善广泛应用电路的放置质量。
{"title":"Placement algorithm for mixed-grained reconfigurable architecture with dedicated carry chain","authors":"Takashi Imagawa, Koki Honda, H. Ochi","doi":"10.1109/SOCC.2017.8226012","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226012","url":null,"abstract":"This paper proposes a placement algorithm using analytical placement (AP) and low-temperature simulated annealing (SA) for mixed-grained reconfigurable architecture (MGRA) with dedicated carry chains. The target MGRAs are assumed to have fine-grained blocks with dedicated carry chains to implement high-speed adders/subtracters and coarse-grained blocks to implement complicated arithmetic operations. Although this mixed-grained architecture is expected to enhance the performance of the implemented application circuit, placement becomes problematic because of the inherent placement constraints. For example, some logic blocks have many connections to others and constraints such as these cause simple pair-swap-based SA to converge to a local optimum. The proposed algorithm uses AP to determine an initial placement for SA. The AP explores an appropriate placement of coarse-grained blocks and adders/subtracters consisting of fine-grained blocks and dedicated carry chains. On the other hand, SA is mainly used to determine optimal placement of the remaining fine-grained blocks. The evaluations show that the proposed algorithm reduces the placement cost, critical path delay, and runtime by 18.4%, 6.0%, and 67.6% on average, respectively, over the versatile place and route (VPR) approach. The benchmark includes circuits consisting of only fine-grained logic. Hence, the proposed algorithm is expected to improve the placement quality for a wide range of application circuits.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130058524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Multifractal on-chip traffic generation under TLM TLM下的多重分形片上流量生成
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226009
J. B. Filho, J. Wang
In the design flow of Multi-Processed Systems-on-Chip (MPSoCs), the evaluation of communication structures play a very important role, since it may reveal relevant information on performance, energy consumption and cost. Simulation under a number of stimulus given by a traffic generator is a relevant solution for MPSoCs performance analysis. Traditional traffic generators based on Poisson and classic Markovian models are not able to reproduce certain characteristics of original application traces, such as bursts and self-similarity. After the detection of Long Range Dependence (LRD) in on-chip traffic, monofractal models started being used for traffic generation. These models, however, were mainly used under RTL/CA simulations and present statistical limitations, opening opportunities for tests with multifractal models and higher abstraction level descriptions. In this work, it is shown that the Multifractal Wavelet Model (MWM) presents a better accuracy in the modeling of on-chip traffic when compared with auto-regressive (monofractal) models and that the usage of traffic generators modeled under TLM can achieve simulation speed-ups in the order of 12x.
在多处理片上系统(mpsoc)的设计流程中,通信结构的评估起着非常重要的作用,因为它可以揭示性能、能耗和成本的相关信息。在流量发生器给出的多种刺激下进行仿真是mpsoc性能分析的一个相关解决方案。传统的基于泊松模型和经典马尔可夫模型的流量生成器不能再现原始应用轨迹的某些特征,如爆发和自相似性。在检测到片上流量的长距离依赖(LRD)后,单分形模型开始用于流量生成。然而,这些模型主要用于RTL/CA模拟,并且存在统计局限性,为多重分形模型和更高抽象级别描述的测试提供了机会。在这项工作中,研究表明,与自回归(单分形)模型相比,多重分形小波模型(MWM)在片上流量建模方面具有更好的准确性,并且在TLM下建模的流量生成器的使用可以实现12倍的仿真速度。
{"title":"Multifractal on-chip traffic generation under TLM","authors":"J. B. Filho, J. Wang","doi":"10.1109/SOCC.2017.8226009","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226009","url":null,"abstract":"In the design flow of Multi-Processed Systems-on-Chip (MPSoCs), the evaluation of communication structures play a very important role, since it may reveal relevant information on performance, energy consumption and cost. Simulation under a number of stimulus given by a traffic generator is a relevant solution for MPSoCs performance analysis. Traditional traffic generators based on Poisson and classic Markovian models are not able to reproduce certain characteristics of original application traces, such as bursts and self-similarity. After the detection of Long Range Dependence (LRD) in on-chip traffic, monofractal models started being used for traffic generation. These models, however, were mainly used under RTL/CA simulations and present statistical limitations, opening opportunities for tests with multifractal models and higher abstraction level descriptions. In this work, it is shown that the Multifractal Wavelet Model (MWM) presents a better accuracy in the modeling of on-chip traffic when compared with auto-regressive (monofractal) models and that the usage of traffic generators modeled under TLM can achieve simulation speed-ups in the order of 12x.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128952383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The memory challenge in computing systems: A survey 计算机系统中的内存挑战:综述
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226062
N. Wehn
It is well known that DRAM memory performance cannot keep pace with the performance of today's multicore compute systems. In addition to the memory bandwidth problem, there is another major challenge, namely, the power/energy challenge. DRAMs are largely contributing to the overall power consumption. Thus, there is a need for power and bandwidth optimization of the DRAM memory subsystems. Moreover, new memory architectures are emerging like HBM, HMC and Wide I/O DRAMs to cope with the increasing bandwidth requirements. In this talk, we will give an overview on these new architectures and present various optimization techniques to optimize bandwidth and energy consumption in DRAM based memory systems.
众所周知,DRAM内存的性能无法跟上当今多核计算系统的性能。除了内存带宽问题之外,还有另一个主要挑战,即功率/能量挑战。dram对整体功耗的贡献很大。因此,需要对DRAM内存子系统进行功率和带宽优化。此外,新的内存架构如HBM、HMC和Wide I/O dram正在出现,以应对不断增长的带宽需求。在本次演讲中,我们将概述这些新架构,并介绍各种优化技术,以优化基于DRAM的存储系统的带宽和能耗。
{"title":"The memory challenge in computing systems: A survey","authors":"N. Wehn","doi":"10.1109/SOCC.2017.8226062","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226062","url":null,"abstract":"It is well known that DRAM memory performance cannot keep pace with the performance of today's multicore compute systems. In addition to the memory bandwidth problem, there is another major challenge, namely, the power/energy challenge. DRAMs are largely contributing to the overall power consumption. Thus, there is a need for power and bandwidth optimization of the DRAM memory subsystems. Moreover, new memory architectures are emerging like HBM, HMC and Wide I/O DRAMs to cope with the increasing bandwidth requirements. In this talk, we will give an overview on these new architectures and present various optimization techniques to optimize bandwidth and energy consumption in DRAM based memory systems.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"256 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121180286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A study on the energy-precision tradeoffs on commercially available processors and SoCs with an EPI based energy model 基于EPI能量模型的商用处理器和soc能量精度权衡研究
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226072
Jeremy Schlachter, M. Fagan, K. Palem, C. Enz
Energy-efficiency is a critical concern for many computing systems. With Moore's law showing its limits, new hardware design and programming techniques emerge to pursue energy scaling. Among these, approximate computing is certainly the most popular in current works. It has been shown that reducing precision using software techniques can show significant energy savings on commercially available processors. In this paper, an energy model based on Energy Per Instruction (EPI) has been built in order to understand which mechanisms enable those savings. EPIs of various instructions have been measured and data movement has been identified as being the major consumer. The energy model has been evaluated on a computationally intensive Newton method for solving nonlinear equations using double-precision and single-precision floating-point data types. For all the cases, the model predicts the consumption with less than 10 % error. The energy breakdown reveals that arithmetic operations consume less than 6 % of the total energy and that savings are mainly achieved by reducing the amount of data transferred between registers, cache and main memory. With these conclusions, implementing approximate arithmetic circuits in this type of architecture would have a very limited impact on the consumption. It is however shown that specialized hardware implemented on an FPGA interconnected with a processing system can lead to an additional 20 % energy savings on the Newton method using the same single-precision data type.
对于许多计算系统来说,能源效率是一个关键问题。随着摩尔定律显示出其局限性,新的硬件设计和编程技术出现,以追求能量缩放。其中,近似计算无疑是当前工作中最流行的。已经证明,使用软件技术降低精度可以在商用处理器上显着节省能源。在本文中,建立了一个基于每条指令能量(EPI)的能量模型,以便了解哪些机制能够实现这些节省。已经测量了各种指令的epi,并确定数据移动是主要的消费者。利用双精度和单精度浮点数据类型求解非线性方程的计算密集型牛顿方法对能量模型进行了评估。对于所有情况,该模型预测的能耗误差小于10%。能量分解显示,算术运算消耗的能量不到总能量的6%,这主要是通过减少在寄存器、缓存和主存储器之间传输的数据量来实现的。根据这些结论,在这种类型的体系结构中实现近似算术电路对功耗的影响非常有限。然而,它表明,在与处理系统互连的FPGA上实现的专用硬件可以在使用相同的单精度数据类型的牛顿方法上额外节省20%的能源。
{"title":"A study on the energy-precision tradeoffs on commercially available processors and SoCs with an EPI based energy model","authors":"Jeremy Schlachter, M. Fagan, K. Palem, C. Enz","doi":"10.1109/SOCC.2017.8226072","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226072","url":null,"abstract":"Energy-efficiency is a critical concern for many computing systems. With Moore's law showing its limits, new hardware design and programming techniques emerge to pursue energy scaling. Among these, approximate computing is certainly the most popular in current works. It has been shown that reducing precision using software techniques can show significant energy savings on commercially available processors. In this paper, an energy model based on Energy Per Instruction (EPI) has been built in order to understand which mechanisms enable those savings. EPIs of various instructions have been measured and data movement has been identified as being the major consumer. The energy model has been evaluated on a computationally intensive Newton method for solving nonlinear equations using double-precision and single-precision floating-point data types. For all the cases, the model predicts the consumption with less than 10 % error. The energy breakdown reveals that arithmetic operations consume less than 6 % of the total energy and that savings are mainly achieved by reducing the amount of data transferred between registers, cache and main memory. With these conclusions, implementing approximate arithmetic circuits in this type of architecture would have a very limited impact on the consumption. It is however shown that specialized hardware implemented on an FPGA interconnected with a processing system can lead to an additional 20 % energy savings on the Newton method using the same single-precision data type.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122632065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Providing throughput guarantees in mixed-criticality networks-on-chip 在片上混合临界网络中提供吞吐量保证
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226064
Sebastian Tobuschat, R. Ernst
Future mixed-criticality systems must handle a growing variety of traffic requirements, ranging from safety-critical real-time traffic to bursty latency-sensitive best-effort traffic. Additionally, safety standards (e.g. ISO 26262) require sufficient independence among different criticality levels for mixed-criticality systems. Networks-on-Chip (NoCs), as a scalable and modular interconnect, are used as a promising solution for such systems. Hence, a NoC must provide performance isolation for safety-critical traffic and low latency for best-effort traffic at the same time. This paper presents a run-time configurable NoC design enabling throughput guarantees for selected traffic streams with reduced adverse impact on the performance of best-effort traffic. In contrast to existing approaches, we prioritize best-effort over guaranteed throughput traffic and only switch priorities when required, providing sufficient performance isolation among different criticality levels. We show that the overhead implementing our approach is affordable. And through an experimental evaluation, we show that the approach reduces the adverse effects through strict prioritization on best-effort applications.
未来的混合关键系统必须处理越来越多的各种流量需求,从安全关键的实时流量到突发延迟敏感的最佳努力流量。此外,安全标准(例如ISO 26262)要求混合临界系统的不同临界级别之间有足够的独立性。片上网络(noc)作为一种可扩展的模块化互连,被用作此类系统的有前途的解决方案。因此,NoC必须同时为安全关键型流量提供性能隔离,并为尽力而为的流量提供低延迟。本文提出了一种运行时可配置的NoC设计,可以为选定的流量流提供吞吐量保证,同时减少对尽力而为流量性能的不利影响。与现有方法相比,我们优先考虑“尽力而为”,而不是保证吞吐量的流量,并且仅在需要时切换优先级,从而在不同的临界级别之间提供足够的性能隔离。我们展示了实现我们的方法的开销是可以承受的。通过实验评估,我们表明,该方法通过严格的优先级,以最大努力的应用减少了不利影响。
{"title":"Providing throughput guarantees in mixed-criticality networks-on-chip","authors":"Sebastian Tobuschat, R. Ernst","doi":"10.1109/SOCC.2017.8226064","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226064","url":null,"abstract":"Future mixed-criticality systems must handle a growing variety of traffic requirements, ranging from safety-critical real-time traffic to bursty latency-sensitive best-effort traffic. Additionally, safety standards (e.g. ISO 26262) require sufficient independence among different criticality levels for mixed-criticality systems. Networks-on-Chip (NoCs), as a scalable and modular interconnect, are used as a promising solution for such systems. Hence, a NoC must provide performance isolation for safety-critical traffic and low latency for best-effort traffic at the same time. This paper presents a run-time configurable NoC design enabling throughput guarantees for selected traffic streams with reduced adverse impact on the performance of best-effort traffic. In contrast to existing approaches, we prioritize best-effort over guaranteed throughput traffic and only switch priorities when required, providing sufficient performance isolation among different criticality levels. We show that the overhead implementing our approach is affordable. And through an experimental evaluation, we show that the approach reduces the adverse effects through strict prioritization on best-effort applications.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130669422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Power and area evaluation of a fault-tolerant network-on-chip 片上容错网络的功率和面积评估
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226034
Thawra Kadeed, Eberle A. Rambo, R. Ernst
As fault-tolerant Networks-on-Chip (NoCs) become prevalent in reliable systems, their overhead must be accurately evaluated. In this paper, we evaluate the overhead of a soft error resilient real-time NoC router for ASICs in terms of area and power. We employ a power analysis framework and load profiles that provide accurate power figures. Furthermore, we analyze the power behavior in normal operation as well as under errors. Experiments show that the employed error detection and retransmission schemes in our NoC contribute low power overhead when compared to previously proposed scheme.
随着容错片上网络(noc)在可靠系统中的普及,必须准确地评估它们的开销。在本文中,我们从面积和功耗方面评估了用于asic的软错误弹性实时NoC路由器的开销。我们采用功率分析框架和负载配置文件,提供准确的功率数据。此外,我们还分析了在正常工作和错误情况下的功率行为。实验表明,与先前提出的方案相比,我们的NoC中采用的错误检测和重传方案具有较低的功耗开销。
{"title":"Power and area evaluation of a fault-tolerant network-on-chip","authors":"Thawra Kadeed, Eberle A. Rambo, R. Ernst","doi":"10.1109/SOCC.2017.8226034","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226034","url":null,"abstract":"As fault-tolerant Networks-on-Chip (NoCs) become prevalent in reliable systems, their overhead must be accurately evaluated. In this paper, we evaluate the overhead of a soft error resilient real-time NoC router for ASICs in terms of area and power. We employ a power analysis framework and load profiles that provide accurate power figures. Furthermore, we analyze the power behavior in normal operation as well as under errors. Experiments show that the employed error detection and retransmission schemes in our NoC contribute low power overhead when compared to previously proposed scheme.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114143240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Lithography hotspot detection: From shallow to deep learning 光刻热点检测:从浅层到深度学习
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226047
Haoyu Yang, Yajun Lin, Bei Yu, Evangeline F. Y. Young
As VLSI technology nodes continue, the gap between lithography system manufacturing ability and transistor feature size induces serious problems, thus lithography hotspot detection is of importance in physical verification flow. Existing hotspot detection approaches can be categorized into pattern matching-based and machine learning-based. With extreme scaling of transistor feature size and the growing complexity of layout patterns, the traditional methods may suffer from performance degradation. For example, pattern matching-based methods have lower hotspot detection rates for unseen patterns, while machine learning-based methods may lose information in manual feature extraction for ultra-large-scale integrated circuit masks. To overcome the drawbacks derived from existing methods, in this paper, we survey very recent deep learning techniques and argue that the pooling layers in ordinary deep learning architecture are not necessary. We further propose a novel pooling-free neural network architecture, whose effectiveness is verified by industrial benchmark suites.
随着VLSI技术节点的不断增加,光刻系统制造能力与晶体管特征尺寸之间的差距引发了严重的问题,因此光刻热点检测在物理验证流程中具有重要意义。现有的热点检测方法可分为基于模式匹配和基于机器学习两种。随着晶体管特征尺寸的急剧缩小和布局模式的日益复杂,传统的方法可能会受到性能下降的影响。例如,基于模式匹配的方法对未见模式的热点检测率较低,而基于机器学习的方法在超大规模集成电路掩模的人工特征提取中可能会丢失信息。为了克服现有方法的缺点,在本文中,我们回顾了最近的深度学习技术,并认为普通深度学习架构中的池化层是不必要的。我们进一步提出了一种新的无池神经网络架构,并通过工业基准测试套件验证了其有效性。
{"title":"Lithography hotspot detection: From shallow to deep learning","authors":"Haoyu Yang, Yajun Lin, Bei Yu, Evangeline F. Y. Young","doi":"10.1109/SOCC.2017.8226047","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226047","url":null,"abstract":"As VLSI technology nodes continue, the gap between lithography system manufacturing ability and transistor feature size induces serious problems, thus lithography hotspot detection is of importance in physical verification flow. Existing hotspot detection approaches can be categorized into pattern matching-based and machine learning-based. With extreme scaling of transistor feature size and the growing complexity of layout patterns, the traditional methods may suffer from performance degradation. For example, pattern matching-based methods have lower hotspot detection rates for unseen patterns, while machine learning-based methods may lose information in manual feature extraction for ultra-large-scale integrated circuit masks. To overcome the drawbacks derived from existing methods, in this paper, we survey very recent deep learning techniques and argue that the pooling layers in ordinary deep learning architecture are not necessary. We further propose a novel pooling-free neural network architecture, whose effectiveness is verified by industrial benchmark suites.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114473456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Efficient virtual channel allocator for NoC router micro-architecture NoC路由器微体系结构的高效虚拟信道分配器
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226030
Y. Lan, V. Muthukumar
Low-level design parameters such as router micro-architecture (RMA), flow controls (resource allocation), routing techniques and traffic patterns have a major significance on cost and performance of Network on Chip (NOC) design. This work proposes an efficient virtual channel (VC) buffer management structure and a dynamic VC allocation mechanism for the router to minimize latency, and area (buffer allocation) overhead. The proposed VC architecture and allocation algorithm can be adapted to various switching techniques used in NoC implementations with buffers and is independent of the topology. The architecture was developed and simulated for various traffic patterns. The performance was evaluated for different load scenarios and comparison to existing VC allocation algorithms are discussed in this paper. Our implementation achieves better performance (throughput and area overhead) compared to baseline and Adaptive Backpressure (ABP) VC allocation algorithm.
路由器微架构(RMA)、流量控制(资源分配)、路由技术和流量模式等底层设计参数对片上网络(NOC)设计的成本和性能具有重要意义。本文提出了一种高效的虚拟通道(VC)缓冲区管理结构和动态VC分配机制,以最大限度地减少延迟和区域(缓冲区分配)开销。所提出的VC架构和分配算法可以适应在带缓冲区的NoC实现中使用的各种交换技术,并且独立于拓扑结构。该体系结构针对各种交通模式进行了开发和模拟。本文对该算法在不同负载情况下的性能进行了评估,并与现有的VC分配算法进行了比较。与基线和自适应背压(ABP) VC分配算法相比,我们的实现实现了更好的性能(吞吐量和面积开销)。
{"title":"Efficient virtual channel allocator for NoC router micro-architecture","authors":"Y. Lan, V. Muthukumar","doi":"10.1109/SOCC.2017.8226030","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226030","url":null,"abstract":"Low-level design parameters such as router micro-architecture (RMA), flow controls (resource allocation), routing techniques and traffic patterns have a major significance on cost and performance of Network on Chip (NOC) design. This work proposes an efficient virtual channel (VC) buffer management structure and a dynamic VC allocation mechanism for the router to minimize latency, and area (buffer allocation) overhead. The proposed VC architecture and allocation algorithm can be adapted to various switching techniques used in NoC implementations with buffers and is independent of the topology. The architecture was developed and simulated for various traffic patterns. The performance was evaluated for different load scenarios and comparison to existing VC allocation algorithms are discussed in this paper. Our implementation achieves better performance (throughput and area overhead) compared to baseline and Adaptive Backpressure (ABP) VC allocation algorithm.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124635074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Reliability for IoT and automotive markets 物联网和汽车市场的可靠性
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8225984
Subhadeep Ghosh, Scott Martin, Shane Stelmach
The semiconductor industry is strategically focusing on automotive and industrial markets. Significant investment is targeted to address these markets. The automotive industry in particular is already in focus for last several years. At the same time, with its seemingly endless possibilities in the “internet of things” (IOT) world, industrial markets are gaining attention as building automation, factory automation, and grid infrastructure rapidly advance.
半导体产业的战略重点是汽车和工业市场。针对这些市场进行了大量投资。在过去的几年里,汽车行业已经成为人们关注的焦点。与此同时,随着楼宇自动化、工厂自动化和电网基础设施的快速发展,物联网(IOT)世界似乎有着无限的可能性,工业市场也越来越受到关注。
{"title":"Reliability for IoT and automotive markets","authors":"Subhadeep Ghosh, Scott Martin, Shane Stelmach","doi":"10.1109/SOCC.2017.8225984","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8225984","url":null,"abstract":"The semiconductor industry is strategically focusing on automotive and industrial markets. Significant investment is targeted to address these markets. The automotive industry in particular is already in focus for last several years. At the same time, with its seemingly endless possibilities in the “internet of things” (IOT) world, industrial markets are gaining attention as building automation, factory automation, and grid infrastructure rapidly advance.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117189090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A random access analog memory with master-slave structure for implementing hexadecimal logic 具有主从结构的随机存取模拟存储器,用于实现十六进制逻辑
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8225995
Renyuan Zhang, M. Kaneko
A random access analog memory is designed without static power in this work. The analog memory appears the benefit on the great reduction of interconnections but suffers from the static power consumption and inaccuracy. As a hybrid, the hexadecimal signal processing is targeted in this paper. For storing hexadecimal values even implementing hexadecimal sequential logic, a master-slave structure is proposed with eighteen transistors, which is 28% of four pieces of binary master-slave flipflop. The hexadecimal voltage values are stored on the floating gate; and the read-out operations are executed by a comparator to protect the stored voltage. This comparator is powered only during the read-out operation. In this manner, the static power is eliminated. By using the proposed timing control mode, the master and slave stages are organized for hexadecimal sequential logic. As a demonstration, a sixteen-counter is designed on the basis of proposed analog memory without combinational logic circuits, in which the number of devices is reduced in contrast of binary approaches. From the circuit simulation results, the designed circuits maintain the hexadecimal values and execute hexadecimal functions correctly.
本文设计了一种无静电随机存取模拟存储器。模拟存储器具有大大减少互连的优点,但存在静态功耗和不准确性的问题。十六进制信号处理是一种混合信号处理方法。为了存储十六进制值,甚至实现十六进制顺序逻辑,提出了一种由18个晶体管组成的主从结构,这是4个二进制主从触发器的28%。所述十六进制电压值存储在所述浮栅上;所述读出操作由比较器执行以保护所述存储电压。该比较器仅在读出操作期间通电。这样就消除了静电。采用所提出的定时控制方式,将主从级按十六进制顺序逻辑组织。作为演示,在不使用组合逻辑电路的模拟存储器的基础上设计了一个16计数器,与二进制方法相比,该方法减少了器件的数量。从电路仿真结果来看,所设计的电路能够正确地保持十六进制值并执行十六进制功能。
{"title":"A random access analog memory with master-slave structure for implementing hexadecimal logic","authors":"Renyuan Zhang, M. Kaneko","doi":"10.1109/SOCC.2017.8225995","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8225995","url":null,"abstract":"A random access analog memory is designed without static power in this work. The analog memory appears the benefit on the great reduction of interconnections but suffers from the static power consumption and inaccuracy. As a hybrid, the hexadecimal signal processing is targeted in this paper. For storing hexadecimal values even implementing hexadecimal sequential logic, a master-slave structure is proposed with eighteen transistors, which is 28% of four pieces of binary master-slave flipflop. The hexadecimal voltage values are stored on the floating gate; and the read-out operations are executed by a comparator to protect the stored voltage. This comparator is powered only during the read-out operation. In this manner, the static power is eliminated. By using the proposed timing control mode, the master and slave stages are organized for hexadecimal sequential logic. As a demonstration, a sixteen-counter is designed on the basis of proposed analog memory without combinational logic circuits, in which the number of devices is reduced in contrast of binary approaches. From the circuit simulation results, the designed circuits maintain the hexadecimal values and execute hexadecimal functions correctly.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134346213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2017 30th IEEE International System-on-Chip Conference (SOCC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1