2015 ACM/IEEE International Workshop on System Level Interconnect Prediction (SLIP)最新文献

英文中文

Power line communication for hybrid power/signal pin SOC design 电力线通信的混合电源/信号引脚SOC设计

2015 ACM/IEEE International Workshop on System Level Interconnect Prediction (SLIP)

Pub Date : 2015-06-06 DOI: 10.1109/SLIP.2015.7171711

Xiang Zhang, Yang Liu, R. Coutts, Chung-Kuan Cheng

The number of available pins in ball grid array (BGA) of modern system-on-chips (SOCs) has been discussed as one of the major bottlenecks to the performance of the processors, for example many-core enabled portable devices, where the package size and PCB floorplan are tightly constrained. A typical SOC package allocates more than half of the pins for power delivery, resulting in the number of IO pins for off-chip communications is greatly reduced. We observe that the requirement for the number of power and ground (P/G) pins is driven by the highest performance state and the worst design corners, while SOCs are in lower performance state for most of the time for longer battery life. Under this observation, we propose to reuse some of the power pins as dynamic power/signal pins for off-chip data transmissions to increase the off-chip bandwidth during SOC low performance state. Our proposed method provides 20Gbps bandwidth per hybrid pin pair, while providing minimum impact to the original power delivery network (PDN) design.

现代片上系统(soc)的球栅阵列(BGA)中可用引脚的数量已被讨论为处理器性能的主要瓶颈之一，例如支持多核的便携式设备，其中封装尺寸和PCB平面设计受到严格限制。典型的SOC封装分配一半以上的引脚用于供电，导致用于片外通信的IO引脚数量大大减少。我们观察到，对电源和接地(P/G)引脚数量的要求是由最高性能状态和最差设计角驱动的，而soc在大多数时间处于较低的性能状态，以延长电池寿命。在此观察下，我们建议重用一些电源引脚作为片外数据传输的动态电源/信号引脚，以增加SOC低性能状态下的片外带宽。我们提出的方法提供每个混合引脚对20Gbps的带宽，同时对原始电力输送网络(PDN)设计的影响最小。

引用次数: 1

On fast timing closure: speeding up incremental path-based timing analysis with mapreduce 关于快速时序关闭:使用mapreduce加速增量路径时序分析

2015 ACM/IEEE International Workshop on System Level Interconnect Prediction (SLIP)

Pub Date : 2015-06-06 DOI: 10.1109/SLIP.2015.7171710

Tsung-Wei Huang, Martin D. F. Wong

Incremental path-based timing analysis (PBA) is a pivotal step in the timing optimization flow. A core building block analyzes the timing path-by-path subject to a critical amount of incremental changes on the design. However, this process in nature demands an extremely high computational complexity and has been a major bottleneck in accelerating timing closure. Therefore, we introduce in this paper a fast and scalable algorithm of incremental PBA with MapReduce - a recently popular programming paradigm in big-data era. Inspired by the spirit of MapReduce, we formulate our problem into tasks that are associated with keys and values and perform massively-parallel map and reduce operations on a distributed system. Experimental results demonstrated that our approach can not only easily analyze huge deisgns in a few minutes, but also quickly revalidate the timing after the incremental changes. Our results are beneficial for speeding up the lengthy design cycle of timing closure.

基于增量路径的时序分析(PBA)是时序优化流程中的关键步骤。核心构建块根据设计上的关键增量更改逐路径分析时序。然而，这个过程在本质上要求极高的计算复杂度，并且已经成为加速时序关闭的主要瓶颈。因此，本文介绍了一种基于MapReduce的快速、可扩展的增量式PBA算法。MapReduce是大数据时代最新流行的编程范式。受MapReduce精神的启发，我们将问题表述为与键和值相关的任务，并在分布式系统上执行大规模并行的map和reduce操作。实验结果表明，该方法不仅可以在几分钟内轻松分析大型设计，而且可以在增量更改后快速重新验证时间。我们的研究结果有利于加快时序闭合的设计周期。

引用次数: 3

Lynx: a self-organizing wireless sensor network with commodity palmtop computers Lynx:一种自组织无线传感器网络，使用普通掌上电脑

2015 ACM/IEEE International Workshop on System Level Interconnect Prediction (SLIP)

Pub Date : 2015-06-06 DOI: 10.1109/SLIP.2015.7171712

Haifeng Xu, M. Bilec, William O. Collinge, L. Schaefer, A. Landis, A. Jones

While the embedded class processors found in commodity palmtop computers continue to become increasingly capable, various wireless connectivity functions on them provide new opportunities in designing more flexible yet smarter wireless sensor networks (WSNs), and utilizing the computation power in a way we could never imagine before. Designing Lynx, a selforganizing wireless sensor network (SOWSN), is our further step taken in exploiting the potential of palmtop computers. Fundamental functionalities such as automatic neighbor relation detection, link state maintenance, sensor integration, and multihop routing, together make a real world distributively managed WSN system implementation work quite well. And by combining with Ocelot, our mobile distributed computing engine, sensor nodes are now capable of collecting, recording, processing and sending data without any central server support. Significant energy saving is achieved by the Lynx and Ocelot combined system, compare to traditional power-hungry computer platforms such as BOINC when doing same tasks.

当商用掌上电脑中的嵌入式类处理器继续变得越来越强大时，它们上的各种无线连接功能为设计更灵活更智能的无线传感器网络(wsn)提供了新的机会，并以我们以前从未想象过的方式利用计算能力。设计Lynx，一个自组织无线传感器网络(SOWSN)，是我们在开发掌上电脑潜力方面迈出的又一步。自动邻居关系检测、链路状态维护、传感器集成和多跳路由等基本功能共同使现实世界分布式管理WSN系统的实现工作得非常好。通过与Ocelot(我们的移动分布式计算引擎)相结合，传感器节点现在能够在没有任何中央服务器支持的情况下收集、记录、处理和发送数据。在执行相同的任务时，与传统的耗电计算机平台(如BOINC)相比，Lynx和Ocelot组合系统实现了显著的节能。

{"title":"Lynx: a self-organizing wireless sensor network with commodity palmtop computers","authors":"Haifeng Xu, M. Bilec, William O. Collinge, L. Schaefer, A. Landis, A. Jones","doi":"10.1109/SLIP.2015.7171712","DOIUrl":"https://doi.org/10.1109/SLIP.2015.7171712","url":null,"abstract":"While the embedded class processors found in commodity palmtop computers continue to become increasingly capable, various wireless connectivity functions on them provide new opportunities in designing more flexible yet smarter wireless sensor networks (WSNs), and utilizing the computation power in a way we could never imagine before. Designing Lynx, a selforganizing wireless sensor network (SOWSN), is our further step taken in exploiting the potential of palmtop computers. Fundamental functionalities such as automatic neighbor relation detection, link state maintenance, sensor integration, and multihop routing, together make a real world distributively managed WSN system implementation work quite well. And by combining with Ocelot, our mobile distributed computing engine, sensor nodes are now capable of collecting, recording, processing and sending data without any central server support. Significant energy saving is achieved by the Lynx and Ocelot combined system, compare to traditional power-hungry computer platforms such as BOINC when doing same tasks.","PeriodicalId":431489,"journal":{"name":"2015 ACM/IEEE International Workshop on System Level Interconnect Prediction (SLIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130350229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Smart I/Os: a data-pattern aware 2.5D interconnect with space-time multiplexing 智能I/ o:一种数据模式感知的2.5D时空复用互连

2015 ACM/IEEE International Workshop on System Level Interconnect Prediction (SLIP)

Pub Date : 2015-06-06 DOI: 10.1109/SLIP.2015.7171707

Sai Manoj Pudukotai Dinakarrao, Kanwen Wang, Hantao Huang, Hao Yu

A data-pattern aware smart I/O is introduced in this paper for 2.5D through-silicon interposer (TSI) interconnect based memory-logic integration. To match huge many-core bandwidth demand with limited supply of 2.5D I/O channels when accessing one shared memory, a space-time multiplexing based channel utilisation is developed inside the memory controller to reuse 2.5D I/O channels. Many cores are adaptively classified into clusters based on the bandwidth demand by space multiplexing to access the shared memory. Time multiplexing is then performed to schedule the cores in one cluster to occupy the supplied 2.5D I/O channels at different time-slots upon priority. The proposed smart 2.5D TSI I/O is verified by the system-level simulator with benchmarked workloads, which shows up to 58.85% bandwidth balancing and 11.90% QoS improvement.

本文介绍了一种基于2.5D通硅介面(TSI)互连的存储逻辑集成的数据模式感知智能I/O。当访问一个共享内存时，为了匹配巨大的多核带宽需求和有限的2.5D I/O通道供应，在内存控制器内部开发了基于时空复用的通道利用来重用2.5D I/O通道。通过空间多路复用，将多个内核根据带宽需求自适应地划分为集群，以访问共享内存。然后执行时间复用来调度一个集群中的内核，根据优先级在不同的时隙占用提供的2.5D I/O通道。通过系统级模拟器和基准工作负载对所提出的智能2.5D TSI I/O进行了验证，其带宽均衡率高达58.85%，QoS提高了11.90%。

引用次数: 2

Multi-product floorplan and uncore design framework for chip multiprocessors 芯片多处理器的多产品平面图和非核心设计框架

2015 ACM/IEEE International Workshop on System Level Interconnect Prediction (SLIP)

Pub Date : 2015-06-06 DOI: 10.1109/SLIP.2015.7171713

M. Escalante, A. Kahng, M. Kishinevsky, Ümit Y. Ogras, K. Samadi

Chip multiprocessors (CMPs) for server and high-performance computing markets are offered in multiple classes to satisfy various power, performance and cost requirements. As the number of processor cores on a single die grows, resources outside the “core”, such as the distributed last-level cache, on-chip memory controllers and network-on-chip (NoC) interconnecting these resources, which constitute the “uncore”, play an increasingly important role. While it is crucial to optimize the floorplan and uncore of each product class to achieve the best power-performance tradeoff, independent optimization may greatly increase the design effort, and undermine the savings ultimately achieved with a given total amount of optimization effort. This paper presents a novel multi-product optimization framework for next generation CMPs. Unlike traditional chip optimization techniques, we optimize the floorplan of multiple product classes at once, and ensure that the smaller floorplans can be obtained from larger ones by optimally removing, i.e., chopping, the unused parts.

用于服务器和高性能计算市场的芯片多处理器(cmp)分为多个类别，以满足各种功率，性能和成本要求。随着单个芯片上处理器核心数量的增长，“核心”之外的资源，如分布式的最后一级缓存、片上存储器控制器和连接这些资源的片上网络(NoC)，构成了“非核心”，发挥着越来越重要的作用。虽然优化每个产品类别的布局和非核心是实现最佳功率性能权衡的关键，但独立的优化可能会大大增加设计工作量，并破坏在给定总优化努力下最终实现的节省。提出了一种适用于下一代cmp的新型多产品优化框架。与传统的芯片优化技术不同，我们一次优化多个产品类别的平面布置图，并确保通过最优地去除(即切割)未使用的部件，从较大的平面布置图中获得较小的平面布置图。

引用次数: 1

Compact modeling and system implications of microring modulators in nanophotonic interconnects 纳米光子互连中微环调制器的紧凑建模和系统意义

2015 ACM/IEEE International Workshop on System Level Interconnect Prediction (SLIP)

Pub Date : 2015-06-06 DOI: 10.1109/SLIP.2015.7171708

Rui Wu, Chin-Hui Chen, J. Fédéli, M. Fournier, R. Beausoleil, K. Cheng

Silicon microring modulators are critical components in optical on-chip communications. In this paper, we develop theoretical compact models for optical transmission, power consumption, bit-error-rate (BER), and electrical tuning of microring modulators. The proposed theoretical models have been extensively validated by fabricated devices from a number of designs and fabrication batches. Since the quality factor (Q) and the extinction ratio (ER) of the microring modulator are important to determine the BER and link power budget, we include accurate equations for the Q and the ER in our models. Based on the proposed models, we identify an extra power penalty for the electrical tuning, and an energy-efficient swing voltage for the microring modulator to achieve to minimum total energy consumption.

硅微环调制器是光学片上通信的关键部件。本文建立了微环调制器的光传输、功耗、误码率(BER)和电调谐的理论紧凑模型。所提出的理论模型已经被许多设计和制造批次的制造设备广泛验证。由于微环调制器的质量因子(Q)和消光比(ER)对确定误码率和链路功率预算很重要，因此我们在模型中包含了精确的Q和ER方程。基于所提出的模型，我们确定了电调谐的额外功率惩罚，以及微环调制器的节能摆幅电压，以实现最小的总能耗。

引用次数: 15

SI for free: machine learning of interconnect coupling delay and transition effects 免费SI:互连耦合延迟和跃迁效应的机器学习

2015 ACM/IEEE International Workshop on System Level Interconnect Prediction (SLIP)

Pub Date : 2015-06-06 DOI: 10.1109/SLIP.2015.7171706

A. Kahng, Mulong Luo, S. Nath

In advanced technology nodes, incremental delay due to coupling is a serious concern. Design companies spend significant resources on static timing analysis (STA) tool licenses with signal integrity (SI) enabled. The runtime of the STA tools in SI mode is typically large due to complex algorithms and iterative calculation of timing windows to accurately determine aggressor and victim alignments, as well as delay and slew estimations. In this work, we develop machine learning-based predictors of timing in SI mode based on timing reports from non-SI mode. Timing analysis in non-SI mode is faster and the license costs can be several times less than those of SI mode. We determine electrical and logic structure parameters that affect the incremental arc delay/slew and path delay (i.e., the difference in arrival times at the clock pin of the launch flip-flop and the D pin of the capture flip-flop) in SI mode, and develop models that can predict these SI-aware delays. We report worst-case error of 7.0ps and average error of 0.7ps for our models to predict incremental transition time, worst-case error of 5.2ps and average error of 1.2ps for our models to predict incremental delay, and worst-case error of 8.2ps and average error of 1.7ps for our models to predict path delay, in 28nm FDSOI technology. We also demonstrate that our models are robust across designs and signoff constraints at a particular technology node.

在先进的技术节点中，由于耦合导致的增量延迟是一个严重的问题。设计公司在启用信号完整性(SI)的静态时序分析(STA)工具许可上花费了大量资源。在SI模式下，STA工具的运行时间通常很长，因为要精确地确定攻击者和受害者的排列，需要复杂的算法和迭代计算时间窗口，以及延迟和旋转估计。在这项工作中，我们基于非SI模式的时序报告开发了基于机器学习的SI模式时序预测器。非SI模式下的时序分析速度更快，许可成本比SI模式低几倍。我们确定了在SI模式下影响增量电弧延迟/摆和路径延迟(即发射触发器的时钟脚和捕获触发器的D脚到达时间的差异)的电气和逻辑结构参数，并开发了可以预测这些SI感知延迟的模型。在28nm FDSOI技术中，我们的模型预测增量过渡时间的最坏情况误差为7.0ps，平均误差为0.7ps;预测增量延迟的最坏情况误差为5.2ps，平均误差为1.2ps;预测路径延迟的最坏情况误差为8.2ps，平均误差为1.7ps。我们还演示了我们的模型在特定技术节点的设计和签名约束中是健壮的。

{"title":"SI for free: machine learning of interconnect coupling delay and transition effects","authors":"A. Kahng, Mulong Luo, S. Nath","doi":"10.1109/SLIP.2015.7171706","DOIUrl":"https://doi.org/10.1109/SLIP.2015.7171706","url":null,"abstract":"In advanced technology nodes, incremental delay due to coupling is a serious concern. Design companies spend significant resources on static timing analysis (STA) tool licenses with signal integrity (SI) enabled. The runtime of the STA tools in SI mode is typically large due to complex algorithms and iterative calculation of timing windows to accurately determine aggressor and victim alignments, as well as delay and slew estimations. In this work, we develop machine learning-based predictors of timing in SI mode based on timing reports from non-SI mode. Timing analysis in non-SI mode is faster and the license costs can be several times less than those of SI mode. We determine electrical and logic structure parameters that affect the incremental arc delay/slew and path delay (i.e., the difference in arrival times at the clock pin of the launch flip-flop and the D pin of the capture flip-flop) in SI mode, and develop models that can predict these SI-aware delays. We report worst-case error of 7.0ps and average error of 0.7ps for our models to predict incremental transition time, worst-case error of 5.2ps and average error of 1.2ps for our models to predict incremental delay, and worst-case error of 8.2ps and average error of 1.7ps for our models to predict path delay, in 28nm FDSOI technology. We also demonstrate that our models are robust across designs and signoff constraints at a particular technology node.","PeriodicalId":431489,"journal":{"name":"2015 ACM/IEEE International Workshop on System Level Interconnect Prediction (SLIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128071679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Clock clustering and IO optimization for 3D integration 时钟集群和IO优化的3D集成

2015 ACM/IEEE International Workshop on System Level Interconnect Prediction (SLIP)

Pub Date : 2015-06-06 DOI: 10.1109/SLIP.2015.7171709

Samyoung Bang, Kwangsoo Han, A. Kahng, V. Srinivas

3D interconnect between two dies can span a wide range of bandwidths and region areas, depending on the application, partitioning of the dies, die size, and floorplan. We explore the concept of dividing such an interconnect into local clusters, each with a cluster clock. We combine such clustering with a choice of three clock synchronization schemes (synchronous, source-synchronous, asynchronous) and study impacts on power, area and timing of the clock tree, data path and 3DIO. We build a model for the power, area and timing as a function of key system requirements and constraints: total bandwidth, region area, number of clusters, clock synchronization scheme, and 3DIO frequency. Such a model enables architects to perform pathfinding exploration of clocking and IO power, area and bandwidth optimization for 3D integration.

两个模具之间的3D互连可以跨越广泛的带宽和区域区域，这取决于应用，模具的划分，模具尺寸和平面图。我们探索了将这样的互连划分为本地集群的概念，每个集群都有一个集群时钟。我们将这种聚类与三种时钟同步方案(同步、源同步、异步)的选择结合起来，研究对时钟树、数据路径和3DIO的功率、面积和时序的影响。我们建立了功耗、面积和时序的模型，作为关键系统需求和约束的函数:总带宽、区域面积、集群数量、时钟同步方案和3DIO频率。这样的模型使架构师能够为3D集成执行时钟和IO功率，面积和带宽优化的寻路探索。

引用次数: 2

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2015 ACM/IEEE International Workshop on System Level Interconnect Prediction (SLIP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀