AISTECS '16最新文献

英文中文

Bringing OptoBoards to HPC-scale environments: An OptoHPC simulation engine 将光电板带入hpc规模环境:一个OptoHPC仿真引擎

AISTECS '16

Pub Date : 2016-01-18 DOI: 10.1145/2857058.2857062

N. Terzenidis, P. Maniotis, N. Pleros

The increased communication bandwidth demands of HPC-systems calling at the same time for reduced latency and increased power efficiency have designated optical interconnects as the key technology in order to achieve the target of exascale performance. In this realm, technology advances have to be accompanied by corresponding simulation tools that support end-to-end system modeling in order to evaluate the performance benefits offered by optical components at system-environment. We present here the OptoHPC-Sim, which supports the utilization of optical interconnect and electro-optical routing technologies at system-scale offering complete end-to-end simulation of HPC-systems and allowing for reliable comparison with existing HPC platforms. OptoHPC-sim has been developed using the Omnet++ platform and is designed to offer the optimum balance between the model detail and the simulation execution time. We describe the design of the simulation engine and demonstrate the capabilities of OptoHPC-sim by comparing an HPC system employing state-of-the-art optoelectronic routers and optical interconnects with the Cray XK7 system platform.

高性能计算系统对通信带宽的需求不断增加，同时要求降低延迟和提高功率效率，这使得光互连成为实现百亿亿级性能目标的关键技术。在这个领域，技术进步必须伴随着相应的仿真工具，这些工具支持端到端系统建模，以便在系统环境中评估光学组件提供的性能优势。我们在这里展示了OptoHPC-Sim，它支持在系统规模上利用光互连和光电路由技术，提供完整的HPC系统端到端模拟，并允许与现有HPC平台进行可靠的比较。OptoHPC-sim是使用omnet++平台开发的，旨在提供模型细节和仿真执行时间之间的最佳平衡。我们描述了仿真引擎的设计，并通过比较采用最先进的光电路由器和光学互连的高性能计算系统与Cray XK7系统平台来展示OptoHPC-sim的功能。

引用次数: 2

Designing an Efficient MPLS-Based Switch for FAT Tree Network-on-Chip Systems 设计一种高效的基于mpls的FAT树片上网络交换机

AISTECS '16

Pub Date : 2016-01-18 DOI: 10.1145/2857058.2857059

Najwa Salama, A. M. Sllame

This paper describes a proposal for FAT tree based Network-on-Chip system based on MPLS forwarding mechanism. The FAT tree includes processing nodes and communication switches. IP node (processing nodes) has a message generator unit which randomly generates messages to different destinations with different packet lengths and buffering. The switch is based on MPLS technique and consists of the following units: crossbar switch, input/output link controllers and routing and arbitration units. A simulator has been developed in C++ to analyze the proposed architecture. A comparison with wormhole switch is provided to show the efficiency of the MPLS designed switch.

本文提出了一种基于FAT树的基于MPLS转发机制的片上网络系统方案。FAT树包括处理节点和通信交换机。IP节点(处理节点)具有一个消息生成单元，该单元随机生成具有不同包长度和缓冲的消息到不同的目的地。开关是基于MPLS技术和包括以下单位:纵横开关,输入/输出连接控制器和路由和仲裁单位。用c++开发了一个模拟器来分析所提出的体系结构。通过与虫孔交换机的比较，验证了所设计的MPLS交换机的效率。

引用次数: 4

JADE: a Heterogeneous Multiprocessor System Simulation Platform Using Recorded and Statistical Application Models JADE:使用记录和统计应用模型的异构多处理器系统仿真平台

AISTECS '16

Pub Date : 2016-01-18 DOI: 10.1145/2857058.2857066

R. K. V. Maeda, Peng Yang, Xiaowen Wu, Zhe Wang, Jiang Xu, Zhehui Wang, Haoran Li, Luan H. K. Duong, Zhifei Wang

Recent advances in the computing industry towards multiprocessor technologies shifted the dominant method of performance increase from frequency scaling to parallelism. Due to its huge design space, evaluating candidate multicore architectures in early design stages, when the number of variables is at its maximum, is challenging. Simulation plays an important role in estimating architecture performance, and evaluating how the system would perform on average, as well as boundary cases, would require many iterations to cover various cases in the application input domain. Since simulation of heterogeneous systems with enough details are naturally slow, exhaustively evaluating the system for all possible inputs require tremendous amount of time and resources. While there exist quite a few multiprocessor simulators available, they often rely on individual input specification, demanding extensive input enumeration and simulation runs, diminishing their effectiveness for complex systems evaluation. Aiming to fulfill this gap, we publicly release a heterogeneous multiprocessor system simulation platform called JADE, targeting fast initial architecture explorations. Opposing to most simulators, JADE uses statistical models that follow distributions extracted from internal structures of the application, providing a more convenient and systematic exploration approach to evaluate systems performance. JADE simulation features include detailed electrical and optical interconnections, detailed memory hierarchy infrastructure, and built-in energy analysis allowing studies of a broad spectrum of systems.

近年来，计算行业在多处理器技术方面的进展将主要的性能提高方法从频率缩放转变为并行化。由于其巨大的设计空间，在变量数量最大的早期设计阶段评估候选多核架构是具有挑战性的。仿真在评估体系结构性能方面扮演着重要的角色，并且评估系统的平均性能，以及边界情况，将需要许多迭代来覆盖应用程序输入域中的各种情况。由于具有足够细节的异构系统的模拟自然很慢，因此详尽地评估系统的所有可能输入需要大量的时间和资源。虽然有相当多的多处理器模拟器可用，但它们通常依赖于单个输入规范，需要大量的输入枚举和模拟运行，从而降低了它们对复杂系统评估的有效性。为了填补这一空白，我们公开发布了一个名为JADE的异构多处理器系统仿真平台，目标是快速的初始架构探索。与大多数模拟器不同，JADE使用的统计模型遵循从应用程序内部结构中提取的分布，提供了一种更方便和系统的探索方法来评估系统性能。JADE仿真功能包括详细的电气和光学互连，详细的存储器层次结构基础设施，以及允许对广泛系统进行研究的内置能量分析。

{"title":"JADE: a Heterogeneous Multiprocessor System Simulation Platform Using Recorded and Statistical Application Models","authors":"R. K. V. Maeda, Peng Yang, Xiaowen Wu, Zhe Wang, Jiang Xu, Zhehui Wang, Haoran Li, Luan H. K. Duong, Zhifei Wang","doi":"10.1145/2857058.2857066","DOIUrl":"https://doi.org/10.1145/2857058.2857066","url":null,"abstract":"Recent advances in the computing industry towards multiprocessor technologies shifted the dominant method of performance increase from frequency scaling to parallelism. Due to its huge design space, evaluating candidate multicore architectures in early design stages, when the number of variables is at its maximum, is challenging. Simulation plays an important role in estimating architecture performance, and evaluating how the system would perform on average, as well as boundary cases, would require many iterations to cover various cases in the application input domain. Since simulation of heterogeneous systems with enough details are naturally slow, exhaustively evaluating the system for all possible inputs require tremendous amount of time and resources. While there exist quite a few multiprocessor simulators available, they often rely on individual input specification, demanding extensive input enumeration and simulation runs, diminishing their effectiveness for complex systems evaluation. Aiming to fulfill this gap, we publicly release a heterogeneous multiprocessor system simulation platform called JADE, targeting fast initial architecture explorations. Opposing to most simulators, JADE uses statistical models that follow distributions extracted from internal structures of the application, providing a more convenient and systematic exploration approach to evaluate systems performance. JADE simulation features include detailed electrical and optical interconnections, detailed memory hierarchy infrastructure, and built-in energy analysis allowing studies of a broad spectrum of systems.","PeriodicalId":292715,"journal":{"name":"AISTECS '16","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129308986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

PhoenixSim: Crosslayer Design and Modeling of Silicon Photonic Interconnects PhoenixSim:硅光子互连的跨层设计和建模

AISTECS '16

Pub Date : 2016-01-18 DOI: 10.1145/2857058.2857061

S. Rumley, M. Bahadori, K. Wen, D. Nikolova, K. Bergman

Silicon Photonics is emerging as a key technology for high-performance computing interconnects. Yet few tools are available to investigate how to best leverage this technology in current or future computer architectures and, furthermore, how this technology will impact real application workloads. In this paper, we present a multi-layer simulation and modeling software solution -- PhoenixSim. PhoenixSim enables integrated and interactive design space exploration over the physical, networking and application layers. In this paper, we report its general organization and constituting models. We show how the different layers of the tool can be utilized to design and analyze an optical interconnect network for supporting the HPCG (High Performance Conjugate Gradient) benchmark.

硅光子学正在成为高性能计算互连的关键技术。然而，很少有工具可用于研究如何在当前或未来的计算机体系结构中最好地利用该技术，以及该技术将如何影响实际的应用程序工作负载。本文提出了一种多层仿真建模软件解决方案——PhoenixSim。PhoenixSim支持在物理层、网络层和应用层上进行集成和交互的设计空间探索。本文报道了它的一般组织和构成模型。我们展示了如何利用该工具的不同层来设计和分析支持HPCG(高性能共轭梯度)基准的光互连网络。

引用次数: 14

Energy Efficient And Low Latency Interconnection Network For Multicast Invalidates In Shared Memory Systems 节能低延迟多播互连网络在共享内存系统中失效

AISTECS '16

Pub Date : 2016-01-18 DOI: 10.1145/2857058.2857065

Muhammad Ridwan Madarbux, A. Laer, P. Watts, Timothy M. Jones

Optical network-on-chip (NoC) are being investigated to reduce the latency and power consumption of networks for multicore processors. Our previous work has shown that switched optical networks can achieve lower latency for a given power consumption and component count in shared memory processors compared with arbitration-free networks such as single writer multiple reader. We have also shown the advantage of leaving optical circuits open after being generated to capture multiple memory transactions. However invalidation processes, where numerous cores are sharing a memory block, need to establish a large number of very short lived circuits and this increases the average message latency and overall on-chip contention. In this paper, a low power broadcast architecture is proposed which deals specifically with multicast messages. Separating multicast messages from unicast ones shows an improvement in average arbitration latency of up to 88.2% for the Vips benchmark while the Swaptions benchmark shows the highest improvement in average memory access time (up to 21.1%). Vips also sees an increase of 147% in the average number of messages passing through an open optical circuit. Obtaining these advantages requires an additional broadcast network which consumes only 66.1mW power.

为了减少多核处理器网络的延迟和功耗，人们正在研究光片上网络(NoC)。我们之前的工作表明，与无仲裁的网络(如单写入器多读取器)相比，交换光网络在给定功耗和共享内存处理器中组件数量的情况下可以实现更低的延迟。我们还展示了在生成后保持光学电路打开以捕获多个存储事务的优势。然而，当多个内核共享一个内存块时，无效进程需要建立大量寿命非常短的电路，这增加了平均消息延迟和总体片上争用。本文提出了一种低功耗广播架构，专门处理组播消息。将多播消息与单播消息分离后，Vips基准测试的平均仲裁延迟提高了88.2%，而Swaptions基准测试的平均内存访问时间改善最大(提高了21.1%)。vip还看到通过开放光学电路的平均消息数量增加了147%。获得这些优势需要一个额外的广播网络，仅消耗66.1mW的功率。

{"title":"Energy Efficient And Low Latency Interconnection Network For Multicast Invalidates In Shared Memory Systems","authors":"Muhammad Ridwan Madarbux, A. Laer, P. Watts, Timothy M. Jones","doi":"10.1145/2857058.2857065","DOIUrl":"https://doi.org/10.1145/2857058.2857065","url":null,"abstract":"Optical network-on-chip (NoC) are being investigated to reduce the latency and power consumption of networks for multicore processors. Our previous work has shown that switched optical networks can achieve lower latency for a given power consumption and component count in shared memory processors compared with arbitration-free networks such as single writer multiple reader. We have also shown the advantage of leaving optical circuits open after being generated to capture multiple memory transactions. However invalidation processes, where numerous cores are sharing a memory block, need to establish a large number of very short lived circuits and this increases the average message latency and overall on-chip contention.\u0000 In this paper, a low power broadcast architecture is proposed which deals specifically with multicast messages. Separating multicast messages from unicast ones shows an improvement in average arbitration latency of up to 88.2% for the Vips benchmark while the Swaptions benchmark shows the highest improvement in average memory access time (up to 21.1%). Vips also sees an increase of 147% in the average number of messages passing through an open optical circuit. Obtaining these advantages requires an additional broadcast network which consumes only 66.1mW power.","PeriodicalId":292715,"journal":{"name":"AISTECS '16","volume":"32 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129989434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evolutionary vs. Revolutionary Interconnect Technologies for Future Low-Power Multi-Core Systems 未来低功耗多核系统的进化与革命互连技术

AISTECS '16

Pub Date : 2016-01-18 DOI: 10.1145/2857058.2857063

Gabriele Miorandi, Mahdi Tala, Marco Balboni, L. Ramini, D. Bertozzi

Networks-on-chip (NoCs) are today at the core of multi- and many-core systems, acting as the system-level integration framework. In order to support scaling to future device generations, NoCs will struggle to deliver the required communication performance within tight power budgets. In this respect, evolutionary as well as revolutionary interconnect technologies are currently being considered. On one hand, clockless handshaking materializes GALS systems that completely remove the system clock while reducing idle power to only the leakage power. On the other hand, the technology platform could be changed, by replacing electrical wires with optical links and networks. This paper provides a comprehensive power analysis of the two technologies under test on a path-by-path basis, by comparing them with each other and with a baseline synchronous NoC. The outcome of this paper can support the selection of interconnect solutions for future manycore systems where power is the primary concern, as well as the runtime selection policy of routing paths in the context of hybrid interconnect fabrics.

片上网络(noc)是当今多核和多核系统的核心，充当系统级集成框架。为了支持扩展到未来的设备一代，noc将努力在紧张的功率预算内提供所需的通信性能。在这方面，目前正在考虑进化和革命性的互连技术。一方面，无时钟握手实现了GALS系统，它完全消除了系统时钟，同时将空闲功率减少到只有泄漏功率。另一方面，技术平台可以改变，用光链路和网络代替电线。本文通过将两种技术相互比较并与基线同步NoC进行比较，对正在测试的两种技术进行了全面的功率分析。本文的结果可以支持未来多核系统的互连解决方案选择，其中功率是主要关注的问题，以及混合互连结构背景下路由路径的运行时选择策略。

引用次数: 1

Consideration of the Flit Size for Deflection Routing based Network-on-Chips 基于片上网络偏转路由的Flit尺寸考虑

AISTECS '16

Pub Date : 2016-01-18 DOI: 10.1145/2857058.2857060

Armin Runge, Reiner Kolla

Bufferless deflection routing enables energy and hardware efficient Network-on-Chips (NoCs). However, due to the lack of buffers, packet switching can not be deployed for such NoCs. Therefore, it is crucial to determine an appropriate flit size and link width, which can be considerably larger compared to packet switched NoCs. In this work, we investigate the effect of the flit size on hardware costs and on performance for NoCs based on a permutation network and additionally on deflection routing. We show that hardware requirements for a permutation network based router increase linearly. The performance decreases exponentially with smaller link widths, however a moderate reduction of the link width can be an option.

无缓冲偏转路由使能源和硬件高效的片上网络(noc)。然而，由于缺乏缓冲区，不能为这样的noc部署分组交换。因此，确定适当的flit大小和链路宽度是至关重要的，这与分组交换noc相比可能要大得多。在这项工作中，我们研究了基于排列网络和偏转路由的noc的飞行大小对硬件成本和性能的影响。我们证明了基于置换网络的路由器的硬件需求呈线性增长。随着链路宽度的减小，性能呈指数级下降，不过，适当减小链路宽度也是一种选择。

引用次数: 3

Hierarchical Clustering for On-Chip Networks 片上网络的分层聚类

AISTECS '16

Pub Date : 2016-01-18 DOI: 10.1145/2857058.2857064

R. Hesse, Natalie D. Enright Jerger

Hierarchy and communication locality are a must for many-core systems. As systems scale to dozens or hundreds of cores, we simply cannot afford the power consumption and latency of random communication that spans the entire chip. Existing hierarchical Networks-on-Chip (NoCs) support communication locality only for a fixed cluster of nodes; providing a fixed hierarchy is too restrictive in terms of parallelism and data placement. Therefore, we propose a new, more flexible class of hierarchical NoCs: Elastic Hierarchical NoCs. Elastic Hierarchical NoCs dynamically adjust the number and size of clusters during runtime according to the system's communication demands. The interconnect can adapt to changes in communication locality across different application phases, between applications and in the presence of server consolidation. Our design improves overall system performance by up to 46% and 13% on average over a conventional 2D mesh and by up to 16% and 6% on average over an existing hierarchical NoC implementation. Power consumption is reduced by 45% and 7% respectively on average.

层次结构和通信局部性是多核心系统必须具备的。随着系统扩展到数十或数百个核心，我们根本无法承受跨越整个芯片的随机通信的功耗和延迟。现有的分层片上网络(noc)仅支持固定节点集群的通信局部性;在并行性和数据放置方面，提供固定层次结构的限制太大。因此，我们提出了一种新的、更灵活的分层noc:弹性分层noc。弹性分层noc在运行时根据系统的通信需求动态调整集群的数量和大小。互连可以适应跨不同应用程序阶段、应用程序之间以及存在服务器整合时通信局部性的变化。我们的设计比传统的2D网格平均提高了46%和13%的整体系统性能，比现有的分层NoC实现平均提高了16%和6%。能耗平均分别降低45%和7%。

引用次数: 0

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

AISTECS '16

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀