Overcoming far-end congestion in large-scale networks

Jongmin Won, Gwangsun Kim, John Kim, Ted Jiang, Mike Parker, Steve Scott
{"title":"Overcoming far-end congestion in large-scale networks","authors":"Jongmin Won, Gwangsun Kim, John Kim, Ted Jiang, Mike Parker, Steve Scott","doi":"10.1109/HPCA.2015.7056051","DOIUrl":null,"url":null,"abstract":"Accurately estimating congestion for proper global adaptive routing decisions (i.e., determine whether a packet should be routed minimally or non-minimally) has a significant impact on overall performance for high-radix topologies, such as the Dragonfly topology. Prior work have focused on understanding near-end congestion - i.e., congestion that occurs at the current router - or downstream congestion - i.e., congestion that occurs in downstream routers. However, most prior work do not evaluate the impact of far-end congestion or the congestion from the high channel latency between the routers. In this work, we refer to far-end congestion as phantom congestion as the congestion is not \"real\" congestion. Because of the long inter-router latency, the in-flight packets (and credits) result in inaccurate congestion information and can lead to inaccurate adaptive routing decisions. In addition, we show how transient congestion occurs as the occupancy of network queues fluctuate due to random traffic variation, even in steady-state conditions. This also results in inaccurate adaptive routing decisions that degrade network performance with lower throughput and higher latency. To overcome these limitations, we propose a history-window based approach to remove the impact of phantom congestion. We also show how using the average of local queue occupancies and adding an offset significantly remove the impact of transient congestion. Our evaluations of the adaptive routing in a large-scale Dragonfly network show that the combination of these techniques results in an adaptive routing that nearly matches the performance of an ideal adaptive routing algorithm.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"2 1","pages":"415-427"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"53","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2015.7056051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 53

Abstract

Accurately estimating congestion for proper global adaptive routing decisions (i.e., determine whether a packet should be routed minimally or non-minimally) has a significant impact on overall performance for high-radix topologies, such as the Dragonfly topology. Prior work have focused on understanding near-end congestion - i.e., congestion that occurs at the current router - or downstream congestion - i.e., congestion that occurs in downstream routers. However, most prior work do not evaluate the impact of far-end congestion or the congestion from the high channel latency between the routers. In this work, we refer to far-end congestion as phantom congestion as the congestion is not "real" congestion. Because of the long inter-router latency, the in-flight packets (and credits) result in inaccurate congestion information and can lead to inaccurate adaptive routing decisions. In addition, we show how transient congestion occurs as the occupancy of network queues fluctuate due to random traffic variation, even in steady-state conditions. This also results in inaccurate adaptive routing decisions that degrade network performance with lower throughput and higher latency. To overcome these limitations, we propose a history-window based approach to remove the impact of phantom congestion. We also show how using the average of local queue occupancies and adding an offset significantly remove the impact of transient congestion. Our evaluations of the adaptive routing in a large-scale Dragonfly network show that the combination of these techniques results in an adaptive routing that nearly matches the performance of an ideal adaptive routing algorithm.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
克服大规模网络中的远端拥塞
为了正确的全局自适应路由决策而准确估计拥塞(即,确定数据包是否应该最小限度地路由或非最小限度地路由)对高基数拓扑(如Dragonfly拓扑)的整体性能有重大影响。先前的工作集中在理解近端拥塞——即发生在当前路由器上的拥塞——或下游拥塞——即发生在下游路由器上的拥塞。然而,大多数先前的工作没有评估远端拥塞的影响或路由器之间的高信道延迟造成的拥塞。在本研究中,我们将远端拥塞称为幻象拥塞,因为这种拥塞不是“真实的”拥塞。由于路由器间延迟较长,飞行中的数据包(和信用)会导致不准确的拥塞信息,并可能导致不准确的自适应路由决策。此外,我们还展示了即使在稳态条件下,当网络队列的占用率由于随机流量变化而波动时,如何发生瞬态拥塞。这还会导致不准确的自适应路由决策,从而降低网络性能,降低吞吐量,提高延迟。为了克服这些限制,我们提出了一种基于历史窗口的方法来消除幻象拥塞的影响。我们还展示了如何使用本地队列占用的平均值并添加偏移量来显著消除瞬时拥塞的影响。我们对大规模蜻蜓网络中自适应路由的评估表明,这些技术的组合产生的自适应路由几乎与理想的自适应路由算法的性能相匹配。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Parameter Identification Inverse Problems of Partial Differential Equations Based on the Improved Gene Expression Programming High-Efficiency Realization of SRT Division on Ternary Optical Computers A Fast Training Method for Transductive Support Vector Machine in Semi-supervised Learning Performance Optimization of a DEM Simulation Framework on GPU Using a Stencil Model A Platform for Routine Development of Ternary Optical Computers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1