Saksham Agarwal, R. Agarwal, Behnam Montazeri, M. Moshref, Khaled Elmeleegy, L. Rizzo, M. Kruijf, G. Kumar, S. Ratnasamy, D. Culler, A. Vahdat
{"title":"Understanding host interconnect congestion","authors":"Saksham Agarwal, R. Agarwal, Behnam Montazeri, M. Moshref, Khaled Elmeleegy, L. Rizzo, M. Kruijf, G. Kumar, S. Ratnasamy, D. Culler, A. Vahdat","doi":"10.1145/3563766.3564110","DOIUrl":null,"url":null,"abstract":"We present evidence and characterization of host congestion in production clusters: adoption of high-bandwidth access links leading to emergence of bottlenecks within the host interconnect (NIC-to-CPU data path). We demonstrate that contention on existing IO memory management units and/or the memory subsystem can significantly reduce the available NIC-to-CPU bandwidth, resulting in hundreds of microseconds of queueing delays and eventual packet drops at hosts (even when running a state-of-the-art congestion control protocol that accounts for CPU-induced host congestion). We also discuss implications of host interconnect congestion to design of future host architecture, network stacks and network protocols.","PeriodicalId":339381,"journal":{"name":"Proceedings of the 21st ACM Workshop on Hot Topics in Networks","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st ACM Workshop on Hot Topics in Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3563766.3564110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
We present evidence and characterization of host congestion in production clusters: adoption of high-bandwidth access links leading to emergence of bottlenecks within the host interconnect (NIC-to-CPU data path). We demonstrate that contention on existing IO memory management units and/or the memory subsystem can significantly reduce the available NIC-to-CPU bandwidth, resulting in hundreds of microseconds of queueing delays and eventual packet drops at hosts (even when running a state-of-the-art congestion control protocol that accounts for CPU-induced host congestion). We also discuss implications of host interconnect congestion to design of future host architecture, network stacks and network protocols.