Jack Tigar Humphries, Neel Natu, Kostis Kaffes, Stanko Novaković, Paul Turner, Hank Levy, David Culler, Christos Kozyrakis
{"title":"Wave: A Split OS Architecture for Application Engines","authors":"Jack Tigar Humphries, Neel Natu, Kostis Kaffes, Stanko Novaković, Paul Turner, Hank Levy, David Culler, Christos Kozyrakis","doi":"arxiv-2408.17351","DOIUrl":null,"url":null,"abstract":"The end of Moore's Law and the tightening performance requirements in today's\nclouds make re-architecting the software stack a necessity. To address this,\ncloud providers and vendors offload the virtualization control plane and data\nplane, along with the host OS data plane, to IPUs (SmartNICs), recovering\nscarce host resources that are then used by applications. However, the host OS\ncontrol plane--encompassing kernel thread scheduling, memory management, the\nnetwork stack, file systems, and more--is left on the host CPU and degrades\nworkload performance. This paper presents Wave, a split OS architecture that moves OS subsystem\npolicies to the IPU while keeping OS mechanisms on the host CPU. Wave not only\nfrees host CPU resources, but it reduces host workload interference and\nleverages network insights on the IPU to improve policy decisions. Wave makes\nOS control plane offloading practical despite high host-IPU communication\nlatency, lack of a coherent interconnect, and operation across two system\nimages. We present Wave's design and implementation, and implement several OS\nsubsystems in Wave, including kernel thread scheduling, the control plane for a\nnetwork stack, and memory management. We then evaluate the Wave subsystems on\nStubby (scheduling and network), our GCE VM service (scheduling), and RocksDB\n(memory management and scheduling). We demonstrate that Wave subsystems are\ncompetitive with and often superior to on-host subsystems, saving 8 host CPUs\nfor Stubby, 16 host CPUs for database memory management, and improving VM\nperformance by up to 11.2%.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.17351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The end of Moore's Law and the tightening performance requirements in today's
clouds make re-architecting the software stack a necessity. To address this,
cloud providers and vendors offload the virtualization control plane and data
plane, along with the host OS data plane, to IPUs (SmartNICs), recovering
scarce host resources that are then used by applications. However, the host OS
control plane--encompassing kernel thread scheduling, memory management, the
network stack, file systems, and more--is left on the host CPU and degrades
workload performance. This paper presents Wave, a split OS architecture that moves OS subsystem
policies to the IPU while keeping OS mechanisms on the host CPU. Wave not only
frees host CPU resources, but it reduces host workload interference and
leverages network insights on the IPU to improve policy decisions. Wave makes
OS control plane offloading practical despite high host-IPU communication
latency, lack of a coherent interconnect, and operation across two system
images. We present Wave's design and implementation, and implement several OS
subsystems in Wave, including kernel thread scheduling, the control plane for a
network stack, and memory management. We then evaluate the Wave subsystems on
Stubby (scheduling and network), our GCE VM service (scheduling), and RocksDB
(memory management and scheduling). We demonstrate that Wave subsystems are
competitive with and often superior to on-host subsystems, saving 8 host CPUs
for Stubby, 16 host CPUs for database memory management, and improving VM
performance by up to 11.2%.