首页 > 最新文献

Proceedings of the 21st ACM Workshop on Hot Topics in Networks最新文献

英文 中文
Automating network heuristic design and analysis 自动化网络启发式设计和分析
Pub Date : 2022-11-14 DOI: 10.1145/3563766.3564085
Anup Agarwal, V. Arun, Devdeep Ray, R. Martins, S. Seshan
Heuristics are ubiquitous in computer systems. Examples include congestion control, adaptive bit rate streaming, scheduling, load balancing, and caching. In some domains, theoretical proofs have provided clarity on the conditions where a heuristic is guaranteed to work well. This has not been possible in all domains because proving such guarantees can involve combinatorial reasoning making it hard, cumbersome and error-prone. In this paper we argue that computers should help humans with the combinatorial part of reasoning. We model reasoning questions as ∃∀ formulas [1] and solve them using the counterexample guided inductive synthesis (CEGIS) framework. As preliminary evidence, we prototype CCmatic, a tool that semi-automatically synthesizes congestion control algorithms that are provably robust. It rediscovered a recent congestion control algorithm that provably achieves high utilization and bounded delay under a challenging network model. It also found previously unknown variants of the algorithm that achieve different throughput-delay trade-offs.
启发式在计算机系统中无处不在。示例包括拥塞控制、自适应比特率流、调度、负载平衡和缓存。在某些领域,理论证明已经明确了启发式保证良好工作的条件。这并非在所有领域都可行,因为证明此类保证可能涉及组合推理,使其变得困难、繁琐且容易出错。在本文中,我们认为计算机应该帮助人类进行推理的组合部分。我们将推理问题建模为∃∀公式[1],并使用反例引导归纳综合(CEGIS)框架来解决它们。作为初步证据,我们原型CCmatic,一个工具,半自动合成拥塞控制算法,证明是鲁棒的。重新发现了一种最新的拥塞控制算法,证明该算法在具有挑战性的网络模型下实现了高利用率和有界延迟。它还发现了以前未知的算法变体,这些变体实现了不同的吞吐量-延迟权衡。
{"title":"Automating network heuristic design and analysis","authors":"Anup Agarwal, V. Arun, Devdeep Ray, R. Martins, S. Seshan","doi":"10.1145/3563766.3564085","DOIUrl":"https://doi.org/10.1145/3563766.3564085","url":null,"abstract":"Heuristics are ubiquitous in computer systems. Examples include congestion control, adaptive bit rate streaming, scheduling, load balancing, and caching. In some domains, theoretical proofs have provided clarity on the conditions where a heuristic is guaranteed to work well. This has not been possible in all domains because proving such guarantees can involve combinatorial reasoning making it hard, cumbersome and error-prone. In this paper we argue that computers should help humans with the combinatorial part of reasoning. We model reasoning questions as ∃∀ formulas [1] and solve them using the counterexample guided inductive synthesis (CEGIS) framework. As preliminary evidence, we prototype CCmatic, a tool that semi-automatically synthesizes congestion control algorithms that are provably robust. It rediscovered a recent congestion control algorithm that provably achieves high utilization and bounded delay under a challenging network model. It also found previously unknown variants of the algorithm that achieve different throughput-delay trade-offs.","PeriodicalId":339381,"journal":{"name":"Proceedings of the 21st ACM Workshop on Hot Topics in Networks","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128680813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Full-stack SDN 完整SDN
Pub Date : 2022-11-14 DOI: 10.1145/3563766.3564101
Debnil Sur, Ben Pfaff, L. Ryzhyk, M. Budiu
The conventional approach for building software-defined network systems requires separately developing the management, control, and data planes. Manually written code connects the management plane's configuration to the control plane, and the control plane generates the data planes' configurations as small program fragments that scatter across the codebase. Scalability and correctness become increasingly challenging as such a system develops and grows. In contrast, in our approach, called Nerpa, all three planes are programmed in a unified way. In Nerpa a transactional database stores management plane state. The control plane is implemented in a specialized query language which automatically executes in an incremental fashion, improving scalability. Finally, the data plane is programmed in P4. To aid correctness, all three parts are type-checked together, and tools generate code for data movement between planes. We have published a prototype implementation using an open-source license. We believe that full-stack SDN can build more robust and maintainable networked systems.
构建软件定义网络系统的传统方法需要分别开发管理平面、控制平面和数据平面。手工编写的代码将管理平面的配置连接到控制平面,控制平面将数据平面的配置生成为分散在代码库中的小程序片段。随着这种系统的发展和增长,可伸缩性和正确性变得越来越具有挑战性。相比之下,在我们的Nerpa方法中,所有三个平面都以统一的方式编程。在Nerpa中,事务数据库存储管理平面状态。控制平面是用一种专门的查询语言实现的,该语言以增量的方式自动执行,提高了可伸缩性。最后,在P4中对数据平面进行编程。为了提高正确性,这三个部分一起进行类型检查,工具生成用于在平面之间移动数据的代码。我们已经发布了一个使用开源许可的原型实现。我们相信全栈SDN可以构建更加健壮和可维护的网络系统。
{"title":"Full-stack SDN","authors":"Debnil Sur, Ben Pfaff, L. Ryzhyk, M. Budiu","doi":"10.1145/3563766.3564101","DOIUrl":"https://doi.org/10.1145/3563766.3564101","url":null,"abstract":"The conventional approach for building software-defined network systems requires separately developing the management, control, and data planes. Manually written code connects the management plane's configuration to the control plane, and the control plane generates the data planes' configurations as small program fragments that scatter across the codebase. Scalability and correctness become increasingly challenging as such a system develops and grows. In contrast, in our approach, called Nerpa, all three planes are programmed in a unified way. In Nerpa a transactional database stores management plane state. The control plane is implemented in a specialized query language which automatically executes in an incremental fashion, improving scalability. Finally, the data plane is programmed in P4. To aid correctness, all three parts are type-checked together, and tools generate code for data movement between planes. We have published a prototype implementation using an open-source license. We believe that full-stack SDN can build more robust and maintainable networked systems.","PeriodicalId":339381,"journal":{"name":"Proceedings of the 21st ACM Workshop on Hot Topics in Networks","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133879724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Load balancers need in-band feedback control 负载均衡器需要带内反馈控制
Pub Date : 2022-11-14 DOI: 10.1145/3563766.3564094
Bhavana Vannarth Shobhana, S. Narayana, B. Nath
Server load balancers (LBs) are critical components of interactive services, routing client requests to servers in a pool. LBs improve service performance and increase availability by spreading the request load evenly across servers. It is time to rethink what LBs can do for applications. As application compute becomes increasingly granular (e.g., microservices), request-processing latencies at servers will be ever more impacted by software and system variability at small time scales (e.g., 100μs-1ms). Beyond balancing load, we argue that LBs must actively optimize application response time, by adapting request-routing to quickly-varying server performance. Specifically, we advocate for in-band feedback control: LBs should adapt the request-routing policy using purely local observations of server performance, derived from requests traversing the LB. A key challenge to designing such feedback controllers is that high-speed LBs only see the requests, not the responses. We present the design of an LB that adapts to a server latency inflation of 1 ms and reduces tail latencies in milliseconds, while observing only client-to-server traffic.
服务器负载平衡器(lb)是交互式服务的关键组件,它将客户机请求路由到池中的服务器。LBs通过在服务器上均匀地分散请求负载来提高服务性能和可用性。是时候重新思考LBs能为应用程序做些什么了。随着应用程序计算变得越来越细粒度(例如,微服务),服务器上的请求处理延迟将受到小时间尺度(例如,100μs-1ms)的软件和系统可变性的影响。除了平衡负载之外,我们认为LBs必须通过调整请求路由来适应快速变化的服务器性能,从而积极优化应用程序响应时间。具体来说,我们提倡带内反馈控制:LB应该使用纯粹的本地服务器性能观察来适应请求路由策略,来自遍历LB的请求。设计这种反馈控制器的一个关键挑战是高速LB只看到请求,而不是响应。我们提出了一种LB的设计,它可以适应1毫秒的服务器延迟膨胀,并以毫秒为单位减少尾部延迟,同时只观察客户端到服务器的流量。
{"title":"Load balancers need in-band feedback control","authors":"Bhavana Vannarth Shobhana, S. Narayana, B. Nath","doi":"10.1145/3563766.3564094","DOIUrl":"https://doi.org/10.1145/3563766.3564094","url":null,"abstract":"Server load balancers (LBs) are critical components of interactive services, routing client requests to servers in a pool. LBs improve service performance and increase availability by spreading the request load evenly across servers. It is time to rethink what LBs can do for applications. As application compute becomes increasingly granular (e.g., microservices), request-processing latencies at servers will be ever more impacted by software and system variability at small time scales (e.g., 100μs-1ms). Beyond balancing load, we argue that LBs must actively optimize application response time, by adapting request-routing to quickly-varying server performance. Specifically, we advocate for in-band feedback control: LBs should adapt the request-routing policy using purely local observations of server performance, derived from requests traversing the LB. A key challenge to designing such feedback controllers is that high-speed LBs only see the requests, not the responses. We present the design of an LB that adapts to a server latency inflation of 1 ms and reduces tail latencies in milliseconds, while observing only client-to-server traffic.","PeriodicalId":339381,"journal":{"name":"Proceedings of the 21st ACM Workshop on Hot Topics in Networks","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132773336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Network can check itself: scaling data plane checking via distributed, on-device verification 网络可以自我检查:通过分布式、设备上的验证来扩展数据平面检查
Pub Date : 2022-11-14 DOI: 10.1145/3563766.3564095
Qiao Xiang, Ridi Wen, Che-Ling Huang, Yuxin Wang, Franck Le
Current data plane verification (DPV) tools employ a centralized architecture, where a server collects the data planes of all devices and verifies them. This architecture is inherently unscalable (i.e., requiring a reliable management network, incurring a long control path and making the server a single point of failure). In this paper, we tackle this scalability challenge of DPV from an architectural perspective. In particular, we circumvent the scalability bottleneck of centralized design and advocate for a distributed, on-device DPV framework. Our key insight is that DPV can be transformed into a counting problem on DAG, which can be naturally decomposed into lightweight tasks executed at network devices, enabling scalability. Evaluation shows that a prototype of this framework achieves scalable DPV under various settings, with little overhead on commodity network devices.
当前DPV (data plane verification)工具采用集中式架构,由一台服务器收集所有设备的数据平面并进行验证。这种架构本质上是不可扩展的(例如,需要一个可靠的管理网络,产生很长的控制路径,并使服务器成为单点故障)。在本文中,我们从体系结构的角度来解决DPV的可伸缩性挑战。特别是,我们绕过了集中式设计的可扩展性瓶颈,并倡导分布式设备上的DPV框架。我们的关键见解是,DPV可以转换为DAG上的计数问题,DAG可以自然地分解为在网络设备上执行的轻量级任务,从而实现可伸缩性。评估表明,该框架的原型在各种设置下实现了可扩展的DPV,在商用网络设备上的开销很小。
{"title":"Network can check itself: scaling data plane checking via distributed, on-device verification","authors":"Qiao Xiang, Ridi Wen, Che-Ling Huang, Yuxin Wang, Franck Le","doi":"10.1145/3563766.3564095","DOIUrl":"https://doi.org/10.1145/3563766.3564095","url":null,"abstract":"Current data plane verification (DPV) tools employ a centralized architecture, where a server collects the data planes of all devices and verifies them. This architecture is inherently unscalable (i.e., requiring a reliable management network, incurring a long control path and making the server a single point of failure). In this paper, we tackle this scalability challenge of DPV from an architectural perspective. In particular, we circumvent the scalability bottleneck of centralized design and advocate for a distributed, on-device DPV framework. Our key insight is that DPV can be transformed into a counting problem on DAG, which can be naturally decomposed into lightweight tasks executed at network devices, enabling scalability. Evaluation shows that a prototype of this framework achieves scalable DPV under various settings, with little overhead on commodity network devices.","PeriodicalId":339381,"journal":{"name":"Proceedings of the 21st ACM Workshop on Hot Topics in Networks","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130565076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
DIP
Pub Date : 2022-11-14 DOI: 10.1093/gmo/9781561592630.article.j122000
Ziqiang Wang, Zhuotao Liu, Xiaoliang Wang, Songtao Fu, Ke Xu
{"title":"DIP","authors":"Ziqiang Wang, Zhuotao Liu, Xiaoliang Wang, Songtao Fu, Ke Xu","doi":"10.1093/gmo/9781561592630.article.j122000","DOIUrl":"https://doi.org/10.1093/gmo/9781561592630.article.j122000","url":null,"abstract":"","PeriodicalId":339381,"journal":{"name":"Proceedings of the 21st ACM Workshop on Hot Topics in Networks","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132433074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The decoupling principle: a practical privacy framework 解耦原理:一个实用的隐私框架
Pub Date : 2022-11-14 DOI: 10.1145/3563766.3564112
Paul Schmitt, J. Iyengar, Christopher A. Wood, B. Raghavan
The three decade struggle to ensure Internet data confidentiality---a key aspect of communications privacy---is finally behind us. Encryption is fast, secure, and standard in all browsers, modern transports, and major protocols. Yet it has long seemed that network privacy is not unified by core principles but a grab bag of techniques and ideas applied to an equally wide range of applications, contexts, layers of infrastructure, and software stacks. Here we attempt to distill a principle---one that is old but seldom discussed as such---for building privacy into Internet services. We explore what privacy properties are desirable and achievable when we apply this principle. We evaluate several classic systems and ones that have been recently deployed with this principle applied, and discuss future directions for network privacy building upon these efforts.
确保互联网数据保密性(通信隐私的一个关键方面)的三十年斗争终于结束了。加密在所有浏览器、现代传输和主要协议中都是快速、安全且标准的。然而,长期以来,网络隐私似乎并没有统一的核心原则,而是应用于同样广泛的应用程序、环境、基础设施层和软件堆栈的技术和思想的大杂烩。在这里,我们试图提炼出一个原则——一个古老但很少被讨论的原则——将隐私构建到互联网服务中。当我们应用这一原则时,我们将探索什么样的隐私属性是可取的和可实现的。我们评估了几个经典系统和最近部署的应用此原则的系统,并讨论了在这些努力的基础上建立网络隐私的未来方向。
{"title":"The decoupling principle: a practical privacy framework","authors":"Paul Schmitt, J. Iyengar, Christopher A. Wood, B. Raghavan","doi":"10.1145/3563766.3564112","DOIUrl":"https://doi.org/10.1145/3563766.3564112","url":null,"abstract":"The three decade struggle to ensure Internet data confidentiality---a key aspect of communications privacy---is finally behind us. Encryption is fast, secure, and standard in all browsers, modern transports, and major protocols. Yet it has long seemed that network privacy is not unified by core principles but a grab bag of techniques and ideas applied to an equally wide range of applications, contexts, layers of infrastructure, and software stacks. Here we attempt to distill a principle---one that is old but seldom discussed as such---for building privacy into Internet services. We explore what privacy properties are desirable and achievable when we apply this principle. We evaluate several classic systems and ones that have been recently deployed with this principle applied, and discuss future directions for network privacy building upon these efforts.","PeriodicalId":339381,"journal":{"name":"Proceedings of the 21st ACM Workshop on Hot Topics in Networks","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115060447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Understanding host interconnect congestion 了解主机互连拥塞
Pub Date : 2022-11-14 DOI: 10.1145/3563766.3564110
Saksham Agarwal, R. Agarwal, Behnam Montazeri, M. Moshref, Khaled Elmeleegy, L. Rizzo, M. Kruijf, G. Kumar, S. Ratnasamy, D. Culler, A. Vahdat
We present evidence and characterization of host congestion in production clusters: adoption of high-bandwidth access links leading to emergence of bottlenecks within the host interconnect (NIC-to-CPU data path). We demonstrate that contention on existing IO memory management units and/or the memory subsystem can significantly reduce the available NIC-to-CPU bandwidth, resulting in hundreds of microseconds of queueing delays and eventual packet drops at hosts (even when running a state-of-the-art congestion control protocol that accounts for CPU-induced host congestion). We also discuss implications of host interconnect congestion to design of future host architecture, network stacks and network protocols.
我们提出了生产集群中主机拥塞的证据和特征:采用高带宽访问链接导致主机互连(nic到cpu数据路径)中出现瓶颈。我们证明,对现有IO内存管理单元和/或内存子系统的争用可以显著减少可用的nic到cpu带宽,导致数百微秒的排队延迟和主机上的最终数据包丢失(即使在运行最先进的拥塞控制协议时也是如此,该协议可以解释cpu引起的主机拥塞)。我们还讨论了主机互连拥塞对未来主机架构、网络栈和网络协议设计的影响。
{"title":"Understanding host interconnect congestion","authors":"Saksham Agarwal, R. Agarwal, Behnam Montazeri, M. Moshref, Khaled Elmeleegy, L. Rizzo, M. Kruijf, G. Kumar, S. Ratnasamy, D. Culler, A. Vahdat","doi":"10.1145/3563766.3564110","DOIUrl":"https://doi.org/10.1145/3563766.3564110","url":null,"abstract":"We present evidence and characterization of host congestion in production clusters: adoption of high-bandwidth access links leading to emergence of bottlenecks within the host interconnect (NIC-to-CPU data path). We demonstrate that contention on existing IO memory management units and/or the memory subsystem can significantly reduce the available NIC-to-CPU bandwidth, resulting in hundreds of microseconds of queueing delays and eventual packet drops at hosts (even when running a state-of-the-art congestion control protocol that accounts for CPU-induced host congestion). We also discuss implications of host interconnect congestion to design of future host architecture, network stacks and network protocols.","PeriodicalId":339381,"journal":{"name":"Proceedings of the 21st ACM Workshop on Hot Topics in Networks","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128836572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Efficient flow scheduling in distributed deep learning training with echelon formation 基于梯队编队的分布式深度学习训练的高效流调度
Pub Date : 2022-11-14 DOI: 10.1145/3563766.3564096
Rui Pan, Yiming Lei, Jialong Li, Zhiqiang Xie, Binhang Yuan, Yiting Xia
This paper discusses why flow scheduling does not apply to distributed deep learning training and presents EchelonFlow, the first network abstraction to bridge the gap. EchelonFlow deviates from the common belief that semantically related flows should finish at the same time. We reached the key observation, after extensive workflow analysis of diverse training paradigms, that distributed training jobs observe strict computation patterns, which may consume data at different times. We devise a generic method to model the drastically different computation patterns across training paradigms, and formulate EchelonFlow to regulate flow finish times accordingly. Case studies of mainstream training paradigms under EchelonFlow demonstrate the expressiveness of the abstraction, and our system sketch suggests the feasibility of an EchelonFlow scheduling system.
本文讨论了为什么流调度不适用于分布式深度学习训练,并提出了EchelonFlow,这是第一个弥合这一差距的网络抽象。EchelonFlow偏离了通常认为语义相关的流应该同时完成的观点。在对各种训练范例进行了广泛的工作流程分析之后,我们得出了一个关键的观察结果,即分布式训练作业遵循严格的计算模式,这些模式可能在不同的时间消耗数据。我们设计了一种通用的方法来模拟不同训练范式的计算模式,并制定了相应的EchelonFlow来调节流程完成时间。通过对EchelonFlow下主流培训模式的案例研究,证明了该抽象的可表达性,并且我们的系统草图表明了一个EchelonFlow调度系统的可行性。
{"title":"Efficient flow scheduling in distributed deep learning training with echelon formation","authors":"Rui Pan, Yiming Lei, Jialong Li, Zhiqiang Xie, Binhang Yuan, Yiting Xia","doi":"10.1145/3563766.3564096","DOIUrl":"https://doi.org/10.1145/3563766.3564096","url":null,"abstract":"This paper discusses why flow scheduling does not apply to distributed deep learning training and presents EchelonFlow, the first network abstraction to bridge the gap. EchelonFlow deviates from the common belief that semantically related flows should finish at the same time. We reached the key observation, after extensive workflow analysis of diverse training paradigms, that distributed training jobs observe strict computation patterns, which may consume data at different times. We devise a generic method to model the drastically different computation patterns across training paradigms, and formulate EchelonFlow to regulate flow finish times accordingly. Case studies of mainstream training paradigms under EchelonFlow demonstrate the expressiveness of the abstraction, and our system sketch suggests the feasibility of an EchelonFlow scheduling system.","PeriodicalId":339381,"journal":{"name":"Proceedings of the 21st ACM Workshop on Hot Topics in Networks","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122706130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Congestion control in machine learning clusters 机器学习集群中的拥塞控制
Pub Date : 2022-11-14 DOI: 10.1145/3563766.3564115
S. Rajasekaran, M. Ghobadi, Gautam Kumar, Aditya Akella
This paper argues that fair-sharing, the holy grail of congestion control algorithms for decades, is not necessarily a desirable property in Machine Learning (ML) training clusters. We demonstrate that for a specific combination of jobs, introducing unfairness improves the training time for all competing jobs. We call this specific combination of jobs compatible and define the compatibility criterion using a novel geometric abstraction. Our abstraction rolls time around a circle and rotates the communication phases of jobs to identify fully compatible jobs. Using this abstraction, we demonstrate up to 1.3× improvement in the average training iteration time of popular ML models. We advocate that resource management algorithms should take job compatibility on network links into account. We then propose three directions to ameliorate the impact of network congestion in ML training clusters: (i) an adaptively unfair congestion control scheme, (ii) priority queues on switches, and (iii) precise flow scheduling.
本文认为,公平共享,几十年来拥塞控制算法的圣杯,并不一定是机器学习(ML)训练集群的理想属性。我们证明,对于特定的工作组合,引入不公平可以提高所有竞争工作的培训时间。我们将这种特定的作业组合称为相容的,并使用一种新的几何抽象来定义相容标准。我们的抽象将时间绕圈旋转,并旋转作业的通信阶段,以识别完全兼容的作业。使用这种抽象,我们证明了流行ML模型的平均训练迭代时间提高了1.3倍。我们主张资源管理算法应考虑网络链路上的作业兼容性。然后,我们提出了三个方向来改善ML训练集群中网络拥塞的影响:(i)自适应不公平拥塞控制方案,(ii)交换机上的优先队列,以及(iii)精确的流量调度。
{"title":"Congestion control in machine learning clusters","authors":"S. Rajasekaran, M. Ghobadi, Gautam Kumar, Aditya Akella","doi":"10.1145/3563766.3564115","DOIUrl":"https://doi.org/10.1145/3563766.3564115","url":null,"abstract":"This paper argues that fair-sharing, the holy grail of congestion control algorithms for decades, is not necessarily a desirable property in Machine Learning (ML) training clusters. We demonstrate that for a specific combination of jobs, introducing unfairness improves the training time for all competing jobs. We call this specific combination of jobs compatible and define the compatibility criterion using a novel geometric abstraction. Our abstraction rolls time around a circle and rotates the communication phases of jobs to identify fully compatible jobs. Using this abstraction, we demonstrate up to 1.3× improvement in the average training iteration time of popular ML models. We advocate that resource management algorithms should take job compatibility on network links into account. We then propose three directions to ameliorate the impact of network congestion in ML training clusters: (i) an adaptively unfair congestion control scheme, (ii) priority queues on switches, and (iii) precise flow scheduling.","PeriodicalId":339381,"journal":{"name":"Proceedings of the 21st ACM Workshop on Hot Topics in Networks","volume":"IA-15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126557557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Bringing wifi localization to any wifi devices 将wifi本地化到任何wifi设备
Pub Date : 2022-11-14 DOI: 10.1145/3563766.3564090
Tianxiang Li, Haofan Lu, Reza Rezvani, A. Abedi, Omid Salehi-Abari
Recent years have seen significant advances in WiFi Localization. However, existing systems require either multiple access points to cooperate with each other or a single access point to have multiple antennas and transceiver chains. Therefore, they cannot be integrated into most IoT WiFi chipsets which have only a single transceiver chain. This paper presents WiSight, a novel approach to bringing WiFi localization to any WiFi devices, especially those with a single RF chain. We propose a WiFi antenna design and use the inherent properties of the 802.11 protocol to measure Angle-of-Arrival (AoA) and Time-of-Flight (ToF) using a single transceiver chain. Our proof-of-concept simulation and real world experiments promise the feasibility of this approach.
近年来,WiFi本地化取得了重大进展。然而,现有系统要么需要多个接入点相互协作,要么需要单个接入点具有多个天线和收发器链。因此,它们不能集成到大多数只有单个收发器链的IoT WiFi芯片组中。本文介绍了WiSight,一种将WiFi定位到任何WiFi设备的新方法,特别是那些具有单个射频链的设备。我们提出了一种WiFi天线设计,并利用802.11协议的固有特性,使用单个收发器链来测量到达角(AoA)和飞行时间(ToF)。我们的概念验证模拟和现实世界的实验保证了这种方法的可行性。
{"title":"Bringing wifi localization to any wifi devices","authors":"Tianxiang Li, Haofan Lu, Reza Rezvani, A. Abedi, Omid Salehi-Abari","doi":"10.1145/3563766.3564090","DOIUrl":"https://doi.org/10.1145/3563766.3564090","url":null,"abstract":"Recent years have seen significant advances in WiFi Localization. However, existing systems require either multiple access points to cooperate with each other or a single access point to have multiple antennas and transceiver chains. Therefore, they cannot be integrated into most IoT WiFi chipsets which have only a single transceiver chain. This paper presents WiSight, a novel approach to bringing WiFi localization to any WiFi devices, especially those with a single RF chain. We propose a WiFi antenna design and use the inherent properties of the 802.11 protocol to measure Angle-of-Arrival (AoA) and Time-of-Flight (ToF) using a single transceiver chain. Our proof-of-concept simulation and real world experiments promise the feasibility of this approach.","PeriodicalId":339381,"journal":{"name":"Proceedings of the 21st ACM Workshop on Hot Topics in Networks","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128959621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings of the 21st ACM Workshop on Hot Topics in Networks
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1