In RDMA (Remote Direct Memory Access) networks, end-host networks, including intra-host networks and RNICs (RDMA NIC), were considered robust and have received little attention. However, as the RNIC line rate rapidly increases to multi-hundred gigabits, the intra-host network becomes a potential performance bottleneck for network applications. Intra-host network bottlenecks can result in degraded intra-host bandwidth and increased intra-host latency. In addition, RNIC network problems can result in connection failures and packet drops. Host network problems can severely degrade network performance. However, when host network problems occur, they can hardly be noticed due to the lack of a monitoring system. Furthermore, existing diagnostic mechanisms cannot efficiently diagnose host network problems. In this paper, we analyze the symptom of host network problems based on our long-term troubleshooting experience and propose Hostping, the first monitoring and diagnostic system dedicated to host networks. The core idea of Hostping is to conduct 1) loopback tests between RNICs and endpoints within the host to measure intra-host latency and bandwidth, and 2) mutual probing between RNICs on a host to measure RNIC connectivity. We have deployed Hostping on thousands of servers in our distributed machine learning system. Not only can Hostping detect and diagnose host network problems we already knew in minutes, but it also reveals eight problems we did not notice before.
{"title":"Diagnosing End-Host Network Bottlenecks in RDMA Servers","authors":"Kefei Liu;Jiao Zhang;Zhuo Jiang;Haoran Wei;Xiaolong Zhong;Lizhuang Tan;Tian Pan;Tao Huang","doi":"10.1109/TNET.2024.3416419","DOIUrl":"10.1109/TNET.2024.3416419","url":null,"abstract":"In RDMA (Remote Direct Memory Access) networks, end-host networks, including intra-host networks and RNICs (RDMA NIC), were considered robust and have received little attention. However, as the RNIC line rate rapidly increases to multi-hundred gigabits, the intra-host network becomes a potential performance bottleneck for network applications. Intra-host network bottlenecks can result in degraded intra-host bandwidth and increased intra-host latency. In addition, RNIC network problems can result in connection failures and packet drops. Host network problems can severely degrade network performance. However, when host network problems occur, they can hardly be noticed due to the lack of a monitoring system. Furthermore, existing diagnostic mechanisms cannot efficiently diagnose host network problems. In this paper, we analyze the symptom of host network problems based on our long-term troubleshooting experience and propose Hostping, the first monitoring and diagnostic system dedicated to host networks. The core idea of Hostping is to conduct 1) loopback tests between RNICs and endpoints within the host to measure intra-host latency and bandwidth, and 2) mutual probing between RNICs on a host to measure RNIC connectivity. We have deployed Hostping on thousands of servers in our distributed machine learning system. Not only can Hostping detect and diagnose host network problems we already knew in minutes, but it also reveals eight problems we did not notice before.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4302-4316"},"PeriodicalIF":3.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-15DOI: 10.1109/TNET.2024.3425652
Jianmin Liu;Dan Li;Yongjun Xu
Real-time applications that require timely data delivery over wireless multi-hop networks within specified deadlines are growing increasingly. Effective routing protocols that can guarantee real-time QoS are crucial, yet challenging, due to the unpredictable variations in end-to-end delay caused by unreliable wireless channels. In such conditions, the upper bound on the end-to-end delay, i.e., worst-case end-to-end delay, should be guaranteed within the deadline. However, existing routing protocols with guaranteed delay bounds cannot strictly guarantee real-time QoS because they assume that the worst-case end-to-end delay is known and ignore the impact of routing policies on the worst-case end-to-end delay determination. In this paper, we relax this assumption and propose DDRL-ARGB, an Adaptive Routing with Guaranteed delay Bounds using Deep Distributional Reinforcement Learning (DDRL). DDRL-ARGB adopts DDRL to jointly determine the worst-case end-to-end delay and learn routing policies. To accurately determine worst-case end-to-end delay, DDRL-ARGB employs a quantile regression deep Q-network to learn the end-to-end delay cumulative distribution. To guarantee real-time QoS, DDRL-ARGB optimizes routing decisions under the constraint of worst-case end-to-end delay within the deadline. To improve traffic congestion, DDRL-ARGB considers the network congestion status when making routing decisions. Extensive results show that DDRL-ARGB can accurately calculate worst-case end-to-end delay, and can strictly guarantee real-time QoS under a small tolerant violation probability against two state-of-the-art routing protocols.
{"title":"Deep Distributional Reinforcement Learning-Based Adaptive Routing With Guaranteed Delay Bounds","authors":"Jianmin Liu;Dan Li;Yongjun Xu","doi":"10.1109/TNET.2024.3425652","DOIUrl":"10.1109/TNET.2024.3425652","url":null,"abstract":"Real-time applications that require timely data delivery over wireless multi-hop networks within specified deadlines are growing increasingly. Effective routing protocols that can guarantee real-time QoS are crucial, yet challenging, due to the unpredictable variations in end-to-end delay caused by unreliable wireless channels. In such conditions, the upper bound on the end-to-end delay, i.e., worst-case end-to-end delay, should be guaranteed within the deadline. However, existing routing protocols with guaranteed delay bounds cannot strictly guarantee real-time QoS because they assume that the worst-case end-to-end delay is known and ignore the impact of routing policies on the worst-case end-to-end delay determination. In this paper, we relax this assumption and propose DDRL-ARGB, an Adaptive Routing with Guaranteed delay Bounds using Deep Distributional Reinforcement Learning (DDRL). DDRL-ARGB adopts DDRL to jointly determine the worst-case end-to-end delay and learn routing policies. To accurately determine worst-case end-to-end delay, DDRL-ARGB employs a quantile regression deep Q-network to learn the end-to-end delay cumulative distribution. To guarantee real-time QoS, DDRL-ARGB optimizes routing decisions under the constraint of worst-case end-to-end delay within the deadline. To improve traffic congestion, DDRL-ARGB considers the network congestion status when making routing decisions. Extensive results show that DDRL-ARGB can accurately calculate worst-case end-to-end delay, and can strictly guarantee real-time QoS under a small tolerant violation probability against two state-of-the-art routing protocols.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 6","pages":"4692-4706"},"PeriodicalIF":3.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-15DOI: 10.1109/TNET.2024.3422264
Yichen Ruan;Xiaoxi Zhang;Carlee Joe-Wong
Federated learning allows distributed clients to train a shared machine learning model while preserving user privacy. In this framework, user devices (i.e., clients) perform local iterations of the learning algorithm on their data. These updates are periodically aggregated to form a shared model. Thus, a client represents the bundle of the user data, the device, and the user’s willingness to participate: since participating in federated learning requires clients to expend resources and reveal some information about their data, users may require some form of compensation to contribute to the training process. Recruiting more users generally results in higher accuracy, but slower completion time and higher cost. We propose the first work to theoretically analyze the resulting performance tradeoffs in deciding which clients to recruit for the federated learning algorithm. Our framework accounts for both accuracy (training and testing) and efficiency (completion time and cost) metrics. We provide solutions to this NP-Hard optimization problem and verify the value of client recruitment in experiments on synthetic and real-world data. The results of this work can serve as a guideline for the real-world deployment of federated learning and an initial investigation of the client recruitment problem.
{"title":"How Valuable is Your Data? Optimizing Client Recruitment in Federated Learning","authors":"Yichen Ruan;Xiaoxi Zhang;Carlee Joe-Wong","doi":"10.1109/TNET.2024.3422264","DOIUrl":"10.1109/TNET.2024.3422264","url":null,"abstract":"Federated learning allows distributed clients to train a shared machine learning model while preserving user privacy. In this framework, user devices (i.e., clients) perform local iterations of the learning algorithm on their data. These updates are periodically aggregated to form a shared model. Thus, a client represents the bundle of the user data, the device, and the user’s willingness to participate: since participating in federated learning requires clients to expend resources and reveal some information about their data, users may require some form of compensation to contribute to the training process. Recruiting more users generally results in higher accuracy, but slower completion time and higher cost. We propose the first work to theoretically analyze the resulting performance tradeoffs in deciding which clients to recruit for the federated learning algorithm. Our framework accounts for both accuracy (training and testing) and efficiency (completion time and cost) metrics. We provide solutions to this NP-Hard optimization problem and verify the value of client recruitment in experiments on synthetic and real-world data. The results of this work can serve as a guideline for the real-world deployment of federated learning and an initial investigation of the client recruitment problem.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4207-4221"},"PeriodicalIF":3.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data generated at the network edge can be processed locally by leveraging the emerging technology of Federated Learning (FL). However, non-IID local data will lead to degradation of model accuracy and the heterogeneity of edge nodes inevitably slows down model training efficiency. Moreover, to avoid the potential communication bottleneck in the parameter-server-based FL, we concentrate on the Decentralized Federated Learning (DFL) that performs distributed model training in Peer-to-Peer (P2P) manner. To address these challenges, we propose an asynchronous DFL system by incorporating neighbor selection and gradient push, termed AsyDFL. Specifically, we require each edge node to push gradients only to a subset of neighbors for resource efficiency. Herein, we first give a theoretical convergence analysis of AsyDFL under the complicated non-IID and heterogeneous scenario, and further design a priority-based algorithm to dynamically select neighbors for each edge node so as to achieve the trade-off between communication cost and model performance. We evaluate the performance of AsyDFL through extensive experiments on a physical platform with 30 NVIDIA Jetson edge devices. Evaluation results show that AsyDFL can reduce the communication cost by 57% and the completion time by about 35% for achieving the same test accuracy, and improve model accuracy by at least 6% under the non-IID scenario, compared to the baselines.
{"title":"Asynchronous Decentralized Federated Learning for Heterogeneous Devices","authors":"Yunming Liao;Yang Xu;Hongli Xu;Min Chen;Lun Wang;Chunming Qiao","doi":"10.1109/TNET.2024.3424444","DOIUrl":"10.1109/TNET.2024.3424444","url":null,"abstract":"Data generated at the network edge can be processed locally by leveraging the emerging technology of Federated Learning (FL). However, non-IID local data will lead to degradation of model accuracy and the heterogeneity of edge nodes inevitably slows down model training efficiency. Moreover, to avoid the potential communication bottleneck in the parameter-server-based FL, we concentrate on the Decentralized Federated Learning (DFL) that performs distributed model training in Peer-to-Peer (P2P) manner. To address these challenges, we propose an asynchronous DFL system by incorporating neighbor selection and gradient push, termed AsyDFL. Specifically, we require each edge node to push gradients only to a subset of neighbors for resource efficiency. Herein, we first give a theoretical convergence analysis of AsyDFL under the complicated non-IID and heterogeneous scenario, and further design a priority-based algorithm to dynamically select neighbors for each edge node so as to achieve the trade-off between communication cost and model performance. We evaluate the performance of AsyDFL through extensive experiments on a physical platform with 30 NVIDIA Jetson edge devices. Evaluation results show that AsyDFL can reduce the communication cost by 57% and the completion time by about 35% for achieving the same test accuracy, and improve model accuracy by at least 6% under the non-IID scenario, compared to the baselines.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4535-4550"},"PeriodicalIF":3.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141722383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-11DOI: 10.1109/TNET.2024.3423780
Tung-Anh Nguyen;Long Tan Le;Tuan Dung Nguyen;Wei Bao;Suranga Seneviratne;Choong Seon Hong;Nguyen H. Tran
With the proliferation of the Internet of Things (IoT) and the rising interconnectedness of devices, network security faces significant challenges, especially from anomalous activities. While traditional machine learning-based intrusion detection systems (ML-IDS) effectively employ supervised learning methods, they possess limitations such as the requirement for labeled data and challenges with high dimensionality. Recent unsupervised ML-IDS approaches such as AutoEncoders and Generative Adversarial Networks (GAN) offer alternative solutions but pose challenges in deployment onto resource-constrained IoT devices and in interpretability. To address these concerns, this paper proposes a novel federated unsupervised anomaly detection framework – FedPCA – that leverages Principal Component Analysis (PCA) and the Alternating Directions Method Multipliers (ADMM) to learn common representations of distributed non-i.i.d. datasets. Building on the FedPCA framework, we propose two algorithms, FedPE in Euclidean space and FedPG on Grassmann manifolds. Our approach enables real-time threat detection and mitigation at the device level, enhancing network resilience while ensuring privacy. Moreover, the proposed algorithms are accompanied by theoretical convergence rates even under a sub-sampling scheme, a novel result. Experimental results on the UNSW-NB15 and TON-IoT datasets show that our proposed methods offer performance in anomaly detection comparable to non-linear baselines, while providing significant improvements in communication and memory efficiency, underscoring their potential for securing IoT networks.
{"title":"Federated PCA on Grassmann Manifold for IoT Anomaly Detection","authors":"Tung-Anh Nguyen;Long Tan Le;Tuan Dung Nguyen;Wei Bao;Suranga Seneviratne;Choong Seon Hong;Nguyen H. Tran","doi":"10.1109/TNET.2024.3423780","DOIUrl":"10.1109/TNET.2024.3423780","url":null,"abstract":"With the proliferation of the Internet of Things (IoT) and the rising interconnectedness of devices, network security faces significant challenges, especially from anomalous activities. While traditional machine learning-based intrusion detection systems (ML-IDS) effectively employ supervised learning methods, they possess limitations such as the requirement for labeled data and challenges with high dimensionality. Recent unsupervised ML-IDS approaches such as AutoEncoders and Generative Adversarial Networks (GAN) offer alternative solutions but pose challenges in deployment onto resource-constrained IoT devices and in interpretability. To address these concerns, this paper proposes a novel federated unsupervised anomaly detection framework – FedPCA – that leverages Principal Component Analysis (PCA) and the Alternating Directions Method Multipliers (ADMM) to learn common representations of distributed non-i.i.d. datasets. Building on the FedPCA framework, we propose two algorithms, FedPE in Euclidean space and FedPG on Grassmann manifolds. Our approach enables real-time threat detection and mitigation at the device level, enhancing network resilience while ensuring privacy. Moreover, the proposed algorithms are accompanied by theoretical convergence rates even under a sub-sampling scheme, a novel result. Experimental results on the UNSW-NB15 and TON-IoT datasets show that our proposed methods offer performance in anomaly detection comparable to non-linear baselines, while providing significant improvements in communication and memory efficiency, underscoring their potential for securing IoT networks.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4456-4471"},"PeriodicalIF":3.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the key techniques for future wireless network is full-duplex-enabled millimeter wave integrated access and backhaul network underlaying device-to-device communication, which is a 3GPP-inspired comprehensive paradigm for higher spectral efficiency and lower latency. However, the multi-user interference (MUI) and residual self-interference (RSI) become the major bottleneck before the commercial application of the system. To this end, we investigate the sub-channel allocation problem for this networking paradigm. To maximize the overall achievable rate under the considerations of MUI and RSI, the sub-channel allocation problem is firstly formulated as an integer nonlinear programming problem, which is intractable to search an optimal solution in polynomial time. Secondly, a coalition formation based sub-channel allocation (CFSA) algorithm is proposed, where the final partition of the sub-channel coalition is iteratively formed by the concurrent link players according to the two defined switching criterions. Thirdly, the properties of the proposed CFSA algorithm are analyzed from the perspectives of Nash stability and uniform convergence. Fourthly, the proposed CFSA algorithm is compared with other reference algorithms through abundant simulations, and superiorities including effectiveness, convergence and sub-optimality of the proposed CFSA algorithm are demonstrated through the kernel indicators.
未来无线网络的关键技术之一是全双工毫米波集成接入和回程网络,下层是设备到设备通信,这是一种受 3GPP 启发的综合范例,可实现更高的频谱效率和更低的延迟。然而,多用户干扰(MUI)和残余自干扰(RSI)成为该系统商业应用前的主要瓶颈。为此,我们研究了这种网络范例的子信道分配问题。为了在考虑 MUI 和 RSI 的情况下最大化总体可实现速率,首先将子信道分配问题表述为整数非线性编程问题,该问题难以在多项式时间内找到最优解。其次,提出了一种基于联盟形成的子信道分配(CFSA)算法,由并发链路参与者根据两个定义的切换标准迭代形成子信道联盟的最终分区。第三,从纳什稳定性和均匀收敛性角度分析了所提 CFSA 算法的特性。第四,通过大量仿真将所提出的CFSA算法与其他参考算法进行比较,通过内核指标证明了所提出的CFSA算法的有效性、收敛性和次优性等优越性。
{"title":"Coalition Formation-Based Sub-Channel Allocation in Full-Duplex-Enabled mmWave IABN With D2D","authors":"Zhongyu Ma;Yajing Wang;Zijun Wang;Guangjie Han;Zhanjun Hao;Qun Guo","doi":"10.1109/TNET.2024.3423775","DOIUrl":"10.1109/TNET.2024.3423775","url":null,"abstract":"One of the key techniques for future wireless network is full-duplex-enabled millimeter wave integrated access and backhaul network underlaying device-to-device communication, which is a 3GPP-inspired comprehensive paradigm for higher spectral efficiency and lower latency. However, the multi-user interference (MUI) and residual self-interference (RSI) become the major bottleneck before the commercial application of the system. To this end, we investigate the sub-channel allocation problem for this networking paradigm. To maximize the overall achievable rate under the considerations of MUI and RSI, the sub-channel allocation problem is firstly formulated as an integer nonlinear programming problem, which is intractable to search an optimal solution in polynomial time. Secondly, a coalition formation based sub-channel allocation (CFSA) algorithm is proposed, where the final partition of the sub-channel coalition is iteratively formed by the concurrent link players according to the two defined switching criterions. Thirdly, the properties of the proposed CFSA algorithm are analyzed from the perspectives of Nash stability and uniform convergence. Fourthly, the proposed CFSA algorithm is compared with other reference algorithms through abundant simulations, and superiorities including effectiveness, convergence and sub-optimality of the proposed CFSA algorithm are demonstrated through the kernel indicators.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4503-4518"},"PeriodicalIF":3.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-09DOI: 10.1109/TNET.2024.3423000
Xiaoxue Zhang;Chen Qian
Payment channel networks (PCNs) have been designed and utilized to address the scalability challenge and throughput limitation of blockchains. It provides a high-throughput solution for blockchain-based payment systems. However, such “layer-2” blockchain solutions have their own problems: payment channels require a separate deposit for each channel of two users. Thus it significantly locks funds from users into particular channels without the flexibility of moving these funds across channels. In this paper, we proposed Aggregated Payment Channel Network (APCN), in which flexible funds are used as a per-user basis instead of a per-channel basis. To prevent users from misbehaving such as double-spending, APCN includes mechanisms that make use of hardware trusted execution environments (TEEs) to control funds, balances, and payments. The distributed routing protocol in APCN also addresses the congestion problem to further improve resource utilization. Our prototype implementation and simulation results show that APCN achieves significant improvements on transaction success ratio with low routing latency, compared to even the most advanced PCN routing.
{"title":"Toward Aggregated Payment Channel Networks","authors":"Xiaoxue Zhang;Chen Qian","doi":"10.1109/TNET.2024.3423000","DOIUrl":"10.1109/TNET.2024.3423000","url":null,"abstract":"Payment channel networks (PCNs) have been designed and utilized to address the scalability challenge and throughput limitation of blockchains. It provides a high-throughput solution for blockchain-based payment systems. However, such “layer-2” blockchain solutions have their own problems: payment channels require a separate deposit for each channel of two users. Thus it significantly locks funds from users into particular channels without the flexibility of moving these funds across channels. In this paper, we proposed Aggregated Payment Channel Network (APCN), in which flexible funds are used as a per-user basis instead of a per-channel basis. To prevent users from misbehaving such as double-spending, APCN includes mechanisms that make use of hardware trusted execution environments (TEEs) to control funds, balances, and payments. The distributed routing protocol in APCN also addresses the congestion problem to further improve resource utilization. Our prototype implementation and simulation results show that APCN achieves significant improvements on transaction success ratio with low routing latency, compared to even the most advanced PCN routing.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4333-4348"},"PeriodicalIF":3.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-09DOI: 10.1109/TNET.2024.3424337
Balázs Vass;Erika R. Bérczi-Kovács;Ádám Fraknói;Costin Raiciu;Gábor Rétvári
P4 is a widely used Domain-specific Language for Programmable Data Planes. A critical step in P4 compilation is finding a feasible and efficient mapping of the high-level P4 source code constructs to the physical resources exposed by the underlying hardware, while meeting data and control flow dependencies in the program. In this paper, we take a new look at the algorithmic aspects of this problem, with the motivation to understand the fundamental theoretical limits and obtain better P4 pipeline embeddings, and to speed up practical P4 compilation times for RMT and dRMT target architectures. We report mixed results: we find that P4 compilation is computationally hard even in a severely relaxed formulation, and there is no polynomial-time approximation of arbitrary precision (unless $mathcal {P}$