Throughput and latency critical applications could often benefit of performing computations close to the client. To enable this, distributed computing paradigms such as edge computing have recently emerged. However, with the advent of programmable data planes, computations cannot only be performed by servers but they can be offloaded to network switches. Languages like P4 enable to flexibly reprogram the entire packet processing pipeline. Though these devices promise high throughput and ultra-low response times, implementing application-layer tasks in the data plane programming language P4 is still challenging for an application developer who is not familiar with networking domain. In this paper, we first identify and examine obstacles and pain points one can experience when offloading server-based computations to the network. Then we present P4rrot, a code generator (in form of a library) which allows to overcome these limitations by providing a user-friendly API to describe computations to be offloaded. After discussing the design choices behind P4rrot, we introduce our proof-of-concept implementation for two P4 targets: Netronome SmartNIC and BMv2. To demonstrate the applicability of P4rrot, we investigate case studies in the context of publish-subscribe sensor data processing and real-time data streaming, supporting, in particular, MQTT-SN and MoldUDP packets.
To mitigate IPv4 exhaustion, IPv6 provides expanded address space, and NAT allows a single public IPv4 address to suffice for many devices assigned private IPv4 address space. Even though NAT has greatly extended the shelf-life of IPv4, some networks need more private IPv4 space than what is officially allocated by IANA due to their size and/or network management practices. Some of these networks resort to using squat space, a term the network operations community uses for large public IPv4 address blocks allocated to organizations but historically never announced to the Internet. While squatting of IP addresses is an open secret, it introduces ethical, legal, and technical problems. In this work we examine billions of traceroutes to identify thousands of organizations squatting. We examine how they are using it and what happened when the US Department of Defense suddenly started announcing what had traditionally been squat space. In addition to shining light on a dirty secret of operational practices, our paper shows that squatting distorts common Internet measurement methodologies, which we argue have to be re-examined to account for squat space.
Packet-processing data planes have been continuously enhanced in performance over the last few years to the point that, nowadays, they are increasingly implemented in hardware (i.e., in SmartNICs and programmable switches). However, little attention is given to the slow path residing between the data plane and the control plane, as it is not typically considered performance-critical.
In this paper, we show that the slow path is set to become a new key bottleneck in Software-Defined Networks (SDNs). This is due to the growth in physical network bandwidth (200 Gbps is becoming common in data centers) and topological complexity (e.g., virtual switches now span hundreds of physical machines). We present our vision of a new Domain Specific Accelerator (DSA) for the slow path at the end host that sits between the hardware-offloaded data plane and the logically-centralized control plane. We discuss open problems in this domain and call on the networking community to creatively address this emerging issue.
The extended Berkeley Packet Filter (eBPF) is an infrastructure that allows to dynamically load and run micro-programs directly in the Linux kernel without recompiling it.
In this work, we study how to develop high-performance network measurements in eBPF. We take sketches as case-study, given their ability to support a wide-range of tasks while providing low-memory footprint and accuracy guarantees. We implemented NitroSketch, the state-of-the-art sketch for user-space networking and show that best practices in user-space networking cannot be directly applied to eBPF, because of its different performance characteristics. By applying our lesson learned we improve its performance by 40% compared to a naive implementation.
Telecommunication operators are massively moving their network functions in small data centers at the edge of the network, which are becoming increasingly common. However, the high performance provided by commonly used technologies for data plane processing such as DPDK, based on kernel-bypass primitives, comes at the cost of rigid resource partitioning. This is unsuitable for edge data centers, in which efficiency demands both general-purpose applications and data-plane telco workloads to be executed on the same (shared) physical machines. In this respect, eBPF/XDP looks a more appealing solution, thanks to its capability to process packets in the kernel, achieving a higher level of integration with non-data plane applications albeit with lower performance than DPDK. In this paper we leverage the recent introduction of AF_XDP, an XDP-based technology that allows to efficiently steer packets in user space, to provide a thorough comparison of user space vs in-kernel packet processing in typical scenarios of a data center at the edge of the network. Our results provide useful insights on how to select and combine these technologies in order to improve overall throughput and optimize resource usage.