NIC-Level Co-Processors for Resilient Coded Networking and Computation
|Scientists:||Sebastian Gallenmüller, Manuel Simon, M.Sc., Henning Stubbe, Kilian Holzinger, Stefan Lachnit, M.Sc., Prof. Dr.-Ing. Georg Carle|
|Duration:||17.02.2023 – 16.02.2026|
|Funding:||DFG SPP 2378|
Important for providing resilient time-sensitive services is the availability of resilience mechanisms and high packet processing performance at the right location, thereby avoiding overloaded network and host components. Hardware offloading by NIC-level (Network Interface Card) co-processors enables resilient, low-latency computation and can help to free scarce CPU resources.
In the HYPERNIC project, the Chair of Integrated Systems (LIS) and Chair of Network Architectures and Services (NET) of the Technical University of Munich (TUM) propose a novel communication and computation approach that is resilient against potential attacks and partial network failures. We plan to investigate mechanisms and processing platforms that provide resilience efficiently and flexibly. We aim to design a novel class of NICs with processing capabilities that employ techniques such as network coding, low-latency packet retransmissions, and fault-tolerant algorithms.
The joint project work consists of a hybrid software/hardware stack. The software layers will ensure the necessary flexibility (through programmability) where the latency of the HYPERNIC functions is not the prior concern. The hardware layer will devise co-processors and wire-rate data processing pipelines where low, deterministic latencies are required. Thus, the two project partners exploit their complementary and proven competencies in the fields of network architectures, protocols, and network processing engines. This merger of skills and expertise will allow the theoretical, methodological, architectural, and practical investigations.
In the project the two project partners will combine the complementary research foci of their respective research groups. Our chair has a strong background in measurements and modeling of programmable packet processing systems, long-term dedication to measurement-driven research, sophisticated testbed infrastructure and extensive measurement facilities. LIS contributes various FPGA boards that allow low-level integration of the investigated resilience mechanisms. Besides hardware acceleration, LIS has a research focus on highly reliable, high-bandwidth networks. Both the research areas and the testbed facilities are complementary. This allows the theoretical and practical investigation of the entire networking stack starting at the NIC and its offloading capabilities, management of the NIC utilizing network drivers, and processing in software up to the application.
Structure of Work
The project is structured in three work areas, as follows.
Work Area A
The first WP investigates fundamental redundancy mechanisms that allow using multiple independent paths through measures on a protocol level, e.g., packet-level duplication of traffic. More efficient ways to introduce redundancy into the network communication, such as NC, are also investigated. Finally, we investigate methods for resilient computation. There, we want to investigate frameworks that allow low-latency replication of state across network nodes reliably and securely.
Work Area B
The first part investigates software implementations of different redundancy schemes. This WP is concerned with the analysis and design, the actual implementation and optimizations. The second part investigates the applicability of hardware offloading for the mechanisms investigated in Workarea A. Offloading specific functions to hardware helps to improve the bandwidth, lowers latency, and avoids jitter. We plan to assess these potential benefits by measurements. Third, we implement resilient computation. Measurements that help to identify possible bottlenecks are used to plan for improvements by targeted hardware acceleration.
Work Area C
Work Area C is dedicated to establish the testbed facilities to perform experiments, with a special focus to support cooperation with the research community. We plan to establish a federated testbed between our two research groups to enable distributed experiments under a common experiment framework. The results created in the federated testbed will be used for modeling of our results, and for being able to predict properties like maximum throughput or worst case latencies of a larger range of systems, scenarios and configurations. The testbed facilities and the investigated techniques will also be made available to other members of the research community, especially to members of the priority programme. A federated testbed with further research groups can be established to extend the capabilities of the original testbeds.
|2023-06-01||Henning Stubbe, Sebastian Gallenmüller, Manuel Simon, Eric Hauser, Dominik Scholz, Georg Carle, “Keeping Up to Date With P4Runtime: An Analysis of Data Plane Updates on P4 Switches,” in International Federation for Information Processing (IFIP) Networking 2023 Conference (IFIP Networking 2023), Barcelona, Spain, Jun. 2023, p. 9. [Pdf] [Bib]|
|2023-06-01||Manuel Simon, Sebastian Gallenmüller, Georg Carle, “Never Miss Twice - Add-On-Miss Table Updates in Software Data Planes,” in KuVS Fachgespräch - Würzburg Workshop on Modeling, Analysis and Simulation of Next-Generation Communication Networks 2023 (WueWoWAS’23), Würzburg, Germany, Jun. 2023, p. 5. Best Workshop Contribution [Pdf] [Slides] [DOI] [Bib]|
Finished student theses
|Leon Krix||On-The-Fly Network Erasure Coding Protocol for Delay and Loss-Sensitive Data||BA||Henning Stubbe, Kilian Holzinger||2023|
|Luca Otting||Improving QUIC with User Space Networking||BA||Kilian Holzinger, Benedikt Jaeger, Johannes Zirngibl||2023|
|Paul Stephan||Improvements to Reliable Multipath Forward Error Correction||BA||Kilian Holzinger, Henning Stubbe||2023|
|Krishna Mavani||Simulation of a Network Redundancy Protocol||BA||Kilian Holzinger, Henning Stubbe||2022|
|Tristan Döring||Packet Selection using Concepts from IPFIX and PSAMP||BA||Kilian Holzinger, Henning Stubbe||2022|
|Jonas Kaps||High-Performance Low-Latency Forward Error Correction Coding for Reliable Ethernet Communication||MA||Kilian Holzinger, Filip Rezabek||2022|
|Timon Tsiolis||Analyzing the Extensibility of Programmable Data Planes||BA||Manuel Simon, Henning Stubbe, Sebastian Gallenmüller||2022|
Open and running student theses
|Martin Fritz||State of the Art Assessment of Multipath QUIC||MA||Kilian Holzinger, Lion Steger, Marcel Kempf||2023|
|Ruben Bachmann||Comparison of DPDK-Enabled P4 Software Targets||MA||Manuel Simon, Sebastian Gallenmüller, Stefan Lachnit||2023|
|Nico Greger||Improvements to Forward Erasure Correction Coding||IDP||Kilian Holzinger, Henning Stubbe, Stefan Lachnit||2023|
|Michael Hackl||Improvements to Convolutional Forward Erasure Correction Coding||IDP||Kilian Holzinger, Henning Stubbe, Stefan Lachnit||2023|
|Felix Christ||MASQUE-Proxying in User-Space||MA||Kilian Holzinger, Lion Steger||2023|
|Felix Hahn||Failure Detection through Active and Passive Techniques in P4||BA||Manuel Simon, Eric Hauser||2023|
|Thomas Senftl||Flexible Precise Path Property Emulation||BA||Kilian Holzinger, Sebastian Gallenmüller, Stefan Lachnit||2023|
|Alexander Anton Keil||Comparison of One-Way Delay Measurement Approaches||BA||Kilian Holzinger, Florian Wiedner, Henning Stubbe||2023|
|offen||Forward Erasure Correction Coding in QUIC||IDP, MA||Kilian Holzinger, Henning Stubbe, Stefan Lachnit||2023|
|offen||MoonGen - Implementation of a Testing Toolchain||BA, IDP||Stefan Lachnit, Eric Hauser, Sebastian Gallenmüller||2023|