HYPERNIC
NIC-Level Co-Processors for Resilient Coded Networking and Computation
Scientists: | Sebastian Gallenmüller, Manuel Simon, M.Sc., Henning Stubbe, Kilian Holzinger, Prof. Dr.-Ing. Georg Carle |
Duration: | 17.02.2023 – 16.02.2026 |
Funding: | DFG SPP 2378 |
Homepage: | https://www.resilient-worlds.org/spp2378/ |
Introduction
Important for providing resilient time-sensitive services is the availability of resilience mechanisms and high packet processing performance at the right location, thereby avoiding overloaded network and host components. Hardware offloading by NIC-level (Network Interface Card) co-processors enables resilient, low-latency computation and can help to free scarce CPU resources.
In the HYPERNIC project, the Chair of Integrated Systems (LIS) and Chair of Network Architectures and Services (NET) of the Technical University of Munich (TUM) propose a novel communication and computation approach that is resilient against potential attacks and partial network failures. We plan to investigate mechanisms and processing platforms that provide resilience efficiently and flexibly. We aim to design a novel class of NICs with processing capabilities that employ techniques such as network coding, low-latency packet retransmissions, and fault-tolerant algorithms.
Research Objectives
The joint project work consists of a hybrid software/hardware stack. The software layers will ensure the necessary flexibility (through programmability) where the latency of the HYPERNIC functions is not the prior concern. The hardware layer will devise co-processors and wire-rate data processing pipelines where low, deterministic latencies are required. Thus, the two project partners exploit their complementary and proven competencies in the fields of network architectures, protocols, and network processing engines. This merger of skills and expertise will allow the theoretical, methodological, architectural, and practical investigations.
In the project the two project partners will combine the complementary research foci of their respective research groups. Our chair has a strong background in measurements and modeling of programmable packet processing systems, long-term dedication to measurement-driven research, sophisticated testbed infrastructure and extensive measurement facilities. LIS contributes various FPGA boards that allow low-level integration of the investigated resilience mechanisms. Besides hardware acceleration, LIS has a research focus on highly reliable, high-bandwidth networks. Both the research areas and the testbed facilities are complementary. This allows the theoretical and practical investigation of the entire networking stack starting at the NIC and its offloading capabilities, management of the NIC utilizing network drivers, and processing in software up to the application.
Structure of Work
The project is structured in three work areas, as follows.
Work Area A
The first WP investigates fundamental redundancy mechanisms that allow using multiple independent paths through measures on a protocol level, e.g., packet-level duplication of traffic. More efficient ways to introduce redundancy into the network communication, such as NC, are also investigated. Finally, we investigate methods for resilient computation. There, we want to investigate frameworks that allow low-latency replication of state across network nodes reliably and securely.
Work Area B
The first part investigates software implementations of different redundancy schemes. This WP is concerned with the analysis and design, the actual implementation and optimizations. The second part investigates the applicability of hardware offloading for the mechanisms investigated in Workarea A. Offloading specific functions to hardware helps to improve the bandwidth, lowers latency, and avoids jitter. We plan to assess these potential benefits by measurements. Third, we implement resilient computation. Measurements that help to identify possible bottlenecks are used to plan for improvements by targeted hardware acceleration.
Work Area C
Work Area C is dedicated to establish the testbed facilities to perform experiments, with a special focus to support cooperation with the research community. We plan to establish a federated testbed between our two research groups to enable distributed experiments under a common experiment framework. The results created in the federated testbed will be used for modeling of our results, and for being able to predict properties like maximum throughput or worst case latencies of a larger range of systems, scenarios and configurations. The testbed facilities and the investigated techniques will also be made available to other members of the research community, especially to members of the priority programme. A federated testbed with further research groups can be established to extend the capabilities of the original testbeds.