This repository contains the code needed to run the RDMA server agent in the Ribosome system.
This implementation is tested with MLNX_OFED >= 5.4.0.
The agent is designed to run on a server equipped with a Mellanox ConnectX-5 NIC.
Before compiling the agent you need to install the following libraries:
Before running the server agent, you need to disable the iCRC check on the specific ConnectX-5 port. In this repository, such commands are not shown as they are under NDA.
In order to compile the agent, run the following commands from the root project folder:
mkdir build && cd build
cmake -S .. -B . && make -jserver <device> <idx> <numa> <qps> <min-timer> <max-timer>
<device>: name of the interface to use for the connection with Ribosome switch (example:mlx5_1)<idx>: unique index associated to this server, starting from 0. Each RDMA Agent process should be identified with a different index, assigned sequentially<numa>: index of the NUMA node to use when allocating buffers<qps>: number of Queue-Pairs to create<min-timer>: minimum QP reset timer value, in number of packets (example: 200). This is used (in conjuction with<max-timer>) to compute a random value that is used in the Ribosome Tofino to idle sending packets towards a freshly restored QP<max-timer>: maximum QP reset timer value, in number of packets (example: 2000). This is used (in conjuction with<min-timer>) to compute a random value that is used in the Ribosome Tofino to idle sending packets towards a freshly restored QP
Before starting the process, you have to ensure that the Linux network interface corresponding to the mlx5_N that you want to use towards the Ribosome switch has the correct MTU and an IPv4 address assigned.
First of all, check which Linux interface corresponds to the mlx5_N that you want to use (this can be done using the ibdev2netdev command). For example the name is cx5_0if0.
The server agent opens Queue-Pairs with a path MTU of 4096 bytes. The corresponding network interface MTU must be tweaked in order to work with this value. Set the MTU of the interface to 4200 bytes:
sudo ip link set dev cx5_0if0 mtu 4200Assign an IPv4 address to the interface. Do not worry, even if it an unreachable network, the server agent will install a fake ARP entry to correctly send packets on the cx5_0if0 interface:
sudo ip addr add 192.168.40.13/24 dev cx5_0if0At this point you can start the server agent with root privileges, for example:
sudo ./server mlx5_N 0 0 32 10 100Consider the experimental setup depicted in the following figure.
The ConnectX-5 port, the idx, and the corresponding Tofino port of each RDMA Server are shown. To swap things a little bit, RDMA Server 4 is connected to port 8, while RDMA Server 3 is connected to port 12. The idx is assigned with respect to the corresponding QP-Restore Mirror Group assigned in the Ribosome-P4 setup.py script. In particular, the group is associated to each one of the ports shown in the figure, and it is computed as 200 + idx. Hence, in order to receive the QP Restore packets for the correct Queue-Pairs, the mirror group must correspond the correct server, denoted by the correct idx.
Moreover, each process should be run with the same <qp> value (which is the one chosen in the Ribosome-P4 program). In this example we use 32 Queue-Pairs.
The commands to correctly run the server agents are (<numa>, <min-timer>, and <max-timer> values are just examples):
user@rdma-server-1: sudo ./server mlx5_1 0 0 32 10 100user@rdma-server-2: sudo ./server mlx5_3 1 0 32 10 100user@rdma-server-3: sudo ./server mlx5_2 2 0 32 10 100user@rdma-server-4: sudo ./server mlx5_0 3 0 32 10 100