Throughput and CPU usage of Quiche code

Hi all! I am interested in the design of a low latency transport protocol using QUIC for XR and want to build on Quiche for this. Therefore, I evaluated the throughput and CPU usage of Quiche. It would be great if I could receive some feedback from the Quiche community on whether 
(a) the results obtain are what one would expect for Quiche
(b) the methodology makes sense-have I missed any optimizations? does my evaluation setup introduce unexpected behaviour? and so forth

Here is my setup:

1) Codebase: I used a modified version of the codebase from [this commit](https://github.com/cloudflare/quiche/commit/c6489c891ab997ab97b5cefbb26ff68147a05304). The Rust code is compiled in release mode and the C code is compiled with the ```-I. -Wall -pedantic -O3 -DNDEBUG``` flags.

2) Network: The CPU and throughput measurements are performed on localhost with TC that sets the rate and buffer size like below:
```
sudo tc qdisc add dev lo root handle 1: tbf rate 100Mbit burst 2mbit latency 40ms
```
Different limits are tested in the set {10, 20, 50, 100, 200, 350, 500} Mbps and without TC. Only the ```rate``` parameter is varied. The rest remain the same. There is no propagation delay in this setup.

3) Server: Written using the C-API. The server wakes up, stores 2 GB in memory, receives a client request and then sends the requested amount from the 2 GB, storing a copy of the remaining response for later completion if it can't all be sent at once. Uses Http0.9 to minimize any overhead of parsing.

4) Client: Written using the C-API. The client wakes up, requests data of size 100 MB or 1 GB, receives it and then closes. 
    (a) It marks the times before sending the request (before `quiche_connection_stream_send`) and after receiving the full response (after the final `quiche_conn_stream_recv`). The difference between the two timestamps is considered the ```RTT```.
    (b) The ```throughput``` is calculated as `(requested size)/RTT`.

5) CPU usage measurement:
    (a) The client and server are pinned to separate CPU cores using `taskset -c`. The client starts 10s after the server.
    (b) The CPU usage of the client and server PIDs are monitored using the top command. The top command produced new frames every 1 s i.e. it is run as ```top -b -d 1 -p $process_pid``` and expresses the CPU usage as percentage of one core used in the last second.
    (c) A graph of the client and server CPU usage patterns is plotted vs time. Note that the server graphs will have a 10 s window at the start where CPU usage is almost 0% as the client hasn't started yet. 
    (d) This is repeated for 3 runs. As each experiment takes roughly the same amount of time, the plotted graph is the mean CPU usage of the client or server every second across the three trials.

6) Throughput measurement:
    (a) Throughput is measured as described in (4b). 
    (b) The values are obtained and averaged after the three runs of (5).
    (c) Iperf3 measurements are plotted for comparison.

## CPU usage results

<img width="1000" height="800" alt="Image" src="https://github.com/user-attachments/assets/8ed9e14c-4d33-4ed3-8907-30b80ea4c4a3" />

Some questions:

(a) Are these expected numbers? Have I missed any optimizations? 
(b) Is there any way to improve these CPU usage numbers?

I repeated the experiment with the master branch of the code and have similar results, as seen below:

<img width="1000" height="800" alt="Image" src="https://github.com/user-attachments/assets/7ec35780-4edd-41c3-a3bf-a9c194843c59" />

## Throughput results

<img width="640" height="480" alt="Image" src="https://github.com/user-attachments/assets/cfff7062-7cf6-49e5-9934-9d10906aea3e" />

(a) Why does the throughput not measure up to Iperf3? Does TC, as I have configured it, induce odd behaviour?
(b) Have I missed some configuration of congestion control, pacing, etc. that might be causing this?

When I ran throughput measurements on a modified Mahimahi emulator (https://github.com/ravinet/mahimahi), I found that Quiche was able to realize the complete throughput and match iperf3.

<img width="640" height="480" alt="Image" src="https://github.com/user-attachments/assets/8e85711d-bf3d-478f-97eb-e3c8ff234fef" />

Any feedback, thoughts or comments would be very helpful! Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Throughput and CPU usage of Quiche code #2284

CPU usage results

Throughput results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Throughput and CPU usage of Quiche code #2284

Description

CPU usage results

Throughput results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions