Skip to content

Conversation

Velaciela
Copy link
Collaborator

@Velaciela Velaciela commented Oct 10, 2025

related PR: #987

Performance

  • fibonacci $\text{fib}(2^{20})$: runs 12M steps (warmup + rerun)
  • machine(node-19): AMD EPYC 7702 64-Core, 1 x RTX 3090
metric CPU(bb31) GPU(bb31) speedup - CPU(gl64) GPU(gl64) speedup
Create Proof(total) 12.6s 4.18s 3.01x - 13.9s 4.87s 2.85x
batch commit 2.31s 514ms 4.49x - 6.81s 1.20s 5.68x
transfer pk - 5.17ms N/A - - 8.20ms N/A
main and tower proof 9.56s 3.48s 2.74x - 6.11s 3.45s 1.77x
opening 773ms 179ms 4.31x - 1.01s 214ms 4.71x

trace: Babybear

### CPU-BB31 ### ZKVM_create_proof [ 12.6s | 0.00% / 100.00% ]
┝━ batch commit to traces [ 2.31s | 18.29% ] profiling_1: true
┝━ transfer pk to device [ 2.99µs | 0.00% ] profiling_1: true
┝━ main_proofs [ 9.56s | 0.01% / 75.60% ] profiling_1: true
│  ┝━ create_chip_proof [ 2.03s | 0.35% / 16.04% ] table_name: "ADD"
│  │  ┝━ per_layer_gen_witness [ 111ms | 0.87% ] profiling_2: true
│  │  ┝━ prove_tower_relation [ 1.65s | 0.00% / 13.04% ] profiling_2: true
│  │  │  ┝━ build_tower_witness [ 398ms | 3.15% ] profiling_2: true
│  │  │  ┕━ prove_tower_relation [ 1.25s | 9.89% ] profiling_2: true
│  │  ┕━ layer_proof [ 224ms | 1.77% ] profiling_2: true
...
┕━ pcs_opening [ 773ms | 6.11% ] profiling_1: true


### GPU-BB31 ### ZKVM_create_proof [ 4.18s | 0.00% / 100.00% ]
┝━ batch commit to traces [ 514ms | 0.45% / 12.28% ] profiling_1: true
│  ┝━ [gpu] rmms_h2d  = 231.46ms
│  ┝━ [gpu] polys_d2d = 10.37ms
│  ┝━ [gpu] encode    = 42.93ms
│  ┕━ [gpu] mmcs      = 92.70ms
┝━ main_proofs [ 3.48s | 7.31% / 83.30% ] profiling_1: true
│  ┝━ create_chip_proof [ 677ms | 0.00% / 16.19% ] table_name: "ADD"
│  │  ┝━ per_layer_gen_witness [ 5.26ms | 0.13% ] profiling_2: true
│  │  ┝━ prove_tower_relation [ 357ms | 0.00% / 8.54% ] profiling_2: true
│  │  │  ┝━ build_tower_witness [ 62.3ms | 1.49% ] profiling_2: true
│  │  │  ┝━ extract_out_evals_from_gpu_towers [ 3.99ms | 0.10% ] profiling_2: true
│  │  │  ┕━ prove_tower_relation [ 291ms | 6.95% ] profiling_2: true
│  │  ┕━ layer_proof [ 315ms | 7.52% ] profiling_2: true
...
┕━ pcs_opening [ 179ms | 4.29% ] profiling_1: true

trace: Goldilocks

### CPU-GL64 ### ZKVM_create_proof [ 13.9s | 0.00% / 100.00% ]
┝━ batch commit to traces [ 6.81s | 48.85% ] profiling_1: true
┝━ main_proofs [ 6.11s | 0.41% / 43.87% ] profiling_1: true
│  ┝━ create_chip_proof [ 1.30s | 0.50% / 9.35% ] table_name: "ADD"
│  │  ┝━ per_layer_gen_witness [ 107ms | 0.77% ] profiling_2: true
│  │  ┝━ prove_tower_relation [ 945ms | 0.00% / 6.78% ] profiling_2: true
│  │  │  ┝━ build_tower_witness [ 334ms | 2.40% ] profiling_2: true
│  │  │  ┕━ prove_tower_relation [ 610ms | 4.38% ] profiling_2: true
│  │  ┕━ layer_proof [ 180ms | 1.29% ] profiling_2: true
...
┕━ pcs_opening [ 1.01s | 7.28% ] profiling_1: true

### GPU-GL64 ### ZKVM_create_proof [ 4.87s | 0.00% / 100.00% ]
┝━ batch commit to traces [ 1.20s | 0.90% / 24.58% ] profiling_1: true
│  ┝━ [gpu] rmms_h2d  = 420.56ms
│  ┝━ [gpu] polys_d2d = 15.84ms
│  ┝━ [gpu] encode    = 84.63ms
│  ┕━ [gpu] mmcs      = 431.68ms
┝━ transfer pk to device [ 8.20ms | 0.17% ] profiling_1: true
┝━ main_proofs [ 3.45s | 7.33% / 70.86% ] profiling_1: true
│  ┝━ create_chip_proof [ 657ms | 0.00% / 13.49% ] table_name: "ADD"
│  │  ┝━ per_layer_gen_witness [ 6.87ms | 0.14% ] profiling_2: true
│  │  ┝━ prove_tower_relation [ 346ms | 0.00% / 7.11% ] profiling_2: true
│  │  │  ┝━ build_tower_witness [ 61.4ms | 1.26% ] profiling_2: true
│  │  │  ┝━ extract_out_evals_from_gpu_towers [ 3.16ms | 0.06% ] profiling_2: true
│  │  │  ┕━ prove_tower_relation [ 282ms | 5.78% ] profiling_2: true
│  │  ┕━ layer_proof [ 304ms | 6.24% ] profiling_2: true
...
┕━ pcs_opening [ 214ms | 4.39% ] profiling_1: true

e2e test

# fibonacci
RUST_LOG=debug cargo run --release --features gpu --package ceno_zkvm --bin e2e -- --platform=ceno --hints=15 --public-io=2400 --profiling=2 --field=baby-bear examples/target/riscv32im-ceno-zkvm-elf/release/examples/fibonacci 2>&1 | tee "fib_perf_gpu.log"

# keccak
RUST_LOG=debug cargo run --release --features gpu --package ceno_zkvm --bin e2e -- --platform=ceno --profiling=2 --field=baby-bear examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall 2>&1 | tee "keccak_perf_gpu.log"

@Velaciela Velaciela changed the base branch from temp/gpu-dev-base to master October 14, 2025 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant