I'm trying to benchmark the latency of self-hosted PageIndex, focusing specifically on the tree generation stage.
I'm using a local fork that incorporates concurrent request handling, but I would like to understand the baseline performance of the upstream self-hosted version for comparison, particularly regarding tree generation.
Environment
- Self-hosted deployment with concurrent LLM calls
- OCR input: scanned/image-based PDF
- Document length: 128 pages
- Model backend: Snowflake Cortex (Claude Sonnet 4.5)
Observed Latency
- Tree generation with concurrent requests: approximately 125–140 seconds (with concurrent requests set to 128)
- Tree generation without concurrent requests (original implementation): approximately 400 seconds
I have also tested the same document through the official chat page, which completed in around 60 seconds end-to-end. I understand that the official framework may have implemented many more optimizations, but I'm still curious whether the stats above are expected.