## Description The current script suggests to use `--enforce-eager` in vllm server https://github.com/Project-MONAI/VLM-Surgical-Agent-Framework/blob/4fa04340248da6f8f913d35066b47abc5d1d51cf/scripts/run_vllm_server.sh#L16 However, this disables the [cuda graph acceleration](https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/), and as a result, the token generation on A6000 (Holoscan IGX, arm64) is very slow (11.2 tokes/s) ## Possible solution We can add documentation about the flag. Also, removing the flag will increase the speed 4x (45.9 tokens/s )