Update performance section in README.md (#100)

LucasWilkinson · web-flow · commit d9e577e1b9c6 · 2025-10-07T10:34:00.000-04:00
Clarify performance optimizations and upstream hesitance.
diff --git a/README.md b/README.md
@@ -7,4 +7,4 @@ We have the following customizations:
 - Build: Cmake, torch library (this package is bundled into vLLM).
 - Size: reduced templating and removal of (training) kernels
 - Features: Small page size support (FA2), DCP support (FA3)
-- Performance: Some decode specific optimizations for sizes we care about; as well as mixed batch performance optimizations. Upstream is hesitant on specializing for inference.
+- Performance: Some decode specific optimizations for sizes we care about; as well as mixed batch performance optimizations. (Upstream is understandably hesitant on specializing for inference as they also need to support training; we on the other hand compile out the backward pass kernels and do not test that our optimizations do not break them.)