You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,4 +7,4 @@ We have the following customizations:
7
7
- Build: Cmake, torch library (this package is bundled into vLLM).
8
8
- Size: reduced templating and removal of (training) kernels
9
9
- Features: Small page size support (FA2), DCP support (FA3)
10
-
- Performance: Some decode specific optimizations for sizes we care about; as well as mixed batch performance optimizations. Upstream is hesitant on specializing for inference.
10
+
- Performance: Some decode specific optimizations for sizes we care about; as well as mixed batch performance optimizations. (Upstream is understandably hesitant on specializing for inference as they also need to support training; we on the other hand compile out the backward pass kernels and do not test that our optimizations do not break them.)
0 commit comments