Skip to content

Commit d9e577e

Browse files
Update performance section in README.md (#100)
Clarify performance optimizations and upstream hesitance.
1 parent b831072 commit d9e577e

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@ We have the following customizations:
77
- Build: Cmake, torch library (this package is bundled into vLLM).
88
- Size: reduced templating and removal of (training) kernels
99
- Features: Small page size support (FA2), DCP support (FA3)
10-
- Performance: Some decode specific optimizations for sizes we care about; as well as mixed batch performance optimizations. Upstream is hesitant on specializing for inference.
10+
- Performance: Some decode specific optimizations for sizes we care about; as well as mixed batch performance optimizations. (Upstream is understandably hesitant on specializing for inference as they also need to support training; we on the other hand compile out the backward pass kernels and do not test that our optimizations do not break them.)

0 commit comments

Comments
 (0)