-
Notifications
You must be signed in to change notification settings - Fork 74
[WIP] LM Workload #860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
[WIP] LM Workload #860
Conversation
Dev -> main
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
…to use int32 for tensor types and add dropout rate parameter
I might be wrong here, but I just randomly saw that we are using a default MLP expansion factor of
However, it seems to me, that we are also using a gated linear unit, which uses an initial expansion that is twice as large (since it is split into the gate and the value), i.e.
I believe that the common expansion factor of If you agree, we should either change the default to |
@fsschneider agree, we should definitely adjust the expansion factor as you suggested. This was probably an oversight, as in plainLM we rescale when using GLU. |
This is for the LM workload.