You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/paper_index.md
+13Lines changed: 13 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -470,6 +470,19 @@ training_args = SFTConfig(
470
470
)
471
471
```
472
472
473
+
To closely match the paper’s setup, you can use the following configuration (see Sec. 4.1). Authors also mention that the hyperparameters are not very sensitive (Sec. 4.3):
474
+
475
+
```python
476
+
SFTConfig(
477
+
loss_type="dft",
478
+
learning_rate=5e-5,
479
+
max_length=2048,
480
+
# Target batch size 256; achieved via per-device batch 8 * grad accumulation 32
0 commit comments