RoPen

Rope enhanced transformer that pens shakespear, attention-only

What are we doing in this project

training a 6 layer decoder only transformer with RoPe on each attention head, operating on character level tokens from 'tinyshakespear'

Documenting the workings

Day 1 : Datapiple
- this downloads the tinyshakespeare.txt
- Character level tokenizer
- train / val split : important Keeping it in the ratio of 90:10
Day 2 : Model Configuration and pure attention transformer implmentation
- RoPe attention with tied Q/K projections
- Pure attention block (no FFN layers)
- Weight trying between input/output embeddings
Wrap up, Complete training pipeline
- AdamW optimizer with weight decay
- Cosine learning rate schedule with warmup
- Gradient clipping
- Validation loss tracking
- Automatic checkpointing
Text generation part
- Top-k and top-p (nucleus) sampling
- Temperature control
- Checkpoint loading
- Multiple prompt examples

Lets add a feed forward network to transformer block also teach and implement AliBi also add kv cache for faster generation, currently it re-computes all activation every step

later :

improve rope : support partial head dimensions, fix : handle odd dimensions, Add Residual Stream Dropout & Stochastic Depth ✅ Add Dropout in Forward Pass

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
config.py		config.py
data.py		data.py
model.py		model.py
run.sh		run.sh
sample.py		sample.py
tinyshakespeare.txt		tinyshakespeare.txt
train.py		train.py
tui_monitor.py		tui_monitor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RoPen

What are we doing in this project

Documenting the workings

About

Uh oh!

Releases

Packages

Languages

mrinalxdev/RoPen

Folders and files

Latest commit

History

Repository files navigation

RoPen

What are we doing in this project

Documenting the workings

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages