Minimal Transformer - Full Backprop in the Browser
This code does manual backprop through a 4-layer, 2-head Transformer entirely in JavaScript!
It's primarily for demonstration and can be very slow. Use cautiously.
Training
Select words.txt (vocab) and input.txt (training text).
We’ll train a small Transformer with:
4 layers
32-dimensional embeddings
2 attention heads
Feed-forward dimension=64
Full manual backprop (tokenEmb, Q/K/V, feedforward, final FC all updated)
This is very slow in JS! We run for 100 epochs by default, but you can stop early if you're impatient.
Inference
You can load a .json (this script's parameter dump) or just use the
in-memory weights after training. Then type a context and generate text.
If you trained in this session, skip loading and use the in-memory model.