Minimal Transformer - Full Backprop in the Browser

This code does manual backprop through a 4-layer, 2-head Transformer entirely in JavaScript!
It's primarily for demonstration and can be very slow. Use cautiously.

Training

Select words.txt (vocab) and input.txt (training text). We’ll train a small Transformer with:

4 layers
32-dimensional embeddings
2 attention heads
Feed-forward dimension=64
Full manual backprop (tokenEmb, Q/K/V, feedforward, final FC all updated)

This is very slow in JS! We run for 100 epochs by default, but you can stop early if you're impatient.

Vocabulary (words.txt):

Training Text (input.txt):

Minimal Transformer - Full Backprop in the Browser

Training

Inference