Minimal Transformer - Full Backprop in the Browser

This code does manual backprop through a 4-layer, 2-head Transformer entirely in JavaScript!
It's primarily for demonstration and can be very slow. Use cautiously.

Training

Select words.txt (vocab) and input.txt (training text). We’ll train a small Transformer with:

This is very slow in JS! We run for 100 epochs by default, but you can stop early if you're impatient.