Some intuitions about transformers

Unless you have been living under a rock for the last five years, you have definitely (if possibly unknowingly) somehow interacted with a machine learning model that uses the transformer architecture. I have spent a couple months poking at little transformer models like GPT-2 and the 19 million-parameter version of Pythia and yet after working at an interpretability startup for a week I realised that I actually don’t have a great understanding of how a transformer works....

December 24, 2022 · 6 min · 1108 words · Me