Aryaman Arora
About · Blog · Google Scholar · GitHub · Twitter · Email · CV
I am a second-year Ph.D. student at Stanford University advised by Dan Jurafsky and Christopher Potts. I work on interpretability of language models. Not only am I curious about how language models work, but I want to discover principles that can enable better language models.
I completed my B.S. in Computer Science and Linguistics at Georgetown University, where I worked with Nathan Schneider on computational linguistics. I interned at ETH Zürich with Ryan Cotterell working on information theory, as well as at Apple and Redwood Research.
Greatest Hits [» more papers]
Mechanistic evaluation of Transformers and state space models
AxBench: Steering LLMs? Even simple baselines outperform sparse autoencoders
ReFT: Representation finetuning for language models
CausalGym: Benchmarking causal interpretability methods on linguistic tasks