Aryaman Arora
I am a third-year Ph.D. student at Stanford University advised by Dan Jurafsky and Christopher Potts, funded by the NSF Graduate Research Fellowship Program. Concurrently, I am a researcher at Transluce.
I work on interpretability of language models. Not only am I curious about how language models work, but I want to discover principles that can enable better language models.
I completed my B.S. in Computer Science and Linguistics at Georgetown University, where I worked with Nathan Schneider on computational linguistics. I interned at ETH Zürich with Ryan Cotterell working on information theory, as well as at Apple and Redwood Research.
I am current recruiting students to work on interpretability. [» more info]
Contact
Google Scholar · GitHub · Twitter · Email
Greatest Hits [» more papers]
Mechanistic evaluation of Transformers and state space models
AxBench: Steering LLMs? Even simple baselines outperform sparse autoencoders
ReFT: Representation finetuning for language models
CausalGym: Benchmarking causal interpretability methods on linguistic tasks