Aryaman Arora

About · Blog · CV

Aryaman Arora

I am a third-year Ph.D. student at Stanford University advised by Dan Jurafsky and Christopher Potts, funded by the NSF Graduate Research Fellowship Program. Concurrently, I am a researcher at Transluce.

I work on interpretability of language models. Not only am I curious about how language models work, but I want to discover principles that can enable better language models.

I completed my B.S. in Computer Science and Linguistics at Georgetown University, where I worked with Nathan Schneider on computational linguistics. I interned at ETH Zürich with Ryan Cotterell working on information theory, as well as at Apple and Redwood Research.

I am current recruiting students to work on interpretability. [» more info]

Contact

Google Scholar · GitHub · Twitter · Email

Greatest Hits [» more papers]

Mechanistic evaluation of Transformers and state space models
Aryaman Arora, Neil Rathi, Nikil Roashan Selvam, Róbert Csórdas, Dan Jurafsky, Christopher Potts
arXiv:2505.15105, 2025

AxBench: Steering LLMs? Even simple baselines outperform sparse autoencoders
Zhengxuan Wu*, Aryaman Arora*, Atticus Geiger, Zheng Wang, Jing Huang, Dan Jurafsky, Christopher D. Manning, Christopher Potts
ICML, 2025 Spotlight

ReFT: Representation finetuning for language models
Zhengxuan Wu*, Aryaman Arora*, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts
NeurIPS, 2024 Spotlight

CausalGym: Benchmarking causal interpretability methods on linguistic tasks
Aryaman Arora, Dan Jurafsky, Christopher Potts
ACL, 2024 Outstanding Paper Award Senior Area Chair Award