Aryaman Arora · आर्यमन अरोरा

About · Blog · Papers · CV

Aryaman Arora

I am a third-year Ph.D. student at Stanford University advised by Dan Jurafsky and Christopher Potts, and funded by the NSF Graduate Research Fellowship Program.

I work on interpretability of language models. I believe understanding how language models work is the scientific problem of our age.

I completed my B.S. in Computer Science and Linguistics at Georgetown University, where I worked with Nathan Schneider on computational linguistics. I interned at ETH Zürich with Ryan Cotterell, as well as at Apple, Redwood Research, and (most recently) Transluce.

Contact

Google Scholar · GitHub · Twitter · Email

Greatest Hits [» more papers]

ADAG: Automatically describing attribution graphs
Aryaman Arora, Zhengxuan Wu, Jacob Steinhardt, Sarah Schwettmann
arXiv:2604.07615, 2026

Language model circuits are sparse in the neuron basis
Aryaman Arora*, Zhengxuan Wu*, Jacob Steinhardt, Sarah Schwettmann
ICML, 2026 Spotlight

Mechanistic evaluation of Transformers and state space models
Aryaman Arora, Neil Rathi, Nikil Roashan Selvam, Róbert Csórdas, Dan Jurafsky, Christopher Potts
NeurIPS Mechanistic Interpretability Workshop, 2025 Spotlight

AxBench: Steering LLMs? Even simple baselines outperform sparse autoencoders
Zhengxuan Wu*, Aryaman Arora*, Atticus Geiger, Zheng Wang, Jing Huang, Dan Jurafsky, Christopher D. Manning, Christopher Potts
ICML, 2025 Spotlight

ReFT: Representation finetuning for language models
Zhengxuan Wu*, Aryaman Arora*, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts
NeurIPS, 2024 Spotlight

CausalGym: Benchmarking causal interpretability methods on linguistic tasks
Aryaman Arora, Dan Jurafsky, Christopher Potts
ACL, 2024 Outstanding Paper Award Senior Area Chair Award