Aryaman Arora

First-year Ph.D. student at Stanford NLP.


At Stanford, I’m advised by Dan Jurafsky and Christopher Potts, and currently rotating with Noah D. Goodman. My research is focused on (mechanistic) interpretability.

I want to understand how neural networks, particularly language models, work. To that end, I’m excited about causal approaches and insights from linguistics. But mainly, I’m just here to learn cool stuff.

I completed my B.S. in Computer Science and Linguistics at Georgetown University, where I worked with Nathan Schneider. I interned at ETH Zürich with Ryan Cotterell working on information theory. And I spent time at Apple and Redwood Research. (See my CV for more.)

Hit me up on Twitter or at aryamana [at] stanford [dot] edu.

Greatest hits

ReFT: Representation Finetuning for Language Models.
Zhengxuan Wu*, Aryaman Arora*, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts.
arXiv:2404.03592, 2024.
CausalGym: Benchmarking causal interpretability methods on linguistic tasks.
Aryaman Arora, Dan Jurafsky, Christopher Potts.
arXiv:2402.12560, 2024.



  • 2024-04-05 New interp-inspired ultra-efficient finetuning method out: ReFT (repo, tweet).
  • 2024-03-13 We released the paper for pyvene, a new library for intervening on the internal states of neural networks!
  • 2024-02-19 My first lead-author project as a Ph.D. student is out: CausalGym: Benchmarking causal interpretability methods on linguistic tasks.
  • 2023-09-14 Moved to the San Francisco Bay Area 🌉 to start my Ph.D. 🫡
  • 2023-07-31 Back from the Leiden University Summer School in Languages and Linguistics in the Netherlands!
  • 2023-02-08 Accepted to the Ph.D. program at Stanford CS!