Aryaman Arora ยป Recruiting

I'm Aryaman, a third-year Ph.D. student at Stanford University, and a part-time researcher at Transluce. I am looking for students who want to work on interpretability for language models! I'm most interested in coming up with new applications of interpretability to the entire LM training stack, towards the goal of improving models via better understanding.

You might be a good fit for this if you are:

Logistics. We will meet at least once a week to discuss the project. I prefer in-person, but am willing to advise remote students as well. I will be responsible for getting you compute for the project. I won't be able to compensate you, but (for Stanford students) I will see if RAship or course credits are possible if you do well.

Expected outcome. A publication in a top-tier AI/NLP conference (e.g. NeurIPS / ACL / ICML / ICLR).

Working style. I currently prefer students who have high agency who can work on low-level implementation independently.
In the past, I've done more collaborative work with students but that usually requires far more time from me than I have this quarter; I'd instead like to take on many students!

How to apply

You have two tasks:

  1. Create a small Jupyter notebook (less than 10 code cells, ideally) that reproduces a single experiment or result from an existing interpretability paper. For inspiration, check out my pyvene tutorial replicating causal tracing from ROME, Zhengxuan Wu's quiz on memorisation subspaces, nnsight mini-replications of papers. Also, feel free to use a small language model like gpt2 for this, if you don't have GPUs. Feel free to use LM assistance to write the code, I'm more interested in what you picked to investigate; negative results are okay.
  2. Submit your notebook along with some info about yourself here.

Project ideas

This is a sample of immediate ideas I have in mind, but I'm open to novel ideas or even just exploring a general direction for a bit to find interesting problems.