Aryaman Arora » Papers

These are all of my publications. I also like to keep track of and read papers I'm acknowledged in.

2025

Language model circuits are sparse in the neuron basis
Aryaman Arora*, Zhengxuan Wu*, Jacob Steinhardt, Sarah Schwettmann
Transluce Blog, 2025 [blogpost]

Mechanistic evaluation of Transformers and state space models
Aryaman Arora, Neil Rathi, Nikil Roashan Selvam, Róbert Csórdas, Dan Jurafsky, Christopher Potts
NeurIPS Mechanistic Interpretability Workshop, 2025 Spotlight [paper] [code]

Improved representation steering for language models
Zhengxuan Wu*, Qinan Yu*, Aryaman Arora, Christopher D. Manning, Christopher Potts
NeurIPS, 2025 Spotlight [paper] [code]

Detecting foreign content in self-generated text: A recognition study of large language models
Shengyu Zhu, Tamika Bassman, Dat Tran, Aryaman Arora
NeurIPS LLM Evaluation Workshop, 2025 [paper]

Bayesian scaling laws for in-context learning
Aryaman Arora, Dan Jurafsky, Christopher Potts, Noah D. Goodman
COLM, 2025 [paper] [code]

AxBench: Steering LLMs? Even simple baselines outperform sparse autoencoders
Zhengxuan Wu*, Aryaman Arora*, Atticus Geiger, Zheng Wang, Jing Huang, Dan Jurafsky, Christopher D. Manning, Christopher Potts
ICML, 2025 Spotlight [paper] [code]

2024

Causal abstraction: A theoretical foundation for mechanistic interpretability
Atticus Geiger, Duligur Ibeling, Amir Zur, Maheep Chaudhary, Sonakshi Chauhan, Jing Huang, Aryaman Arora, Zhengxuan Wu, Noah Goodman, Christopher Potts, Thomas Icard
JMLR, 2025 [paper]

ReFT: Representation finetuning for language models
Zhengxuan Wu*, Aryaman Arora*, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts
NeurIPS, 2024 Spotlight [paper] [code]

CausalGym: Benchmarking causal interpretability methods on linguistic tasks
Aryaman Arora, Dan Jurafsky, Christopher Potts
ACL, 2024 Outstanding Paper Award Senior Area Chair Award [paper] [code]

pyvene: A library for understanding and improving PyTorch models via interventions
Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts
NAACL: System Demonstrations, 2024 [paper] [code]

IruMozhi: Automatically classifying diglossia in Tamil
Kabilan Prasanna, Aryaman Arora
NAACL: Findings, 2024 [paper] [code]

Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens
Nay San, Georgios Paraskevopoulos, Aryaman Arora, Xiluo He, Prabhjot Kaur, Oliver Adams, Dan Jurafsky
SIGTYP, 2024 [paper] [code]

A reply to Makelov et al. (2023)’s “interpretability illusion” arguments
Zhengxuan Wu, Atticus Geiger, Jing Huang, Aryaman Arora, Thomas Icard, Christopher Potts, and Noah D. Goodman
arXiv:2401.12631, 2024 [paper] [code]

2023

Towards vision-language mechanistic interpretability: A causal tracing tool for BLIP
Vedant Palit*, Rohan Pandey*, Aryaman Arora, Paul Pu Liang
5th Workshop on Closing the Loop Between Vision and Language (CLVL), 2024 [paper] [code]

SIGMORPHON–UniMorph 2023 Shared Task 0: Typologically diverse morphological inflection
Omer Goldman, Khuyagbaatar Batsuren, Salam Khalifa, Aryaman Arora, Garrett Nicolai, Reut Tsarfaty, Ekaterina Vylomova
SIGMORPHON, 2023 [paper] [code]

Jambu: A historical linguistic database for South Asian languages
Aryaman Arora, Adam Farris, Samopriya Basu, Suresh Kolichala
SIGMORPHON, 2023 [paper] [code]

Unified syntactic annotation of English in the CGEL framework
Brett Reynolds, Aryaman Arora, Nathan Schneider
LAW, 2023 [paper] [code]

CGELBank Annotation Manual v1.0
Brett Reynolds, Nathan Schneider, Aryaman Arora
arXiv:2305.17347, 2023 [paper] [code]

Investigating induction heads in a small transformer language model
Aryaman Arora
MASC-SLL, 2023 [paper] [code]

Localizing model behavior with path patching
Nicholas Goldowsky-Dill, Chris MacLeod, Lucas Sato, Aryaman Arora
arXiv:2304.05969, 2023 [paper] [code]

2022

Information theory in linguistics: Methods and applications
Ryan Cotterell, Richard Futrell, Kyle Mahowald, Clara Meister, Tiago Pimentel, Adina Williams, Aryaman Arora
COLING: Tutorials, 2022 [paper]

CGELBank: CGEL as a framework for English syntax annotation
Brett Reynolds, Aryaman Arora, Nathan Schneider
arXiv:2210.00394, 2022 [paper] [code]

SIGMORPHON–UniMorph 2022 Shared Task 0: Generalization and typologically diverse morphological inflection
Jordan Kodner, ..., Aryaman Arora, ..., Ekaterina Vylomova
SIGMORPHON, 2022 [paper] [code]

The SIGMORPHON 2022 Shared Task on Morpheme Segmentation
Khuyagbaatar Batsuren, Gábor Bella, Aryaman Arora, ..., Ryan Cotterell, Ekaterina Vylomova
SIGMORPHON, 2022 [paper] [code]

Universal Dependencies for Punjabi
Aryaman Arora
LREC, 2022 [paper] [code]

MASALA: Modelling and analysing the semantics of adpositions in linguistic annotation of Hindi
Aryaman Arora, Nitin Venkateswaran, Nathan Schneider
LREC, 2022 [paper] [code]

UniMorph 4.0: Universal Morphology
Khuyagbaatar Batsuren, Omer Goldman, ..., Aryaman Arora, ..., Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova
LREC, 2022 [paper] [code]

A CGEL-formalism English treebank
Aryaman Arora, Nathan Schneider, Brett Reynolds
MASC-SLL, 2022 [code]

Estimating the entropy of linguistic distributions
Aryaman Arora, Clara Meister, Ryan Cotterell
ACL, 2022 [paper] [code]

Computational historical linguistics and language diversity in South Asia
Aryaman Arora, Adam Farris, Samopriya Basu, Suresh Kolichala
ACL, 2022 [paper]

DIPI: Dependency parsing for Ashokan Prakrit historical dialectology
Adam Farris*, Aryaman Arora*
Towards a comparative historical dialectology: evidence from morphology and syntax, Deutschen Gesellschaft für Sprachwissenschaft, 2022 [code]

2021

For the purpose of curry: A UD Treebank for Ashokan Prakrit
Adam Farris*, Aryaman Arora*
UDW, SyntaxFest, 2021 [paper] [code]

Bhāṣācitra: Visualising the dialect geography of South Asia
Aryaman Arora, Adam Farris, Gopalakrishnan R, Samopriya Basu
LChange, 2021 [paper] [code]

Kholosi Dictionary
Aryaman Arora, Ahmed Etebari
Zenodo, 2021 [paper] [code]

Adposition and case supersenses v1.0: Guidelines for Hindi–Urdu
Aryaman Arora, Nitin Venkateswaran, Nathan Schneider
arXiv:2103.01399, 2021 [paper] [code]

SNACS annotation of case markers and adpositions in Hindi
Aryaman Arora, Nitin Venkateswaran, Nathan Schneider
SCiL, 2021 [paper] [code]

2020

PASTRIE: A corpus of prepositions annotated with supsersense tags in Reddit International English
Michael Kranzlein, Emma Manning, Siyao Peng, Shira Wein, Aryaman Arora, Nathan Schneider
LAW, 2020 [paper] [code]

SNACS annotation of case markers and adpositions in Hindi
Aryaman Arora, Nathan Schneider
SIGTYP, 2020 [paper] [code]

Supervised grapheme-to-phoneme conversion of orthographic schwas in Hindi and Punjabi
Aryaman Arora, Luke Gessler, Nathan Schneider
ACL, 2020 [paper] [code]

2019

Quasi-passive lower and upper extremity robotic exoskeleton for strengthening human locomotion
Aryaman Arora, John R. McIntyre
Sustainable Innovation, 2019 [paper]