Aryaman Arora » Papers

These are all of my publications. I also like to keep track of and read papers I'm acknowledged in.

2025

Mechanistic evaluation of Transformers and state space models
Aryaman Arora, Neil Rathi, Nikil Roashan Selvam, Róbert Csórdas, Dan Jurafsky, Christopher Potts
arXiv:2505.15105, 2025 [paper] [code]

Improved representation steering for language models
Zhengxuan Wu*, Qinan Yu*, Aryaman Arora, Christopher D. Manning, Christopher Potts
arXiv:2505.20809, 2025 [paper] [code]

Mirror test for large language models: A self-recognition evaluation framework
Shengyu Zhu, Tamika Bassman, Dat Tran, Aryaman Arora
under review, 2025

AxBench: Steering LLMs? Even simple baselines outperform sparse autoencoders
Zhengxuan Wu*, Aryaman Arora*, Atticus Geiger, Zheng Wang, Jing Huang, Dan Jurafsky, Christopher D. Manning, Christopher Potts
ICML, 2025 Spotlight [paper] [code]

2024

Bayesian scaling laws for in-context learning
Aryaman Arora, Dan Jurafsky, Christopher Potts, Noah D. Goodman
arXiv:2410.16531, 2024 [paper] [code]

Causal abstraction: A theoretical foundation for mechanistic interpretability
Atticus Geiger, Duligur Ibeling, Amir Zur, Maheep Chaudhary, Sonakshi Chauhan, Jing Huang, Aryaman Arora, Zhengxuan Wu, Noah Goodman, Christopher Potts, Thomas Icard
JMLR, 2025 [paper]

ReFT: Representation finetuning for language models
Zhengxuan Wu*, Aryaman Arora*, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts
NeurIPS, 2024 Spotlight [paper] [code]

CausalGym: Benchmarking causal interpretability methods on linguistic tasks
Aryaman Arora, Dan Jurafsky, Christopher Potts
ACL, 2024 Outstanding Paper Award Senior Area Chair Award [paper] [code]

pyvene: A library for understanding and improving PyTorch models via interventions
Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts
NAACL: System Demonstrations, 2024 [paper] [code]

IruMozhi: Automatically classifying diglossia in Tamil
Kabilan Prasanna, Aryaman Arora
NAACL: Findings, 2024 [paper] [code]

Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens
Nay San, Georgios Paraskevopoulos, Aryaman Arora, Xiluo He, Prabhjot Kaur, Oliver Adams, Dan Jurafsky
SIGTYP, 2024 [paper] [code]

A reply to Makelov et al. (2023)’s “interpretability illusion” arguments
Zhengxuan Wu, Atticus Geiger, Jing Huang, Aryaman Arora, Thomas Icard, Christopher Potts, and Noah D. Goodman
arXiv:2401.12631, 2024 [paper] [code]

2023

Towards vision-language mechanistic interpretability: A causal tracing tool for BLIP
Vedant Palit*, Rohan Pandey*, Aryaman Arora, Paul Pu Liang
5th Workshop on Closing the Loop Between Vision and Language (CLVL), 2024 [paper] [code]

SIGMORPHON–UniMorph 2023 Shared Task 0: Typologically diverse morphological inflection
Omer Goldman, Khuyagbaatar Batsuren, Salam Khalifa, Aryaman Arora, Garrett Nicolai, Reut Tsarfaty, Ekaterina Vylomova
SIGMORPHON, 2023 [paper] [code]

Jambu: A historical linguistic database for South Asian languages
Aryaman Arora, Adam Farris, Samopriya Basu, Suresh Kolichala
SIGMORPHON, 2023 [paper] [code]

Unified syntactic annotation of English in the CGEL framework
Brett Reynolds, Aryaman Arora, Nathan Schneider
LAW, 2023 [paper] [code]

CGELBank Annotation Manual v1.0
Brett Reynolds, Nathan Schneider, Aryaman Arora
arXiv:2305.17347, 2023 [paper] [code]

Investigating induction heads in a small transformer language model
Aryaman Arora
MASC-SLL, 2023 [paper] [code]

Localizing model behavior with path patching
Nicholas Goldowsky-Dill, Chris MacLeod, Lucas Sato, Aryaman Arora
arXiv:2304.05969, 2023 [paper] [code]

2022

Information theory in linguistics: Methods and applications
Ryan Cotterell, Richard Futrell, Kyle Mahowald, Clara Meister, Tiago Pimentel, Adina Williams, Aryaman Arora
COLING: Tutorials, 2022 [paper]

CGELBank: CGEL as a framework for English syntax annotation
Brett Reynolds, Aryaman Arora, Nathan Schneider
arXiv:2210.00394, 2022 [paper] [code]

SIGMORPHON–UniMorph 2022 Shared Task 0: Generalization and typologically diverse morphological inflection
Jordan Kodner, ..., Aryaman Arora, ..., Ekaterina Vylomova
SIGMORPHON, 2022 [paper] [code]

The SIGMORPHON 2022 Shared Task on Morpheme Segmentation
Khuyagbaatar Batsuren, Gábor Bella, Aryaman Arora, ..., Ryan Cotterell, Ekaterina Vylomova
SIGMORPHON, 2022 [paper] [code]

Universal Dependencies for Punjabi
Aryaman Arora
LREC, 2022 [paper] [code]

MASALA: Modelling and analysing the semantics of adpositions in linguistic annotation of Hindi
Aryaman Arora, Nitin Venkateswaran, Nathan Schneider
LREC, 2022 [paper] [code]

UniMorph 4.0: Universal Morphology
Khuyagbaatar Batsuren, Omer Goldman, ..., Aryaman Arora, ..., Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova
LREC, 2022 [paper] [code]

A CGEL-formalism English treebank
Aryaman Arora, Nathan Schneider, Brett Reynolds
MASC-SLL, 2022 [code]

Estimating the entropy of linguistic distributions
Aryaman Arora, Clara Meister, Ryan Cotterell
ACL, 2022 [paper] [code]

Computational historical linguistics and language diversity in South Asia
Aryaman Arora, Adam Farris, Samopriya Basu, Suresh Kolichala
ACL, 2022 [paper]

DIPI: Dependency parsing for Ashokan Prakrit historical dialectology
Adam Farris*, Aryaman Arora*
Towards a comparative historical dialectology: evidence from morphology and syntax, Deutschen Gesellschaft für Sprachwissenschaft, 2022 [code]

2021

For the purpose of curry: A UD Treebank for Ashokan Prakrit
Adam Farris*, Aryaman Arora*
UDW, SyntaxFest, 2021 [paper] [code]

Bhāṣācitra: Visualising the dialect geography of South Asia
Aryaman Arora, Adam Farris, Gopalakrishnan R, Samopriya Basu
LChange, 2021 [paper] [code]

Kholosi Dictionary
Aryaman Arora, Ahmed Etebari
Zenodo, 2021 [paper] [code]

Adposition and case supersenses v1.0: Guidelines for Hindi–Urdu
Aryaman Arora, Nitin Venkateswaran, Nathan Schneider
arXiv:2103.01399, 2021 [paper] [code]

SNACS annotation of case markers and adpositions in Hindi
Aryaman Arora, Nitin Venkateswaran, Nathan Schneider
SCiL, 2021 [paper] [code]

2020

PASTRIE: A corpus of prepositions annotated with supsersense tags in Reddit International English
Michael Kranzlein, Emma Manning, Siyao Peng, Shira Wein, Aryaman Arora, Nathan Schneider
LAW, 2020 [paper] [code]

SNACS annotation of case markers and adpositions in Hindi
Aryaman Arora, Nathan Schneider
SIGTYP, 2020 [paper] [code]

Supervised grapheme-to-phoneme conversion of orthographic schwas in Hindi and Punjabi
Aryaman Arora, Luke Gessler, Nathan Schneider
ACL, 2020 [paper] [code]

2019

Quasi-passive lower and upper extremity robotic exoskeleton for strengthening human locomotion
Aryaman Arora, John R. McIntyre
Sustainable Innovation, 2019 [paper]