Aryaman Arora Papers
These are all of my publications. I also like to keep track of and read papers I'm acknowledged in.
ADAG: Automatically describing attribution graphs
Verbalizing LLMs' assumptions to explain and control sycophancy
Language model circuits are sparse in the neuron basis
Mechanistic evaluation of Transformers and state space models
Improved representation steering for language models
Detecting foreign content in self-generated text: A recognition
study of large language models
Bayesian scaling laws for in-context learning
AxBench: Steering LLMs? Even simple baselines outperform sparse
autoencoders
Causal abstraction: A theoretical foundation for mechanistic
interpretability
ReFT: Representation finetuning for language models
CausalGym: Benchmarking causal interpretability methods on
linguistic tasks
pyvene: A library for understanding and improving PyTorch models via
interventions
IruMozhi: Automatically classifying diglossia in Tamil
Predicting positive transfer for improved low-resource speech
recognition using acoustic pseudo-tokens
A reply to Makelov et al. (2023)’s “interpretability illusion”
arguments
Towards vision-language mechanistic interpretability: A causal
tracing tool for BLIP
SIGMORPHON–UniMorph 2023 Shared Task 0: Typologically diverse
morphological inflection
Jambu: A historical linguistic database for South Asian languages
Unified syntactic annotation of English in the CGEL framework
CGELBank Annotation Manual v1.0
Investigating induction heads in a small transformer language
model
Localizing model behavior with path patching
Information theory in linguistics: Methods and applications
CGELBank: CGEL as a framework for English syntax annotation
SIGMORPHON–UniMorph 2022 Shared Task 0: Generalization and
typologically diverse morphological inflection
The SIGMORPHON 2022 Shared Task on Morpheme Segmentation
Universal Dependencies for Punjabi
MASALA: Modelling and analysing the semantics of adpositions in
linguistic annotation of Hindi
UniMorph 4.0: Universal Morphology
A CGEL-formalism English treebank
Estimating the entropy of linguistic distributions
Computational historical linguistics and language diversity in South
Asia
DIPI: Dependency parsing for Ashokan Prakrit historical
dialectology
For the purpose of curry: A UD Treebank for Ashokan Prakrit
Bhāṣācitra: Visualising the dialect geography of South Asia
Kholosi Dictionary
Adposition and case supersenses v1.0: Guidelines for Hindi–Urdu
SNACS annotation of case markers and adpositions in Hindi
PASTRIE: A corpus of prepositions annotated with supsersense tags in
Reddit International English
SNACS annotation of case markers and adpositions in Hindi
Supervised grapheme-to-phoneme conversion of orthographic schwas in
Hindi and Punjabi
Quasi-passive lower and upper extremity robotic exoskeleton for
strengthening human locomotion