Aryaman Arora

PublicationsPostsProjects

Search

Search IconIcon to open search

Publications

Last updated Feb 27, 2023 Edit Source

  1. paper at a major NLP/CL conference or journal (e.g. *CL, LREC, TACL).
  2. paper at a NLP/CL workshop or smaller conference (e.g. SIGMORPHON, Syntaxfest).
  3. non-archival talk (e.g. SIGTYP abstract).
  4. (none) preprint.

# 2022

Summary

Productive year. Got involved in a lot of different groups working on bigger projects. Took a lot more initiative in working on new things (evidenced by my first single-author paper).

Brett Reynolds, Aryaman Arora, Nathan Schneider. CGELBank: CGEL as a framework for English syntax annotation.

Jordan Kodner, …, Aryaman Arora, …, Ekaterina Vylomova. SIGMORPHON–UniMorph 2022 Shared Task 0: Generalization and Typologically Diverse Morphological Inflection. SIGMORPHON.

Khuyagbaatar Batsuren, Gábor Bella, Aryaman Arora, …, Ekaterina Vylomova. The SIGMORPHON 2022 Shared Task on Morpheme Segmentation. SIGMORPHON.

Aryaman Arora. Universal Dependencies for Punjabi. LREC.

Aryaman Arora, Nitin Venkateswaran, Nathan Schneider. MASALA: Modelling and analysing the semantics of adpositions in linguistic annotation of Hindi. LREC.

Khuyagbaatar Batsuren*, Omer Goldman*, …, Aryaman Arora, …, Ekaterina Vylomova. UniMorph 4.0: Universal Morphology. LREC.

Aryaman Arora, Clara Isabel Meister, Ryan Cotterell. Estimating the entropy of linguistic distributions. ACL.

Aryaman Arora, Adam Farris, Samopriya Basu, Suresh Kolichala. Computational historical linguistics and language diversity in South Asia. ACL.

Aryaman Arora, Nathan Schneider, Brett Reynolds. A CGEL-formalism English treebank. MASC-SLL, Philadelphia, USA (April 30, 2022).

Adam Farris*, Aryaman Arora*. DIPI: Dependency Parsing for Ashokan Prakrit Historical Dialectology. Towards a comparative historical dialectology: evidence from morphology and syntax @ DGfS, Tübingen, Germany (February 23–25, 2022).

# 2021

Summary

Didn’t write a lot of papers—mostly focused on taking more advanced classes in my area and exploring new ideas (like I learned about BERT in this year).

Adam Farris*, Aryaman Arora*. For the purpose of curry: A UD Treebank for Ashokan Prakrit. Universal Dependencies Workshop.

Aryaman Arora, Adam Farris, Gopalakrishnan R, Samopriya Basu. Bhāṣācitra: Visualising the dialect geography of South Asia. Workshop on Computational Approaches to Historical Language Change.

Aryaman Arora, Ahmed Etebari. Kholosi Dictionary.

Aryaman Arora, Nitin Venkateswaran, Nathan Schneider. Adposition and case supersenses v1.0: Guidelines for Hindi–Urdu.

Aryaman Arora, Nitin Venkateswaran, Nathan Schneider. SNACS annotation of case markers and adpositions in Hindi. SCiL.

# 2020

Summary

Started doing NLP research in August 2020.

Michael Kranzlein, Emma Manning, Siyao Peng, Shira Wein, Aryaman Arora, Nathan Schneider. PASTRIE: A corpus of prepositions annotated with supsersense tags in Reddit International English. Linguistic Annotation Workshop.

Aryaman Arora, Nathan Schneider. SNACS annotation of case markers and adpositions in Hindi. SIGTYP.

Aryaman Arora, Luke Gessler, Nathan Schneider. Supervised grapheme-to-phoneme conversion of orthographic schwas in Hindi and Punjabi. ACL.

  • code
  • slides
  • Story

    This was the first real project I worked on. I had no practical ML experience going in (beyond having watched a few lectures by Andrew Ng), had cold-emailed this professor, and it turned out his Ph.D. student Luke knew Hindi and had a project idea.

    The basic problem I had been dealing with for a while actually. In Hindi, the orthography has an inherent schwa attached after every consonant unless you use the vowel-killer (halant) or a conjunct consonant. However, in speech, this schwa is sometimes not pronounced; the most obvious instance of this is at the end of the word, e.g. orthographic ⟨ka-ra⟩ is pronounce /kar/. As an editor on Wiktionary I had been trying to write a rule-based convertor from Hindi orthography to pronunciation, and schwa deletion was a really annoying problem. Linguists don’t have a good story about schwa deletion; it is a prosodic phenomenon but also definitely interacts with morphology.

    So we just threw machine learning at the problem and got an insanely good score, schwa deletion is solved! But anyway, I do think about this problem occasionally still. In ~2021 I was thinking “why didn’t I use a biLSTM instead to have infinite context window in the prediction?” And in ~2023 I’m thinking “why didn’t I try harder to understand what algorithm was learned in the decision tree?”

Aryaman Arora, John R. McIntyre. Quasi-passive lower and upper extremity robotic exoskeleton for strengthening human locomotion. Sustainable Innovation.