Aryaman Arora

B.S. in Computer Science and Linguistics
Georgetown University, Class of 2024


  • Nov 16: Attending EMNLP and presenting at SIGTYP.
  • Aug 27: My research is in the news.
  • Aug 3: Starting undergrad at Georgetown.
  • Jul 21: Presented at ACL (my first conference!) and wrote about it.
  • Apr 22: Officially registered our nonprofit, Washingtutors, an online tutoring service for DC Public School students.

Who am I?

I'm Aryaman [आर्यमन /ˈäːɾjəmən/], a freshman at Georgetown! I'm doing computational linguistics research as a member of NERT, Dr. Nathan Schneider's research group at Georgetown. I'm excited about technologies for South Asian languages and linguistic work on Indo-Aryan languages.

I'm an admin on the English Wiktionary where I manage South Asian language documentation. I also do programming competitions and hackathons on occasion.

I like to bicycle (though not very well), read (in English and Hindi, but the latter not very well), and eat unreasonable amounts of food relative to my size. I'd also say I like to go places, but we're in the middle of a pandemic.

The background image is from here. It's from one of the edicts of the Mauryan emperor Aśoka of the 3rd century BCE written in Early Middle Indo-Aryan (specifically Aśokan Prakrit).


My research work has included adposition and case supersenses applied to various languages, grapheme-to-phoneme conversion of Indo-Aryan languages, and other work on Hindi from a computational perspective.

I've worked with unique coauthors on publications.

  • PASTRIE: A Corpus of Prepositions Annotated with Supsersense Tags in Reddit International English Michael Kranzlein, Emma Manning, Siyao Peng, Shira Wein, Aryaman Arora, Nathan Schneider LAW @ COLING 2020

    We present the Prepositions Annotated with Supsersense Tags in Reddit International English (“PASTRIE”) corpus, a new dataset containing manually annotated preposition supersenses of English data from presumed speakers of four L1s: English, French, German, and Spanish. The annotations are comprehensive, covering all preposition types and tokens in the sample. Along with the corpus, we provide analysis of distributional patterns across the included L1s and a discussion of the influence of L1s on L2 preposition choice.

  • SNACS Annotation of Case Markers and Adpositions in Hindi Aryaman Arora, Nathan Schneider SIGTYP @ EMNLP 2020

    The use of specific case markers and adpositions for particular semantic roles is idiosyncratic to every language. This poses problems in many natural language processing tasks such as machine translation and semantic role labelling. Models for these tasks rely on human-annotated corpora as training data. There is a lack of corpora in South Asian languages for such tasks. Even Hindi, despite being a resource-rich language, is limited in available labelled data. This extended abstract presents the in-progress annotation of case markers and adpositions in a Hindi corpus, employing the cross-lingual scheme proposed by Schneider et al. (2017), Semantic Network of Adposition and Case Supersenses (SNACS). The SNACS guidelines we developed also apply to Urdu. We hope to finalize this corpus and develop NLP tools making use of the dataset, as well as promote NLP for typologically similar South Asian languages.

  • Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi Aryaman Arora, Luke Gessler, Nathan Schneider ACL 2020

    Hindi grapheme-to-phoneme (G2P) conversion is mostly trivial, with one exception: whether a schwa represented in the orthography is pronounced or unpronounced (deleted). Previous work has attempted to predict schwa deletion in a rule-based fashion using prosodic or phonetic analysis. We present the first statistical schwa deletion classifier for Hindi, which relies solely on the orthography as the input and outperforms previous approaches. We trained our model on a newly-compiled pronunciation lexicon extracted from various online dictionaries. Our best Hindi model achieves state of the art performance, and also achieves good performance on a closely related language, Punjabi, without modification.

        title = "Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in {H}indi and {P}unjabi",
        author = "Arora, Aryaman  and
            Gessler, Luke  and
            Schneider, Nathan",
        booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
        month = jul,
        year = "2020",
        address = "Online",
        publisher = "Association for Computational Linguistics",
        url = "",
        doi = "10.18653/v1/2020.acl-main.696",
        pages = "7791--7795",
  • Quasi-Passive Lower and Upper Extremity Robotic Exoskeleton for Strengthening Human Locomotion Aryaman Arora, John R. McIntyre Sustainable Innovation

    Most of the robotic exoskeletons available today are either lower extremity or upper extremity devices targeting individual orthotic (elbow, knee, and ankle) joints. However, there are a few which target both lower and upper extremities. This chapter aims to propose a design for a wearable quasi-passive lower and upper extremity robotic exoskeleton (QLUE-REX) system, targeting disabled users and aged seniors. This exoskeleton system aims to improve mobility, assist walking, improve and enhance muscle strength, and help people with leg/arm disabilities. QLUE-REX combines elbow, knee, and ankle joints with options to synchronize individual joints’ movements to achieve the following: (1) assist in lifting loads of 30–40 kilograms, (2) assist in walking, (3) easy and flexible to wear without any discomfort, and (4) be able to learn and adapt along with storing time-stamped sensor data on its exoskeleton storage media for predicting/correcting users’ movements and share data with health professionals. The research’s main objective is to conceptualize a design for QLUE-REX system. QLUE-REX will be a feasible modular-type wearable system that incorporates orthotic elbow, knee, and ankle joints effectively in either synchronous or asynchronous modes depending on the users’ needs. It will utilize human-walking analysis, data sensing and estimation technology, and measurement of the electromyography signals of user’s muscles, exploiting biomechanical principles of human-machine interface.


Writeups on various things that I want to be public.