Aryaman Arora

I am a first-year Ph.D. student at Stanford NLP advised by Dan Jurafsky and Christopher Potts, and currently rotating with Noah D. Goodman. My research is focused on interpretability of neural language models.

Zhengxuan Wu*, Aryaman Arora*, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts. 2024. ReFT: Representation Finetuning for Language Models. arXiv:2404.03592.
Aryaman Arora, Dan Jurafsky, Christopher Potts. 2024. CausalGym: Benchmarking causal interpretability methods on linguistic tasks. arXiv:2402.12560.

2024-04-05 New interp-inspired ultra-efficient finetuning method out: ReFT (repo, tweet).
2024-03-13 We released the paper for pyvene, a new library for intervening on the internal states of neural networks!
2024-02-19 My first lead-author project as a Ph.D. student is out: CausalGym: Benchmarking causal interpretability methods on linguistic tasks.
2023-09-14 Moved to the San Francisco Bay Area 🌉 to start my Ph.D. 🫡
2023-07-31 Back from the Leiden University Summer School in Languages and Linguistics in the Netherlands!
2023-02-08 Accepted to the Ph.D. program at Stanford CS!