Greatest hits
- Zhengxuan Wu*, Aryaman Arora*, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts. 2024. ReFT: Representation Finetuning for Language Models. arXiv:2404.03592.
- Aryaman Arora, Dan Jurafsky, Christopher Potts. 2024. CausalGym: Benchmarking causal interpretability methods on linguistic tasks. arXiv:2402.12560.
News
2024-04-05
New interp-inspired ultra-efficient finetuning method out: ReFT (repo, tweet).2024-03-13
We released the paper for pyvene, a new library for intervening on the internal states of neural networks!2024-02-19
My first lead-author project as a Ph.D. student is out: CausalGym: Benchmarking causal interpretability methods on linguistic tasks.2023-09-14
Moved to the San Francisco Bay Area 🌉 to start my Ph.D. 🫡2023-07-31
Back from the Leiden University Summer School in Languages and Linguistics in the Netherlands!2023-02-08
Accepted to the Ph.D. program at Stanford CS!