I am a postdoctoral research fellow in computer science at Harvard University where I work with Hima Lakkaraju. I completed my PhD from Idiap Research Institute & EPFL, Switzerland, advised by François Fleuret.

I previously completed my Masters (by Research) at the Indian Institute of Science, Bangalore advised by R. Venkatesh Babu. During my PhD, I interned with the model efficiency team at Qualcomm AI Research in Amsterdam.

I am interested in mathematically and scientifically understanding deep learning, usually through the lens of interpretability, robustness and computational efficiency of models. I typically study these in the context of computer vision and more recently, natural language processing.

  • In interpretability, my work has identified flaws with feature attribution methods (aka “heatmap” explanations), proposed unifying mathematical frameworks to better understand them, and explained why they appear human-aligned for robust models. I have also worked on interpreting representations of CLIP models via sparse concept decompositions.

  • In robustness, I am motivated by Occam’s razor, building low curvature models that are “as simple as possible” in a functional sense, and models that are robust in the average-case rather than the “adversarial” worst-case. I have also worked on certified robustness to large language model jailbreaks.

  • In model efficiency, I have worked on building neural nets with sparse weights, usually by pruning redundant weights in dense models. I also developed a method to prune duplicate neurons using minimal training data.

For more details, please see my latest publications.

I am also an organizer on the Theory of Interpretable AI online seminar series, and previously for the XAI in Action workshop at NeurIPS 2023. I was involved also in teaching an interpretable ML course at Harvard during spring 2023.