I am a research scientist at Bosch Research (Sunnyvale, USA), where I work on computer vision problems for autonomous driving. My research involves developing interpretability tools that enable model understanding, debugging, and model editing.

Some representative methods (click to expand):
  • splice: A dictionary learning-like method to interpret CLIP models
  • discriminative feature attributions: A method to build discriminative models such that their saliency maps are faithful by design
  • fullgrad saliency: Layer-wise saliency maps for ReLU neural nets with cool mathematical properties (aka completeness)


I am also interested in the “science” of deep learning, i.e., systematic investigations of deep learning phenomena. For example, studying forgetting dynamics in LLM training, or explaining observed links between robustness and gradient interpretability. For more information, please see my research themes and publications.

I was previously a postdoctoral research fellow with Hima Lakkaraju at Harvard University. I completed my PhD with François Fleuret, at Idiap Research Institute & EPFL, Switzerland.

I am/was an organizer on (click to expand):


Note: If you are looking for mentorship / research collaborations on interpretability, feel free to reach out!