I am a research scientist at Bosch Research (Sunnyvale, USA), where I work on computer vision problems for autonomous driving. My research involves developing interpretability tools that enable model understanding, debugging, and model editing.
Some representative methods (click to expand):
- splice: A dictionary learning-like method to interpret CLIP models
- discriminative feature attributions: A method to build discriminative models such that their saliency maps are faithful by design
- fullgrad saliency: Layer-wise saliency maps for ReLU neural nets with cool mathematical properties (aka completeness)
I am also interested in the “science” of deep learning, i.e., systematic investigations of deep learning phenomena. For example, studying forgetting dynamics in LLM training, or explaining observed links between robustness and gradient interpretability. For more information, please see my research themes and publications.
I was previously a postdoctoral research fellow with Hima Lakkaraju at Harvard University. I completed my PhD with François Fleuret, at Idiap Research Institute & EPFL, Switzerland.
I am/was an organizer on (click to expand):
- the theory of interpretable AI online seminar series
- xai in action: past, present and future workshop at NeurIPS 2023
- interpretable ai: past, present and future workshop at NeurIPS 2024
- interpretable ml course at Harvard, spring 2023
Note: If you are looking for mentorship / research collaborations on interpretability, feel free to reach out!