For more information, please see my google scholar page.

conference papers

ICML
2026

Position: Explainability Research Must Prioritize Foundations over Ad-hoc Methods
M. Moshkovitz*, S. Srinivas*, L. Semenova*, N. Frost, C. Rashtchian, V. Boreiko, S. Zhang, H. Lakkaraju, C. Rudin, J.W. Vaughan
pdf · summary
interpretability Despite the proliferation of XAI techniques, explanations rarely influence real-world workflows. To address this, this paper argues the ML community must pivot from developing ad-hoc XAI methods toward addressing foundational challenges: unclear problem formulations, underspecified evaluation objectives, and the absence of pipelines for explanation-driven feedback.

ACL
2026

Evaluating Adversarial Robustness of Concept Representations in Sparse Autoencoders
A.J. Li, S. Srinivas, U. Bhalla, H. Lakkaraju
pdf · summary
interpretability Sparse autoencoder (SAEs) concepts lack robustness, minimal input perturbations can substantially distort SAE concept representations without affecting the underlying model's behavior, raising concerns about their reliability.

ICML
2025

How much can we forget about Data Contamination?
S. Bordt, S. Srinivas, V. Boreiko, U. Luxburg
pdf · summary
data-centric AI Are LLM benchmarks rendered invalid by accidental data contamination? It turns out not always, because models also naturally forget examples seen during training.

NeurIPS
2024

Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
U. Bhalla*, A. Oesterling*, S. Srinivas, F. Calmon, H. Lakkaraju
pdf · code · summary
interpretability We convert dense uninterpretable CLIP embeddings to overcomplete sparse interpretable ones, with a minimal loss in fidelity.

CoLM
2024

Certifying LLM Safety against Adversarial Prompting
A. Kumar, C. Agarwal, S. Srinivas, A. Li, S. Feizi, H. Lakkaraju
pdf · code · summary
robustness We present a simple method to detect LLM adversarial attacks, by systematically deleting tokens until the underlying string is labelled harmful.

UAI
2024

Characterizing Data Point Vulnerability as Average-Case Robustness
T. Han*, S. Srinivas*, H. Lakkaraju
pdf · summary
robustness We consider a relaxation of adversarial robustness, i.e., average-case robustness, and provide efficient estimators to compute this quantity.

NeurIPS
2023

Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness
S. Srinivas*, S. Bordt*, H. Lakkaraju
pdf · code · summary
robustness Previous work finds gradients of robust models to be "perceptually aligned". We explain this phenomenon by observing that robust models in practice are not robust in all directions, in fact they are mostly only robust outside the data manifold. This causes their gradients to align with the manifold, causing them to be perceptually aligned.
Spotlight presentation (Top 3%)

NeurIPS
2023

Discriminative Feature Attributions: A Bridge between Post Hoc Explainability and Inherent Interpretability
U. Bhalla*, S. Srinivas*, H. Lakkaraju
pdf · code · summary
interpretability Given a pre-trained model, adapt this model to be robust to the perturbations introduced by feature attribution methods. Doing so results in models that recover ground truth attributions!

UAI
2023

On Minimizing the Impact of Dataset Shifts on Actionable Explanations
A. Meyer*, D. Ley*, S. Srinivas, H. Lakkaraju
pdf · summary
data-centric AI How to train classifiers such that they are unaffected by small shifts in the dataset? We show theoretically and experimentally that weight decay, model curvature and robustness are all important factors that can help minimize the impact of such dataset shifts.
Oral presentation (Top 5%)

NeurIPS
2022

Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post hoc Explanations
T. Han, S. Srinivas, H. Lakkaraju
pdf · code · workshop · summary
interpretability Several popular post-hoc explanations such as LIME, SHAP, and gradient based explanations can be viewed as performing local function approximation (LFA). Thinking of LFA as a framework for explanations enables us to make useful statements about explanations such as a no-free lunch theorem, and identify which explanations to use.
Best paper award at ICML "Interpretable ML for Healthcare" workshop, 2022

NeurIPS
2022

Efficiently Training Low-Curvature Neural Networks
S. Srinivas*, K. Matoba*, H. Lakkaraju, F. Fleuret
pdf · slides · poster · code · summary
robustness We train low-curvature neural networks, that are "as linear as possible" by (1) replacing ReLU with a variant of softplus, (2) spectral normalization of linear layers, (3) (optionally) using gradient-norm regularization; and minimizing the curvatures and spectral norms of each layer independently. This approach rivals adversarial training without training with adversarial examples.

NeurIPS
2022

Data-Efficient Structured Pruning via Submodular Optimization
M. El-Halabi, S. Srinivas, S. Lacoste-Julien
pdf · summary
efficiency Pruning neurons in neural networks can be cast as a submodular optimization problem, enabling proposal of principled algorithms with rigorous theoretical guarantees that perform well when pruning with small number of data points.

ICLR
2021

Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability
S. Srinivas, F. Fleuret
pdf · slides · poster · code · workshop · summary
interpretability Commonly used input-gradient saliency maps for explaining discriminative neural nets capture information about an implicit density model, rather than that of the underlying discriminative model which it is intended to explain.
Oral presentation (Top 1%)

NeurIPS
2019

Full-Gradient Representation for Neural Network Visualization
S. Srinivas, F. Fleuret
pdf · poster · code · summary
interpretability Compute saliency information from all intermediate layers in neural networks, rather than just from the input, as is done commonly. This provably captures two desirable properties (sensitivity and completeness) which typical saliency maps cannot capture.

ICML
2018

Knowledge Transfer with Jacobian Matching
S. Srinivas, F. Fleuret
pdf · slides · poster · workshop · summary
efficiency Perform sample-efficient distillation by requiring that the student model mimic the input-gradients of the teacher model. This is equivalent (in expectation) to performing classical distillation with data augmentation via additive input noise.
Best paper award at NeurIPS "Learning with Limited Data" workshop, 2017

BMVC
2016

Learning Neural Network Architectures using Backpropagation
S. Srinivas, R.V. Babu
pdf · poster · summary
efficiency Automatically prune unimportant neurons during neural network training, by introducing multiplicative binary gating variables with each neuron, and encouraging the gate variables to be as sparse as possible via regularization.

Frontiers in Robotics and AI
2015

A Taxonomy of Deep Convolutional Neural Nets for Computer Vision
S. Srinivas, R. Sarvadevabhatla, K.R. Mopuri, N. Prabhu, S.S. Kruthiventi, R.V. Babu
pdf · summary
misc A recipe-style survey of pre-2015 deep neural networks as applied to computer vision.
Top 25% of all research outputs scored on Altmetric

BMVC
2015

Data-free Parameter Pruning for Deep Neural Networks
S. Srinivas, R.V. Babu
pdf · poster · summary
efficiency Prune neurons in neural networks by (1) identifying duplicate neuron pairs, (2) removing one and performing a `surgery` step to compensate for removal.

workshop papers / tech reports

Tech Report
2024

Towards Unifying Interpretability and Control: Evaluation via Intervention
U. Bhalla, S. Srinivas, A. Ghandeharioun, H. Lakkaraju
pdf · summary
interpretability Popular mech interp methods such as sparse autoencoders underperform simpler alternatives such as prompting and logitlens on the task of controlling model outputs, raising questions on the faithfulness of such methods.

Tech Report
2024

Generalized Group Data Attribution
D. Ley, S. Srinivas, S. Zhang, G. Rusak, H. Lakkaraju
pdf · summary
data-centric AI Data attribution methods, such as influence functions, can be made drastically more efficient (10-50x) by attributing to groups rather than individual data points. This can be used for fast dataset pruning and noisy label identification.

ICML Workshops
2024

All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models
C. Badrinath, U. Bhalla, A. Oesterling, S. Srinivas, H. Lakkaraju
pdf · summary
interpretability We find that many generative image models recover approximately similar representations.
Published at the Workshop on Geometry-grounded Representation Learning and Generative Modeling (GRaM)

ICML Workshops
2023

Word-Level Explanations for Analyzing Bias in Text-to-Image Models
A. Lin, L.M. Paes, S.H. Tanneru, S. Srinivas, H. Lakkaraju
pdf · summary
interpretability For text to image models, we find which input words contribute to bias in output images. For example, we find that the word "doctor" in the input leads to an over-representation of males in the output.
Published at the Workshop on Challenges in Deploying Generative AI

ICML Workshops
2023

Consistent Explanations in the Face of Model Indeterminacy via Ensembling
D. Ley, L. Tang, M. Nazari, H. Lin, S. Srinivas, H. Lakkaraju
pdf · summary
interpretability With model ensembles, feature attributions are fairly consistent. We find strategies that lead to efficient construction of such ensembles.
Published at the Workshop on Interpretable Machine Learning for Healthcare (IMLH)

CVPR Workshops
2022

Cyclical Pruning for Sparse Neural Networks
S. Srinivas, A. Kuzmin, M. Nagel, M. van Baalen, A. Skliar, T. Blankevoort
pdf · slides · summary
efficiency Algorithms for training sparse neural networks should be more like projected gradient descent / iterative hard thresholding, which alternates between sparsification (i.e., projection step) and densification (i.e., gradient step), as opposed to common pruning approaches which do not perform densification.
Oral presentation at the Workshop on Efficient Computer Vision for Deep Learning (ECV)

SPCOM
2018

Estimating Confidence for Deep Neural Networks through Density modelling
A. Subramanya, S. Srinivas, R.V. Babu
pdf · slides · summary
robustness Model the density of intermediate features in a neural network using a high-dimensional Gaussian distribution. If features for a test point fall outside the "typical set" for such a Gaussian, then declare that test point to be out-of-distribution.

CVPR Workshops
2017

Training Sparse Neural Networks
S. Srinivas, A. Subramanya, R.V. Babu
pdf · slides · summary
efficiency Encourage weight sparsity in neural networks by introducing multiplicative binary gating variables along with each weight, and regularizing gates to be sparse.
Oral presentation at Embedded Vision Workshop

Tech Report
2016

Generalized Dropout
S. Srinivas, R.V. Babu
pdf · summary
efficiency A generalized version of dropout where dropout probabilities are automatically tuned during training. This is done by introducing multiplicative bernoulli gating variables to each neuron within a neural network, and modelling the bernoulli probability by penalizing from a beta distribution.

ICVGIP
2016

Compensating for Large In-plane Rotations in Natural Images
L. Boominathan, S. Srinivas, R.V. Babu
pdf · poster · summary
misc Correct for large in-plane rotation in images by (1) detecting the presence of rotation using a CNN, and (2) correcting it iteratively using Bayesian optimization.

phd thesis

EPFL
2021

Gradient-based Methods for Deep Model Interpretability
pdf
EPFL thesis distinction award (top 8%, EE dept.)