phd thesis
Gradient-based Methods for Deep Model Interpretability
Ecole Polytechnique Federale du Lausanne (EPFL), 2021
Recipient of the EPFL thesis distinction award (top 8% thesis) in the EE dept. for 2021
master thesis
Learning Compact Architectures for Deep Neural Networks
Indian Institute of Science (IISc), 2017
For more information see my google scholar page. Representative papers are highlighted below.
preprints
2024
How much can we forget about Data Contamination?
S. Bordt,
S. Srinivas,
V. Boreiko,
U. Luxburg
pdf
· summary
Are LLM benchmarks rendered invalid for any amount of test set contamination in the pre-training data? It turns out not always, because models also naturally forget examples seen during training.
long papers
NeurIPS
2024
CoLM
2024
Certifying LLM Safety against Adversarial Prompting
A. Kumar,
C. Agarwal,
S. Srinivas,
A. Li,
S. Feizi,
H. Lakkaraju
pdf
· code
· summary
We present a simple method to detect LLM adversarial attacks, by systematically deleting tokens until the underlying string is labelled harmful.
UAI
2024
NeurIPS
2023
Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness
S. Srinivas*,
S. Bordt*,
H. Lakkaraju
pdf
· code
· summary
Previous work finds gradients of robust models to be "perceptually aligned". We explain this phenomenon by observing that robust models in practice are not robust in all directions, in fact they are mostly only robust outside the data manifold. This causes their gradients to align with the manifold, causing them to be perceptually aligned.
Spotlight presentation (Top 3%)
NeurIPS
2023
Discriminative Feature Attributions: A Bridge between Post Hoc Explainability and Inherent Interpretability
U. Bhalla*,
S. Srinivas*,
H. Lakkaraju
pdf
· code
· summary
Given a pre-trained model, adapt this model to be robust to the perturbations introduced by feature attribution methods. Doing so results in models that recover ground truth attributions!
UAI
2023
On Minimizing the Impact of Dataset Shifts on Actionable Explanations
A. Meyer*,
D. Ley*,
S. Srinivas,
H. Lakkaraju
pdf
· summary
How to train classifiers such that they are unaffected by small shifts in the dataset? We show theoretically and experimentally that weight decay, model curvature and robustness are all important factors that can help minimize the impact of such dataset shifts.
Oral presentation (Top 5%)
NeurIPS
2022
Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post hoc Explanations
T. Han,
S. Srinivas,
H. Lakkaraju
pdf
· code
· workshop
· summary
Several popular post-hoc explanations such as LIME, SHAP, and gradient based explanations can be viewed as performing local function approximation (LFA). Thinking of LFA as a framework for explanations enables us to make useful statements about explanations such as a no-free lunch theorem, and identify which explanations to use.
Best paper award at ICML "Interpretable ML for Healthcare" workshop, 2022
NeurIPS
2022
Efficiently Training Low-Curvature Neural Networks
S. Srinivas*,
K. Matoba*,
H. Lakkaraju,
F. Fleuret
pdf
· slides
· poster
· code
· summary
We train low-curvature neural networks, that are "as linear as possible" by (1) replacing ReLU with a variant of softplus, (2) spectral normalization of linear layers, (3) (optionally) using gradient-norm regularization; and minimizing the curvatures and spectral norms of each layer independently. This approach rivals adversarial training without training with adversarial examples.
NeurIPS
2022
Data-Efficient Structured Pruning via Submodular Optimization
M. El-Halabi,
S. Srinivas,
S. Lacoste-Julien
pdf
· summary
Pruning neurons in neural networks can be cast as a submodular optimization problem, enabling proposal of principled algorithms with rigorous theoretical guarantees that perform well when pruning with small number of data points.
ICLR
2021
Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability
S. Srinivas,
F. Fleuret
pdf
· slides
· poster
· code
· workshop
· summary
Commonly used input-gradient saliency maps for explaining discriminative neural nets capture information about an implicit density model, rather than that of the underlying discriminative model which it is intended to explain.
Oral presentation (Top 1%)
NeurIPS
2019
Full-Gradient Representation for Neural Network Visualization
S. Srinivas,
F. Fleuret
pdf
· poster
· code
· summary
Compute saliency information from all intermediate layers in neural networks, rather than just from the input, as is done commonly. This provably captures two desirable properties (sensitivity and completeness) which typical saliency maps cannot capture.
190+ Github stars · 200+ citations
ICML
2018
Knowledge Transfer with Jacobian Matching
S. Srinivas,
F. Fleuret
pdf
· slides
· poster
· workshop
· summary
Perform sample-efficient distillation by requiring that the student model mimic the input-gradients of the teacher model. This is equivalent (in expectation) to performing classical distillation with data augmentation via additive input noise.
Best paper award at NeurIPS "Learning with Limited Data" workshop, 2017
BMVC
2016
Learning Neural Network Architectures using Backpropagation
S. Srinivas,
R.V. Babu
pdf
· poster
· summary
Automatically prune unimportant neurons during neural network training, by introducing multiplicative binary gating variables with each neuron, and encouraging the gate variables to be as sparse as possible via regularization.
Frontiers in Robotics and AI
2015
A Taxonomy of Deep Convolutional Neural Nets for Computer Vision
S. Srinivas,
R. Sarvadevabhatla,
K.R. Mopuri,
N. Prabhu,
S.S. Kruthiventi,
R.V. Babu
pdf
· summary
A recipe-style survey of pre-2015 deep neural networks as applied to computer vision.
300+ citations · Top 25% of all research outputs scored on Altmetric
BMVC
2015
short conference / workshop papers
ICML Workshops
2024
All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models
C. Badrinath,
U. Bhalla,
A. Oesterling,
S. Srinivas,
H. Lakkaraju
pdf
· summary
We find that many generative image models recover approximately similar representations.
Published at the Workshop on Geometry-grounded Representation Learning and Generative Modeling (GRaM)
ICML Workshops
2023
Word-Level Explanations for Analyzing Bias in Text-to-Image Models
A. Lin,
L.M. Paes,
S.H. Tanneru,
S. Srinivas,
H. Lakkaraju
pdf
· summary
For text to image models, we find which input words contribute to bias in output images. For example, we find that the word "doctor" in the input leads to an over-representation of males in the output.
Published at the Workshop on Challenges in Deploying Generative AI
ICML Workshops
2023
Consistent Explanations in the Face of Model Indeterminacy via Ensembling
D. Ley,
L. Tang,
M. Nazari,
H. Lin,
S. Srinivas,
H. Lakkaraju
pdf
· summary
With model ensembles, feature attributions are fairly consistent. We find strategies that lead to efficient construction of such ensembles.
Published at the Workshop on Interpretable Machine Learning for Healthcare (IMLH)
CVPR Workshops
2022
Cyclical Pruning for Sparse Neural Networks
S. Srinivas,
A. Kuzmin,
M. Nagel,
M. van Baalen,
A. Skliar,
T. Blankevoort
pdf
· slides
· summary
Algorithms for training sparse neural networks should be more like projected gradient descent / iterative hard thresholding, which alternates between sparsification (i.e., projection step) and densification (i.e., gradient step), as opposed to common pruning approaches which do not perform densification.
Oral presentation at the Workshop on Efficient Computer Vision for Deep Learning (ECV)
SPCOM
2018
Estimating Confidence for Deep Neural Networks through Density modelling
A. Subramanya,
S. Srinivas,
R.V. Babu
pdf
· slides
· summary
Model the density of intermediate features in a neural network using a high-dimensional Gaussian distribution. If features for a test point fall outside the "typical set" for such a Gaussian, then declare that test point to be out-of-distribution.
CVPR Workshops
2017
Training Sparse Neural Networks
S. Srinivas,
A. Subramanya,
R.V. Babu
pdf
· slides
· summary
Encourage weight sparsity in neural networks by introducing multiplicative binary gating variables along with each weight, and regularizing gates to be sparse.
200+ citations · Oral presentation at Embedded Vision Workshop
Tech Report
2016
Generalized Dropout
S. Srinivas,
R.V. Babu
pdf
· summary
A generalized version of dropout where dropout probabilities are automatically tuned during training. This is done by introducing multiplicative bernoulli gating variables to each neuron within a neural network, and modelling the bernoulli probability by penalizing from a beta distribution.
ICVGIP
2016
Compensating for Large In-plane Rotations in Natural Images
L. Boominathan,
S. Srinivas,
R.V. Babu
pdf
· poster
· summary
Correct for large in-plane rotation in images by (1) detecting the presence of rotation using a CNN, and (2) correcting it iteratively using Bayesian optimization.
ICIP
2014
Controlled blurring for improving image reconstruction quality in flutter-shutter acquisition
S. Srinivas,
A. Adiga,
C.S. Seelamantula
pdf
· summary
Deliberately shaking a camera's sensor in a 2D plane during acquisition results in a well-defined blur kernel that can be used to deblur even in the presence of external camera shake.