SHAP vs Grad-CAM: Image Explainability in Medical AI

Overview

This project presents a comparative study of image-based explainability method Grad-CAM and SHAP (Deep SHAP) applied to a DenseNet121 model trained for pneumonia detection from chest X-rays. The focus is not only on predictive performance, but also on interpretability, stability, and clinical usefulness of the explanations generated by each method.

Motivation

Deep learning models achieve strong performance in medical imaging tasks, yet their decisions often remain opaque. This work investigates a central trade-off in explainable AI:

Should we prioritize visually intuitive explanations (Grad-CAM) or mathematically complete but complex explanations (SHAP)?

Understanding this balance is crucial for clinical trust, accountability, and safe deployment of AI systems in healthcare.

Explainability Methods

Grad-CAM (Gradient-weighted Class Activation Mapping)

Grad-CAM is a CNN-specific visual explanation technique that leverages gradients of a target class with respect to feature maps in a convolutional layer. These gradients are pooled to produce importance weights, which are then combined with the feature maps and passed through a ReLU activation.

Key characteristics

Model-specific (CNNs only)
Produces smooth, class-discriminative heatmaps
Highlights only positive evidence supporting a prediction

Advantages

Highly interpretable
Aligns well with radiological reasoning

Limitations

Ignores negative evidence
Not model-agnostic

SHAP for Images (Deep SHAP)

Deep SHAP approximates Shapley values for deep neural networks by combining concepts from SHAP and DeepLIFT. It measures how each pixel contributes to the prediction by comparing the input image to a baseline (e.g., a black image).

Key characteristics

Provides per-pixel attributions
Includes both positive and negative contributions
Theoretically grounded and faithful to the model

Advantages

Strong explanation completeness
Suitable for auditing and accountability

Limitations

Computationally expensive
Produces noisy and fragmented visualizations

Model Performance

Model: DenseNet121
Task: Binary classification (Pneumonia vs Normal)
Test Set Size: 624 chest X-ray images

Class	Precision	Recall	F1-score	Support
Pneumonia	0.98	0.86	0.91	234
Normal	0.92	0.99	0.95	390
Accuracy			0.94	624

The model shows strong overall performance, particularly in identifying healthy cases. Pneumonia recall remains the most critical metric, as false negatives may delay treatment.

Explainability Evaluation Metrics

To compare Grad-CAM and SHAP, we evaluated explanations using:

Sparsity (Top 90% Attribution Mass)
Average Output Perturbation Coherence (AOPC)
Prediction Probability Degradation

Quantitative Comparison

Metric	SHAP	Grad-CAM
Sparsity (Top 90%)	5.35 × 10⁻⁶	5.35 × 10⁻⁶
AOPC (Mean Drop)	0.0151	0.0221
Original True Probability	0.4013	0.4013

Key Findings

Sparsity Analysis

Both SHAP and Grad-CAM concentrated over 90% of their attribution mass on less than 0.0005% of the image pixels. This indicates that the model relies on highly localized and informative lung regions for diagnosis. Notably, both methods converged on the same spatial regions.

Stability and AOPC

SHAP achieved a slightly lower AOPC score, indicating more stable explanations under pixel removal. Grad-CAM showed a smoother but faster confidence degradation as important regions were removed.

Probability Degradation Behavior

Grad-CAM produces a smooth, monotonic decrease in confidence.
SHAP exhibits non-monotonic drops due to the presence of negative attributions that may temporarily increase confidence when removed.

Visual Comparison

Grad-CAM: Smooth, localized heatmaps highlighting diagnostically relevant regions.
SHAP: Pixel-level attribution maps showing both positive (supporting) and negative (opposing) evidence.

From a clinical perspective, Grad-CAM mirrors how radiologists identify focal abnormalities, whereas SHAP provides deeper insight into model reasoning at the cost of interpretability.

Cross-Modality Insights

Modality	Preferred Method	Observation
Tabular	SHAP	Captures complex feature interactions
Text	None (SHAP vs IG)	Low agreement and unstable explanations
Image	Context-dependent	SHAP for fidelity, Grad-CAM for usability

Key Takeaway

Explainability methods often agree on where to look but disagree on what matters within that region.

In medical imaging, interpretability and clarity may outweigh full mathematical completeness, making Grad-CAM particularly suitable for clinical workflows, while SHAP remains valuable for auditing and forensic analysis.

Repository Structure

Model training and evaluation notebooks
Grad-CAM and SHAP implementations
Explainability metric calculations
Visualizations and plots
Research-oriented analysis

Disclaimer

This project is intended for research and educational purposes only and should not be used as a standalone medical diagnostic system.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
finding		finding
gradcam		gradcam
kaggle_models		kaggle_models
models		models
papers		papers
prective		prective
shap		shap
train		train
.gitattributes		.gitattributes
Pneumonia_Xray_Classification_Colab.ipynb		Pneumonia_Xray_Classification_Colab.ipynb
README.md		README.md
explainability_metrics_summary.csv		explainability_metrics_summary.csv
index.html		index.html
metrics.ipynb		metrics.ipynb
pneumonia.ipynb		pneumonia.ipynb
report.ipynb		report.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SHAP vs Grad-CAM: Image Explainability in Medical AI

Overview

Motivation

Explainability Methods

Grad-CAM (Gradient-weighted Class Activation Mapping)

SHAP for Images (Deep SHAP)

Model Performance

Explainability Evaluation Metrics

Quantitative Comparison

Key Findings

Sparsity Analysis

Stability and AOPC

Probability Degradation Behavior

Visual Comparison

Cross-Modality Insights

Key Takeaway

Repository Structure

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SHAP vs Grad-CAM: Image Explainability in Medical AI

Overview

Motivation

Explainability Methods

Grad-CAM (Gradient-weighted Class Activation Mapping)

SHAP for Images (Deep SHAP)

Model Performance

Explainability Evaluation Metrics

Quantitative Comparison

Key Findings

Sparsity Analysis

Stability and AOPC

Probability Degradation Behavior

Visual Comparison

Cross-Modality Insights

Key Takeaway

Repository Structure

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages