Skip to content

swaekaa/SHAP_VS_GradCam

Repository files navigation

SHAP vs Grad-CAM: Image Explainability in Medical AI

Overview

This project presents a comparative study of image-based explainability method Grad-CAM and SHAP (Deep SHAP) applied to a DenseNet121 model trained for pneumonia detection from chest X-rays. The focus is not only on predictive performance, but also on interpretability, stability, and clinical usefulness of the explanations generated by each method.


Motivation

Deep learning models achieve strong performance in medical imaging tasks, yet their decisions often remain opaque. This work investigates a central trade-off in explainable AI:

Should we prioritize visually intuitive explanations (Grad-CAM) or mathematically complete but complex explanations (SHAP)?

Understanding this balance is crucial for clinical trust, accountability, and safe deployment of AI systems in healthcare.


Explainability Methods

Grad-CAM (Gradient-weighted Class Activation Mapping)

Grad-CAM is a CNN-specific visual explanation technique that leverages gradients of a target class with respect to feature maps in a convolutional layer. These gradients are pooled to produce importance weights, which are then combined with the feature maps and passed through a ReLU activation.

Key characteristics

  • Model-specific (CNNs only)
  • Produces smooth, class-discriminative heatmaps
  • Highlights only positive evidence supporting a prediction

Advantages

  • Highly interpretable
  • Aligns well with radiological reasoning

Limitations

  • Ignores negative evidence
  • Not model-agnostic

SHAP for Images (Deep SHAP)

Deep SHAP approximates Shapley values for deep neural networks by combining concepts from SHAP and DeepLIFT. It measures how each pixel contributes to the prediction by comparing the input image to a baseline (e.g., a black image).

Key characteristics

  • Provides per-pixel attributions
  • Includes both positive and negative contributions
  • Theoretically grounded and faithful to the model

Advantages

  • Strong explanation completeness
  • Suitable for auditing and accountability

Limitations

  • Computationally expensive
  • Produces noisy and fragmented visualizations

Model Performance

Model: DenseNet121
Task: Binary classification (Pneumonia vs Normal)
Test Set Size: 624 chest X-ray images

Class Precision Recall F1-score Support
Pneumonia 0.98 0.86 0.91 234
Normal 0.92 0.99 0.95 390
Accuracy 0.94 624

The model shows strong overall performance, particularly in identifying healthy cases. Pneumonia recall remains the most critical metric, as false negatives may delay treatment.


Explainability Evaluation Metrics

To compare Grad-CAM and SHAP, we evaluated explanations using:

  • Sparsity (Top 90% Attribution Mass)
  • Average Output Perturbation Coherence (AOPC)
  • Prediction Probability Degradation

Quantitative Comparison

Metric SHAP Grad-CAM
Sparsity (Top 90%) 5.35 × 10⁻⁶ 5.35 × 10⁻⁶
AOPC (Mean Drop) 0.0151 0.0221
Original True Probability 0.4013 0.4013

Key Findings

Sparsity Analysis

Both SHAP and Grad-CAM concentrated over 90% of their attribution mass on less than 0.0005% of the image pixels. This indicates that the model relies on highly localized and informative lung regions for diagnosis. Notably, both methods converged on the same spatial regions.


Stability and AOPC

SHAP achieved a slightly lower AOPC score, indicating more stable explanations under pixel removal. Grad-CAM showed a smoother but faster confidence degradation as important regions were removed.


Probability Degradation Behavior

  • Grad-CAM produces a smooth, monotonic decrease in confidence.
  • SHAP exhibits non-monotonic drops due to the presence of negative attributions that may temporarily increase confidence when removed.

Visual Comparison

  • Grad-CAM: Smooth, localized heatmaps highlighting diagnostically relevant regions.
  • SHAP: Pixel-level attribution maps showing both positive (supporting) and negative (opposing) evidence.

From a clinical perspective, Grad-CAM mirrors how radiologists identify focal abnormalities, whereas SHAP provides deeper insight into model reasoning at the cost of interpretability.


Cross-Modality Insights

Modality Preferred Method Observation
Tabular SHAP Captures complex feature interactions
Text None (SHAP vs IG) Low agreement and unstable explanations
Image Context-dependent SHAP for fidelity, Grad-CAM for usability

Key Takeaway

Explainability methods often agree on where to look but disagree on what matters within that region.

In medical imaging, interpretability and clarity may outweigh full mathematical completeness, making Grad-CAM particularly suitable for clinical workflows, while SHAP remains valuable for auditing and forensic analysis.


Repository Structure

  • Model training and evaluation notebooks
  • Grad-CAM and SHAP implementations
  • Explainability metric calculations
  • Visualizations and plots
  • Research-oriented analysis

Disclaimer

This project is intended for research and educational purposes only and should not be used as a standalone medical diagnostic system.

About

This project presents a comparative study of image-based explainability method Grad-CAM and SHAP (Deep SHAP) applied to a DenseNet121 model trained for pneumonia detection from chest X-rays. The focus is not only on predictive performance, but also on interpretability, stability, and clinical usefulness of the explanations generated by each method.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors