This project presents a comparative study of image-based explainability method Grad-CAM and SHAP (Deep SHAP) applied to a DenseNet121 model trained for pneumonia detection from chest X-rays. The focus is not only on predictive performance, but also on interpretability, stability, and clinical usefulness of the explanations generated by each method.
Deep learning models achieve strong performance in medical imaging tasks, yet their decisions often remain opaque. This work investigates a central trade-off in explainable AI:
Should we prioritize visually intuitive explanations (Grad-CAM) or mathematically complete but complex explanations (SHAP)?
Understanding this balance is crucial for clinical trust, accountability, and safe deployment of AI systems in healthcare.
Grad-CAM is a CNN-specific visual explanation technique that leverages gradients of a target class with respect to feature maps in a convolutional layer. These gradients are pooled to produce importance weights, which are then combined with the feature maps and passed through a ReLU activation.
Key characteristics
- Model-specific (CNNs only)
- Produces smooth, class-discriminative heatmaps
- Highlights only positive evidence supporting a prediction
Advantages
- Highly interpretable
- Aligns well with radiological reasoning
Limitations
- Ignores negative evidence
- Not model-agnostic
Deep SHAP approximates Shapley values for deep neural networks by combining concepts from SHAP and DeepLIFT. It measures how each pixel contributes to the prediction by comparing the input image to a baseline (e.g., a black image).
Key characteristics
- Provides per-pixel attributions
- Includes both positive and negative contributions
- Theoretically grounded and faithful to the model
Advantages
- Strong explanation completeness
- Suitable for auditing and accountability
Limitations
- Computationally expensive
- Produces noisy and fragmented visualizations
Model: DenseNet121
Task: Binary classification (Pneumonia vs Normal)
Test Set Size: 624 chest X-ray images
| Class | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| Pneumonia | 0.98 | 0.86 | 0.91 | 234 |
| Normal | 0.92 | 0.99 | 0.95 | 390 |
| Accuracy | 0.94 | 624 |
The model shows strong overall performance, particularly in identifying healthy cases. Pneumonia recall remains the most critical metric, as false negatives may delay treatment.
To compare Grad-CAM and SHAP, we evaluated explanations using:
- Sparsity (Top 90% Attribution Mass)
- Average Output Perturbation Coherence (AOPC)
- Prediction Probability Degradation
| Metric | SHAP | Grad-CAM |
|---|---|---|
| Sparsity (Top 90%) | 5.35 × 10⁻⁶ | 5.35 × 10⁻⁶ |
| AOPC (Mean Drop) | 0.0151 | 0.0221 |
| Original True Probability | 0.4013 | 0.4013 |
Both SHAP and Grad-CAM concentrated over 90% of their attribution mass on less than 0.0005% of the image pixels. This indicates that the model relies on highly localized and informative lung regions for diagnosis. Notably, both methods converged on the same spatial regions.
SHAP achieved a slightly lower AOPC score, indicating more stable explanations under pixel removal. Grad-CAM showed a smoother but faster confidence degradation as important regions were removed.
- Grad-CAM produces a smooth, monotonic decrease in confidence.
- SHAP exhibits non-monotonic drops due to the presence of negative attributions that may temporarily increase confidence when removed.
- Grad-CAM: Smooth, localized heatmaps highlighting diagnostically relevant regions.
- SHAP: Pixel-level attribution maps showing both positive (supporting) and negative (opposing) evidence.
From a clinical perspective, Grad-CAM mirrors how radiologists identify focal abnormalities, whereas SHAP provides deeper insight into model reasoning at the cost of interpretability.
| Modality | Preferred Method | Observation |
|---|---|---|
| Tabular | SHAP | Captures complex feature interactions |
| Text | None (SHAP vs IG) | Low agreement and unstable explanations |
| Image | Context-dependent | SHAP for fidelity, Grad-CAM for usability |
Explainability methods often agree on where to look but disagree on what matters within that region.
In medical imaging, interpretability and clarity may outweigh full mathematical completeness, making Grad-CAM particularly suitable for clinical workflows, while SHAP remains valuable for auditing and forensic analysis.
- Model training and evaluation notebooks
- Grad-CAM and SHAP implementations
- Explainability metric calculations
- Visualizations and plots
- Research-oriented analysis
This project is intended for research and educational purposes only and should not be used as a standalone medical diagnostic system.