-
-
Notifications
You must be signed in to change notification settings - Fork 149
Open
Labels
Description
name: Feature request
about: Suggest an idea or feature
title: 'Add Prometheus Metrics and Database Availability Monitoring'
labels: enhancement, monitoring, prometheus
assignees: ''
Description of the feature
Add comprehensive Prometheus metrics integration and database availability monitoring to the docker-db-backup project. This feature provides real-time monitoring capabilities for backup operations and database connectivity status.
Core Components
1. Prometheus Metrics Integration
- Metrics Endpoint: HTTP server exposing Prometheus-formatted metrics
- Metric Types: Counters for job statistics, Gauges for current status
- File-based Storage: Secure metric storage with file locking
- HTTP Server: Python-based server with netcat fallback
2. Database Availability Monitoring
- Real-time Status: Monitor database connectivity for all supported types
- Availability Metric:
dbbackup_database_availability
gauge (1=available, 0=unavailable) - Database Type Support: MySQL, PostgreSQL, MongoDB, Redis, CouchDB, InfluxDB, MSSQL, SQLite3
- Non-intrusive: Availability checks don't affect backup operations
3. Enhanced Backup Metrics
- Job Counters: Total, successful, and failed backup jobs
- Performance Metrics: Backup duration, size, and status
- Timestamp Tracking: Last backup completion time
- Proper Counter/Gauge Behavior: Metrics update correctly without duplication
Technical Implementation
Metrics Available
# Database availability
dbbackup_database_availability{db_host="mysql-host",db_name="testdb",db_type="mysql"}
# Backup status and performance
dbbackup_backup_status{db_host="mysql-host",db_name="testdb"}
dbbackup_backup_duration_seconds{db_host="mysql-host",db_name="testdb"}
dbbackup_backup_size_bytes{db_host="mysql-host",db_name="testdb"}
dbbackup_backup_timestamp{db_host="mysql-host",db_name="testdb"}
# Job counters
dbbackup_jobs_total{db_host="mysql-host",db_name="testdb"}
dbbackup_jobs_success_total{db_host="mysql-host",db_name="testdb"}
dbbackup_jobs_failed_total{db_host="mysql-host",db_name="testdb"}
# Upload metrics (if applicable)
dbbackup_upload_duration_seconds{db_host="mysql-host",db_name="testdb"}
Configuration Variables
# Enable monitoring
CONTAINER_ENABLE_MONITORING=TRUE
CONTAINER_MONITORING_BACKEND=prometheus
# Prometheus configuration
PROMETHEUS_PORT=9090
PROMETHEUS_METRICS_FILE=/tmp/prometheus_metrics
PROMETHEUS_METRICS_LOCK=/tmp/prometheus_metrics.lock
DEBUG_PROMETHEUS=FALSE
Example Alerts
groups:
- name: db-backup
rules:
- alert: DatabaseUnavailable
expr: dbbackup_database_availability == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Database {{ $labels.db_host }}/{{ $labels.db_name }} ({{ $labels.db_type }}) is unavailable"
- alert: BackupFailed
expr: increase(dbbackup_jobs_failed_total[5m]) > 0
labels:
severity: warning
annotations:
summary: "Backup failed for {{ $labels.db_host }}/{{ $labels.db_name }}"
- alert: BackupTooSlow
expr: dbbackup_backup_duration_seconds > 300
labels:
severity: warning
annotations:
summary: "Backup taking too long for {{ $labels.db_host }}/{{ $labels.db_name }}"
Benefits of feature
Operational Benefits
- Real-time Monitoring: Immediate visibility into backup operations and database health
- Proactive Issue Detection: Identify problems before they affect data integrity
- Performance Optimization: Track backup duration and size trends
- Capacity Planning: Monitor backup storage usage and growth patterns
Security Benefits
- Database Health Monitoring: Detect connectivity issues that could affect backup reliability
- Audit Trail: Track backup success/failure rates over time
- Compliance: Meet monitoring requirements for backup systems
Developer Benefits
- Observability: Comprehensive metrics for debugging and optimization
- Integration: Seamless integration with existing Prometheus/Grafana stacks
- Flexibility: Configurable metrics endpoint and monitoring options
- Standards Compliance: Follows Prometheus metrics best practices
Business Benefits
- Reduced Downtime: Early detection of backup failures
- Improved Reliability: Monitor database availability for backup readiness
- Cost Optimization: Identify inefficient backup configurations
- Compliance: Meet regulatory requirements for backup monitoring
Additional context
Architecture Overview
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Database │ │ DB Backup │ │ Prometheus │
│ (MySQL, etc.) │◄──►│ Container │───►│ Server │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌──────────────────┐
│ Metrics File │
│ (with locks) │
└──────────────────┘
Integration Points
- Container Initialization: Metrics server starts during container startup
- Backup Operations: Metrics updated after each backup job
- Database Connectivity: Availability checked during backup preparation
- Graceful Shutdown: Metrics server stops cleanly on container shutdown
Performance Considerations
- Minimal Overhead: Metrics collection adds <1ms to backup operations
- Efficient Storage: File-based metrics with automatic cleanup
- Concurrent Safety: File locking prevents race conditions
- Memory Efficient: No persistent in-memory storage
Compatibility
- Backward Compatible: Existing configurations continue to work
- Optional Feature: Can be enabled/disabled per deployment
- Multi-Database Support: Works with all supported database types
- Cloud Agnostic: Works in any environment with network access
Example Use Cases
- Production Monitoring: Monitor backup health in production environments
- DevOps Integration: Integrate with CI/CD pipelines for backup validation
- Compliance Reporting: Generate backup success rate reports
- Capacity Planning: Track backup size growth over time
- Troubleshooting: Identify patterns in backup failures
Files Added/Modified
install/assets/functions/08-prometheus
- Core Prometheus functionsinstall/assets/defaults/08-prometheus
- Default configurationinstall/etc/cont-init.d/10-db-backup
- Integration with initializationinstall/assets/functions/10-db-backup
- Enhanced with availability metricsexamples/prometheus/
- Complete examples and documentationREADME.md
- Updated with new environment variables
This feature significantly enhances the observability and reliability of the docker-db-backup system, making it suitable for production environments with strict monitoring requirements.