|
| 1 | +# Design Document |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The Docker socket permission error (EACCES) occurs because the homelabarr-backend container lacks proper permissions to access the Docker daemon socket at `/var/run/docker.sock`. The current implementation attempts to change socket permissions using `chmodSync()`, but this fails because the container runs as a non-root user (`node`) and cannot modify system-level socket permissions. |
| 6 | + |
| 7 | +The solution involves configuring proper Docker group membership and socket access through container orchestration rather than runtime permission changes. |
| 8 | + |
| 9 | +## Architecture |
| 10 | + |
| 11 | +The fix operates at multiple layers: |
| 12 | + |
| 13 | +1. **Container Build Layer** - Ensure proper user/group configuration in Dockerfile |
| 14 | +2. **Container Runtime Layer** - Configure group membership and socket mounting in docker-compose |
| 15 | +3. **Application Layer** - Implement robust error handling and retry logic |
| 16 | +4. **Monitoring Layer** - Add proper logging and health checks for Docker connectivity |
| 17 | + |
| 18 | +## Components and Interfaces |
| 19 | + |
| 20 | +### Docker Group Configuration |
| 21 | +- **Host Group Mapping**: Map the host's docker group ID to the container |
| 22 | +- **User Group Membership**: Add the container user to the docker group |
| 23 | +- **Socket Permissions**: Ensure socket is accessible to docker group members |
| 24 | + |
| 25 | +### Container Configuration |
| 26 | +- **Dockerfile Updates**: Configure proper group membership during build |
| 27 | +- **Docker Compose Updates**: Set correct group_add configuration |
| 28 | +- **Environment Variables**: Configure socket path and connection settings |
| 29 | + |
| 30 | +### Application Layer Improvements |
| 31 | +- **Connection Retry Logic**: Implement exponential backoff for failed connections |
| 32 | +- **Error Handling**: Graceful degradation when Docker is unavailable |
| 33 | +- **Health Monitoring**: Enhanced health checks for Docker connectivity |
| 34 | + |
| 35 | +### Security Considerations |
| 36 | +- **Least Privilege**: Grant minimal necessary permissions |
| 37 | +- **Group-based Access**: Use group membership instead of privileged mode |
| 38 | +- **Socket Protection**: Maintain socket security while enabling access |
| 39 | + |
| 40 | +## Data Models |
| 41 | + |
| 42 | +### Docker Connection Configuration |
| 43 | +```javascript |
| 44 | +interface DockerConfig { |
| 45 | + socketPath: string; |
| 46 | + timeout: number; |
| 47 | + retryAttempts: number; |
| 48 | + retryDelay: number; |
| 49 | + healthCheckInterval: number; |
| 50 | +} |
| 51 | +``` |
| 52 | + |
| 53 | +### Connection State Management |
| 54 | +```javascript |
| 55 | +interface DockerConnectionState { |
| 56 | + isConnected: boolean; |
| 57 | + lastError: Error | null; |
| 58 | + lastSuccessfulConnection: Date | null; |
| 59 | + retryCount: number; |
| 60 | + nextRetryAt: Date | null; |
| 61 | +} |
| 62 | +``` |
| 63 | + |
| 64 | +### Error Classification |
| 65 | +```javascript |
| 66 | +interface DockerError { |
| 67 | + type: 'permission' | 'connection' | 'timeout' | 'unknown'; |
| 68 | + code: string; |
| 69 | + message: string; |
| 70 | + recoverable: boolean; |
| 71 | + retryAfter?: number; |
| 72 | +} |
| 73 | +``` |
| 74 | + |
| 75 | +## Error Handling |
| 76 | + |
| 77 | +### Permission Errors (EACCES) |
| 78 | +- **Root Cause**: Container user lacks docker group membership |
| 79 | +- **Detection**: Monitor for EACCES errors on socket connection |
| 80 | +- **Resolution**: Configure proper group membership in container |
| 81 | +- **Fallback**: Log error and continue with degraded functionality |
| 82 | + |
| 83 | +### Connection Failures |
| 84 | +- **Retry Strategy**: Exponential backoff with maximum retry limit |
| 85 | +- **Circuit Breaker**: Temporarily stop retrying after consecutive failures |
| 86 | +- **Recovery**: Automatic reconnection when Docker becomes available |
| 87 | + |
| 88 | +### Socket Unavailability |
| 89 | +- **Detection**: Monitor socket file existence and permissions |
| 90 | +- **Logging**: Detailed error messages for troubleshooting |
| 91 | +- **Graceful Degradation**: Continue serving non-Docker endpoints |
| 92 | + |
| 93 | +## Testing Strategy |
| 94 | + |
| 95 | +### Permission Testing |
| 96 | +- Verify container can access Docker socket after configuration changes |
| 97 | +- Test that docker group membership is properly configured |
| 98 | +- Validate socket permissions are correctly set |
| 99 | + |
| 100 | +### Connection Resilience Testing |
| 101 | +- Test behavior when Docker daemon is stopped/started |
| 102 | +- Verify retry logic works correctly with various failure scenarios |
| 103 | +- Test graceful degradation when Docker is unavailable |
| 104 | + |
| 105 | +### Security Testing |
| 106 | +- Ensure container doesn't run with unnecessary privileges |
| 107 | +- Verify socket access is limited to required operations |
| 108 | +- Test that security boundaries are maintained |
| 109 | + |
| 110 | +### Integration Testing |
| 111 | +- Test full container deployment with fixed configuration |
| 112 | +- Verify Docker operations work correctly after fix |
| 113 | +- Test health check endpoints report correct Docker status |
| 114 | + |
| 115 | +## Implementation Approach |
| 116 | + |
| 117 | +### Phase 1: Container Configuration Fix |
| 118 | +1. Update Dockerfile to properly configure docker group |
| 119 | +2. Modify docker-compose.yml to set correct group_add values |
| 120 | +3. Remove ineffective chmod attempts from application code |
| 121 | + |
| 122 | +### Phase 2: Application Layer Improvements |
| 123 | +1. Implement robust Docker connection management |
| 124 | +2. Add retry logic with exponential backoff |
| 125 | +3. Enhance error logging and monitoring |
| 126 | + |
| 127 | +### Phase 3: Health and Monitoring |
| 128 | +1. Improve health check to properly report Docker status |
| 129 | +2. Add connection state monitoring |
| 130 | +3. Implement graceful degradation for Docker unavailability |
| 131 | + |
| 132 | +### Phase 4: Security Hardening |
| 133 | +1. Verify minimal privilege configuration |
| 134 | +2. Add security validation for Docker operations |
| 135 | +3. Implement proper error boundaries |
| 136 | + |
| 137 | +## Security Considerations |
| 138 | + |
| 139 | +### Docker Socket Access |
| 140 | +- Use group-based permissions instead of privileged mode |
| 141 | +- Limit socket access to necessary operations only |
| 142 | +- Monitor and log all Docker API calls |
| 143 | + |
| 144 | +### Container Security |
| 145 | +- Run as non-root user with minimal required permissions |
| 146 | +- Use specific group membership rather than broad privileges |
| 147 | +- Implement proper input validation for Docker operations |
| 148 | + |
| 149 | +### Host System Protection |
| 150 | +- Ensure container cannot escape to host system |
| 151 | +- Limit Docker operations to safe, necessary functions |
| 152 | +- Implement proper audit logging for security monitoring |
0 commit comments