Complete developer guide for contributing to Discogsography
This guide covers the development workflow, tools, and best practices for working on Discogsography. Whether you're fixing bugs, adding features, or improving performance, this guide will help you get started.
Discogsography leverages cutting-edge Python tooling for maximum developer productivity and code quality.
| Tool | Purpose | Configuration |
|---|---|---|
| uv | 10-100x faster package management | pyproject.toml |
| ruff | Lightning-fast linting & formatting | pyproject.toml |
| mypy | Strict static type checking | pyproject.toml |
| bandit | Security vulnerability scanning | pyproject.toml |
| pre-commit | Git hooks for code quality | .pre-commit-config.yaml |
| just | Task runner (like make, but better) | justfile |
uv: 10-100x faster than pip, with better dependency resolution ruff: Replaces flake8, isort, pyupgrade, and more - all in one fast tool mypy: Catch type errors before runtime bandit: Find security vulnerabilities automatically pre-commit: Ensure code quality before every commit just: Simple, cross-platform task automation
discogsography/
├── 🔐 api/ # User auth, graph queries, OAuth, sync
│ ├── api.py # FastAPI application entry point
│ ├── auth.py # JWT helpers and OAuth token encryption
│ ├── limiter.py # Shared slowapi rate-limiter instance
│ ├── setup.py # discogs-setup CLI tool
│ ├── routers/ # FastAPI routers (auth, explore, sync, user, snapshot, oauth, nlq, rarity, etc.)
│ ├── README.md
│ └── __init__.py
├── 🧠 brainzgraphinator/ # MusicBrainz Neo4j enrichment service
│ ├── brainzgraphinator.py # Enriches Neo4j nodes with MusicBrainz metadata
│ ├── README.md
│ └── __init__.py
├── 🧬 brainztableinator/ # MusicBrainz PostgreSQL storage service
│ ├── brainztableinator.py # Stores MusicBrainz data in PostgreSQL
│ ├── README.md
│ └── __init__.py
├── 📦 common/ # Shared utilities and configuration
│ ├── config.py # Centralized configuration management
│ ├── health_server.py # Health check endpoint server
│ └── __init__.py
├── 📊 dashboard/ # Real-time monitoring dashboard
│ ├── dashboard.py # FastAPI backend with WebSocket
│ ├── admin_proxy.py # Admin panel proxy to API service
│ ├── tailwind.config.js # Tailwind CLI configuration (content paths, plugins)
│ ├── tailwind.input.css # Tailwind source directives (@tailwind base/…)
│ ├── static/ # Frontend HTML/CSS/JS (Tailwind, SVG gauges)
│ │ ├── index.html
│ │ ├── tailwind.css # Generated at Docker build time by css-builder stage
│ │ ├── styles.css
│ │ └── dashboard.js
│ ├── README.md
│ └── __init__.py
├── 📥 extractor/ # Rust-based high-performance extractor
│ ├── src/
│ │ └── main.rs # Rust processing logic
│ ├── benches/ # Rust benchmarks
│ ├── tests/ # Rust unit tests
│ ├── Cargo.toml # Rust dependencies
│ └── README.md
├── 🔍 explore/ # Static frontend for graph exploration UI
│ ├── explore.py # FastAPI static file server (health check only)
│ ├── tailwind.config.js # Tailwind CLI configuration (content paths, plugins)
│ ├── tailwind.input.css # Tailwind source directives (@tailwind base/…)
│ ├── static/ # Frontend HTML/CSS/JS (Tailwind, Alpine.js, D3.js, Plotly.js)
│ │ ├── index.html
│ │ ├── tailwind.css # Generated at Docker build time by css-builder stage
│ │ ├── css/styles.css
│ │ └── js/ # Modular JS (app, graph, trends, auth, etc.)
│ ├── README.md
│ └── __init__.py
├── 🔗 graphinator/ # Neo4j graph database service
│ ├── graphinator.py # Graph relationship builder
│ ├── README.md
│ └── __init__.py
├── 📈 insights/ # Precomputed analytics and music trends
│ ├── insights.py # Insights service entry point (scheduler + endpoints)
│ ├── computations.py # Computation orchestration (fetches from API over HTTP)
│ ├── cache.py # Redis cache-aside layer
│ ├── models.py # Pydantic response models
│ ├── README.md
│ └── __init__.py
├── 🤖 mcp-server/ # AI assistant MCP server
│ ├── server.py # FastMCP server exposing knowledge graph
│ ├── README.md
│ └── __init__.py
├── 🔧 schema-init/ # One-shot database schema initializer
│ ├── schema_init.py # Entry point — creates Neo4j + PostgreSQL schema
│ ├── neo4j_schema.py # Neo4j constraints and indexes
│ ├── postgres_schema.py # PostgreSQL tables and indexes
│ ├── Dockerfile
│ └── __init__.py
├── 🐘 tableinator/ # PostgreSQL storage service
│ ├── tableinator.py # Relational data management
│ ├── README.md
│ └── __init__.py
├── 🔧 utilities/ # Operational tools
│ ├── check_errors.py # Log analysis
│ ├── monitor_queues.py # Real-time queue monitoring
│ ├── system_monitor.py # System health dashboard
│ └── __init__.py
├── 🧪 tests/ # Comprehensive test suite
│ ├── api/ # API service tests
│ ├── brainzgraphinator/ # Brainzgraphinator tests
│ ├── brainztableinator/ # Brainztableinator tests
│ ├── common/ # Common module tests
│ ├── dashboard/ # Dashboard tests (including E2E)
│ ├── explore/ # Explore service tests
│ ├── graphinator/ # Graphinator tests
│ ├── insights/ # Insights service tests
│ ├── mcp-server/ # MCP server tests
│ ├── schema-init/ # Schema initializer tests
│ └── tableinator/ # Tableinator tests
├── 📝 docs/ # Documentation
├── 📜 scripts/ # Utility scripts
│ ├── update-project.sh # Dependency upgrade automation
│ └── README.md
├── 🐋 docker-compose.yml # Container orchestration
├── 📄 .env.example # Environment variable template
├── 📄 .pre-commit-config.yaml # Pre-commit hooks configuration
├── 📄 justfile # Task automation
├── 📄 pyproject.toml # Project configuration (root)
└── 📄 README.md # Project overview
# Install uv (package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install just (task runner)
brew install just # macOS
# or: cargo install just
# or: https://just.systems/install.sh
# Verify installations
uv --version
just --version# Clone repository
git clone https://github.com/SimplicityGuy/discogsography.git
cd discogsography
# Install all dependencies (including dev dependencies)
just install
# Or using uv directly
uv sync --all-extras# Install pre-commit hooks
just init
# Or using uv directly
uv run pre-commit install# Start databases and message queue
docker-compose up -d neo4j postgres rabbitmq redis
# Verify they're running
docker-compose ps# Copy example environment file
cp .env.example .env
# Edit for local development (or use defaults)
nano .envSee Configuration Guide for all environment variables.
# Dashboard (monitoring UI)
just dashboard
# Explore (graph exploration & trends)
just explore
# Extractor (Rust-based data ingestion - requires cargo)
just extractor-run
# Graphinator (Neo4j builder)
just graphinator
# Insights (precomputed analytics & trends)
just insights
# Tableinator (PostgreSQL builder)
just tableinator
# Brainzgraphinator (MusicBrainz → Neo4j enrichment)
just brainzgraphinator
# Brainztableinator (MusicBrainz → PostgreSQL)
just brainztableinator
# MCP Server (AI assistant integration)
just mcp-server# Run all quality checks
just lint # Linting with ruff
just format # Code formatting with ruff
just lint-python # Linting with ruff + type checking with mypy
just security # Security scan with bandit
# Or run everything at once
uv run pre-commit run --all-files-
Create a branch:
git checkout -b feature/my-feature
-
Make your changes:
- Follow coding standards (see below)
- Add type hints
- Write docstrings
- Update tests
-
Test your changes:
just test # Run tests just test-cov # With coverage
-
Check code quality:
just lint just format just typecheck just security
-
Commit changes:
git add . git commit -m "feat: add amazing feature" # Pre-commit hooks will run automatically
-
Push and create PR:
git push origin feature/my-feature # Create pull request on GitHub
tests/
├── api/ # API service tests (auth, routers, queries)
├── brainzgraphinator/ # Brainzgraphinator tests
├── brainztableinator/ # Brainztableinator tests
├── common/ # Common module tests
├── dashboard/ # Dashboard tests
│ └── test_dashboard_ui.py # E2E tests with Playwright
├── explore/ # Explore service tests
├── graphinator/ # Graphinator tests
├── insights/ # Insights service tests
├── mcp-server/ # MCP server tests
├── schema-init/ # Schema initializer tests
└── tableinator/ # Tableinator tests
Tests run in parallel by default using pytest-xdist (-n auto --dist loadfile is set in pyproject.toml). This reduces the full suite from ~15 minutes to ~5 minutes.
# All tests (excluding E2E) — runs in parallel automatically
just test
# With coverage report (parallel)
just test-cov
# Specific test file
uv run pytest tests/api/test_neo4j_queries.py
# Specific test function
uv run pytest tests/api/test_neo4j_queries.py::test_search_artists
# Sequential execution (for debugging, shows cleaner output)
uv run pytest -n 0 -s
# With verbose output
uv run pytest -v# One-time setup
uv run playwright install chromium
uv run playwright install-deps chromium
# Run E2E tests
just test-e2e
# Or directly
uv run pytest tests/dashboard/test_dashboard_ui.py -m e2e
# With specific browser
uv run pytest tests/dashboard/test_dashboard_ui.py -m e2e --browser firefox
# Run in headed mode (see browser)
uv run pytest tests/dashboard/test_dashboard_ui.py -m e2e --headedThe Explore frontend's modular JavaScript files are tested using Vitest:
# Install JS dependencies (one-time)
just install-js
# Run JS tests
just test-js
# Run JS tests with coverage
just test-js-covJavaScript tests are also included in the CI pipeline (test.yml) and in just test-parallel.
See Testing Guide for comprehensive testing documentation.
Follow PEP 8 with these tools:
- ruff: Linting and formatting (replaces flake8, isort, pyupgrade, black — 150 character line length)
- mypy: Static type checking
# Auto-format code
just format
# Check for issues
just lint
# Type check
just typecheckAlways use type hints for function parameters and return values:
# ✅ Good
def process_artist(artist_id: str, data: dict) -> bool:
"""Process artist data."""
...
# ❌ Bad
def process_artist(artist_id, data):
...Write docstrings for all public functions and classes:
def calculate_similarity(artist1: str, artist2: str) -> float:
"""Calculate similarity score between two artists.
Args:
artist1: Name of first artist
artist2: Name of second artist
Returns:
Similarity score between 0.0 and 1.0
Raises:
ValueError: If artist names are empty
"""
...Use emoji-prefixed logging for consistency (with structlog — see Logging Guide):
import structlog
logger = structlog.get_logger(__name__)
# Startup
logger.info("🚀 Starting service...")
# Success
logger.info("✅ Operation completed successfully")
# Error
logger.error("❌ Failed to connect to database")
# Warning
logger.warning("⚠️ Connection timeout, retrying...")
# Progress
logger.info("📊 Processed 1000 records")See Logging Guide and Emoji Guide for complete logging standards.
Always handle errors gracefully:
# ✅ Good
try:
result = perform_operation()
except ValueError as e:
logger.error(f"❌ Invalid value: {e}")
raise
except ConnectionError as e:
logger.warning(f"⚠️ Connection failed: {e}, retrying...")
retry_operation()
# ❌ Bad
try:
result = perform_operation()
except: # Don't use bare except
pass # Don't silently ignore errorsNever log sensitive data:
# ✅ Good
logger.info(f"🔗 Connecting to database at {host}")
# ❌ Bad
logger.info(f"🔗 Connecting with password: {password}")Use parameterized queries:
# ✅ Good
cursor.execute(
"SELECT * FROM artists WHERE name = %s",
(artist_name,)
)
# ❌ Bad (SQL injection vulnerability)
cursor.execute(f"SELECT * FROM artists WHERE name = '{artist_name}'")# Set environment variable
export LOG_LEVEL=DEBUG
# Run service
uv run python dashboard/dashboard.py
# Or with Docker
LOG_LEVEL=DEBUG docker-compose up# Add breakpoint
import pdb; pdb.set_trace()
# Or Python 3.7+
breakpoint()Create .vscode/launch.json:
{
"version": "0.2.0",
"configurations": [
{
"name": "Python: Dashboard",
"type": "python",
"request": "launch",
"program": "${workspaceFolder}/dashboard/dashboard.py",
"console": "integratedTerminal",
"env": {
"LOG_LEVEL": "DEBUG"
}
}
]
}import cProfile
import pstats
# Profile function
profiler = cProfile.Profile()
profiler.enable()
# Your code here
process_data()
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(20) # Top 20 functions# Install memory profiler
uv add --dev memory-profiler
# Run with profiler
uv run python -m memory_profiler script.py# Scan for vulnerabilities
just security
# Or directly
uv run bandit -r . -ll# Check for known vulnerabilities
uv run pip-audit
# Update dependencies
./scripts/update-project.sh- Use Markdown for all documentation
- Follow the Emoji Guide for consistency
- Add Mermaid diagrams where helpful
- Include code examples
- Keep documentation up-to-date
# Add new documentation to docs/
# Update docs/README.md
# Link from main README.md- Write tests first (TDD when possible)
- Keep functions small and focused
- Use descriptive variable names
- Avoid magic numbers - use constants
- Handle errors explicitly
- Log important events
- Document complex logic
- Optimize only when needed (measure first)
Use Conventional Commits:
feat: add new feature
fix: correct bug
docs: update documentation
style: format code
refactor: restructure code
test: add tests
chore: update dependencies
- Code follows style guide
- Tests are included and pass
- Type hints are complete
- Documentation is updated
- No security vulnerabilities
- Performance is acceptable
- Error handling is robust
- Logging is appropriate
The project uses GitHub Actions for CI/CD:
- Build: Verify Docker images build correctly
- Code Quality: Run linters and type checkers
- Tests: Run unit and integration tests
- E2E Tests: Run Playwright tests
- Security: Scan for vulnerabilities
See GitHub Actions Guide for details.
Local checks before commit:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
hooks:
- id: ruff
args: [--fix]
- id: ruff-format
- repo: https://github.com/pre-commit/mirrors-mypy
hooks:
- id: mypy# Clear cache
uv cache clean
# Reinstall dependencies
rm -rf .venv
uv sync --all-extras# Update pre-commit
uv run pre-commit autoupdate
# Re-install hooks
uv run pre-commit install --install-hooks# Run single test with verbose output
uv run pytest tests/path/to/test.py::test_name -vv
# Show stdout
uv run pytest -s
# Debug with pdb
uv run pytest --pdb- Testing Guide - Comprehensive testing documentation
- Contributing Guide - How to contribute
- GitHub Actions Guide - CI/CD workflows
- Logging Guide - Logging standards
- Emoji Guide - Emoji conventions
Last Updated: 2026-03-27