Skip to content

Conversation

@douglance
Copy link
Contributor

Add production-ready observability infrastructure:

  • Grafana stack (Prometheus, Loki, Tempo, Grafana)
  • OpenTelemetry instrumentation (HTTP, Fastify, Pino, Viem)
  • Grafana Alloy for Docker log collection (pull-based)
  • Pre-configured Grafana datasources with trace/log correlation

All telemetry is optional (ALTO_ENABLE_TELEMETRY flag) and disabled
by default to avoid affecting existing deployments.

Run 'npm run lint:fix' to apply automatic formatting fixes.
This is a pure formatting commit with no functional changes.

- Fix import ordering
- Fix code formatting and indentation
- Apply Biome's style guidelines

341 files automatically formatted by Biome linter.
The build was failing because zkSync's `zks_getFeeParams` RPC method
is not included in viem's standard RPC method types.

Fixed by using type assertions (`as any`) for the zkSync-specific
method and params, allowing the code to compile while maintaining
runtime correctness.

This was the only actual CI blocker - Docker build runs pnpm install
which triggers the prepare hook that runs pnpm build.
Add production-ready observability infrastructure:
- Grafana stack (Prometheus, Loki, Tempo, Grafana)
- OpenTelemetry instrumentation (HTTP, Fastify, Pino, Viem)
- Grafana Alloy for Docker log collection (pull-based)
- Pre-configured Grafana datasources with trace/log correlation

All telemetry is optional (ALTO_ENABLE_TELEMETRY flag) and disabled
by default to avoid affecting existing deployments.

Technical changes:
- Add Alloy config for Docker container log discovery
- Add docker-compose.observability.yml with full stack
- Add docker-compose.yml for application deployment
- Add instrumentation with custom sampler (excludes /metrics, /health)
- Add CLI options for telemetry configuration

Security & Production:
- Grafana requires authentication (admin/admin)
- All Docker images pinned to specific versions
- Cluster name configurable via ALTO_CLUSTER_NAME env var
- Network auto-created (no external dependency)
Add CLI options for observability:
- --enable-telemetry: Enable/disable OpenTelemetry tracing
- --otlp-endpoint: Configure OTLP trace export endpoint

Wrap OTel SDK initialization in ALTO_ENABLE_TELEMETRY check to ensure
instrumentation only runs when explicitly enabled. Update start script
to preload instrumentation module.

Integrates with Tempo backend configured in observability stack for
distributed tracing of HTTP, Fastify, Pino, Undici, and Viem operations.
Add comprehensive test script that validates:
- Service health (Prometheus, Tempo, Loki, Alloy, Grafana)
- Metrics collection and scraping
- Log aggregation via Alloy
- Distributed tracing with OpenTelemetry
- Grafana datasource configuration

Script provides clear pass/fail output and keeps stack running on
success for manual exploration. Automatically cleans up on exit.
- Fix Alloy config: move loki.relabel block before loki.source.docker to resolve forward reference issue
- Update test script to gracefully handle Alto startup failures
- Make Alto validation optional while still validating observability infrastructure
- Automatically start Anvil (local Ethereum node) if available
- Add utility private key configuration for Alto
- Improve step numbering (now 1-10 with Anvil as step 4)
SahilVasava
SahilVasava previously approved these changes Oct 22, 2025
@douglance douglance changed the base branch from dl/biome-formatting to main October 23, 2025 17:03
@douglance douglance dismissed SahilVasava’s stale review October 23, 2025 17:03

The base branch was changed.

- Remove observabilityArgsSchema and related types
- Remove observabilityOptions from CLI configuration
- Keep environment variable approach which works correctly
- CLI options were non-functional due to instrumentation loading before arg parsing
- Add explicit UIDs to Grafana datasources (prometheus, tempo, loki)
  Fixes cross-datasource links (Tempo→Loki logs, service map, derived fields)
- Change Prometheus target from host.docker.internal to ultra-relay-provider
  Ensures scraping works on Linux and when bundler runs in Docker network
- Add jq dependency check to test script with installation instructions
  Prevents cryptic failures when jq is missing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants