-
Notifications
You must be signed in to change notification settings - Fork 3
Implement observability stack with Alloy-based logging #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
douglance
wants to merge
12
commits into
main
Choose a base branch
from
dl/monitoring
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Run 'npm run lint:fix' to apply automatic formatting fixes. This is a pure formatting commit with no functional changes. - Fix import ordering - Fix code formatting and indentation - Apply Biome's style guidelines 341 files automatically formatted by Biome linter.
The build was failing because zkSync's `zks_getFeeParams` RPC method is not included in viem's standard RPC method types. Fixed by using type assertions (`as any`) for the zkSync-specific method and params, allowing the code to compile while maintaining runtime correctness. This was the only actual CI blocker - Docker build runs pnpm install which triggers the prepare hook that runs pnpm build.
Add production-ready observability infrastructure: - Grafana stack (Prometheus, Loki, Tempo, Grafana) - OpenTelemetry instrumentation (HTTP, Fastify, Pino, Viem) - Grafana Alloy for Docker log collection (pull-based) - Pre-configured Grafana datasources with trace/log correlation All telemetry is optional (ALTO_ENABLE_TELEMETRY flag) and disabled by default to avoid affecting existing deployments. Technical changes: - Add Alloy config for Docker container log discovery - Add docker-compose.observability.yml with full stack - Add docker-compose.yml for application deployment - Add instrumentation with custom sampler (excludes /metrics, /health) - Add CLI options for telemetry configuration Security & Production: - Grafana requires authentication (admin/admin) - All Docker images pinned to specific versions - Cluster name configurable via ALTO_CLUSTER_NAME env var - Network auto-created (no external dependency)
Add CLI options for observability: - --enable-telemetry: Enable/disable OpenTelemetry tracing - --otlp-endpoint: Configure OTLP trace export endpoint Wrap OTel SDK initialization in ALTO_ENABLE_TELEMETRY check to ensure instrumentation only runs when explicitly enabled. Update start script to preload instrumentation module. Integrates with Tempo backend configured in observability stack for distributed tracing of HTTP, Fastify, Pino, Undici, and Viem operations.
Add comprehensive test script that validates: - Service health (Prometheus, Tempo, Loki, Alloy, Grafana) - Metrics collection and scraping - Log aggregation via Alloy - Distributed tracing with OpenTelemetry - Grafana datasource configuration Script provides clear pass/fail output and keeps stack running on success for manual exploration. Automatically cleans up on exit.
- Fix Alloy config: move loki.relabel block before loki.source.docker to resolve forward reference issue - Update test script to gracefully handle Alto startup failures - Make Alto validation optional while still validating observability infrastructure
- Automatically start Anvil (local Ethereum node) if available - Add utility private key configuration for Alto - Improve step numbering (now 1-10 with Anvil as step 4)
SahilVasava
previously approved these changes
Oct 22, 2025
- Remove observabilityArgsSchema and related types - Remove observabilityOptions from CLI configuration - Keep environment variable approach which works correctly - CLI options were non-functional due to instrumentation loading before arg parsing
- Add explicit UIDs to Grafana datasources (prometheus, tempo, loki) Fixes cross-datasource links (Tempo→Loki logs, service map, derived fields) - Change Prometheus target from host.docker.internal to ultra-relay-provider Ensures scraping works on Linux and when bundler runs in Docker network - Add jq dependency check to test script with installation instructions Prevents cryptic failures when jq is missing
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add production-ready observability infrastructure:
All telemetry is optional (ALTO_ENABLE_TELEMETRY flag) and disabled
by default to avoid affecting existing deployments.