Skip to content

SPIFFE Native Support for Temporal #9152

@vedosis

Description

@vedosis

Is your feature request related to a problem? Please describe.

SPIFFE (Secure Production Identity Framework for Everyone) is a CNCF graduated project that provides a universal identity framework for distributed systems. It's widely adopted across the cloud-native ecosystem with proven integration patterns in Envoy, Istio, Kubernetes, and many other platforms. Native SPIFFE support in Temporal would strengthen its security posture and make it significantly easier for users to deploy secure, production-ready Temporal clusters.

Currently, Temporal supports TLS/mTLS via static certificate files. While functional, this approach requires users to either manually manage certificate rotation or integrate external tooling (cert-manager, spiffe-helper) which adds operational complexity. Additionally, when users do deploy Temporal with SPIFFE certificates today, the SPIFFE IDs in those certificates (URI SANs) are not extracted or used for authorization decisions - the workload identity information is present but unused.

Native SPIFFE support would provide:

  • Automatic certificate lifecycle management via the Workload API - zero manual rotation or bolt on automation
  • Identity-based authorization using SPIFFE IDs to drive authorization decisions
  • Simplified deployments for users already running SPIFFE/SPIRE infrastructure
  • Enhanced security story aligning Temporal with cloud-native zero-trust best practices
  • Smooth migration path from traditional mTLS without requiring full cutover

This would improve Temporal's security positioning and make it easier for enterprises to trust and deploy Temporal in production environments.

Describe the solution you'd like

Native SPIFFE support integrated directly into Temporal server with:

1. Automatic Certificate Management

  • Direct integration with SPIFFE Workload API (Unix socket)
  • Automatic X.509-SVID fetching and rotation via GetCertificate callbacks
  • Trust bundle management from Workload API
  • Fallback to static TLS when Workload API unavailable (for mTLS -> SPIFFE migration)

2. SPIFFE-Based Authorization

  • New ClaimMapper implementation that extracts SPIFFE IDs from client certificates (extends existing ClaimMapper interface)
  • Regex-based pattern matching to map SPIFFE ID paths to Temporal's existing role model
  • Trust domain validation (allowlist)
  • Maps to existing namespace-level and system-level permissions via the Claims struct

3. Smooth Migration Path

  • Claim mapper chains that try multiple authentication methods in order (extends GetClaimMapperFromConfig to support multiple mappers)
  • Support both SPIFFE and traditional mTLS during migration
  • Backward compatible configuration (extends Authorization config struct)

4. Full Mesh Coverage

  • All services: Frontend, History, Matching, Internal Workers
  • Both internode and client-facing connections

How This Integrates with Existing Architecture

This proposal builds on Temporal's existing TLS and authorization infrastructure:

  • Extends existing TLS configuration: Adds SPIFFE support to RootTLS, GroupTLS, and ServerTLS structs without breaking existing static certificate configurations
  • Implements existing interfaces: New SPIFFE and TLS claim mappers implement the existing ClaimMapper interface, so they integrate seamlessly with Interceptor
  • Uses existing authorization model: SPIFFE IDs map to the same Claims struct and Role types that JWT-based auth uses today
  • Leverages existing TLS helpers: Uses TLSInfoFromContext and PeerCert functions to extract certificate information
  • No breaking changes: All existing configurations continue to work; SPIFFE is purely additive

Technical Approach

Key design considerations:

  1. Workload API integration via go-spiffe SDK - Leverages upstream caching, automatic rotation
  2. GetCertificate callbacks - No background goroutines needed, fetches current cert on each TLS handshake (standard Go tls.Config pattern)
  3. Claim mapper chains - Extends existing ClaimMapper interface with ordered fallback (SPIFFE → TLS → JWT) for migration flexibility
  4. Per-service configuration - Extends existing RootTLS config with global SPIFFE defaults and per-service overrides
  5. Trust domain validation - Optional allowlist for multi-tenant environments
  6. Integrates with existing authorization - Works with existing Interceptor and Authorizer without breaking changes

I have a detailed design covering architecture, configuration schema, implementation phases, testing strategy, migration patterns, and security considerations. However, I'm looking for alignment on the approach before investing in a full implementation, not approval of specific implementation details.

Seeking Alignment

I'm looking for confirmation that this type of contribution would be accepted before dedicating engineering resources to implementation. Specific questions:

  1. Is SPIFFE support something Temporal would accept? - Does native SPIFFE integration align with Temporal's roadmap and architectural direction?

  2. Architecture approach - Is Workload API integration via GetCertificate callbacks acceptable? Any concerns with adding the go-spiffe SDK dependency (it's currently a transitive dependency)?

  3. Configuration extension - Does extending RootTLS and Authorization structs align with Temporal's configuration philosophy? Any preference for how config should be structured? I have a recommendation if that would be helpful.

  4. Authorization integration - Is extending ClaimMapper with a chain mechanism the right pattern? Any concerns with SPIFFE ID regex pattern matching for authz decisions?

  5. Contribution scope - Should this be:

    • One comprehensive PR?
    • Split into multiple PRs (Core infra → Authorization → Configuration → Hardening, this would be my preference because it's touching a lot of little places)?
    • Feature-flagged initially for gradual rollout? or is a graceful fallback config sufficient?
  6. Testing and documentation expectations - Beyond standard unit/integration tests, are there specific test scenarios, performance benchmarks, or documentation requirements? My expectation is to adhere to the STUNNING quality of Temporal's existing end user documentation.

  7. Temporal Cloud - I've primarily focused on the Temporal Server installations, but are there any Temporal Cloud considerations that need to be added?

I'm not asking for commitment to merge or review a specific design - just alignment that this type of contribution fits Temporal's direction and would be considered if implemented according to Temporal's standards and preferences.

Commitment and Resources

This is a business priority for us. We're allocating dedicated engineering resources to implement this feature properly if Temporal confirms alignment. Our commitment includes:

  • Full implementation according to Temporal's architectural standards
  • Comprehensive test coverage (unit, integration, migration scenarios)
  • Documentation (configuration guides, migration guides, operational runbooks)
  • Long-term maintenance and support for the feature

We're not looking for Temporal to implement this - we will do the work. We just need confirmation that the general approach aligns with Temporal's direction before committing resources to a potentially incompatible implementation.

Describe alternatives you've considered

Alternative 1: Continue using external tools

  • spiffe-helper to fetch SVIDs and write to disk, Temporal reads static files
  • Downsides: Extra process to manage, delayed rotation, no SPIFFE-based authz

Alternative 2: Custom sidecar container

  • Custom sidecar that fetches SVIDs (i.e. Istio) and updates Temporal config dynamically
  • Downsides: Similar to spiffe-helper, complex deployment, still no SPIFFE-based authz

Alternative 3: Proxy-based approach (Envoy + SPIRE integration)

  • Use Envoy with SPIRE SDS for certificate management
  • Downsides: Additional infrastructure layer, latency overhead, doesn't solve authz problem

Why native integration is better:

  • Zero external dependencies - No sidecars, no external processes
  • Immediate rotation - Certificates fetched on-demand, no disk writes
  • SPIFFE-aware authorization - Can make authz decisions based on workload identity
  • Simpler operations - One less moving part to manage and debug

Additional context

Production context

  • We're running Temporal in Kubernetes with Defakto deployed (SPIRE enhanced server + agent DaemonSet)
  • All workloads in our cluster already receive SPIFFE identities
  • We want Temporal to be a native citizen in this zero-trust environment

Community interest

This feature would benefit any organization using SPIFFE/SPIRE infrastructure and wanting to integrate Temporal into their zero-trust architecture. SPIFFE is a CNCF graduated project with growing adoption.

Timeline and next steps

If Temporal confirms this approach aligns with their direction, we can:

  • Provide a detailed design document for technical review
  • Begin implementation immediately (this is a business priority for us and we'll be committing resources to it)
  • Target initial PR(s) within 4-6 weeks
  • Iterate based on code review feedback

Related work


Looking forward to hearing whether this aligns with Temporal's roadmap and architectural direction. Happy to discuss any aspects in more detail, provide additional technical details, or adjust the approach based on Temporal's preferences and requirements.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions