Skip to content

Tracking ticket: Kubernetes CRD & operator #18

@danopia

Description

@danopia

It's pretty awkward to iteratively configure and manage a DNS-Sync deployment in a live cluster. The process involves updating configmaps, redeploying the pod, and inspecting its health directly. A Kubernetes operator pattern can fix this and also support multi-tenancy.

Problems to be addressed by Operator pattern

  • DnsSync faults best exposed through CrashLoopBackOff
  • Granular DnsSync status not visible outside of logs
    • a single provider not being available / invalid credential
    • a desired FQDN already being managed by someone else
    • a source has an invalid TTL for the provider
  • DnsSync only accepts one configuration per process
  • Dry-run just prints any changes to pod logs
    • Enabling changes is all-or-nothing
    • Could instead allow for one-off applies as needed

Features provided by Operator pattern

  • Independent DnsSync resources can be created to manage per-team configs, or split-horizon DNS
  • DnsSync resources can have extra table columns:
    • Health such as Ready, Pending, Degraded
    • Strategy such as FullSync, DryRun, Disabled
  • Refer to same-namespace Secrets to load provider api keys
  • Approve existing DryRun to apply without also authorizing future syncs, by copying a status field into the spec

Problems introduced by Operator pattern

  • Pain of changing existing CRD. Helm doesn't update CRDs, so we'd want to continue working with outdated CRD specs.
  • Relatively inflexible kubectl behavior which has led other projects to making their own CLI (cmctl, etc)
  • Supposed to have leader election in case of multiple running operator pods
  • Multiple DnsSync resources observing a cluster source should reuse the same Watcher stream
  • RBAC: To let DnsSync resources reference API-key secrets from their namespace, do we need read get access to every secret?
  • Lack of Namespace isolation: Even if the Operator is installed into one specific Namespace, the CRD must be cluster-level

Alternative Solutions

For zone isolation:

  • Accept multiple TOML config files mounted similarly to the existing config file
  • Enable targeting a subset of records to specific zones, e.g. annotation filtering

For status visibility:

  • Emit warnings as Kubernetes "Event" resources next to the source resources (Ingress, etc)
  • Publish overall status as plain text inside one ConfigMap (like GKE cluster autoscaler)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions