Skip to content

More Robust Autoconfiguration/Service Discovery #367

@nf-brentsaner

Description

@nf-brentsaner

Overview

Apologies if this isn't the right repo, but it isn't clear how to suggest an improvement to the architecture itself.

When using NATS, it is beneficial for one to not need any existing knowledge of servers -- this allows for wider scaling of swarms/other clustering services, and reduces maintenance/configuration overhead. (I'll admit this is a selfish request; while I maintain configuration management, autodiscovery via the mechanisms below provides much more flexible provisioning options for me. I suspect it'd be globally useful as well, however.)

Generally speaking, over the years this has been accomplished with great success in a platform- and language-agnostic way via the following technologies:

  • SRV DNS Records (RFC 2782)

    • This allows one to specify not just many destinations (servers) under a specific single name (much as NATS currently does with multi-record single-name A/AAAA records), but with prioritization (preference) and weights (reliability). This explicit preferential ordering by a service administrator is not possible with multi-record A/AAAA names.
    • This also allows for easier NATS clustering; preference 0 weight 0 could/would indicate a set for RAFT quorum, with other priority/weight mixes indicating additional addresses for additional load-balancing/fault tolerance.
    • Being that NATS services are not registered with IANA, suggested service names are:
      • NATS (Plaintext/"Opportunistic" TLS): nats (_nats)
      • NATS (WebSockets): natsws (_natsws)
      • NATS Cluster traffic: natsclst (_natsclst)
      • NATS Gateways: natsgw (_natsgw)
      • NATS Leafnodes: natslf (_natslf)
      • MQTT: mqtt (_mqtt) (This is a registered IANA service and thus per RFC must use the registered service name, though note most MQTT senders don't attempt SRV resolution.)
      • All above services (with the exception of the below) would have enforced/explicit TLS services specified by an appended -tls (e.g. NATS with enforced TLS would be nats-tls / _nats-tls), except:
        • Explicitly plaintext WebSockets would be natsws-plain / _natsws-plain, as the default is TLS.
        • MQTT with TLS transport would be secure-mqtt / _secure-mqtt, as that is also an explicitly registered IANA service.
      • The opportunistic behavior of the Core service with TLS enabled (to support legacy clients, as mentioned here) may cause a bit of awkwardness, as it technically should be nats-tls but this then defeats the purpose of the TLS INFO negotiation in the first place -- as such, these are recommended to be instead presented as nats.
  • mDNS (RFC 6762)

    • mDNS allows for record resolution and service discovery with no explicit configuration -- not even DNS search lists are needed. This then essentially allows for a "drop-in-and-go" service/client.
    • mDNS, combined with the above SRV record services, allows for clients, cluster members, etc. to discover servers/other cluster members/etc. purely from environmental discovery (RFC 6763) with no requirement for additional turnup (configuration management, Consul, etc. are not needed) or even without any explicit configuration at all in many cases.

Suggested behavior would be the following:

  1. If explicit URLs are specified in configuration, only use the specified URLs; do not attempt mDNS resolution. This guarantees backwards compatibility.
  2. If a record of a specified URL resolves to an A or AAAA record, follow the current logic of URL list building via those records; do not attempt SRV lookup.
  3. If no A/AAAA exists, attempt lookup based on the above service names, preferring TLS first. Follow SRV ordering specification (the stdlib Golang internal/native net.Resolver orders these automatically, as do most (all?) system/OS-backed net.Resolvers).
    * I'm unsure how to handle a client-provided port number in the case of SRV resolution. I recommend it should take preference over SRV-provided port numbers.
  4. If no URLs are explicitly configured, then use mDNS service discovery for relevant above SRV records.

What this looks like as some practical examples:

  • NATS client has nats://nats.example.com specified as a URL:

    1. If A/AAAA for nats.example.com exists, then use those addresses to build a selection pool (current behavior). If not,
    2. If SRV for _nats-tls._tcp.nats.example.com exists, then use the specified ordered entries from the resolved order to build a selection pool. If not,
    3. If SRV for _nats._tcp.nats.example.com exists, then use the specified ordered entries from the resolved order to build a selection pool. If not,
    4. (Error condition; no servers could be found)
  • NATS client has no URL specified:

    1. Perform mDNS DNS-SD for _nats-tls._tcp.local.. If exists, follow SRV selection pool building. If not,
    2. Perform mDNS DNS-SD for _nats._tcp.local.. If exists, follow SRV selection pool building. If not,
    3. If SRV for _nats-tls._tcp.nats.example.com exists, where example.com is the first name specified in the DNS search list for a host, then use the specified ordered entries from the resolved order to build a selection pool. If not,
    4. If SRV for _nats._tcp.nats.example.com exists, where example.com is the first name specified in the DNS search list for a host, then use the specified ordered entries from the resolved order to build a selection pool. If not,
    5. Repeat steps 3, 4 for each domain in a hosts DNS search list until a record is found. If no record is found and the DNS search list is exhausted, then
    6. (Error condition; no servers could be found)

Metadata

Metadata

Assignees

Labels

clientClient related workenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions