More Robust Autoconfiguration/Service Discovery

## Overview

Apologies if this isn't the right repo, but it isn't clear how to suggest an improvement to the architecture itself.

When using NATS, it is beneficial for one to not need any existing knowledge of servers -- this allows for wider scaling of swarms/other clustering services, and reduces maintenance/configuration overhead. (I'll admit this is a selfish request; while I maintain configuration management, autodiscovery via the mechanisms below provides much more flexible provisioning options for me. I suspect it'd be globally useful as well, however.)

Generally speaking, over the years this has been accomplished with great success in a platform- and language-agnostic way via the following technologies:

* SRV DNS Records ([RFC 2782](https://datatracker.ietf.org/doc/html/rfc2782))
  * This allows one to specify not just many destinations (servers) under a specific single name (much as NATS currently does with multi-record single-name A/AAAA records), but with prioritization (preference) and weights (reliability). This explicit preferential ordering by a service administrator is not possible with multi-record A/AAAA names.
  * This also allows for easier NATS clustering; preference 0 weight 0 could/would indicate a set for RAFT quorum, with other priority/weight mixes indicating additional addresses for additional load-balancing/fault tolerance.
  * Being that NATS services are [not registered with IANA](https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml), suggested service names are:
    * NATS (Plaintext/"Opportunistic" TLS): `nats` (`_nats`)
    * NATS (WebSockets): `natsws` (`_natsws`)
    * NATS Cluster traffic: `natsclst` (`_natsclst`)
    * NATS Gateways: `natsgw` (`_natsgw`)
    * NATS Leafnodes: `natslf` (`_natslf`)
    * MQTT: `mqtt` (`_mqtt`) (This is a registered IANA service and thus per RFC **must** use the registered service name, though note most MQTT senders don't attempt SRV resolution.)
    * All above services (with the exception of the below) would have enforced/explicit TLS services specified by an appended `-tls` (e.g. NATS with enforced TLS would be `nats-tls` / `_nats-tls`), except:
      * Explicitly *plaintext* WebSockets would be `natsws-plain` / `_natsws-plain`, as the default is TLS.
      * MQTT with TLS transport would be `secure-mqtt` / `_secure-mqtt`, as that is also an explicitly registered IANA service.
    * The opportunistic behavior of the Core service with TLS enabled (to support legacy clients, as mentioned [here](https://docs.nats.io/running-a-nats-service/configuration/securing_nats/tls#tls-first-handshake)) may cause a bit of awkwardness, as it technically **should** be `nats-tls` but this then defeats the purpose of the TLS `INFO` negotiation in the first place -- as such, these are recommended to be instead presented as `nats`.

* mDNS ([RFC 6762](https://datatracker.ietf.org/doc/html/rfc6762))
  * mDNS allows for record resolution and service discovery with *no* explicit configuration -- not even DNS search lists are needed. This then essentially allows for a "drop-in-and-go" service/client.
  * mDNS, combined with the above SRV record services, allows for clients, cluster members, etc. to discover servers/other cluster members/etc. *purely from environmental discovery* ([RFC 6763](https://datatracker.ietf.org/doc/html/rfc6763)) with *no* requirement for additional turnup (configuration management, Consul, etc. are not needed) or even without any explicit configuration at all in many cases.

Suggested behavior would be the following:

1. If explicit URLs are specified in configuration, only use the specified URLs; do not attempt mDNS resolution. This guarantees backwards compatibility.
  1. If a record of a specified URL resolves to an A or AAAA record, follow the current logic of URL list building via those records; do not attempt SRV lookup.
  2. If no A/AAAA exists, attempt lookup based on the above service names, preferring TLS first. Follow SRV ordering specification (the stdlib Golang internal/native `net.Resolver` orders these automatically, as do most (all?) system/OS-backed `net.Resolver`s).
    * I'm unsure how to handle a client-provided port number in the case of SRV resolution. I *recommend* it should take preference over SRV-provided port numbers.
4. If no URLs are explicitly configured, then use mDNS service discovery for relevant above SRV records.


What this looks like as some practical examples:

* NATS client has `nats://nats.example.com` specified as a URL:
  1. If A/AAAA for `nats.example.com` exists, then use those addresses to build a selection pool (current behavior). If not,
  2. If SRV for `_nats-tls._tcp.nats.example.com` exists, then use the specified ordered entries from the resolved order to build a selection pool. If not,
  3. If SRV for `_nats._tcp.nats.example.com` exists, then use the specified ordered entries from the resolved order to build a selection pool. If not,
  4. (Error condition; no servers could be found)

* NATS client has *no* URL specified:
  1. Perform mDNS DNS-SD for `_nats-tls._tcp.local.`. If exists, follow SRV selection pool building. If not,
  2. Perform mDNS DNS-SD for `_nats._tcp.local.`. If exists, follow SRV selection pool building. If not,
  3. If SRV for `_nats-tls._tcp.nats.example.com` exists, where `example.com` is the first name specified in the DNS search list for a host, then use the specified ordered entries from the resolved order to build a selection pool. If not,
  4. If SRV for `_nats._tcp.nats.example.com` exists, where `example.com` is the first name specified in the DNS search list for a host, then use the specified ordered entries from the resolved order to build a selection pool. If not,
  5. Repeat steps 3, 4 for each domain in a hosts DNS search list until a record is found. If no record is found and the DNS search list is exhausted, then
  6. (Error condition; no servers could be found)
 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More Robust Autoconfiguration/Service Discovery #367

Overview

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

More Robust Autoconfiguration/Service Discovery #367

Description

Overview

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions