Skip to content

Conversation

enggnr
Copy link
Contributor

@enggnr enggnr commented Jun 30, 2023

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
    Provides a script to add the host to etcd cluster

  • What is the new behavior (if this is a feature change)?
    Fixes Add logic that implements etcd #29

@ProfessorManhattan, please note that the IP addresses that etcd advertises/listens to are taken from hostname command. In cases where there may be multiple IPs assigned to the host, and a specific IP is to be used for etcd, then that can be made an input. Please share your thoughts on this.

@github-actions
Copy link

🤖 OpenAI


Chat with 🤖 OpenAI Bot (@openai)

  • Reply on review comments left by this bot to ask follow-up questions. A review comment is a comment on a diff or a file.
  • Invite the bot into a review comment chain by tagging @openai in a reply.

Code suggestions

  • The bot may make code suggestions, but please review them carefully before committing since the line number ranges may be misaligned.
  • You can edit the comment made by the bot and manually tweak the suggestion if it is slightly off.

Ignoring further reviews

  • Type @openai: ignore anywhere in the PR description to ignore further reviews from the bot.

Files not summarized due to errors (2)

Failed to summarize

  • home/.chezmoi.yaml.tmpl (nothing obtained from openai)
  • home/.chezmoiscripts/universal/run_onchange_before_11-join-etcd-cluster.tmpl (nothing obtained from openai)

In the recent run, only the files that changed from the base of the PR and between d6e9d2d434f14f429d4b15af9ce23e236f69d1e6 and fa3b21e55b28b39719aaef472ed50c33228b1640 commits were reviewed.

Files not reviewed due to errors in the recent run (2)

Failed to review in the last run

  • home/.chezmoiscripts/universal/run_onchange_before_11-join-etcd-cluster.tmpl (no response)
  • home/.chezmoi.yaml.tmpl (no response)

Copy link
Contributor

@ProfessorManhattan ProfessorManhattan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions and suggestions

docker:
doRegion: nyc1
domain: "{{ $domain }}"
etcd:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this? See comments below

Copy link
Contributor Author

@enggnr enggnr Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are to use the public_services_domain and the etcd cluster member available at etcd.{{ service_domain }} has to be the only member in the cluster. The host where the script runs will be the 2nd member. If there are more members in the cluster, then this variable is needed. This is because, cluster configuration is a 2 step process. More details below.


The initial_cluster_url is the value returned by etcdctl on adding a new host to the cluster. The command returns 3 values, which can be set as environment variables and a new member bootstrapped. This member will join the cluster using the information in these variables.

The value of initial_cluster_urls is set to the value of ETCD_INITIAL_CLUSTER (one of the 3 values returned).

An example is below, taken from here.

etcdctl member add infra3 --peer-urls=http://10.0.1.13:2380
added member 9bf1b35fc7761a23 to cluster

ETCD_NAME="infra3"
ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra3=http://10.0.1.13:2380"
ETCD_INITIAL_CLUSTER_STATE=existing

Start the new etcd member:
export ETCD_NAME="infra3"
$ export ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra3=http://10.0.1.13:2380"
$ export ETCD_INITIAL_CLUSTER_STATE=existing
$ etcd --listen-client-urls http://10.0.1.13:2379 --advertise-client-urls http://10.0.1.13:2379 --listen-peer-urls http://10.0.1.13:2380 --initial-advertise-peer-urls http://10.0.1.13:2380 --data-dir %data_dir%

Note: The ETCD_INITIAL_CLUSTER value includes the name and advertised peer URLs of existing members and the new member. This is used by the new member to verify the configuration.

While it is possible to use a discovery service (an internal one) to add a member (after the cluster is up and running), using runtime configuration is the recommended way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comments below. We need to use the CloudFlare API to get a list of all of the subdomains for the PUBLIC_SERVICES_DOMAIN and then piece together the network like that. We can store data in TXT records, for instance and assume the master node of etcd for instance is available at etcd.megabyte.space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @enggnr --- can we pull the DNS records for the PUBLIC_SERVICES_DOMAIN and return all the DNS entries that look something like etcd.{{ .host.hostname }}.{{ .host.domain }}? This should give all the existing nodes which we can then use to populate everything.

ETCD_NAME should just equal the {{ .host.hostname }}

Can you take a swing at this implementation? I think the cfcli should be able to provide all the functionality we need to get this working.

#
# | Variable | Description |
# |------------------------------|------------------------------------------------------------|
# | `etcd.initial_cluster_urls` | Appropriately formatted initial cluster configuration |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace this with an address generated based on the domain address that gets populated in .chezmoi.yaml.tmpl? That way we can provision etcd even before we know the final IP address of the etcd server. Also, what is the node part for? Shouldn't we just need a server to connect to? Also, we should be using https. And where did you get the two IP addresses from listed above?

I think we should run this over CloudFlare tunnels which I'll be setting up so you can assume that etcd.{{ service_domain }} points to the server. Just not sure what the node1 is for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script is adding a member to an etcd cluster. If we are bootstrapping a new cluster, in which the host will be a member, then a discover mechanism can be used. In that case, the variable can be removed. This will let the cluster be created before the list of members and their addresses are known. This is different from adding a member to a cluster.

The node part is the name of the member in the cluster. I am not sure why there is a 'mapping' between the name and its advertise URLs. The value of advertise-peer-urls holds this 'map' of all the members that are part of the cluster and it changes when the membership changes.

The IPs are just examples. I knew this would need changes, so I put it in to show what is expected.

Yes, we should be using HTTPS. Certificates can be passed as inputs when starting a member to encrypt communication between peers and with clients. It is also possible to have the members automatically generate certificates for communication among peers. Certificates can be used for authentication as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @enggnr -- we need to come up with a clever plan to use the PUBLIC_SERVICES_DOMAIN's DNS record on CloudFlare to figure out all the other peers, masters and then either provision the master node or join to it with a peer.

Each network provisioned by Install Doctor will use the PUBLIC_SERVICES_DOMAIN to classify which network to join to.

Copy link
Contributor Author

@enggnr enggnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments/clarifications. Please review.

Copy link
Contributor

@ProfessorManhattan ProfessorManhattan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I added a few comments / change requests to this one. Let's redesign this and some of the other PRs to get rid of the need for IP addresses and just use Cloudflare DNS / their API to use a single PUBLIC_SERVICES_DOMAIN to group together networks.

#
# | Variable | Description |
# |------------------------------|------------------------------------------------------------|
# | `etcd.initial_cluster_urls` | Appropriately formatted initial cluster configuration |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @enggnr -- we need to come up with a clever plan to use the PUBLIC_SERVICES_DOMAIN's DNS record on CloudFlare to figure out all the other peers, masters and then either provision the master node or join to it with a peer.

Each network provisioned by Install Doctor will use the PUBLIC_SERVICES_DOMAIN to classify which network to join to.

docker:
doRegion: nyc1
domain: "{{ $domain }}"
etcd:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comments below. We need to use the CloudFlare API to get a list of all of the subdomains for the PUBLIC_SERVICES_DOMAIN and then piece together the network like that. We can store data in TXT records, for instance and assume the master node of etcd for instance is available at etcd.megabyte.space.

@ProfessorManhattan
Copy link
Contributor

Hey @enggnr --- left a few comments on other threads --- please let me know if you have any ideas on how to set up network discovery / auto-joining for these various services.

docker:
doRegion: nyc1
domain: "{{ $domain }}"
etcd:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @enggnr --- can we pull the DNS records for the PUBLIC_SERVICES_DOMAIN and return all the DNS entries that look something like etcd.{{ .host.hostname }}.{{ .host.domain }}? This should give all the existing nodes which we can then use to populate everything.

ETCD_NAME should just equal the {{ .host.hostname }}

Can you take a swing at this implementation? I think the cfcli should be able to provide all the functionality we need to get this working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add logic that implements etcd
2 participants