-
Notifications
You must be signed in to change notification settings - Fork 100
Open
Description
For etcd IO issues the WAL fsync graph of the etcd grafana dashboard provides a useful measure of IO performance over time, in comparison to the point-in-time check of fio. Suggest collecting this data via the log collector where rancher-monitoring is installed, to save requesting users check the grafana dashboard manually. Below is just an example of collecting this data from the prometheus service, generated by gemini and based on the grafana dashboard query, which I tested in a lab:
# Define variables for clarity
PROMETHEUS_URL="http://10.43.250.198:9090"
PROMQL_QUERY='histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket{job="kube-etcd"}[1m])) by (instance, le))'
# URL-encode the query (most shells can handle this, but for scripts use a utility)
ENCODED_QUERY=$(python -c "import urllib.parse; print(urllib.parse.quote('''${PROMQL_QUERY}'''))")
# Set time range (e.g., the last hour)
END_TIME=$(date +%s)
START_TIME=$((END_TIME - 3600))
# Make the API call
curl -G "${PROMETHEUS_URL}/api/v1/query_range" \
--data-urlencode "query=${PROMQL_QUERY}" \
--data "start=${START_TIME}" \
--data "end=${END_TIME}" \
--data "step=1m"
Metadata
Metadata
Assignees
Labels
No labels