Pull etcd WAL fsync data from prometheus in clusters with rancher-monitoring installed

For etcd IO issues the WAL fsync graph of the etcd grafana dashboard provides a useful measure of IO performance over time, in comparison to the point-in-time check of fio. Suggest collecting this data via the log collector where rancher-monitoring is installed, to save requesting users check the grafana dashboard manually. Below is just an example of collecting this data from the prometheus service, generated by gemini and based on the grafana dashboard query, which I tested in a lab:

```bash
# Define variables for clarity
PROMETHEUS_URL="http://10.43.250.198:9090"
PROMQL_QUERY='histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket{job="kube-etcd"}[1m])) by (instance, le))'

# URL-encode the query (most shells can handle this, but for scripts use a utility)
ENCODED_QUERY=$(python -c "import urllib.parse; print(urllib.parse.quote('''${PROMQL_QUERY}'''))")

# Set time range (e.g., the last hour)
END_TIME=$(date +%s)
START_TIME=$((END_TIME - 3600))

# Make the API call
curl -G "${PROMETHEUS_URL}/api/v1/query_range" \
  --data-urlencode "query=${PROMQL_QUERY}" \
  --data "start=${START_TIME}" \
  --data "end=${END_TIME}" \
  --data "step=1m"
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pull etcd WAL fsync data from prometheus in clusters with rancher-monitoring installed #386

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pull etcd WAL fsync data from prometheus in clusters with rancher-monitoring installed #386

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions