Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
ef2676f
Update monitor-built-in-alerting.md
huoyao1125 Mar 2, 2026
2042954
Apply suggestion from @gemini-code-assist[bot]
huoyao1125 Mar 2, 2026
e278773
Update monitor-datadog-integration.md
huoyao1125 Mar 2, 2026
d3acbc7
Update monitor-new-relic-integration.md
huoyao1125 Mar 2, 2026
ed83eed
Update monitor-prometheus-and-grafana-integration.md
huoyao1125 Mar 2, 2026
f182e31
Update monitor-datadog-integration.md
huoyao1125 Mar 2, 2026
e02841d
Update monitor-new-relic-integration.md
huoyao1125 Mar 2, 2026
a323295
Update monitor-prometheus-and-grafana-integration.md
huoyao1125 Mar 2, 2026
3da01d4
Apply suggestions from code review
hfxsd Mar 2, 2026
9d5ea71
Apply suggestions from code review
hfxsd Mar 2, 2026
eef6bf2
Apply suggestions from code review
hfxsd Mar 2, 2026
859e133
Apply suggestions from code review
hfxsd Mar 3, 2026
5afb3b3
Apply suggestions from code review
hfxsd Mar 3, 2026
c475d39
Update tidb-cloud/monitor-prometheus-and-grafana-integration.md
huoyao1125 Mar 3, 2026
9d31ef5
Update tidb-cloud/monitor-built-in-alerting.md
huoyao1125 Mar 3, 2026
166b75c
Update tidb-cloud/monitor-datadog-integration.md
huoyao1125 Mar 3, 2026
6f4392e
Update tidb-cloud/monitor-new-relic-integration.md
huoyao1125 Mar 3, 2026
39fc563
Apply suggestions from code review
hfxsd Mar 3, 2026
40f3b3f
Update built-in-monitoring.md
huoyao1125 Mar 3, 2026
d309864
Update built-in-monitoring.md
huoyao1125 Mar 3, 2026
154aeb0
Update monitor-datadog-integration.md
huoyao1125 Mar 3, 2026
11fb339
Update monitor-prometheus-and-grafana-integration.md
huoyao1125 Mar 3, 2026
a1b0e29
Update monitor-new-relic-integration.md
huoyao1125 Mar 3, 2026
2fec899
Update monitor-prometheus-and-grafana-integration.md
huoyao1125 Mar 3, 2026
83773c6
Apply suggestions from code review
hfxsd Mar 3, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions tidb-cloud/built-in-monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,12 @@ The following sections illustrate the metrics on the **Metrics** page for TiDB C
| TiKV CPU Usage | node, limit | The CPU usage statistics or upper limit of each TiKV node. |
| TiKV Memory Usage | node, limit | The memory usage statistics or upper limit of each TiKV node. |
| TiKV IO Bps | node-write, node-read | The total input/output bytes per second of read and write in each TiKV node. |
| TiKV Storage Usage | node, limit | The storage usage statistics or upper limit of each TiKV node. |
| TiKV Storage Usage | node, limit | The storage usage statistics or upper limit of each TiKV node. The storage usage includes the logical data size in the storage engine, as well as WAL files and temporary files. |
| TiFlash Uptime | node | The runtime of each TiFlash node since last restart. |
| TiFlash CPU Usage | node, limit | The CPU usage statistics or upper limit of each TiFlash node. |
| TiFlash Memory Usage | node, limit | The memory usage statistics or upper limit of each TiFlash node. |
| TiFlash IO MBps | node-write, node-read | The total bytes of read and write in each TiFlash node. |
| TiFlash Storage Usage | node, limit | The storage usage statistics or upper limit of each TiFlash node. |
| TiFlash Storage Usage | node, limit | The storage usage statistics or upper limit of each TiFlash node. The storage usage includes the logical data size in the storage engine, as well as WAL files and temporary files. |
| TiProxy CPU Usage | node | The CPU usage statistics of each TiProxy node. The upper limit is 100%. |
| TiProxy Connections | node | The number of connections on each TiProxy node. |
| TiProxy Throughput | node | The bytes transferred per second on each TiProxy node. |
Expand Down
4 changes: 2 additions & 2 deletions tidb-cloud/monitor-built-in-alerting.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,8 @@ TiDB Cloud provides different alert rules for each cluster plan, based on the fe
| Total TiDB node CPU utilization exceeded 80% for 10 minutes | Consider increasing the node number or node size for TiDB to reduce the CPU usage percentage of the current workload.|
| Total TiKV node CPU utilization exceeded 80% for 10 minutes | Consider increasing the node number or node size for TiKV to reduce the CPU usage percentage of the current workload. |
| Total TiFlash node CPU utilization exceeded 80% for 10 minutes | Consider increasing the node number or node size for TiFlash to reduce the CPU usage percentage of the current workload. |
| TiKV storage utilization exceeds 80% | Consider increasing the node number or node storage size for TiKV to increase your storage capacity. |
| TiFlash storage utilization exceeds 80% | Consider increasing the node number or node storage size for TiFlash to increase your storage capacity. |
| TiKV storage utilization exceeds 80% | Consider increasing the node number or node storage size for TiKV to increase your storage capacity. When the storage usage of TiKV exceeds 80%, latency spikes might occur, and higher usage might cause requests to fail. |
| TiFlash storage utilization exceeds 80% | Consider increasing the node number or node storage size for TiFlash to increase your storage capacity. When the storage usage of all TiFlash nodes reaches 80%, any DDL statement that adds a TiFlash replica hangs indefinitely. |
| Max memory utilization across TiDB nodes exceeded 70% for 10 minutes | Consider checking if there is any [hotspot](/tidb-cloud/tidb-cloud-sql-tuning-overview.md#hotspot-issues) in the cluster or increasing the node number or node size for TiDB to reduce the memory usage percentage of the current workload. |
| Max memory utilization across TiKV nodes exceeded 70% for 10 minutes | Consider checking if there is any [hotspot](/tidb-cloud/tidb-cloud-sql-tuning-overview.md#hotspot-issues) in the cluster or increasing the node number or node size for TiKV to reduce the memory usage percentage of the current workload. |
| Max CPU utilization across TiDB nodes exceeded 80% for 10 minutes | Consider checking if there is any [hotspot](/tidb-cloud/tidb-cloud-sql-tuning-overview.md#hotspot-issues) in the cluster or increasing the node number or node size for TiDB to reduce the CPU usage percentage of the current workload. |
Expand Down
2 changes: 1 addition & 1 deletion tidb-cloud/monitor-datadog-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ Datadog tracks the following metrics for your TiDB clusters.
| tidb_cloud.db_command_per_second| gauge | type: Query\|StmtPrepare\|...<br/>cluster_name: `<cluster name>`<br/>instance: tidb-0\|tidb-1…<br/>component: `tidb` | The number of commands processed by TiDB per second, which is classified according to the success or failure of command execution results. |
| tidb_cloud.db_queries_using_plan_cache_ops| gauge | cluster_name: `<cluster name>`<br/>instance: tidb-0\|tidb-1…<br/>component: `tidb` | The statistics of queries using [Plan Cache](/sql-prepared-plan-cache.md) per second. The execution plan cache only supports the prepared statement command. |
| tidb_cloud.db_transaction_per_second| gauge | txn_mode: pessimistic\|optimistic<br/>type: abort\|commit\|...<br/>cluster_name: `<cluster name>`<br/>instance: tidb-0\|tidb-1…<br/>component: `tidb` | The number of transactions executed per second. |
| tidb_cloud.node_storage_used_bytes | gauge | cluster_name: `<cluster name>`<br/>instance: tikv-0\|tikv-1…\|tiflash-0\|tiflash-1…<br/>component: tikv\|tiflash | The disk usage, in bytes, for TiKV or TiFlash nodes. This metric primarily represents the logical data size in the storage engine, and excludes filesystem overhead, WAL files, and temporary files. To calculate the actual disk usage rate, use `(capacity - available) / capacity` instead. |
| tidb_cloud.node_storage_used_bytes | gauge | cluster_name: `<cluster name>`<br/>instance: tikv-0\|tikv-1…\|tiflash-0\|tiflash-1…<br/>component: tikv\|tiflash | The disk usage, in bytes, for TiKV or TiFlash nodes. This metric primarily represents the logical data size in the storage engine, and excludes WAL files and temporary files. To calculate the actual disk usage rate, use `(capacity - available) / capacity` instead. When the storage usage of TiKV exceeds 80%, latency spikes might occur, and higher usage might cause requests to fail. When the storage usage of all TiFlash nodes reaches 80%, any DDL statement that adds a TiFlash replica hangs indefinitely. |
| tidb_cloud.node_storage_capacity_bytes | gauge | cluster_name: `<cluster name>`<br/>instance: tikv-0\|tikv-1…\|tiflash-0\|tiflash-1…<br/>component: tikv\|tiflash | The disk capacity of TiKV/TiFlash nodes, in bytes. |
| tidb_cloud.node_cpu_seconds_total | count | cluster_name: `<cluster name>`<br/>instance: tidb-0\|tidb-1…\|tikv-0…\|tiflash-0…<br/>component: tidb\|tikv\|tiflash | The CPU usage of TiDB/TiKV/TiFlash nodes. |
| tidb_cloud.node_cpu_capacity_cores | gauge | cluster_name: `<cluster name>`<br/>instance: tidb-0\|tidb-1…\|tikv-0…\|tiflash-0…<br/>component: tidb\|tikv\|tiflash | The limit on CPU cores of TiDB/TiKV/TiFlash nodes. |
Expand Down
2 changes: 1 addition & 1 deletion tidb-cloud/monitor-new-relic-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ New Relic tracks the following metrics for your TiDB clusters.
| tidb_cloud.db_command_per_second| gauge | type: Query\|StmtPrepare\|...<br/><br/>cluster_name: `<cluster name>`<br/><br/>instance: tidb-0\|tidb-1…<br/><br/>component: `tidb` | The number of commands processed by TiDB per second, which is classified according to the success or failure of command execution results. |
| tidb_cloud.db_queries_using_plan_cache_ops| gauge | cluster_name: `<cluster name>`<br/><br/>instance: tidb-0\|tidb-1…<br/><br/>component: `tidb` | The statistics of queries using [Plan Cache](/sql-prepared-plan-cache.md) per second. The execution plan cache only supports the prepared statement command. |
| tidb_cloud.db_transaction_per_second| gauge | txn_mode: pessimistic\|optimistic<br/><br/>type: abort\|commit\|...<br/><br/>cluster_name: `<cluster name>`<br/><br/>instance: tidb-0\|tidb-1…<br/><br/>component: `tidb` | The number of transactions executed per second. |
| tidb_cloud.node_storage_used_bytes | gauge | cluster_name: `<cluster name>`<br/><br/>instance: tikv-0\|tikv-1…\|tiflash-0\|tiflash-1…<br/><br/>component: tikv\|tiflash | The disk usage, in bytes, for TiKV or TiFlash nodes. This metric primarily represents the logical data size in the storage engine, and excludes filesystem overhead, WAL files, and temporary files. To calculate the actual disk usage rate, use `(capacity - available) / capacity` instead. |
| tidb_cloud.node_storage_used_bytes | gauge | cluster_name: `<cluster name>`<br/><br/>instance: tikv-0\|tikv-1…\|tiflash-0\|tiflash-1…<br/><br/>component: tikv\|tiflash | The disk usage, in bytes, for TiKV or TiFlash nodes. This metric primarily represents the logical data size in the storage engine, and excludes WAL files and temporary files. To calculate the actual disk usage rate, use `(capacity - available) / capacity` instead. When the storage usage of TiKV exceeds 80%, latency spikes might occur, and higher usage might cause requests to fail. When the storage usage of all TiFlash nodes reaches 80%, any DDL statement that adds a TiFlash replica hangs indefinitely. |
| tidb_cloud.node_storage_capacity_bytes | gauge | cluster_name: `<cluster name>`<br/><br/>instance: tikv-0\|tikv-1…\|tiflash-0\|tiflash-1…<br/><br/>component: tikv\|tiflash | The disk capacity of TiKV/TiFlash nodes, in bytes. |
| tidb_cloud.node_cpu_seconds_total (Beta only) | count | cluster_name: `<cluster name>`<br/><br/>instance: tidb-0\|tidb-1…\|tikv-0…\|tiflash-0…<br/><br/>component: tidb\|tikv\|tiflash | The CPU usage of TiDB/TiKV/TiFlash nodes. |
| tidb_cloud.node_cpu_capacity_cores | gauge | cluster_name: `<cluster name>`<br/><br/>instance: tidb-0\|tidb-1…\|tikv-0…\|tiflash-0…<br/><br/>component: tidb\|tikv\|tiflash | The limit on CPU cores of TiDB/TiKV/TiFlash nodes. |
Expand Down
2 changes: 1 addition & 1 deletion tidb-cloud/monitor-prometheus-and-grafana-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ Prometheus tracks the following metric data for your TiDB clusters.
| tidbcloud_changefeed_latency | gauge | changefeed_id | The data replication latency between the upstream and the downstream of a changefeed |
| tidbcloud_changefeed_checkpoint_ts | gauge | changefeed_id | The checkpoint timestamp of a changefeed, representing the largest TSO (Timestamp Oracle) successfully written to the downstream |
| tidbcloud_changefeed_replica_rows | gauge | changefeed_id | The number of replicated rows that a changefeed writes to the downstream per second |
| tidbcloud_node_storage_used_bytes | gauge | cluster_name: `<cluster name>`<br/>instance: `tikv-0\|tikv-1…\|tiflash-0\|tiflash-1…`<br/>component: `tikv\|tiflash` | The disk usage, in bytes, for TiKV or TiFlash nodes. This metric primarily represents the logical data size in the storage engine, and excludes filesystem overhead, WAL files, and temporary files. To calculate the actual disk usage rate, use `(capacity - available) / capacity` instead. |
| tidbcloud_node_storage_used_bytes | gauge | cluster_name: `<cluster name>`<br/>instance: `tikv-0\|tikv-1…\|tiflash-0\|tiflash-1…`<br/>component: `tikv\|tiflash` | The disk usage, in bytes, for TiKV or TiFlash nodes. This metric primarily represents the logical data size in the storage engine, and excludes WAL files and temporary files. To calculate the actual disk usage rate, use `(capacity - available) / capacity` instead. When the storage usage of TiKV exceeds 80%, latency spikes might occur, and higher usage might cause requests to fail. When the storage usage of all TiFlash nodes reaches 80%, any DDL statement that adds a TiFlash replica hangs indefinitely. |
| tidbcloud_node_storage_capacity_bytes | gauge | cluster_name: `<cluster name>`<br/>instance: `tikv-0\|tikv-1…\|tiflash-0\|tiflash-1…`<br/>component: `tikv\|tiflash` | The disk capacity bytes of TiKV/TiFlash nodes |
| tidbcloud_node_cpu_seconds_total | count | cluster_name: `<cluster name>`<br/>instance: `tidb-0\|tidb-1…\|tikv-0…\|tiflash-0…`<br/>component: `tidb\|tikv\|tiflash` | The CPU usage of TiDB/TiKV/TiFlash nodes |
| tidbcloud_node_cpu_capacity_cores | gauge | cluster_name: `<cluster name>`<br/>instance: `tidb-0\|tidb-1…\|tikv-0…\|tiflash-0…`<br/>component: `tidb\|tikv\|tiflash` | The CPU limit cores of TiDB/TiKV/TiFlash nodes |
Expand Down