Skip to content

Commit f8625c5

Browse files
rr404jdv
andauthored
stack health v1 (#954)
Co-authored-by: jdv <[email protected]>
1 parent 571ecb0 commit f8625c5

File tree

7 files changed

+129
-0
lines changed

7 files changed

+129
-0
lines changed

crowdsec-docs/sidebarsUnversioned.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -422,6 +422,11 @@ const sidebarsUnversionedConfig: SidebarConfig = {
422422
},
423423
],
424424
},
425+
{
426+
type: "doc",
427+
label: "🩺 Stack Health",
428+
id: "console/stackhealth",
429+
},
425430
],
426431
remediationSideBar: [
427432
{
66.8 KB
Loading
99.9 KB
Loading
603 KB
Loading
688 KB
Loading
465 KB
Loading
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
---
2+
id: stackhealth
3+
title: Stack Health
4+
---
5+
6+
The **Stack Health** Feature is a monitoring tool within the CrowdSec Console helping you keep your infrastructure operational and properly configured.
7+
Its primary goal is to identify configuration issues, connectivity problems, or potential misconfigurations that could impact your detection capabilities.
8+
9+
---
10+
11+
## Key Features
12+
13+
- **Issue Detection**: Identifies problems with Security Engines, Log Processors, and blocklists integrations
14+
- **Severity-Based Prioritization**: Issues are categorized by criticality *(Critical, Important, Recommended, Bonus)*
15+
- **Contextual Troubleshooting**: Each issue points to a dedicated troubleshooting page with detailed diagnosis steps and resolution guidance
16+
- **Notification Support**: Get notified about critical issues through the [Console notification system](/u/console/notification_integrations/overview)
17+
18+
---
19+
20+
## Accessing Stack Health
21+
22+
**Stack Health** is available in the CrowdSec Console for all authenticated users.
23+
It is manifesting as:
24+
- A dedicated dashboard accessible from the **Security Stack** space left bar menu.
25+
- An issue counter badge on the Security Engines cards *(circle in top right corner)*
26+
- A list of issues in the Security Engine details view.
27+
28+
### Stack Health Dashboard
29+
30+
The dashboard shows:
31+
- List of all detected issues in your organization grouped by criticality
32+
- A filter to focus on a specific Security Engine
33+
- Each issue card displays:
34+
- Issue title and description
35+
- Affected Security Engine(s)
36+
- Buttons to: mark as resolved, ignore or access troubleshooting guide
37+
38+
![Stack Health Overview](/img/console/stackhealth/stackhealth-overview.png)
39+
40+
### Issues in Security Engine view
41+
42+
A badge on the Security Engine card indicates the number of active issues affecting that engine.
43+
If you click on the Security Engine card to access its details, you will find a dedicated section listing all active issues for that engine.
44+
45+
|SE Card with Badge| SE Details with Issues|
46+
|------------------|-----------------------|
47+
|![Issues Badge](/img/console/stackhealth/stackhealth-engine-badge.png)|![Issues in Engine Details](/img/console/stackhealth/stackhealth-engine-details.png)|
48+
49+
50+
---
51+
52+
## Understanding Issue Criticality
53+
54+
Stack Health categorizes issues into four severity levels:
55+
56+
| Severity | Description |
57+
|----------|-------------|
58+
| **Critical** | Immediate attention required - core functionality is impaired |
59+
| **Important** | Should be addressed soon - may impact protection effectiveness |
60+
| **Recommended** | Additional actions to improve your security posture |
61+
| **Bonus** | Optimization advice and premium feature recommendations |
62+
63+
Focus on resolving Critical and Important issues first to ensure your security stack is functioning properly.
64+
65+
---
66+
67+
## Issue Details and Resolution
68+
69+
Click on any issue to view detailed information and step-by-step troubleshooting guidance.
70+
71+
![Issue Details](/img/console/stackhealth/stackhealth-issue-details.png)
72+
73+
Each issue detail page includes:
74+
75+
- **Trigger Condition**: Why the issue was raised
76+
- **Criticality Level**: Severity and priority
77+
- **Impact**: What functionality is affected
78+
- **Engine Information**: Affected Security Engine details (ID, OS, IP address)
79+
- **Contextual Troubleshooting**: Specific diagnosis steps for your situation
80+
81+
### Example: Security Engine No Alerts
82+
83+
When a Security Engine hasn't generated alerts in 48 hours, Stack Health provides:
84+
85+
- Possible root causes (simulation mode, missing collections, low traffic)
86+
- Commands to verify scenario status
87+
- Steps to check log acquisition and parsing
88+
- Links to related documentation
89+
90+
![Security Engine No Alerts](/img/console/stackhealth/stackhealth-no-alerts-troubleshooting.png)
91+
92+
### List of Issues
93+
94+
Refer to the [**Console Health Check Issues**](/u/troubleshooting/console_issues) documentation page for a comprehensive **list of all Stack Health issues**, their **trigger** conditions, and links to **troubleshooting** guides.
95+
This page is regularly updated as new issues are added.
96+
97+
---
98+
99+
## Notifications and Alerts
100+
101+
Stack Health integrates with the Console notification system to alert you when critical issues occur.
102+
103+
To receive notifications:
104+
105+
1. Navigate to **Notification Settings** in the Console
106+
2. Configure your preferred notification channels (Email, Slack, Discord, Webhook)
107+
3. Set up notification rules for Stack Health events
108+
109+
Learn more about [Console Notification Integrations](/u/console/notification_integrations/overview).
110+
111+
---
112+
113+
## Best Practices
114+
115+
### Regular Monitoring
116+
- Check Stack Health dashboard regularly, especially after infrastructure changes
117+
- Set up [notifications](/u/console/notification_integrations/overview) for **Critical** and **Important** issues
118+
- Review the full list of issues at least weekly
119+
120+
### Prioritize by Severity
121+
1. Address **Critical** issues immediately - they indicate broken functionality
122+
2. Plan to fix **Important** issues within 24-48 hours
123+
3. Schedule **Recommended** improvements during maintenance windows
124+
4. Explore **Bonus** optimizations when optimizing your setup

0 commit comments

Comments
 (0)