Skip to content

Commit e68a6cd

Browse files
committed
Merge remote-tracking branch 'origin/dev' into DT-420-aws-security-lake-documentation
2 parents f3f4bc7 + 80e302f commit e68a6cd

File tree

4 files changed

+273
-6
lines changed

4 files changed

+273
-6
lines changed
Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
# Microsoft Sentinel data lake
2+
3+
<span className="theme-doc-version-badge badge badge--secondary">Microsoft Azure</span><span className="theme-doc-version-badge badge badge--secondary">SIEM</span>
4+
5+
## Synopsis
6+
7+
Creates a target that ingests log messages into Microsoft Sentinel data lake tables with lower ingestion costs and extended retention capabilities. Optimized for high-volume, high-fidelity log types like firewall logs, DNS logs, and network traffic requiring long-term storage.
8+
9+
:::tip
10+
For more details on Microsoft Sentinel integration, refer to <Topic id="sentinel-overview">Microsoft Sentinel Overview</Topic> and <Topic id="sentinel-integration">Microsoft Sentinel Integration</Topic>. For Director Proxy deployment, see <Topic id="about-director">VirtualMetric Director Proxy</Topic>.
11+
:::
12+
13+
## Schema
14+
15+
```yaml {1,3}
16+
- name: <string>
17+
description: <string>
18+
type: sentineldatalake
19+
pipelines: <pipeline[]>
20+
status: <boolean>
21+
properties:
22+
tenant_id: <string>
23+
client_id: <string>
24+
client_secret: <string>
25+
function_app: <string>
26+
function_token: <string>
27+
rule_id: <string>
28+
endpoint: <string>
29+
streams:
30+
- name: <string>
31+
rule_id: <string>
32+
stream: <string[]>
33+
buffer_size: <numeric>
34+
batch_size: <numeric>
35+
keep_phantom_fields: <boolean>
36+
drop_unknown_stream_events: <boolean>
37+
cache:
38+
timeout: <numeric>
39+
field_format: <string>
40+
debug:
41+
status: <boolean>
42+
dont_send_logs: <boolean>
43+
```
44+
45+
## Configuration
46+
47+
The following fields are used to define the target:
48+
49+
### Core Settings
50+
51+
|Field|Required|Default|Description|
52+
|---|---|---|---|
53+
|`name`|Y||Target name|
54+
|`description`|N|-|Optional description|
55+
|`type`|Y||Must be `sentineldatalake`|
56+
|`pipelines`|N|-|Optional post-processor pipelines|
57+
|`status`|N|`true`|Enable/disable the target|
58+
59+
### Authentication
60+
61+
|Field|Required|Default|Description|
62+
|---|---|---|---|
63+
|`tenant_id`|N*|-|Azure tenant ID (required for direct authentication)|
64+
|`client_id`|N*|-|Azure client ID (required for direct authentication)|
65+
|`client_secret`|N*|-|Client secret (required for direct authentication)|
66+
|`function_app`|N*|-|Director Proxy endpoint URL (required for proxy forwarding)|
67+
|`function_token`|N*|-|Director Proxy authentication token (required with function_app)|
68+
69+
\* = Conditionally required. Use either direct authentication (tenant_id, client_id, client_secret) OR Director Proxy forwarding (function_app, function_token).
70+
71+
### Stream Configuration
72+
73+
|Field|Required|Default|Description|
74+
|---|---|---|---|
75+
|`endpoint`|Y||Data Collection Endpoint URL or Resource ID|
76+
|`rule_id`|N|-|Default Data Collection Rule (DCR) ID|
77+
|`streams`|N|-|Array of stream configurations with name and optional rule_id|
78+
|`stream`|N|-|Legacy string array of stream names|
79+
|`buffer_size`|N|`1048576`|Buffer size in bytes (1MB)|
80+
|`batch_size`|N|`1000`|Maximum messages per batch|
81+
|`keep_phantom_fields`|N|`false`|Keep fields not defined in DCR schema|
82+
|`drop_unknown_stream_events`|N|`true`|Silently drop events for undefined streams|
83+
|`cache.timeout`|N|`300`|Stream cache timeout in seconds|
84+
|`field_format`|N|-|Data normalization format. See applicable <Topic id="normalization-mapping">Normalization</Topic> section|
85+
86+
### Debug Options
87+
88+
|Field|Required|Default|Description|
89+
|---|---|---|---|
90+
|`debug.status`|N|`false`|Enable debug logging|
91+
|`debug.dont_send_logs`|N|`false`|Process logs but don't send to Sentinel (testing)|
92+
93+
## Details
94+
95+
The Microsoft Sentinel data lake target provides cost-optimized ingestion for high-volume telemetry with extended retention requirements. Data lake ingestion offers significantly lower costs compared to standard DCR-based ingestion, making it ideal for firewall logs, DNS queries, network flows, and other high-fidelity telemetry requiring long-term storage.
96+
97+
### Data Lake Benefits
98+
99+
**Cost Efficiency** - Data lake ingestion costs are substantially lower than standard analytics ingestion, enabling cost-effective processing of massive telemetry volumes that would be prohibitively expensive with traditional methods.
100+
101+
**High Fidelity** - Preserves complete log detail without sampling or field reduction, maintaining full forensic capability for security investigations and compliance auditing.
102+
103+
**Extended Retention** - Optimized for long-term storage of high-volume logs, supporting retention periods spanning months or years for compliance requirements and historical analysis.
104+
105+
### Director Proxy Integration
106+
107+
The target supports two deployment models:
108+
109+
**Direct Authentication** - Director connects directly to Azure using service principal credentials (`tenant_id`, `client_id`, `client_secret`). This model requires Director to have network connectivity to Azure endpoints and credentials for the target subscription.
110+
111+
**Director Proxy Forwarding** - Director sends processed data to VirtualMetric Director Proxy (Azure Function) deployed in customer environment. Director Proxy uses Azure Managed Identity for credential-free access to Microsoft Sentinel data lake, eliminating the need to share Azure credentials with Director.
112+
113+
The Director Proxy model is particularly valuable for MSSP deployments where customers maintain complete control over Azure credentials while enabling centralized data processing and routing by the MSSP's Director infrastructure.
114+
115+
### Stream Discovery
116+
117+
When `endpoint` is specified as a Resource ID (not HTTPS URL), the target automatically discovers available Data Collection Rules and their associated streams. This autodiscovery feature simplifies configuration by eliminating manual stream enumeration.
118+
119+
Stream configurations can be filtered using the `streams` array to limit ingestion to specific tables. Each stream configuration supports independent DCR IDs via the `rule_id` field, enabling flexible routing to different data collection rules.
120+
121+
### Field Management
122+
123+
The target automatically detects table schemas and validates incoming data against defined columns. When `keep_phantom_fields` is `false` (default), fields not defined in the target schema are automatically removed before ingestion, preventing schema validation errors.
124+
125+
:::warning
126+
Disabling `keep_phantom_fields` removes undefined fields. Ensure all required fields are included in your DCR schema.
127+
:::
128+
129+
Data is buffered until batch size limits are reached or explicit flush occurs. The `drop_unknown_stream_events` setting (default: `true`) silently discards events for streams not configured in the target, preventing processing failures for unexpected data types.
130+
131+
:::warning
132+
Enabling `drop_unknown_stream_events` silently discards unmatched events. Monitor data flow to ensure expected streams are properly configured.
133+
:::
134+
135+
### Field Normalization
136+
137+
The `field_format` property normalizes log data to standard formats before ingestion:
138+
139+
- `csl` - Common Security Log format
140+
- `asim` - Advanced Security Information Model
141+
142+
Normalization ensures consistent field naming and structure across diverse log sources, improving query efficiency and security analytics capabilities.
143+
144+
## Examples
145+
146+
### Basic Configuration
147+
148+
Minimum configuration using direct Azure authentication:
149+
150+
```yaml
151+
targets:
152+
- name: sentinel_data_lake
153+
type: sentineldatalake
154+
properties:
155+
tenant_id: "00000000-0000-0000-0000-000000000000"
156+
client_id: "00000000-0000-0000-0000-000000000000"
157+
client_secret: "your-client-secret"
158+
endpoint: "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/myResourceGroup/providers/Microsoft.Insights/dataCollectionEndpoints/myDCE"
159+
```
160+
161+
### Director Proxy
162+
163+
Configuration using Director Proxy for credential-free forwarding:
164+
165+
```yaml
166+
targets:
167+
- name: proxy_data_lake
168+
type: sentineldatalake
169+
properties:
170+
function_app: "https://my-director-proxy.azurewebsites.net/api/Sentinel"
171+
function_token: "your-proxy-authentication-token"
172+
endpoint: "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/myResourceGroup/providers/Microsoft.Insights/dataCollectionEndpoints/myDCE"
173+
```
174+
175+
### Filtered Streams
176+
177+
Configuration with specific stream filtering and custom settings:
178+
179+
```yaml
180+
targets:
181+
- name: filtered_data_lake
182+
type: sentineldatalake
183+
properties:
184+
tenant_id: "00000000-0000-0000-0000-000000000000"
185+
client_id: "00000000-0000-0000-0000-000000000000"
186+
client_secret: "your-client-secret"
187+
endpoint: "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/myResourceGroup/providers/Microsoft.Insights/dataCollectionEndpoints/myDCE"
188+
streams:
189+
- name: "Custom-FirewallLogs"
190+
- name: "Custom-DNSLogs"
191+
keep_phantom_fields: false
192+
drop_unknown_stream_events: true
193+
cache:
194+
timeout: 600
195+
```
196+
197+
### High-Volume Processing
198+
199+
Optimized configuration for high-volume log ingestion:
200+
201+
```yaml
202+
targets:
203+
- name: high_volume_data_lake
204+
type: sentineldatalake
205+
pipelines:
206+
- normalization
207+
properties:
208+
function_app: "https://my-director-proxy.azurewebsites.net/api/Sentinel"
209+
function_token: "your-proxy-authentication-token"
210+
endpoint: "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/myResourceGroup/providers/Microsoft.Insights/dataCollectionEndpoints/myDCE"
211+
buffer_size: 5242880 # 5MB
212+
batch_size: 5000
213+
field_format: "asim"
214+
streams:
215+
- name: "Custom-FirewallLogs"
216+
rule_id: "dcr-00000000000000000000000000000000"
217+
- name: "Custom-DNSLogs"
218+
rule_id: "dcr-11111111111111111111111111111111"
219+
```
220+
221+
### Debug Configuration
222+
223+
Testing configuration with debug enabled:
224+
225+
```yaml
226+
targets:
227+
- name: debug_data_lake
228+
type: sentineldatalake
229+
properties:
230+
tenant_id: "00000000-0000-0000-0000-000000000000"
231+
client_id: "00000000-0000-0000-0000-000000000000"
232+
client_secret: "your-client-secret"
233+
endpoint: "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/myResourceGroup/providers/Microsoft.Insights/dataCollectionEndpoints/myDCE"
234+
debug:
235+
status: true
236+
dont_send_logs: true # Test mode - doesn't actually upload
237+
```

docs/configuration/targets/microsoft-sentinel.mdx

Lines changed: 33 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
Creates a target that ingests log messages into Microsoft Sentinel workspace tables using Data Collection Rules (DCRs). Supports automatic table selection, field normalization, and filtering options.
88

99
:::tip
10-
For more details, refer to our Microsoft Sentinel Overview and Microsoft Sentinel Integration chapters.
10+
For more details on Microsoft Sentinel integration, refer to <Topic id="sentinel-overview">Microsoft Sentinel Overview</Topic> and <Topic id="sentinel-integration">Microsoft Sentinel Integration</Topic>. For Director Proxy deployment, see <Topic id="about-director-proxy">VirtualMetric Director Proxy</Topic>. For cost-optimized ingestion with extended retention, see <Topic id="targets-microsoft-sentinel-data-lake">Microsoft Sentinel data lake</Topic>.
1111
:::
1212

1313
## Schema
@@ -22,6 +22,8 @@ For more details, refer to our Microsoft Sentinel Overview and Microsoft Sentine
2222
tenant_id: <string>
2323
client_id: <string>
2424
client_secret: <string>
25+
function_app: <string>
26+
function_token: <string>
2527
rule_id: <string>
2628
endpoint: <string>
2729
streams:
@@ -58,11 +60,13 @@ The following fields are used to define the target:
5860

5961
|Field|Required|Default|Description|
6062
|---|---|---|---|
61-
|`tenant_id`|N*|-|Azure tenant ID (required unless using managed identity)|
62-
|`client_id`|N*|-|Azure client ID (required unless using managed identity)|
63-
|`client_secret`|N*|-|Client secret (required unless using managed identity)|
63+
|`tenant_id`|N*|-|Azure tenant ID (required for direct authentication)|
64+
|`client_id`|N*|-|Azure client ID (required for direct authentication)|
65+
|`client_secret`|N*|-|Client secret (required for direct authentication)|
66+
|`function_app`|N*|-|Director Proxy endpoint URL (required for proxy forwarding)|
67+
|`function_token`|N*|-|Director Proxy authentication token (required with `function_app`)|
6468

65-
\* = Conditionally required (see authentication methods above)
69+
\* = Conditionally required. Use either direct authentication (`tenant_id`, `client_id`, `client_secret`) OR Director Proxy forwarding (`function_app`, `function_token`).
6670

6771
### Stream Configuration
6872

@@ -112,6 +116,16 @@ When `streams` is not specified, tables are automatically selected based on inpu
112116

113117
The Microsoft Sentinel target enables direct ingestion into Microsoft Sentinel tables with flexible configuration options. It supports using the `SystemS3` field to route messages to specific stream tables, using the format `Custom-TableName`.
114118

119+
### Deployment Models
120+
121+
The target supports two deployment models:
122+
123+
**Direct Authentication** - Director connects directly to Azure using service principal credentials (`tenant_id`, `client_id`, `client_secret`). This model requires Director to have network connectivity to Azure endpoints and credentials for the target subscription.
124+
125+
**Director Proxy Forwarding** - Director sends processed data to VirtualMetric Director Proxy (Azure Function) deployed in customer environment. Director Proxy uses Azure Managed Identity for credential-free access to Microsoft Sentinel, eliminating the need to share Azure credentials with Director.
126+
127+
The Director Proxy model is particularly valuable for MSSP deployments where customers maintain complete control over Azure credentials while enabling centralized data processing and routing by the MSSP's Director infrastructure.
128+
115129
The target automatically detects table schemas and can clean messages to remove phantom fields that aren't defined in the schema when `keep_phantom_fields` is set to `false`.
116130

117131
:::warning
@@ -133,7 +147,6 @@ The `field_format` property allows normalizing log data to standard formats:
133147

134148
Field normalization is applied before the logs are sent to Sentinel, ensuring consistent indexing and search capabilities.
135149

136-
137150
### Preconfigured Schemas
138151

139152
The target includes built-in schema definitions for standard tables like:
@@ -204,6 +217,20 @@ targets:
204217
endpoint: "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/myResourceGroup/providers/Microsoft.Insights/dataCollectionEndpoints/myDCE"
205218
```
206219

220+
### Director Proxy
221+
222+
Configuration using Director Proxy for credential-free forwarding:
223+
224+
```yaml
225+
targets:
226+
- name: proxy_sentinel
227+
type: sentinel
228+
properties:
229+
function_app: "https://my-director-proxy.azurewebsites.net/api/Sentinel"
230+
function_token: "your-proxy-authentication-token"
231+
endpoint: "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/myResourceGroup/providers/Microsoft.Insights/dataCollectionEndpoints/myDCE"
232+
```
233+
207234
### Filtered
208235

209236
Using specific stream filtering and custom cache timeout:

sidebars.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,7 @@ const sidebars: SidebarsConfig = {
125125
"configuration/targets/event-hubs",
126126
"configuration/targets/file",
127127
"configuration/targets/microsoft-sentinel",
128+
"configuration/targets/microsoft-sentinel-data-lake",
128129
"configuration/targets/splunk-hec",
129130
"configuration/targets/syslog",
130131
],

topics.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
{
22
"about-director": "/about/applications#virtualmetric-director",
3+
"about-director-proxy": "/about/applications#virtualmetric-director-proxy",
34
"about-agent": "/about/applications#virtualmetric-agent",
45
"sentinel-overview": "/microsoft-sentinel/overview",
56
"sentinel-integration": "/microsoft-sentinel/integration",
@@ -20,6 +21,7 @@
2021
"targets-console": "/configuration/targets/console",
2122
"targets-file": "/configuration/targets/file",
2223
"targets-microsoft-sentinel": "/configuration/targets/microsoft-sentinel",
24+
"targets-microsoft-sentinel-data-lake": "/configuration/targets/microsoft-sentinel-data-lake",
2325
"pipelines-overview": "/configuration/pipelines/overview",
2426
"pipelines-overview-config": "/configuration/pipelines/overview#configuration",
2527
"routes": "/configuration/routes",

0 commit comments

Comments
 (0)