|
| 1 | +# Microsoft Sentinel data lake |
| 2 | + |
| 3 | +<span className="theme-doc-version-badge badge badge--secondary">Microsoft Azure</span><span className="theme-doc-version-badge badge badge--secondary">SIEM</span> |
| 4 | + |
| 5 | +## Synopsis |
| 6 | + |
| 7 | +Creates a target that ingests log messages into Microsoft Sentinel data lake tables with lower ingestion costs and extended retention capabilities. Optimized for high-volume, high-fidelity log types like firewall logs, DNS logs, and network traffic requiring long-term storage. |
| 8 | + |
| 9 | +:::tip |
| 10 | +For more details on Microsoft Sentinel integration, refer to <Topic id="sentinel-overview">Microsoft Sentinel Overview</Topic> and <Topic id="sentinel-integration">Microsoft Sentinel Integration</Topic>. For Director Proxy deployment, see <Topic id="about-director">VirtualMetric Director Proxy</Topic>. |
| 11 | +::: |
| 12 | + |
| 13 | +## Schema |
| 14 | + |
| 15 | +```yaml {1,3} |
| 16 | +- name: <string> |
| 17 | + description: <string> |
| 18 | + type: sentineldatalake |
| 19 | + pipelines: <pipeline[]> |
| 20 | + status: <boolean> |
| 21 | + properties: |
| 22 | + tenant_id: <string> |
| 23 | + client_id: <string> |
| 24 | + client_secret: <string> |
| 25 | + function_app: <string> |
| 26 | + function_token: <string> |
| 27 | + rule_id: <string> |
| 28 | + endpoint: <string> |
| 29 | + streams: |
| 30 | + - name: <string> |
| 31 | + rule_id: <string> |
| 32 | + stream: <string[]> |
| 33 | + buffer_size: <numeric> |
| 34 | + batch_size: <numeric> |
| 35 | + keep_phantom_fields: <boolean> |
| 36 | + drop_unknown_stream_events: <boolean> |
| 37 | + cache: |
| 38 | + timeout: <numeric> |
| 39 | + field_format: <string> |
| 40 | + debug: |
| 41 | + status: <boolean> |
| 42 | + dont_send_logs: <boolean> |
| 43 | +``` |
| 44 | +
|
| 45 | +## Configuration |
| 46 | +
|
| 47 | +The following fields are used to define the target: |
| 48 | +
|
| 49 | +### Core Settings |
| 50 | +
|
| 51 | +|Field|Required|Default|Description| |
| 52 | +|---|---|---|---| |
| 53 | +|`name`|Y||Target name| |
| 54 | +|`description`|N|-|Optional description| |
| 55 | +|`type`|Y||Must be `sentineldatalake`| |
| 56 | +|`pipelines`|N|-|Optional post-processor pipelines| |
| 57 | +|`status`|N|`true`|Enable/disable the target| |
| 58 | + |
| 59 | +### Authentication |
| 60 | + |
| 61 | +|Field|Required|Default|Description| |
| 62 | +|---|---|---|---| |
| 63 | +|`tenant_id`|N*|-|Azure tenant ID (required for direct authentication)| |
| 64 | +|`client_id`|N*|-|Azure client ID (required for direct authentication)| |
| 65 | +|`client_secret`|N*|-|Client secret (required for direct authentication)| |
| 66 | +|`function_app`|N*|-|Director Proxy endpoint URL (required for proxy forwarding)| |
| 67 | +|`function_token`|N*|-|Director Proxy authentication token (required with function_app)| |
| 68 | + |
| 69 | +\* = Conditionally required. Use either direct authentication (tenant_id, client_id, client_secret) OR Director Proxy forwarding (function_app, function_token). |
| 70 | + |
| 71 | +### Stream Configuration |
| 72 | + |
| 73 | +|Field|Required|Default|Description| |
| 74 | +|---|---|---|---| |
| 75 | +|`endpoint`|Y||Data Collection Endpoint URL or Resource ID| |
| 76 | +|`rule_id`|N|-|Default Data Collection Rule (DCR) ID| |
| 77 | +|`streams`|N|-|Array of stream configurations with name and optional rule_id| |
| 78 | +|`stream`|N|-|Legacy string array of stream names| |
| 79 | +|`buffer_size`|N|`1048576`|Buffer size in bytes (1MB)| |
| 80 | +|`batch_size`|N|`1000`|Maximum messages per batch| |
| 81 | +|`keep_phantom_fields`|N|`false`|Keep fields not defined in DCR schema| |
| 82 | +|`drop_unknown_stream_events`|N|`true`|Silently drop events for undefined streams| |
| 83 | +|`cache.timeout`|N|`300`|Stream cache timeout in seconds| |
| 84 | +|`field_format`|N|-|Data normalization format. See applicable <Topic id="normalization-mapping">Normalization</Topic> section| |
| 85 | + |
| 86 | +### Debug Options |
| 87 | + |
| 88 | +|Field|Required|Default|Description| |
| 89 | +|---|---|---|---| |
| 90 | +|`debug.status`|N|`false`|Enable debug logging| |
| 91 | +|`debug.dont_send_logs`|N|`false`|Process logs but don't send to Sentinel (testing)| |
| 92 | + |
| 93 | +## Details |
| 94 | + |
| 95 | +The Microsoft Sentinel data lake target provides cost-optimized ingestion for high-volume telemetry with extended retention requirements. Data lake ingestion offers significantly lower costs compared to standard DCR-based ingestion, making it ideal for firewall logs, DNS queries, network flows, and other high-fidelity telemetry requiring long-term storage. |
| 96 | + |
| 97 | +### Data Lake Benefits |
| 98 | + |
| 99 | +**Cost Efficiency** - Data lake ingestion costs are substantially lower than standard analytics ingestion, enabling cost-effective processing of massive telemetry volumes that would be prohibitively expensive with traditional methods. |
| 100 | + |
| 101 | +**High Fidelity** - Preserves complete log detail without sampling or field reduction, maintaining full forensic capability for security investigations and compliance auditing. |
| 102 | + |
| 103 | +**Extended Retention** - Optimized for long-term storage of high-volume logs, supporting retention periods spanning months or years for compliance requirements and historical analysis. |
| 104 | + |
| 105 | +### Director Proxy Integration |
| 106 | + |
| 107 | +The target supports two deployment models: |
| 108 | + |
| 109 | +**Direct Authentication** - Director connects directly to Azure using service principal credentials (`tenant_id`, `client_id`, `client_secret`). This model requires Director to have network connectivity to Azure endpoints and credentials for the target subscription. |
| 110 | + |
| 111 | +**Director Proxy Forwarding** - Director sends processed data to VirtualMetric Director Proxy (Azure Function) deployed in customer environment. Director Proxy uses Azure Managed Identity for credential-free access to Microsoft Sentinel data lake, eliminating the need to share Azure credentials with Director. |
| 112 | + |
| 113 | +The Director Proxy model is particularly valuable for MSSP deployments where customers maintain complete control over Azure credentials while enabling centralized data processing and routing by the MSSP's Director infrastructure. |
| 114 | + |
| 115 | +### Stream Discovery |
| 116 | + |
| 117 | +When `endpoint` is specified as a Resource ID (not HTTPS URL), the target automatically discovers available Data Collection Rules and their associated streams. This autodiscovery feature simplifies configuration by eliminating manual stream enumeration. |
| 118 | + |
| 119 | +Stream configurations can be filtered using the `streams` array to limit ingestion to specific tables. Each stream configuration supports independent DCR IDs via the `rule_id` field, enabling flexible routing to different data collection rules. |
| 120 | + |
| 121 | +### Field Management |
| 122 | + |
| 123 | +The target automatically detects table schemas and validates incoming data against defined columns. When `keep_phantom_fields` is `false` (default), fields not defined in the target schema are automatically removed before ingestion, preventing schema validation errors. |
| 124 | + |
| 125 | +:::warning |
| 126 | +Disabling `keep_phantom_fields` removes undefined fields. Ensure all required fields are included in your DCR schema. |
| 127 | +::: |
| 128 | + |
| 129 | +Data is buffered until batch size limits are reached or explicit flush occurs. The `drop_unknown_stream_events` setting (default: `true`) silently discards events for streams not configured in the target, preventing processing failures for unexpected data types. |
| 130 | + |
| 131 | +:::warning |
| 132 | +Enabling `drop_unknown_stream_events` silently discards unmatched events. Monitor data flow to ensure expected streams are properly configured. |
| 133 | +::: |
| 134 | + |
| 135 | +### Field Normalization |
| 136 | + |
| 137 | +The `field_format` property normalizes log data to standard formats before ingestion: |
| 138 | + |
| 139 | +- `csl` - Common Security Log format |
| 140 | +- `asim` - Advanced Security Information Model |
| 141 | + |
| 142 | +Normalization ensures consistent field naming and structure across diverse log sources, improving query efficiency and security analytics capabilities. |
| 143 | + |
| 144 | +## Examples |
| 145 | + |
| 146 | +### Basic Configuration |
| 147 | + |
| 148 | +Minimum configuration using direct Azure authentication: |
| 149 | + |
| 150 | +```yaml |
| 151 | +targets: |
| 152 | + - name: sentinel_data_lake |
| 153 | + type: sentineldatalake |
| 154 | + properties: |
| 155 | + tenant_id: "00000000-0000-0000-0000-000000000000" |
| 156 | + client_id: "00000000-0000-0000-0000-000000000000" |
| 157 | + client_secret: "your-client-secret" |
| 158 | + endpoint: "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/myResourceGroup/providers/Microsoft.Insights/dataCollectionEndpoints/myDCE" |
| 159 | +``` |
| 160 | + |
| 161 | +### Director Proxy |
| 162 | + |
| 163 | +Configuration using Director Proxy for credential-free forwarding: |
| 164 | + |
| 165 | +```yaml |
| 166 | +targets: |
| 167 | + - name: proxy_data_lake |
| 168 | + type: sentineldatalake |
| 169 | + properties: |
| 170 | + function_app: "https://my-director-proxy.azurewebsites.net/api/Sentinel" |
| 171 | + function_token: "your-proxy-authentication-token" |
| 172 | + endpoint: "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/myResourceGroup/providers/Microsoft.Insights/dataCollectionEndpoints/myDCE" |
| 173 | +``` |
| 174 | + |
| 175 | +### Filtered Streams |
| 176 | + |
| 177 | +Configuration with specific stream filtering and custom settings: |
| 178 | + |
| 179 | +```yaml |
| 180 | +targets: |
| 181 | + - name: filtered_data_lake |
| 182 | + type: sentineldatalake |
| 183 | + properties: |
| 184 | + tenant_id: "00000000-0000-0000-0000-000000000000" |
| 185 | + client_id: "00000000-0000-0000-0000-000000000000" |
| 186 | + client_secret: "your-client-secret" |
| 187 | + endpoint: "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/myResourceGroup/providers/Microsoft.Insights/dataCollectionEndpoints/myDCE" |
| 188 | + streams: |
| 189 | + - name: "Custom-FirewallLogs" |
| 190 | + - name: "Custom-DNSLogs" |
| 191 | + keep_phantom_fields: false |
| 192 | + drop_unknown_stream_events: true |
| 193 | + cache: |
| 194 | + timeout: 600 |
| 195 | +``` |
| 196 | + |
| 197 | +### High-Volume Processing |
| 198 | + |
| 199 | +Optimized configuration for high-volume log ingestion: |
| 200 | + |
| 201 | +```yaml |
| 202 | +targets: |
| 203 | + - name: high_volume_data_lake |
| 204 | + type: sentineldatalake |
| 205 | + pipelines: |
| 206 | + - normalization |
| 207 | + properties: |
| 208 | + function_app: "https://my-director-proxy.azurewebsites.net/api/Sentinel" |
| 209 | + function_token: "your-proxy-authentication-token" |
| 210 | + endpoint: "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/myResourceGroup/providers/Microsoft.Insights/dataCollectionEndpoints/myDCE" |
| 211 | + buffer_size: 5242880 # 5MB |
| 212 | + batch_size: 5000 |
| 213 | + field_format: "asim" |
| 214 | + streams: |
| 215 | + - name: "Custom-FirewallLogs" |
| 216 | + rule_id: "dcr-00000000000000000000000000000000" |
| 217 | + - name: "Custom-DNSLogs" |
| 218 | + rule_id: "dcr-11111111111111111111111111111111" |
| 219 | +``` |
| 220 | + |
| 221 | +### Debug Configuration |
| 222 | + |
| 223 | +Testing configuration with debug enabled: |
| 224 | + |
| 225 | +```yaml |
| 226 | +targets: |
| 227 | + - name: debug_data_lake |
| 228 | + type: sentineldatalake |
| 229 | + properties: |
| 230 | + tenant_id: "00000000-0000-0000-0000-000000000000" |
| 231 | + client_id: "00000000-0000-0000-0000-000000000000" |
| 232 | + client_secret: "your-client-secret" |
| 233 | + endpoint: "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/myResourceGroup/providers/Microsoft.Insights/dataCollectionEndpoints/myDCE" |
| 234 | + debug: |
| 235 | + status: true |
| 236 | + dont_send_logs: true # Test mode - doesn't actually upload |
| 237 | +``` |
0 commit comments