|
| 1 | +# BigQuery |
| 2 | + |
| 3 | +<span className="theme-doc-version-badge badge badge--secondary">Google Cloud</span><span className="theme-doc-version-badge badge badge--secondary">Analytics</span> |
| 4 | + |
| 5 | +## Synopsis |
| 6 | + |
| 7 | +Creates a BigQuery target that streams data directly into BigQuery tables using the streaming insert API. Supports multiple tables, custom schemas, and field normalization. |
| 8 | + |
| 9 | +## Schema |
| 10 | +```yaml {1,3} |
| 11 | +- name: <string> |
| 12 | + description: <string> |
| 13 | + type: bigquery |
| 14 | + pipelines: <pipeline[]> |
| 15 | + status: <boolean> |
| 16 | + properties: |
| 17 | + project_id: <string> |
| 18 | + dataset_id: <string> |
| 19 | + credentials_json: <string> |
| 20 | + table: <string> |
| 21 | + batch_size: <numeric> |
| 22 | + timeout: <numeric> |
| 23 | + drop_unknown_table_events: <boolean> |
| 24 | + ignore_unknown_values: <boolean> |
| 25 | + skip_invalid_rows: <boolean> |
| 26 | + max_bad_records: <numeric> |
| 27 | + field_format: <string> |
| 28 | + tables: |
| 29 | + - name: <string> |
| 30 | + schema: <string> |
| 31 | + debug: |
| 32 | + status: <boolean> |
| 33 | + dont_send_logs: <boolean> |
| 34 | +``` |
| 35 | +
|
| 36 | +## Configuration |
| 37 | +
|
| 38 | +The following fields are used to define the target: |
| 39 | +
|
| 40 | +|Field|Required|Default|Description| |
| 41 | +|---|---|---|---| |
| 42 | +|`name`|Y|-|Target name| |
| 43 | +|`description`|N|-|Optional description| |
| 44 | +|`type`|Y|-|Must be `bigquery`| |
| 45 | +|`pipelines`|N|-|Optional post-processor pipelines| |
| 46 | +|`status`|N|`true`|Enable/disable the target| |
| 47 | + |
| 48 | +### Google Cloud |
| 49 | + |
| 50 | +|Field|Required|Default|Description| |
| 51 | +|---|---|---|---| |
| 52 | +|`project_id`|Y|-|Google Cloud project ID| |
| 53 | +|`dataset_id`|Y|-|BigQuery dataset ID| |
| 54 | +|`credentials_json`|N|-|Service account credentials JSON (uses default credentials if not provided)| |
| 55 | +|`table`|N|-|Default table name| |
| 56 | + |
| 57 | +### Streaming Options |
| 58 | + |
| 59 | +|Field|Required|Default|Description| |
| 60 | +|---|---|---|---| |
| 61 | +|`batch_size`|N|`1000`|Maximum number of rows per batch| |
| 62 | +|`timeout`|N|`30`|Connection timeout in seconds| |
| 63 | +|`drop_unknown_table_events`|N|`true`|Ignore events for undefined tables| |
| 64 | +|`ignore_unknown_values`|N|`false`|Accept rows with values that don't match the schema| |
| 65 | +|`skip_invalid_rows`|N|`false`|Skip rows with errors and insert valid rows| |
| 66 | +|`max_bad_records`|N|`0`|Maximum number of bad records allowed (0 = no limit)| |
| 67 | +|`field_format`|N|-|Data normalization format. See applicable <Topic id="normalization-mapping">Normalization</Topic> section| |
| 68 | + |
| 69 | +### Multiple Tables |
| 70 | + |
| 71 | +You can define multiple tables to stream data into: |
| 72 | +```yaml |
| 73 | +targets: |
| 74 | + - name: bigquery_multiple_tables |
| 75 | + type: bigquery |
| 76 | + properties: |
| 77 | + tables: |
| 78 | + - name: "security_logs" |
| 79 | + schema: "timestamp:TIMESTAMP,message:STRING,severity:STRING" |
| 80 | + - name: "system_logs" |
| 81 | + schema: "timestamp:TIMESTAMP,message:STRING,level:STRING" |
| 82 | +``` |
| 83 | + |
| 84 | +### Schema Format |
| 85 | + |
| 86 | +The schema format follows the pattern: `field1:type1,field2:type2,...` |
| 87 | + |
| 88 | +Supported types: |
| 89 | +- `STRING` - Variable-length character data |
| 90 | +- `INTEGER` or `INT64` - 64-bit integer |
| 91 | +- `FLOAT` or `FLOAT64` - 64-bit floating point |
| 92 | +- `BOOLEAN` or `BOOL` - True or false |
| 93 | +- `TIMESTAMP` - Absolute point in time |
| 94 | +- `DATE` - Calendar date |
| 95 | +- `TIME` - Time of day |
| 96 | +- `DATETIME` - Date and time |
| 97 | +- `BYTES` - Binary data |
| 98 | +- `NUMERIC` - Exact numeric value |
| 99 | +- `BIGNUMERIC` - Larger numeric value |
| 100 | +- `GEOGRAPHY` - Geographic data |
| 101 | +- `JSON` - JSON data |
| 102 | +- `RECORD` or `STRUCT` - Nested structure |
| 103 | + |
| 104 | +### Debug Options |
| 105 | + |
| 106 | +|Field|Required|Default|Description| |
| 107 | +|---|---|---|---| |
| 108 | +|`debug.status`|N|`false`|Enable debug logging| |
| 109 | +|`debug.dont_send_logs`|N|`false`|Process logs but don't send to BigQuery (testing)| |
| 110 | + |
| 111 | +## Details |
| 112 | + |
| 113 | +The BigQuery target uses streaming inserts to send data in near real-time. Data is batched locally until `batch_size` is reached or when an explicit flush is triggered during finalization. |
| 114 | + |
| 115 | +When using the `SystemS3` field in your logs, the value will be used to route the message to the appropriate table. If no table is specified, the default table (if configured) will be used. |
| 116 | + |
| 117 | +The target automatically parses JSON messages. If the message is not valid JSON, it creates a structured event with `message` and `timestamp` fields. |
| 118 | + |
| 119 | +### Authentication |
| 120 | + |
| 121 | +The target supports two authentication methods: |
| 122 | + |
| 123 | +1. **Service Account JSON**: Provide credentials directly in the configuration using `credentials_json` |
| 124 | +2. **Default Credentials**: If `credentials_json` is not provided, the target uses Google Cloud's default credential chain (environment variables, gcloud CLI, GCE metadata service) |
| 125 | + |
| 126 | +### Error Handling |
| 127 | + |
| 128 | +The target provides flexible error handling: |
| 129 | + |
| 130 | +- `ignore_unknown_values`: Allows inserting rows with extra fields not in the schema |
| 131 | +- `skip_invalid_rows`: Continues inserting valid rows even if some rows fail |
| 132 | +- `max_bad_records`: Limits the number of failed rows before returning an error |
| 133 | + |
| 134 | +When `skip_invalid_rows` is enabled and errors occur, the target logs individual row errors when debug mode is enabled. |
| 135 | + |
| 136 | +:::warning |
| 137 | +Streaming inserts have cost implications. Consider batch loading for high-volume historical data. |
| 138 | +::: |
| 139 | + |
| 140 | +:::note |
| 141 | +BigQuery streaming inserts have quotas and limits. Ensure your project has adequate quota for your ingestion rate. |
| 142 | +::: |
| 143 | + |
| 144 | +## Examples |
| 145 | + |
| 146 | +### Basic |
| 147 | + |
| 148 | +Minimum configuration using default credentials: |
| 149 | +```yaml |
| 150 | +targets: |
| 151 | + - name: basic_bigquery |
| 152 | + type: bigquery |
| 153 | + properties: |
| 154 | + project_id: "my-project" |
| 155 | + dataset_id: "logs" |
| 156 | + table: "system_events" |
| 157 | +``` |
| 158 | + |
| 159 | +### With Credentials |
| 160 | + |
| 161 | +Configuration with explicit service account credentials: |
| 162 | +```yaml |
| 163 | +targets: |
| 164 | + - name: auth_bigquery |
| 165 | + type: bigquery |
| 166 | + properties: |
| 167 | + project_id: "my-project" |
| 168 | + dataset_id: "logs" |
| 169 | + table: "application_logs" |
| 170 | + credentials_json: | |
| 171 | + { |
| 172 | + "type": "service_account", |
| 173 | + "project_id": "my-project", |
| 174 | + "private_key_id": "key-id", |
| 175 | + "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n", |
| 176 | + "client_email": "[email protected]", |
| 177 | + "client_id": "123456789", |
| 178 | + "auth_uri": "https://accounts.google.com/o/oauth2/auth", |
| 179 | + "token_uri": "https://oauth2.googleapis.com/token" |
| 180 | + } |
| 181 | +``` |
| 182 | + |
| 183 | +### Multiple Tables |
| 184 | + |
| 185 | +Configuration with multiple target tables and schemas: |
| 186 | +```yaml |
| 187 | +targets: |
| 188 | + - name: multi_table_bigquery |
| 189 | + type: bigquery |
| 190 | + properties: |
| 191 | + project_id: "my-project" |
| 192 | + dataset_id: "security_data" |
| 193 | + batch_size: 500 |
| 194 | + tables: |
| 195 | + - name: "firewall_events" |
| 196 | + schema: "timestamp:TIMESTAMP,src_ip:STRING,dst_ip:STRING,action:STRING,bytes:INTEGER" |
| 197 | + - name: "authentication_events" |
| 198 | + schema: "timestamp:TIMESTAMP,username:STRING,success:BOOLEAN,source:STRING" |
| 199 | + - name: "dns_queries" |
| 200 | + schema: "timestamp:TIMESTAMP,query:STRING,response:STRING,client_ip:STRING" |
| 201 | +``` |
| 202 | + |
| 203 | +### High-Volume |
| 204 | + |
| 205 | +Configuration optimized for high-volume streaming: |
| 206 | +```yaml |
| 207 | +targets: |
| 208 | + - name: highvol_bigquery |
| 209 | + type: bigquery |
| 210 | + properties: |
| 211 | + project_id: "my-project" |
| 212 | + dataset_id: "metrics" |
| 213 | + table: "performance_data" |
| 214 | + batch_size: 5000 |
| 215 | + timeout: 60 |
| 216 | + skip_invalid_rows: true |
| 217 | + max_bad_records: 100 |
| 218 | +``` |
| 219 | + |
| 220 | +### With Error Handling |
| 221 | + |
| 222 | +Configuration with flexible error handling: |
| 223 | +```yaml |
| 224 | +targets: |
| 225 | + - name: flexible_bigquery |
| 226 | + type: bigquery |
| 227 | + properties: |
| 228 | + project_id: "my-project" |
| 229 | + dataset_id: "logs" |
| 230 | + table: "app_logs" |
| 231 | + ignore_unknown_values: true |
| 232 | + skip_invalid_rows: true |
| 233 | + max_bad_records: 50 |
| 234 | +``` |
| 235 | + |
| 236 | +### Normalized |
| 237 | + |
| 238 | +Using field normalization for enhanced compatibility: |
| 239 | +```yaml |
| 240 | +targets: |
| 241 | + - name: normalized_bigquery |
| 242 | + type: bigquery |
| 243 | + properties: |
| 244 | + project_id: "my-project" |
| 245 | + dataset_id: "security" |
| 246 | + table: "normalized_events" |
| 247 | + field_format: "ecs" |
| 248 | +``` |
| 249 | + |
| 250 | +### With Debugging |
| 251 | + |
| 252 | +Configuration with debug options for testing: |
| 253 | +```yaml |
| 254 | +targets: |
| 255 | + - name: debug_bigquery |
| 256 | + type: bigquery |
| 257 | + properties: |
| 258 | + project_id: "my-project" |
| 259 | + dataset_id: "logs" |
| 260 | + table: "test_events" |
| 261 | + debug: |
| 262 | + status: true |
| 263 | + dont_send_logs: true |
| 264 | +``` |
| 265 | + |
| 266 | +### Environment Variables |
| 267 | + |
| 268 | +Using environment variables for sensitive data: |
| 269 | +```yaml |
| 270 | +targets: |
| 271 | + - name: secure_bigquery |
| 272 | + type: bigquery |
| 273 | + properties: |
| 274 | + project_id: "${GCP_PROJECT_ID}" |
| 275 | + dataset_id: "${BIGQUERY_DATASET}" |
| 276 | + table: "secure_logs" |
| 277 | + credentials_json: "${GCP_CREDENTIALS_JSON}" |
| 278 | +``` |
0 commit comments