Skip to content

Commit 8ddb98e

Browse files
committed
Merge branch 'dev' of https://github.com/VirtualMetric/virtualmetric-docs into DT-426-1-5-0-release-notes-edit
2 parents 40b3d2c + 999410e commit 8ddb98e

File tree

2 files changed

+279
-0
lines changed

2 files changed

+279
-0
lines changed
Lines changed: 278 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,278 @@
1+
# BigQuery
2+
3+
<span className="theme-doc-version-badge badge badge--secondary">Google Cloud</span><span className="theme-doc-version-badge badge badge--secondary">Analytics</span>
4+
5+
## Synopsis
6+
7+
Creates a BigQuery target that streams data directly into BigQuery tables using the streaming insert API. Supports multiple tables, custom schemas, and field normalization.
8+
9+
## Schema
10+
```yaml {1,3}
11+
- name: <string>
12+
description: <string>
13+
type: bigquery
14+
pipelines: <pipeline[]>
15+
status: <boolean>
16+
properties:
17+
project_id: <string>
18+
dataset_id: <string>
19+
credentials_json: <string>
20+
table: <string>
21+
batch_size: <numeric>
22+
timeout: <numeric>
23+
drop_unknown_table_events: <boolean>
24+
ignore_unknown_values: <boolean>
25+
skip_invalid_rows: <boolean>
26+
max_bad_records: <numeric>
27+
field_format: <string>
28+
tables:
29+
- name: <string>
30+
schema: <string>
31+
debug:
32+
status: <boolean>
33+
dont_send_logs: <boolean>
34+
```
35+
36+
## Configuration
37+
38+
The following fields are used to define the target:
39+
40+
|Field|Required|Default|Description|
41+
|---|---|---|---|
42+
|`name`|Y|-|Target name|
43+
|`description`|N|-|Optional description|
44+
|`type`|Y|-|Must be `bigquery`|
45+
|`pipelines`|N|-|Optional post-processor pipelines|
46+
|`status`|N|`true`|Enable/disable the target|
47+
48+
### Google Cloud
49+
50+
|Field|Required|Default|Description|
51+
|---|---|---|---|
52+
|`project_id`|Y|-|Google Cloud project ID|
53+
|`dataset_id`|Y|-|BigQuery dataset ID|
54+
|`credentials_json`|N|-|Service account credentials JSON (uses default credentials if not provided)|
55+
|`table`|N|-|Default table name|
56+
57+
### Streaming Options
58+
59+
|Field|Required|Default|Description|
60+
|---|---|---|---|
61+
|`batch_size`|N|`1000`|Maximum number of rows per batch|
62+
|`timeout`|N|`30`|Connection timeout in seconds|
63+
|`drop_unknown_table_events`|N|`true`|Ignore events for undefined tables|
64+
|`ignore_unknown_values`|N|`false`|Accept rows with values that don't match the schema|
65+
|`skip_invalid_rows`|N|`false`|Skip rows with errors and insert valid rows|
66+
|`max_bad_records`|N|`0`|Maximum number of bad records allowed (0 = no limit)|
67+
|`field_format`|N|-|Data normalization format. See applicable <Topic id="normalization-mapping">Normalization</Topic> section|
68+
69+
### Multiple Tables
70+
71+
You can define multiple tables to stream data into:
72+
```yaml
73+
targets:
74+
- name: bigquery_multiple_tables
75+
type: bigquery
76+
properties:
77+
tables:
78+
- name: "security_logs"
79+
schema: "timestamp:TIMESTAMP,message:STRING,severity:STRING"
80+
- name: "system_logs"
81+
schema: "timestamp:TIMESTAMP,message:STRING,level:STRING"
82+
```
83+
84+
### Schema Format
85+
86+
The schema format follows the pattern: `field1:type1,field2:type2,...`
87+
88+
Supported types:
89+
- `STRING` - Variable-length character data
90+
- `INTEGER` or `INT64` - 64-bit integer
91+
- `FLOAT` or `FLOAT64` - 64-bit floating point
92+
- `BOOLEAN` or `BOOL` - True or false
93+
- `TIMESTAMP` - Absolute point in time
94+
- `DATE` - Calendar date
95+
- `TIME` - Time of day
96+
- `DATETIME` - Date and time
97+
- `BYTES` - Binary data
98+
- `NUMERIC` - Exact numeric value
99+
- `BIGNUMERIC` - Larger numeric value
100+
- `GEOGRAPHY` - Geographic data
101+
- `JSON` - JSON data
102+
- `RECORD` or `STRUCT` - Nested structure
103+
104+
### Debug Options
105+
106+
|Field|Required|Default|Description|
107+
|---|---|---|---|
108+
|`debug.status`|N|`false`|Enable debug logging|
109+
|`debug.dont_send_logs`|N|`false`|Process logs but don't send to BigQuery (testing)|
110+
111+
## Details
112+
113+
The BigQuery target uses streaming inserts to send data in near real-time. Data is batched locally until `batch_size` is reached or when an explicit flush is triggered during finalization.
114+
115+
When using the `SystemS3` field in your logs, the value will be used to route the message to the appropriate table. If no table is specified, the default table (if configured) will be used.
116+
117+
The target automatically parses JSON messages. If the message is not valid JSON, it creates a structured event with `message` and `timestamp` fields.
118+
119+
### Authentication
120+
121+
The target supports two authentication methods:
122+
123+
1. **Service Account JSON**: Provide credentials directly in the configuration using `credentials_json`
124+
2. **Default Credentials**: If `credentials_json` is not provided, the target uses Google Cloud's default credential chain (environment variables, gcloud CLI, GCE metadata service)
125+
126+
### Error Handling
127+
128+
The target provides flexible error handling:
129+
130+
- `ignore_unknown_values`: Allows inserting rows with extra fields not in the schema
131+
- `skip_invalid_rows`: Continues inserting valid rows even if some rows fail
132+
- `max_bad_records`: Limits the number of failed rows before returning an error
133+
134+
When `skip_invalid_rows` is enabled and errors occur, the target logs individual row errors when debug mode is enabled.
135+
136+
:::warning
137+
Streaming inserts have cost implications. Consider batch loading for high-volume historical data.
138+
:::
139+
140+
:::note
141+
BigQuery streaming inserts have quotas and limits. Ensure your project has adequate quota for your ingestion rate.
142+
:::
143+
144+
## Examples
145+
146+
### Basic
147+
148+
Minimum configuration using default credentials:
149+
```yaml
150+
targets:
151+
- name: basic_bigquery
152+
type: bigquery
153+
properties:
154+
project_id: "my-project"
155+
dataset_id: "logs"
156+
table: "system_events"
157+
```
158+
159+
### With Credentials
160+
161+
Configuration with explicit service account credentials:
162+
```yaml
163+
targets:
164+
- name: auth_bigquery
165+
type: bigquery
166+
properties:
167+
project_id: "my-project"
168+
dataset_id: "logs"
169+
table: "application_logs"
170+
credentials_json: |
171+
{
172+
"type": "service_account",
173+
"project_id": "my-project",
174+
"private_key_id": "key-id",
175+
"private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
176+
"client_email": "[email protected]",
177+
"client_id": "123456789",
178+
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
179+
"token_uri": "https://oauth2.googleapis.com/token"
180+
}
181+
```
182+
183+
### Multiple Tables
184+
185+
Configuration with multiple target tables and schemas:
186+
```yaml
187+
targets:
188+
- name: multi_table_bigquery
189+
type: bigquery
190+
properties:
191+
project_id: "my-project"
192+
dataset_id: "security_data"
193+
batch_size: 500
194+
tables:
195+
- name: "firewall_events"
196+
schema: "timestamp:TIMESTAMP,src_ip:STRING,dst_ip:STRING,action:STRING,bytes:INTEGER"
197+
- name: "authentication_events"
198+
schema: "timestamp:TIMESTAMP,username:STRING,success:BOOLEAN,source:STRING"
199+
- name: "dns_queries"
200+
schema: "timestamp:TIMESTAMP,query:STRING,response:STRING,client_ip:STRING"
201+
```
202+
203+
### High-Volume
204+
205+
Configuration optimized for high-volume streaming:
206+
```yaml
207+
targets:
208+
- name: highvol_bigquery
209+
type: bigquery
210+
properties:
211+
project_id: "my-project"
212+
dataset_id: "metrics"
213+
table: "performance_data"
214+
batch_size: 5000
215+
timeout: 60
216+
skip_invalid_rows: true
217+
max_bad_records: 100
218+
```
219+
220+
### With Error Handling
221+
222+
Configuration with flexible error handling:
223+
```yaml
224+
targets:
225+
- name: flexible_bigquery
226+
type: bigquery
227+
properties:
228+
project_id: "my-project"
229+
dataset_id: "logs"
230+
table: "app_logs"
231+
ignore_unknown_values: true
232+
skip_invalid_rows: true
233+
max_bad_records: 50
234+
```
235+
236+
### Normalized
237+
238+
Using field normalization for enhanced compatibility:
239+
```yaml
240+
targets:
241+
- name: normalized_bigquery
242+
type: bigquery
243+
properties:
244+
project_id: "my-project"
245+
dataset_id: "security"
246+
table: "normalized_events"
247+
field_format: "ecs"
248+
```
249+
250+
### With Debugging
251+
252+
Configuration with debug options for testing:
253+
```yaml
254+
targets:
255+
- name: debug_bigquery
256+
type: bigquery
257+
properties:
258+
project_id: "my-project"
259+
dataset_id: "logs"
260+
table: "test_events"
261+
debug:
262+
status: true
263+
dont_send_logs: true
264+
```
265+
266+
### Environment Variables
267+
268+
Using environment variables for sensitive data:
269+
```yaml
270+
targets:
271+
- name: secure_bigquery
272+
type: bigquery
273+
properties:
274+
project_id: "${GCP_PROJECT_ID}"
275+
dataset_id: "${BIGQUERY_DATASET}"
276+
table: "secure_logs"
277+
credentials_json: "${GCP_CREDENTIALS_JSON}"
278+
```

sidebars.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,7 @@ const sidebars: SidebarsConfig = {
118118
"configuration/targets/aws-s3",
119119
"configuration/targets/azure-blob-storage",
120120
"configuration/targets/azure-data-explorer",
121+
"configuration/targets/bigquery",
121122
"configuration/targets/clickhouse",
122123
"configuration/targets/console",
123124
"configuration/targets/discard",

0 commit comments

Comments
 (0)