Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .linkcheck.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
{ "pattern": "^#bigquery-materialized-views$" },
{ "pattern": "^#looker-pdts--aggregate-awareness$" },
{ "pattern": "^#experiment-unpacking$" },
{ "pattern": "/v2-system-addon/data_events.html" }
{ "pattern": "/v2-system-addon/data_events.html" },
{ "pattern": "^https://console.cloud.google.com/gcr/images/" }
]
}
52 changes: 26 additions & 26 deletions src/cookbooks/bigquery/querying.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,32 +27,32 @@ projects to maintain BigQuery [datasets](https://cloud.google.com/bigquery/docs/

### Projects with BigQuery datasets

| Project | Dataset | Purpose |
| ------------------------------- | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `mozdata` | | The primary home for user analysis; it has a short name that is easy to type and is filled with views that reference underlying tables in `moz-fx-data-shared-prod`; the default project for STMO and Looker |
| | `analysis` | User-generated tables for analysis; please prefix tables with your username |
| | `tmp` | User-generated tables for ephemeral analysis results; tables created here are automatically deleted after 7 days. |
| | `telemetry` | Views into legacy desktop telemetry pings and many derived tables; see _user-facing (unsuffixed) datasets_ below |
| | `<namespace>` | See _user-facing (unsuffixed) datasets_ below |
| | `search` | Search data imported from parquet (_restricted_) |
| | `static` | Static tables, often useful for data-enriching joins |
| | `udf` | Internal persistent user-defined functions defined in SQL; see [Using UDFs](#using-udfs) |
| | `udf_js` | Internal user-defined functions defined in JavaScript; see [Using UDFs](#using-udfs) |
| `mozfun` | | The primary home for user-defined functions; see [Using UDFs](#using-udfs) |
| `moz-fx-data-bq-<team-name>` | | Some teams have specialized needs and can be provisioned a team-specific project |
| `moz-fx-data-shared-prod` | | All production data including full pings and derived datasets defined in [bigquery-etl](https://github.com/mozilla/bigquery-etl) |
| | `<namespace>_live` | See _live datasets_ below |
| | `<namespace>_stable` | See _stable datasets_ below |
| | `<namespace>_derived` | See _derived datasets_ below |
| | `<product>_external` | Tables that reference external resources; these may be native BigQuery tables populated by a job that queries an third-party API, or they may be [federated data sources](https://cloud.google.com/bigquery/external-data-sources) that pull data from other GCP services like GCS at query time. |
| | `backfill` | Temporary staging area for back-fills |
| | `blpadi` | Blocklist ping derived data(_restricted_) |
| | `payload_bytes_raw` | Raw JSON payloads as received from clients, used for reprocessing scenarios, a.k.a. "landfill" (_restricted_) |
| | `payload_bytes_error` | `gzip`-compressed JSON payloads that were rejected in some phase of the pipeline; particularly useful for investigating schema validation errors |
| | `tmp` | Temporary staging area for parquet data loads |
| | `validation` | Temporary staging area for validation |
| `moz-fx-data-derived-datasets` | | Legacy project that was a precursor to `mozdata` |
| `moz-fx-data-shar-nonprod-efed` | | Non-production data produced by stage ingestion infrastructure |
| Project | Dataset | Purpose |
| ------------------------------- | --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `mozdata` | | The primary home for user analysis; it has a short name that is easy to type and is filled with views that reference underlying tables in `moz-fx-data-shared-prod`; the default project for STMO and Looker |
| | `analysis` | User-generated tables for analysis; please prefix tables with your username |
| | `tmp` | User-generated tables for ephemeral analysis results; tables created here are automatically deleted after 7 days. |
| | `telemetry` | Views into legacy desktop telemetry pings and many derived tables; see _user-facing (unsuffixed) datasets_ below |
| | `<namespace>` | See _user-facing (unsuffixed) datasets_ below |
| | `search` | Search data imported from parquet (_restricted_) |
| | `static` | Static tables, often useful for data-enriching joins |
| | `udf` | Internal persistent user-defined functions defined in SQL; see [Using UDFs](#using-udfs) |
| | `udf_js` | Internal user-defined functions defined in JavaScript; see [Using UDFs](#using-udfs) |
| `mozfun` | | The primary home for user-defined functions; see [Using UDFs](#using-udfs) |
| `moz-fx-data-bq-<team-name>` | | Some teams have specialized needs and can be provisioned a team-specific project |
| `moz-fx-data-shared-prod` | | All production data including full pings and derived datasets defined in [bigquery-etl](https://github.com/mozilla/bigquery-etl) |
| | `<namespace>_live` | See _live datasets_ below |
| | `<namespace>_stable` | See _stable datasets_ below |
| | `<namespace>_derived` | See _derived datasets_ below |
| | `<product>_external` | Tables that reference external resources; these may be native BigQuery tables populated by a job that queries an third-party API, or they may be [federated data sources](https://docs.cloud.google.com/bigquery/docs/external-data-sources) that pull data from other GCP services like GCS at query time. |
| | `backfill` | Temporary staging area for back-fills |
| | `blpadi` | Blocklist ping derived data(_restricted_) |
| | `payload_bytes_raw` | Raw JSON payloads as received from clients, used for reprocessing scenarios, a.k.a. "landfill" (_restricted_) |
| | `payload_bytes_error` | `gzip`-compressed JSON payloads that were rejected in some phase of the pipeline; particularly useful for investigating schema validation errors |
| | `tmp` | Temporary staging area for parquet data loads |
| | `validation` | Temporary staging area for validation |
| `moz-fx-data-derived-datasets` | | Legacy project that was a precursor to `mozdata` |
| `moz-fx-data-shar-nonprod-efed` | | Non-production data produced by stage ingestion infrastructure |

### Table Layout and Naming

Expand Down
2 changes: 0 additions & 2 deletions src/datasets/fxa.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ The [Mozilla accounts documentation](https://mozilla.github.io/ecosystem-platfor
- Requires FxA.
- [AMO](https://addons.mozilla.org/en-US/firefox/)
- For developer accounts; not required by end-users to use or download addons.
- [Pocket](https://getpocket.com/login/?ep=1)
- FxA is an optional authentication method among others.
- [Monitor](https://monitor.firefox.com)
- Required to receive email alerts. Not required for email scans.
- [Relay](https://relay.firefox.com/)
Expand Down