From 59d9145b088b7c44370cd2e88440300bd651b9bf Mon Sep 17 00:00:00 2001 From: Anders Swanson Date: Thu, 23 Oct 2025 13:56:47 -0400 Subject: [PATCH 1/8] initial commit --- .../docs/docs/fusion/about-fusion-caching.md | 28 +++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 website/docs/docs/fusion/about-fusion-caching.md diff --git a/website/docs/docs/fusion/about-fusion-caching.md b/website/docs/docs/fusion/about-fusion-caching.md new file mode 100644 index 00000000000..2b3e03d1d70 --- /dev/null +++ b/website/docs/docs/fusion/about-fusion-caching.md @@ -0,0 +1,28 @@ +--- +title: "Caching and the dbt Fusion engine" +id: "about-fusion-caching" +sidebar_label: "About Fusion Caching" +description: "Caching is a big source of Fusion's improved Developer Experience." +pagination_next: null +pagination_prev: null +--- + +# Caching and the dbt Fusion engine + + + +import FusionLifecycle from '/snippets/_fusion-lifecycle-callout.md'; + + + + + + + +Caching is large part of how delivers a vastly impoved developer experience. The goal for Fusion is to enable analytics engineers to meaningful feedback as fast as possible. + +## Kinds of Caching + +### Source Schema Cache + +In order to perform offline [static analysis](new-concepts) of your project, the first thing that's required is \ No newline at end of file From 82276d6b16d472c8d499044e7ec68e461b709f12 Mon Sep 17 00:00:00 2001 From: Anders Swanson Date: Thu, 23 Oct 2025 14:16:18 -0400 Subject: [PATCH 2/8] feature matrix --- .../docs/docs/fusion/about-fusion-caching.md | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/website/docs/docs/fusion/about-fusion-caching.md b/website/docs/docs/fusion/about-fusion-caching.md index 2b3e03d1d70..e9528b977eb 100644 --- a/website/docs/docs/fusion/about-fusion-caching.md +++ b/website/docs/docs/fusion/about-fusion-caching.md @@ -21,8 +21,24 @@ import FusionLifecycle from '/snippets/_fusion-lifecycle-callout.md'; Caching is large part of how delivers a vastly impoved developer experience. The goal for Fusion is to enable analytics engineers to meaningful feedback as fast as possible. +## Feature Matrix + +Where đźš§ indicates a feature that is still in beta + +| **Flavor of Caching** | **what it enables** | **dbt Core**
(self-hosted) | **Fusion CLI**
(self-hosted) | **VS Code
+ Fusion** | ***** | +| :--------------------- | -------------------------- | :--------------------------------------------: | :---------------------------------------------: | :------------------------: | :-----------------------------------: | +| Relation Cache | knowing what's in your DWH | ✅ | ✅ | ✅ | ✅ | +| Source Schema Cache | offline SQL understanding | ❌ | ✅ | ✅ | ✅ | +| Query Cache | faster subsequent compiles | ❌ | 🚧 | 🚧 | 🚧 | +| LSP Compile Cache | incremental compilation | ❌ | ❌ | ✅ | ✅ | +| Source Freshness Cache | State-Aware Orchestration | ❌ | ❌ | ❌ | ✅ | + ## Kinds of Caching ### Source Schema Cache -In order to perform offline [static analysis](new-concepts) of your project, the first thing that's required is \ No newline at end of file +In order to perform offline [static analysis](new-concepts) of your project, the first thing that's required is + +## Frequently Asked Questions + +### Do the CLI and LSP share the same cache? \ No newline at end of file From 1260802a75aa1e5decef45b56b14e44f525f30c7 Mon Sep 17 00:00:00 2001 From: Anders Swanson Date: Thu, 23 Oct 2025 14:23:05 -0400 Subject: [PATCH 3/8] getting started --- .../docs/docs/fusion/about-fusion-caching.md | 52 +++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/website/docs/docs/fusion/about-fusion-caching.md b/website/docs/docs/fusion/about-fusion-caching.md index e9528b977eb..bdc316d9043 100644 --- a/website/docs/docs/fusion/about-fusion-caching.md +++ b/website/docs/docs/fusion/about-fusion-caching.md @@ -21,6 +21,13 @@ import FusionLifecycle from '/snippets/_fusion-lifecycle-callout.md'; Caching is large part of how delivers a vastly impoved developer experience. The goal for Fusion is to enable analytics engineers to meaningful feedback as fast as possible. +At the same time, caching is famously one of the two hardest problems in computer science! + +dbt's Caching falls into the following three buckets: +1. a user never has to think about +2. a user should sometimes have to think about +3. a user pays dbt Labs so that they need not think about it + ## Feature Matrix Where 🚧 indicates a feature that is still in beta @@ -35,10 +42,55 @@ Where 🚧 indicates a feature that is still in beta ## Kinds of Caching +### Relation Cache + ### Source Schema Cache In order to perform offline [static analysis](new-concepts) of your project, the first thing that's required is +### (BETA) Query Cache + +The biggest performance bottleneck in dbt isn’t the language the engine is written in: it’s actually the times that dbt needs to query the data warehouse in order to render jinja into SQL! + +We call this “introspection” and it really slows down local development! See [New Concepts: Rendering introspective queries](new-concepts#rendering-introspective-queries) + + So we’ve shipped a query cache that’s now in beta. + +**How it works** + +During a dbt compile, every time there’s a DWH query executed to render jinja into SQL, dbt will now locally cache the result. So the next time a dbt command needs to compile, it doesn’t have to make a round trip to the DWH for the same results as last time, it will just use the previously hydrated cache. + +Try it out. I've seen some impressive results on internal projects. + +**Where is the cache?** + +If you have query caching enabled, you will notice a new folder `target/query_cache/` that contains many parquet files. + +**How to invalidate the cache** + +Inevitably, the local cache will be out of date. For example, the remote DWH might have a new column on a certain table that the query cache doesn’t have reflected. + +While there is a 12-hour expiration date on the query cache objects you can also refresh the cache manually by either: + +- deleting the `target/query_cache/` +- use the “Clear Cache” button of the VSCode sidebar + + image 3 + + +**How to opt into this beta feature** + +1. Add `--beta-use-query-cache` to all your dbt CLI commands +2. Enable the VS Code extension setting “Use Query Cache” + +image 4 + + +### LSP compile cache + +### Source Freshness Cache + + ## Frequently Asked Questions ### Do the CLI and LSP share the same cache? \ No newline at end of file From 4aa05b99b5e80db8dc211dd1a5df2a8197faa2fc Mon Sep 17 00:00:00 2001 From: Anders Swanson Date: Fri, 24 Oct 2025 11:19:53 -0400 Subject: [PATCH 4/8] relation cache --- .../docs/docs/fusion/about-fusion-caching.md | 25 +++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/website/docs/docs/fusion/about-fusion-caching.md b/website/docs/docs/fusion/about-fusion-caching.md index bdc316d9043..bf7310e4ef7 100644 --- a/website/docs/docs/fusion/about-fusion-caching.md +++ b/website/docs/docs/fusion/about-fusion-caching.md @@ -44,6 +44,31 @@ Where 🚧 indicates a feature that is still in beta ### Relation Cache +#### What is the relation cache? + +Before dbt creates modifies or drops any table or view in the target data platform, it first needs to know what's already in there! The fundamental reason is simple: make sure that name of model you're about to materialize is not taken already! + +However, it doesn't make sense to make these metadata queries to the warehouse for every model; the better answer is for dbt to initially cache all the relations, then update the cache as it runs. We call this the relational cache. + + +An additional benefit of this cache is when a dbt model makes use of an introspective query. Introspective queries are queries that a dbt model's jinja requires in order to be rendered to SQL. While they are often convenient, the can have a sizable impact on dbt's ability to performantly compile a project, especially as it relates to the dbt Fusion engine which also performs static analysis. + +An example of the additional benefit that the relation cache provide to end users that have introspective queries in their project is the `dbt_utils.get_relations_by_pattern()` ([docs](https://github.com/dbt-labs/dbt-utils?tab=readme-ov-file#get_relations_by_pattern-source)) macro. If you use that in a model, for dbt to know how to turn it into SQL, it needs to know what relations there are! It could ask the datawarehouse everytime the model is compiled or ran. However, it can simply use the relation cache. + +#### When to know about the relation cache and how to troubleshoot it + +The relation cache has been a part of dbt for years now and is quite stable, so you likely will not need to think about it unless are contributing to the dbt codebase, or developing a custom materialization. + +In Fusion, there is currently a `logs/beta_cache.log` artifact which provides some information on the intitial poputation of the cache, such as +- which schemas were cached +- how many relations were found in each schema +- how long did the metadata queries take + + +As the filename suggest, this file is in a beta state, and likely to evolve and be integrated into `logs/dbt.log` + + + ### Source Schema Cache In order to perform offline [static analysis](new-concepts) of your project, the first thing that's required is From 1bdfdcf0d2495ed7b51bc2579995c5da21bcda05 Mon Sep 17 00:00:00 2001 From: Anders Swanson Date: Fri, 24 Oct 2025 11:22:16 -0400 Subject: [PATCH 5/8] initial source schema cache docs --- .../docs/docs/fusion/about-fusion-caching.md | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/website/docs/docs/fusion/about-fusion-caching.md b/website/docs/docs/fusion/about-fusion-caching.md index bf7310e4ef7..78b23a6d65e 100644 --- a/website/docs/docs/fusion/about-fusion-caching.md +++ b/website/docs/docs/fusion/about-fusion-caching.md @@ -71,7 +71,24 @@ As the filename suggest, this file is in a beta state, and likely to evolve and ### Source Schema Cache -In order to perform offline [static analysis](new-concepts) of your project, the first thing that's required is +#### What is the source schema cache? + +In order to perform offline [static analysis](new-concepts) of your project and validate that all the datatypes are correct, the dbt Fusion engine first needs to know the column datatypes of all of your source tables. + +To accomplish this, the first thing Fusion does is make metadata queries to your data platform to get all the column names and datatypes of all of the relevant source tables. The result is saved to `target/db/` as parquet files. + +The parquet files have no rows, but the colums and datatypes do correspond to those of the source table in the data warehouse. + +#### When to know about the source schema cache and how to troubleshoot it? + +As an end user, you'll likely come across the cache when: +- you're migrating from Core to Fusion, but you don't have permission to get the schema of some of the source tables defined in your project +- Fusion tells you it can't find a column in your source table, but it's actually there + + + + + ### (BETA) Query Cache From 6ae8ebd3669b512afaf17694f5a7f456ff41d995 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Fri, 24 Oct 2025 17:16:26 +0100 Subject: [PATCH 6/8] Update website/docs/docs/fusion/about-fusion-caching.md --- website/docs/docs/fusion/about-fusion-caching.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/fusion/about-fusion-caching.md b/website/docs/docs/fusion/about-fusion-caching.md index 78b23a6d65e..60b441b6987 100644 --- a/website/docs/docs/fusion/about-fusion-caching.md +++ b/website/docs/docs/fusion/about-fusion-caching.md @@ -94,7 +94,7 @@ As an end user, you'll likely come across the cache when: The biggest performance bottleneck in dbt isn’t the language the engine is written in: it’s actually the times that dbt needs to query the data warehouse in order to render jinja into SQL! -We call this “introspection” and it really slows down local development! See [New Concepts: Rendering introspective queries](new-concepts#rendering-introspective-queries) +We call this “introspection” and it really slows down local development! See [New Concepts: Rendering introspective queries](/docs/fusion/new-concepts#static-analysis-and-introspective-queries) So we’ve shipped a query cache that’s now in beta. From c8a88b681323d8644cfdfa40b46c19d4d4d179e6 Mon Sep 17 00:00:00 2001 From: Anders Date: Fri, 24 Oct 2025 13:18:31 -0400 Subject: [PATCH 7/8] Apply suggestion from @mirnawong1 Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/fusion/about-fusion-caching.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/fusion/about-fusion-caching.md b/website/docs/docs/fusion/about-fusion-caching.md index 60b441b6987..fff3ea66f3e 100644 --- a/website/docs/docs/fusion/about-fusion-caching.md +++ b/website/docs/docs/fusion/about-fusion-caching.md @@ -73,7 +73,7 @@ As the filename suggest, this file is in a beta state, and likely to evolve and #### What is the source schema cache? -In order to perform offline [static analysis](new-concepts) of your project and validate that all the datatypes are correct, the dbt Fusion engine first needs to know the column datatypes of all of your source tables. +In order to perform offline [static analysis](/docs/fusion/new-concepts) of your project and validate that all the datatypes are correct, the dbt Fusion engine first needs to know the column datatypes of all of your source tables. To accomplish this, the first thing Fusion does is make metadata queries to your data platform to get all the column names and datatypes of all of the relevant source tables. The result is saved to `target/db/` as parquet files. From 40d29d9ca6f0b9a89b847ab52f6686b44591456c Mon Sep 17 00:00:00 2001 From: Anders Swanson Date: Fri, 24 Oct 2025 13:20:17 -0400 Subject: [PATCH 8/8] add close bracket --- website/docs/docs/fusion/about-fusion-caching.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/website/docs/docs/fusion/about-fusion-caching.md b/website/docs/docs/fusion/about-fusion-caching.md index fff3ea66f3e..f182f9523c7 100644 --- a/website/docs/docs/fusion/about-fusion-caching.md +++ b/website/docs/docs/fusion/about-fusion-caching.md @@ -21,7 +21,9 @@ import FusionLifecycle from '/snippets/_fusion-lifecycle-callout.md'; Caching is large part of how delivers a vastly impoved developer experience. The goal for Fusion is to enable analytics engineers to meaningful feedback as fast as possible. -At the same time, caching is famously one of the two hardest problems in computer science! +At the same time, caching is famously one of the two hardest problems in computer science! So let's learn about what the different ways that dbt caches information, and, in what situations do you need to reason about it as an end user. + +
dbt's Caching falls into the following three buckets: 1. a user never has to think about