Skip to content

Conversation

@ericj-db
Copy link
Contributor

@ericj-db ericj-db commented Oct 16, 2025

What are you changing in this pull request and why?

Document two new behavior flags that will be released with dbt-databricks 1.11.0

  • use_managed_iceberg
  • use_replace_on_for_insert_overwrite

Screenshots of updated sections

Screenshot 2025-10-16 at 11 53 29 AM Screenshot 2025-10-16 at 11 53 21 AM Screenshot 2025-10-16 at 11 53 55 AM

Checklist

@ericj-db ericj-db requested a review from a team as a code owner October 16, 2025 17:32
@vercel
Copy link

vercel bot commented Oct 16, 2025

@ericj-db is attempting to deploy a commit to the dbt-labs Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions bot added the content Improvements or additions to content label Oct 16, 2025
### The `insert_overwrite` strategy

This strategy is most effective when specified alongside a `partition_by` clause in your model config. dbt will run an [atomic `insert overwrite` statement](https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-dml-insert-overwrite-table.html) that dynamically replaces all partitions included in your query. Be sure to re-select _all_ of the relevant data for a partition when using this incremental strategy.
This strategy is most effective when specified alongside a `partition_by` or `liquid_clustered_by` clause in your model config. dbt will run an [atomic `insert into .. replace on` statement](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-dml-insert-into#replace-on) that dynamically replaces all partitions included in your query. Be sure to re-select _all_ of the relevant data for a partition when using this incremental strategy. If [`use_replace_on_for_insert_overwrite`](/reference/global-configs/databricks-changes#use-replace-on-for-insert_overwrite-strategy) is set to `False` or runtime is older than 17.1, this strategy will run an [atomic `insert overwrite` statement](https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-dml-insert-overwrite-table.html) instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does liquid_clustered_by interact with insert_overwrite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same as partition_by handling. The liquid_clustered_by columns will be used in the insert into .. replace on clause

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we updated the dbt insert overwrite docs to capture liquid_clustered_by is also supported?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think we should document explicitly that for liquid_clustered_by, the REPLACE ON keys will be the same as the liquid_clustered_by, same for partition_by

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a sentence in 00b5b11. Feel free to suggest edits if the wording seems off

Copy link
Collaborator

@amychen1776 amychen1776 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@amychen1776 amychen1776 enabled auto-merge October 20, 2025 20:37

## Use `replace on` for `insert_overwrite` strategy

The `use_replace_on_for_insert_overwrite` flag is only relevant when using incremental models with the `insert_overwrite` strategy on SQL warehouses. The flag is `True` by default and results in using the `replace on` syntax to perform partition overwrites. When the flag is set to `False`, partition overwrites will be performed via `insert overwrite` with dynamic partition overwrite. The latter is only officially supported for cluster computes, and will truncate the entire table when used with SQL warehouses.
Copy link

@longvu-db longvu-db Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should we focus on the fact that there will be no more table truncation on SQL Warehouses and instead DPO?

For Cluster Computes, nothing changes, if we mention Cluster Computes, then we should at least emphasize that REPLACE ON and the old Classic dynamic INSERT OVERWRITE syntax both perform a dynamic partition overwrite.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Also, Should we still mention partition if we also have liquid clusters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@longvu-db feel free to directly suggest edits to address these points. Since your team owns the feature, you can probably provide the most appropriate wording (sorry I should have shared this PR earlier)

auto-merge was automatically disabled October 27, 2025 16:42

Head branch was pushed to by a user without write access

### The `insert_overwrite` strategy

This strategy is most effective when specified alongside a `partition_by` clause in your model config. dbt will run an [atomic `insert overwrite` statement](https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-dml-insert-overwrite-table.html) that dynamically replaces all partitions included in your query. Be sure to re-select _all_ of the relevant data for a partition when using this incremental strategy.
This strategy is most effective when specified alongside a `partition_by` or `liquid_clustered_by` clause in your model config. dbt will run an [atomic `insert into .. replace on` statement](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-dml-insert-into#replace-on) that dynamically replaces all partitions/clusters included in your query. Be sure to re-select _all_ of the relevant data for a partition when using this incremental strategy. If [`use_replace_on_for_insert_overwrite`](/reference/global-configs/databricks-changes#use-replace-on-for-insert_overwrite-strategy) is set to `False` or runtime is older than 17.1, this strategy will run an [atomic `insert overwrite` statement](https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-dml-insert-overwrite-table.html) instead.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is referring the the databricks insert overwrite syntax, so does it need the underscore? insert_overwrite would be referring the the dbt incremental strategy

### The `insert_overwrite` strategy

This strategy is most effective when specified alongside a `partition_by` clause in your model config. dbt will run an [atomic `insert overwrite` statement](https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-dml-insert-overwrite-table.html) that dynamically replaces all partitions included in your query. Be sure to re-select _all_ of the relevant data for a partition when using this incremental strategy.
This strategy is most effective when specified alongside a `partition_by` or `liquid_clustered_by` clause in your model config. dbt will run an [atomic `insert into .. replace on` statement](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-dml-insert-into#replace-on) that dynamically replaces all partitions/clusters included in your query. Be sure to re-select _all_ of the relevant data for a partition/cluster when using this incremental strategy. If [`use_replace_on_for_insert_overwrite`](/reference/global-configs/databricks-changes#use-replace-on-for-insert_overwrite-strategy) is set to `False` or runtime is older than 17.1, this strategy will run an [atomic `insert overwrite` statement](https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-dml-insert-overwrite-table.html) instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This strategy is most effective when specified alongside a `partition_by` or `liquid_clustered_by` clause in your model config. dbt will run an [atomic `insert into .. replace on` statement](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-dml-insert-into#replace-on) that dynamically replaces all partitions/clusters included in your query. Be sure to re-select _all_ of the relevant data for a partition/cluster when using this incremental strategy. If [`use_replace_on_for_insert_overwrite`](/reference/global-configs/databricks-changes#use-replace-on-for-insert_overwrite-strategy) is set to `False` or runtime is older than 17.1, this strategy will run an [atomic `insert overwrite` statement](https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-dml-insert-overwrite-table.html) instead.
This strategy is most effective when specified alongside a `partition_by` or `liquid_clustered_by` clause in your model config. dbt will run an [atomic `INSERT INTO .. REPLACE ON` statement](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-dml-insert-into#replace-on) that dynamically replaces all partitions/clusters included in your query. Be sure to re-select _all_ of the relevant data for a partition/cluster when using this incremental strategy. If [`use_replace_on_for_insert_overwrite`](/reference/global-configs/databricks-changes#use-replace-on-for-insert_overwrite-strategy) is set to `False` or runtime is older than 17.1, this strategy will run an [atomic `insert overwrite` statement](https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-dml-insert-overwrite-table.html) instead.

Copy link
Contributor Author

@ericj-db ericj-db Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL syntax is intentionally lowercase to align with the rest of dbt documentation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it lowercase

Copy link

@longvu-db longvu-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with 2 small comments

This strategy is most effective when specified alongside a `partition_by` or `liquid_clustered_by` clause in your model config. dbt will run an [atomic `insert into ... replace on` statement](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-dml-insert-into#replace-on) that dynamically replaces all partitions/clusters included in your query. Be sure to re-select _all_ of the relevant data for a partition/cluster when using this incremental strategy.

If no `partition_by` is specified, then the `insert_overwrite` strategy will atomically replace all contents of the table, overriding all existing data with only the new records. The column schema of the table remains the same, however. This can be desirable in some limited circumstances, since it minimizes downtime while the table contents are overwritten. The operation is comparable to running `truncate` and `insert` on other databases. For atomic replacement of Delta-formatted tables, use the `table` materialization (which runs `create or replace`) instead.
When using `liquid_clustered_by`, the `replace on` keys used will be equivalent to the `liquid_clustered_by` value (same as `partition_by` behavior).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When using `liquid_clustered_by`, the `replace on` keys used will be equivalent to the `liquid_clustered_by` value (same as `partition_by` behavior).
When using `liquid_clustered_by`, the `replace on` keys used will be the same as the `liquid_clustered_by` keys (same as `partition_by` behavior).

Since liquid_clustered_by specifies column names, so slightly appropriate to also call it keys

If no `partition_by` is specified, then the `insert_overwrite` strategy will atomically replace all contents of the table, overriding all existing data with only the new records. The column schema of the table remains the same, however. This can be desirable in some limited circumstances, since it minimizes downtime while the table contents are overwritten. The operation is comparable to running `truncate` and `insert` on other databases. For atomic replacement of Delta-formatted tables, use the `table` materialization (which runs `create or replace`) instead.
When using `liquid_clustered_by`, the `replace on` keys used will be equivalent to the `liquid_clustered_by` value (same as `partition_by` behavior).

If [`use_replace_on_for_insert_overwrite`](/reference/global-configs/databricks-changes#use-replace-on-for-insert_overwrite-strategy) is set to `True` in SQL warehouses or if cluster computes are used, this strategy will run a [partitionOverwriteMode='dynamic' `insert overwrite` statement](https://docs.databricks.com/aws/en/delta/selective-overwrite#dynamic-partition-overwrites-with-partitionoverwritemode-legacyl) instead. If [`use_replace_on_for_insert_overwrite`](/reference/global-configs/databricks-changes#use-replace-on-for-insert_overwrite-strategy) is set to `False` in SQL warehouses, this strategy will truncate the entire table.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If [`use_replace_on_for_insert_overwrite`](/reference/global-configs/databricks-changes#use-replace-on-for-insert_overwrite-strategy) is set to `True` in SQL warehouses or if cluster computes are used, this strategy will run a [partitionOverwriteMode='dynamic' `insert overwrite` statement](https://docs.databricks.com/aws/en/delta/selective-overwrite#dynamic-partition-overwrites-with-partitionoverwritemode-legacyl) instead. If [`use_replace_on_for_insert_overwrite`](/reference/global-configs/databricks-changes#use-replace-on-for-insert_overwrite-strategy) is set to `False` in SQL warehouses, this strategy will truncate the entire table.
If [`use_replace_on_for_insert_overwrite`](/reference/global-configs/databricks-changes#use-replace-on-for-insert_overwrite-strategy) is set to `True` in SQL warehouses or if cluster computes are used, this strategy will run a [partitionOverwriteMode='dynamic' `insert overwrite` statement](https://docs.databricks.com/aws/en/delta/selective-overwrite#dynamic-partition-overwrites-with-partitionoverwritemode-legacyl). If [`use_replace_on_for_insert_overwrite`](/reference/global-configs/databricks-changes#use-replace-on-for-insert_overwrite-strategy) is set to `False` in SQL warehouses, this strategy will truncate the entire table.

Nit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

content Improvements or additions to content

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants