Merge branch 'duckdb:main' into main

Franz-Kafka · web-flow · commit 4df531871d79 · 2025-11-19T17:05:17.000+01:00
diff --git a/_posts/2024-09-25-changing-data-with-confidence-and-acid.md b/_posts/2024-09-25-changing-data-with-confidence-and-acid.md
@@ -12,6 +12,8 @@ The great quote “Everything changes and nothing stays the same” from [Heracl
 
 Static datasets are split-second snapshots of whatever the world looked like at one moment. But very quickly, the world moves on, and the dataset needs to catch up to remain useful. In the world of tables, new rows can be added, old rows may be deleted and sometimes rows have to be changed to reflect a new situation. Often, changes are interconnected. A row in a table that maps orders to customers is not very useful without the corresponding entry in the `orders` table. Most, if not all, datasets eventually get changed. As a data management system, managing change is thus not optional. However, managing changes properly is difficult.
 
+## ACID Guarantees
+
 Early data management systems researchers invented a concept called “transactions”, the notions of which were [first formalized](https://dl.acm.org/doi/abs/10.5555/48751.48761) [in the 1980s](https://dl.acm.org/doi/10.1145/289.291). In essence, transactionality and the well-known ACID principles describe a set of guarantees that a data management system has to provide in order to be considered safe. ACID is an acronym that stands for Atomicity, Consistency, Isolation and Durability.
 
 The ACID principles are not a theoretical exercise. Much like the rules governing airplanes or trains, they have been “written in blood” – they are hard-won lessons from decades of data management practice. It is very hard for an application to reason correctly when dealing with non-ACID systems. The end result of such problems is often corrupted data or data that no longer reflects reality accurately. For example, rows can be duplicated or missing.
diff --git a/docs/stable/clients/odbc/windows.md b/docs/stable/clients/odbc/windows.md
@@ -8,6 +8,8 @@ redirect_from:
 title: ODBC API on Windows
 ---
 
+## Setup
+
 Using the DuckDB ODBC API on Windows requires the following steps:
 
 1. The Microsoft Windows requires an ODBC Driver Manager to manage communication between applications and the ODBC drivers.
diff --git a/docs/stable/core_extensions/iceberg/iceberg_rest_catalogs.md b/docs/stable/core_extensions/iceberg/iceberg_rest_catalogs.md
@@ -45,7 +45,7 @@ To see the available tables run
 SHOW ALL TABLES;
 ```
 
-### ATTACH OPTIONS
+## `ATTACH` Options
 
 A REST Catalog with OAuth2 authorization can also be attached with just an `ATTACH` statement. See the complete list of `ATTACH` options for a REST catalog below. 
 
diff --git a/docs/stable/data/json/format_settings.md b/docs/stable/data/json/format_settings.md
@@ -16,7 +16,7 @@ SELECT *
 FROM filename.json;
 ```
 
-#### Format: `newline_delimited`
+## Format: `newline_delimited`
 
 With `format = 'newline_delimited'` newline-delimited JSON can be parsed.
 Each line is a JSON.
@@ -42,7 +42,7 @@ FROM read_json('records.json', format = 'newline_delimited');
 | value2 | value2 |
 | value3 | value3 |
 
-#### Format: `array`
+## Format: `array`
 
 If the JSON file contains a JSON array of objects (pretty-printed or not), `array_of_objects` may be used.
 To demonstrate its use, we use the example file [`records-in-array.json`]({% link data/records-in-array.json %}):
@@ -68,7 +68,7 @@ FROM read_json('records-in-array.json', format = 'array');
 | value2 | value2 |
 | value3 | value3 |
 
-#### Format: `unstructured`
+## Format: `unstructured`
 
 If the JSON file contains JSON that is not newline-delimited or an array, `unstructured` may be used.
 To demonstrate its use, we use the example file [`unstructured.json`]({% link data/unstructured.json %}):
@@ -101,7 +101,7 @@ FROM read_json('unstructured.json', format = 'unstructured');
 | value2 | value2 |
 | value3 | value3 |
 
-### Records Settings
+## `records` Options
 
 The JSON extension can attempt to determine whether a JSON file contains records when setting `records = auto`.
 When `records = true`, the JSON extension expects JSON objects, and will unpack the fields of JSON objects into individual columns.
diff --git a/docs/stable/data/json/sql_to_and_from_json.md b/docs/stable/data/json/sql_to_and_from_json.md
@@ -20,7 +20,7 @@ If you run the `json_execute_serialized_sql(varchar)` table function inside of a
 
 Note that these functions do not preserve syntactic sugar such as `FROM * SELECT ...`, so a statement round-tripped through `json_deserialize_sql(json_serialize_sql(...))` may not be identical to the original statement, but should always be semantically equivalent and produce the same output.
 
-### Examples
+## Examples
 
 Simple example:
 
diff --git a/docs/stable/operations_manual/installing_duckdb/install_script.md b/docs/stable/operations_manual/installing_duckdb/install_script.md
@@ -13,6 +13,8 @@ To use the [DuckDB install script](https://install.duckdb.org) on Linux and macO
 curl https://install.duckdb.org | sh
 ```
 
+<!-- markdownlint-disable MD040 MD046 -->
+
 <details markdown='1'>
 <summary markdown='span'>
 Click to see the output of the install script.
@@ -49,6 +51,8 @@ To launch DuckDB now, type
 ```
 </details>
 
+<!-- markdownlint-enable MD040 MD046 -->
+
 By default, this installs the latest stable version of DuckDB to `~/.duckdb/cli/latest/duckdb`.
 To add the DuckDB binary to your path, append the following line to your shell profile or RC file (e.g., `~/.bashrc`, `~/.zshrc`):
 
diff --git a/docs/stable/sql/data_types/enum.md b/docs/stable/sql/data_types/enum.md
@@ -242,4 +242,4 @@ DROP TYPE ⟨enum_name⟩;
 
 Currently, it is possible to drop enums that are used in tables without affecting the tables.
 
-> Warning This behavior of the enum removal feature is subject to change. In future releases, it is expected that any dependent columns must be removed before dropping the enum, or the enum must be dropped with the additional `CASCADE` parameter.
+> Warning This behavior of the enum removal feature is subject to change. In future releases, it is expected that any dependent columns must be removed before dropping the enum, or the enum must be dropped with the additional `CASCADE` parameter.
diff --git a/docs/stable/sql/data_types/struct.md b/docs/stable/sql/data_types/struct.md
@@ -13,7 +13,7 @@ Conceptually, a `STRUCT` column contains an ordered list of columns called “en
 
 See the [data types overview]({% link docs/stable/sql/data_types/overview.md %}) for a comparison between nested data types.
 
-### Creating Structs
+## Creating Structs
 
 Structs can be created using the [`struct_pack(name := expr, ...)`]({% link docs/stable/sql/functions/struct.md %}) function, the equivalent array notation `{'name': expr, ...}`, using a row variable, or using the `row` function.
 
@@ -63,7 +63,7 @@ SELECT {
     } AS s;
 ```
 
-### Adding or Updating Fields of Structs
+## Adding or Updating Fields of Structs
 
 To add new fields or update existing ones, you can use `struct_update`:
 
@@ -73,7 +73,7 @@ SELECT struct_update({'a': 1, 'b': 2}, b := 3, c := 4) AS s;
 
 Alternatively, `struct_insert` also allows adding new fields but not updating existing ones.
 
-### Retrieving from Structs
+## Retrieving from Structs
 
 Retrieving a value from a struct can be accomplished using dot notation, bracket notation, or through [struct functions]({% link docs/stable/sql/functions/struct.md %}) like `struct_extract`.
 
@@ -101,7 +101,7 @@ The `struct_extract` function is also equivalent. This returns 1:
 SELECT struct_extract({'x space': 1, 'y': 2, 'z': 3}, 'x space');
 ```
 
-#### `unnest` / `STRUCT.*`
+### `unnest` / `STRUCT.*`
 
 Rather than retrieving a single key from a struct, the `unnest` special function can be used to retrieve all keys from a struct as separate columns.
 This is particularly useful when a prior operation creates a struct of unknown shape, or if a query must handle any potential struct keys:
@@ -128,11 +128,11 @@ FROM (SELECT {'x': 1, 'y': 2, 'z': 3} AS a);
 
 > Warning The star notation is currently limited to top-level struct columns and non-aggregate expressions.
 
-### Dot Notation Order of Operations
+## Dot Notation Order of Operations
 
 Referring to structs with dot notation can be ambiguous with referring to schemas and tables. In general, DuckDB looks for columns first, then for struct keys within columns. DuckDB resolves references in these orders, using the first match to occur:
 
-#### No Dots
+### No Dots
 
 ```sql
 SELECT part1
@@ -141,7 +141,7 @@ FROM tbl;
 
 1. `part1` is a column
 
-#### One Dot
+### One Dot
 
 ```sql
 SELECT part1.part2
@@ -151,7 +151,7 @@ FROM tbl;
 1. `part1` is a table, `part2` is a column
 2. `part1` is a column, `part2` is a property of that column
 
-#### Two (or More) Dots
+### Two (or More) Dots
 
 ```sql
 SELECT part1.part2.part3
@@ -164,7 +164,7 @@ FROM tbl;
 
 Any extra parts (e.g., `.part4.part5`, etc.) are always treated as properties
 
-### Creating Structs with the `row` Function
+## Creating Structs with the `row` Function
 
 The `row` function can be used to automatically convert multiple columns to a single struct column.
 When using `row` the keys will be empty strings allowing for easy insertion into a table with a struct column.
diff --git a/docs/stable/sql/expressions/logical_operators.md b/docs/stable/sql/expressions/logical_operators.md
@@ -10,7 +10,7 @@ title: Logical Operators
 
 The following logical operators are available: `AND`, `OR` and `NOT`. SQL uses a three-valuad logic system with `true`, `false` and `NULL`. Note that logical operators involving `NULL` do not always evaluate to `NULL`. For example, `NULL AND false` will evaluate to `false`, and `NULL OR true` will evaluate to `true`. Below are the complete truth tables.
 
-### Binary Operators: `AND` and `OR`
+## Binary Operators: `AND` and `OR`
 
 <div class="monospace_table"></div>
 
@@ -23,7 +23,7 @@ The following logical operators are available: `AND`, `OR` and `NOT`. SQL uses a
 | false | NULL | false | NULL |
 | NULL | NULL | NULL | NULL|
 
-### Unary Operator: NOT
+## Unary Operator: `NOT`
 
 <div class="monospace_table"></div>
 
diff --git a/docs/stable/sql/functions/aggregates.md b/docs/stable/sql/functions/aggregates.md
@@ -228,12 +228,14 @@ The table below shows the available general aggregate functions.
 
 | **Description** | Returns the bitwise `OR` of all bits in a given expression. |
 | **Example** | `bit_or(A)` |
+
 #### `bit_xor(arg)`
 
 <div class="nostroke_table"></div>
 
 | **Description** | Returns the bitwise `XOR` of all bits in a given expression. |
 | **Example** | `bit_xor(A)` |
+
 #### `bitstring_agg(arg)`
 
 <div class="nostroke_table"></div>
diff --git a/docs/stable/sql/functions/text.md b/docs/stable/sql/functions/text.md
@@ -37,7 +37,7 @@ This section describes functions and operators for examining and manipulating [`
 | [`chr(code_point)`](#chrcode_point) | Returns a character which is corresponding the ASCII code value or Unicode code point. |
 | [`concat(value, ...)`](#concatvalue-) | Concatenates multiple strings or lists. `NULL` inputs are skipped. See also [operator `||`](#arg1--arg2). |
 | [`concat_ws(separator, string, ...)`](#concat_wsseparator-string-) | Concatenates many strings, separated by `separator`. `NULL` inputs are skipped. |
-| [`contains(string, search_string)`](#containsstring-search_string) | Returns `true` if `search_string` is found within `string`. Note that [collations]({% link docs/stable/sql/expressions/collations.md %}) are not supported. |g
+| [`contains(string, search_string)`](#containsstring-search_string) | Returns `true` if `search_string` is found within `string`. Note that [collations]({% link docs/stable/sql/expressions/collations.md %}) are not supported. |
 | [`ends_with(string, search_string)`](#suffixstring-search_string) | Alias for `suffix`. |
 | [`format(format, ...)`](#formatformat-) | Formats a string using the [fmt syntax](#fmt-syntax). |
 | [`formatReadableDecimalSize(integer)`](#formatreadabledecimalsizeinteger) | Converts `integer` to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.). |
@@ -270,6 +270,14 @@ This section describes functions and operators for examining and manipulating [`
 | **Example** | `format('Benchmark "{}" took {} seconds', 'CSV', 42)` |
 | **Result** | `Benchmark "CSV" took 42 seconds` |
 
+#### `formatReadableDecimalSize(integer)`
+
+<div class="nostroke_table"></div>
+
+| **Description** | Converts `integer` to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.). |
+| **Example** | `formatReadableDecimalSize(16000)` |
+| **Result** | `16.0 kB` |
+
 #### `format_bytes(integer)`
 
 <div class="nostroke_table"></div>
diff --git a/docs/stable/sql/statements/create_secret.md b/docs/stable/sql/statements/create_secret.md
@@ -8,10 +8,10 @@ title: CREATE SECRET Statement
 
 The `CREATE SECRET` statement creates a new secret in the [Secrets Manager]({% link docs/stable/configuration/secrets_manager.md %}).
 
-### Syntax for `CREATE SECRET`
+## Syntax for `CREATE SECRET`
 
 <div id="rrdiagram1"></div>
 
-### Syntax for `DROP SECRET`
+## Syntax for `DROP SECRET`
 
 <div id="rrdiagram2"></div>
diff --git a/docs/stable/sql/statements/create_sequence.md b/docs/stable/sql/statements/create_sequence.md
@@ -8,7 +8,7 @@ title: CREATE SEQUENCE Statement
 
 The `CREATE SEQUENCE` statement creates a new sequence number generator.
 
-### Examples
+## Examples
 
 Generate an ascending sequence starting from 1:
 
@@ -144,7 +144,7 @@ SELECT currval('serial') AS currval;
 |--------:|
 | 1       |
 
-### Syntax
+## Syntax
 
 <div id="rrdiagram"></div>
 
diff --git a/docs/stable/sql/statements/insert.md b/docs/stable/sql/statements/insert.md
@@ -8,7 +8,7 @@ title: INSERT Statement
 
 The `INSERT` statement inserts new data into a table.
 
-### Examples
+## Examples
 
 Insert the values 1, 2, 3 into `tbl`:
 
@@ -52,7 +52,7 @@ INSERT OR REPLACE INTO tbl (i)
     VALUES (1);
 ```
 
-### Syntax
+## Syntax
 
 <div id="rrdiagram"></div>
 
diff --git a/docs/stable/sql/statements/select.md b/docs/stable/sql/statements/select.md
@@ -9,7 +9,7 @@ title: SELECT Statement
 
 The `SELECT` statement retrieves rows from the database.
 
-### Examples
+## Examples
 
 Select all columns from the table `tbl`:
 
@@ -60,7 +60,7 @@ SELECT d
 FROM (SELECT 1 AS a, 2 AS b) d;
 ```
 
-### Syntax
+## Syntax
 
 The `SELECT` statement retrieves rows from the database. The canonical order of a `SELECT` statement is as follows, with less common clauses being indented:
 
@@ -81,51 +81,51 @@ Optionally, the `SELECT` statement can be prefixed with a [`WITH` clause]({% lin
 
 As the `SELECT` statement is so complex, we have split up the syntax diagrams into several parts. The full syntax diagram can be found at the bottom of the page.
 
-## `SELECT` Clause
+### `SELECT` Clause
 
 <div id="rrdiagram3"></div>
 
 The [`SELECT` clause]({% link docs/stable/sql/query_syntax/select.md %}) specifies the list of columns that will be returned by the query. While it appears first in the clause, *logically* the expressions here are executed only at the end. The `SELECT` clause can contain arbitrary expressions that transform the output, as well as aggregates and window functions. The `DISTINCT` keyword ensures that only unique tuples are returned.
 
 > Column names are case-insensitive. See the [Rules for Case Sensitivity]({% link docs/stable/sql/dialect/keywords_and_identifiers.md %}#rules-for-case-sensitivity) for more details.
 
-## `FROM` Clause
+### `FROM` Clause
 
 <div id="rrdiagram4"></div>
 
 The [`FROM` clause]({% link docs/stable/sql/query_syntax/from.md %}) specifies the *source* of the data on which the remainder of the query should operate. Logically, the `FROM` clause is where the query starts execution. The `FROM` clause can contain a single table, a combination of multiple tables that are joined together, or another `SELECT` query inside a subquery node.
 
-## `SAMPLE` Clause
+### `SAMPLE` Clause
 
 <div id="rrdiagram10"></div>
 
 The [`SAMPLE` clause]({% link docs/stable/sql/query_syntax/sample.md %}) allows you to run the query on a sample from the base table. This can significantly speed up processing of queries, at the expense of accuracy in the result. Samples can also be used to quickly see a snapshot of the data when exploring a dataset. The `SAMPLE` clause is applied right after anything in the `FROM` clause (i.e., after any joins, but before the where clause or any aggregates). See the [Samples]({% link docs/stable/sql/samples.md %}) page for more information.
 
-## `WHERE` Clause
+### `WHERE` Clause
 
 <div id="rrdiagram5"></div>
 
 The [`WHERE` clause]({% link docs/stable/sql/query_syntax/where.md %}) specifies any filters to apply to the data. This allows you to select only a subset of the data in which you are interested. Logically the `WHERE` clause is applied immediately after the `FROM` clause.
 
-## `GROUP BY` and `HAVING` Clauses
+### `GROUP BY` and `HAVING` Clauses
 
 <div id="rrdiagram6"></div>
 
 The [`GROUP BY` clause]({% link docs/stable/sql/query_syntax/groupby.md %}) specifies which grouping columns should be used to perform any aggregations in the `SELECT` clause. If the `GROUP BY` clause is specified, the query is always an aggregate query, even if no aggregations are present in the `SELECT` clause.
 
-## `WINDOW` Clause
+### `WINDOW` Clause
 
 <div id="rrdiagram7"></div>
 
 The [`WINDOW` clause]({% link docs/stable/sql/query_syntax/window.md %}) allows you to specify named windows that can be used within window functions. These are useful when you have multiple window functions, as they allow you to avoid repeating the same window clause.
 
-## `QUALIFY` Clause
+### `QUALIFY` Clause
 
 <div id="rrdiagram11"></div>
 
 The [`QUALIFY` clause]({% link docs/stable/sql/query_syntax/qualify.md %}) is used to filter the result of [`WINDOW` functions]({% link docs/stable/sql/functions/window_functions.md %}).
 
-## `ORDER BY`, `LIMIT` and `OFFSET` Clauses
+### `ORDER BY`, `LIMIT` and `OFFSET` Clauses
 
 <div id="rrdiagram8"></div>
 
@@ -134,13 +134,13 @@ Logically they are applied at the very end of the query.
 The `ORDER BY` clause sorts the rows on the sorting criteria in either ascending or descending order.
 The `LIMIT` clause restricts the amount of rows fetched, while the `OFFSET` clause indicates at which position to start reading the values.
 
-## `VALUES` List
+### `VALUES` List
 
 <div id="rrdiagram9"></div>
 
 [A `VALUES` list]({% link docs/stable/sql/query_syntax/values.md %}) is a set of values that is supplied instead of a `SELECT` statement.
 
-## Row IDs
+### Row IDs
 
 For each table, the [`rowid` pseudocolumn](https://docs.oracle.com/cd/B19306_01/server.102/b14200/pseudocolumns008.htm) returns the row identifiers based on the physical storage.
 
@@ -178,7 +178,7 @@ The `rowid` values are stable within a transaction.
 
 > If there is a user-defined column named `rowid`, it shadows the `rowid` pseudocolumn.
 
-## Common Table Expressions
+### Common Table Expressions
 
 <div id="rrdiagram2"></div>
 
diff --git a/docs/stable/sql/statements/set_variable.md b/docs/stable/sql/statements/set_variable.md
@@ -8,7 +8,7 @@ title: SET VARIABLE and RESET VARIABLE Statements
 
 DuckDB supports the definition of SQL-level variables using the `SET VARIABLE` and `RESET VARIABLE` statements.
 
-### Variable Scopes
+## Variable Scopes
 
 DuckDB supports two levels of variable scopes:
 

Original file line number	Diff line number	Diff line change
`@@ -242,4 +242,4 @@ DROP TYPE ⟨enum_name⟩;`
`242`	`242`
`243`	`243`	`Currently, it is possible to drop enums that are used in tables without affecting the tables.`
`244`	`244`
`245`		-> Warning This behavior of the enum removal feature is subject to change. In future releases, it is expected that any dependent columns must be removed before dropping the enum, or the enum must be dropped with the additional `CASCADE` parameter.
	`245`	+> Warning This behavior of the enum removal feature is subject to change. In future releases, it is expected that any dependent columns must be removed before dropping the enum, or the enum must be dropped with the additional `CASCADE` parameter.