-
Notifications
You must be signed in to change notification settings - Fork 137
Description
What happens?
The hive types parameter documented in the DuckDB Documentation on partitioning can not be set from a query, making partitioning by types like UUIDs impossible.
To Reproduce
Executing this query to get the definitions of read_parquet returns the following rows:
SELECT
proname,
pg_get_function_identity_arguments(oid) as arguments,
pg_get_function_result(oid) as return_type
FROM pg_proc
WHERE proname LIKE '%read_parquet%';
"proname" | "arguments" | "return_type" |
---|---|---|
"read_parquet" | "path text, binary_as_string boolean, filename boolean, file_row_number boolean, hive_partitioning boolean, union_by_name boolean" | "SETOF duckdb.""row""" |
"read_parquet" | "path text[], binary_as_string boolean, filename boolean, file_row_number boolean, hive_partitioning boolean, union_by_name boolean" | "SETOF duckdb.""row""" |
Because of the missing hive_types
parameter, partitioning by a UUID type is not possible:
SELECT
*
FROM read_parquet('<path>/*/data_0.parquet',
hive_partitioning := true,
hive_types := '{"user_id": UUID}'
) WHERE user_id = '00000000-0000-0000-0000-000000000000'::uuid;
ERROR: function read_parquet(unknown, hive_partitioning => boolean, hive_types => unknown) does not exist
I am running this extension through the docker image pgduckdb/pgduckdb:17-main
with the index digest sha256:f618fcf5899a3a6abec20d015216d690756e96be0c3f811381309bc54adc4b0f
OS:
MacOS Sequoia 15.6.1 x86_64
pg_duckdb Version (if built from source use commit hash):
Nightly Docker Build from Sep 8, 2025 at 7:10 am (I have not found a way to get the commit hash of pg_duckdb from the container)
Postgres Version (if built from source use commit hash):
17.6.1
Hardware:
No response
Full Name:
Leo Katzengruber
Affiliation:
Student at FH Salzburg
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a nightly build
Did you include all relevant data sets for reproducing the issue?
Not applicable - the reproduction does not require a data set
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?
- Yes, I have