Skip to content

read_parquet() function does not define the hive_types parameter #923

@thekatze

Description

@thekatze

What happens?

The hive types parameter documented in the DuckDB Documentation on partitioning can not be set from a query, making partitioning by types like UUIDs impossible.

To Reproduce

Executing this query to get the definitions of read_parquet returns the following rows:

SELECT 
    proname,
    pg_get_function_identity_arguments(oid) as arguments,
    pg_get_function_result(oid) as return_type
FROM pg_proc 
WHERE proname LIKE '%read_parquet%';
"proname" "arguments" "return_type"
"read_parquet" "path text, binary_as_string boolean, filename boolean, file_row_number boolean, hive_partitioning boolean, union_by_name boolean" "SETOF duckdb.""row"""
"read_parquet" "path text[], binary_as_string boolean, filename boolean, file_row_number boolean, hive_partitioning boolean, union_by_name boolean" "SETOF duckdb.""row"""

Because of the missing hive_types parameter, partitioning by a UUID type is not possible:

SELECT
    *
FROM read_parquet('<path>/*/data_0.parquet', 
hive_partitioning := true,
hive_types := '{"user_id": UUID}'
) WHERE user_id = '00000000-0000-0000-0000-000000000000'::uuid;
ERROR:  function read_parquet(unknown, hive_partitioning => boolean, hive_types => unknown) does not exist

I am running this extension through the docker image pgduckdb/pgduckdb:17-main with the index digest sha256:f618fcf5899a3a6abec20d015216d690756e96be0c3f811381309bc54adc4b0f

OS:

MacOS Sequoia 15.6.1 x86_64

pg_duckdb Version (if built from source use commit hash):

Nightly Docker Build from Sep 8, 2025 at 7:10 am (I have not found a way to get the commit hash of pg_duckdb from the container)

Postgres Version (if built from source use commit hash):

17.6.1

Hardware:

No response

Full Name:

Leo Katzengruber

Affiliation:

Student at FH Salzburg

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a nightly build

Did you include all relevant data sets for reproducing the issue?

Not applicable - the reproduction does not require a data set

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Labels

    datalakeIssues related to reading parquet/iceberg/deltatypesIssues related to type conversions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions