Skip to content

bug(duckdb): Reading WKB_BLOB column from postgres connection incorrectly interpreted as ibis.dt.Binary() #11585

@NickCrews

Description

@NickCrews

Is your feature request related to a problem?

I have a postgres db, with a table with a geometry column. If I attach to that DB using duckb, and then copy that data over to a local duckdb table, it errors, because ibis interprets the type as ibis.dt.Binary(), so then it CREATE TABLEs with a column with duckdb type of BLOB, and then when actually inserting into that new table, it errors because duckdb can't cast from WKB_BLOB to BLOB. Even if it did, that wouldn't be right, since I'd want the table to be created with type geometry, not blob.

Relevant is #7818, where it looks like we explicitly chose to interpret "WKB_BLOB" as ibis.dt.Binary(). I'm not exactly sure what problem that PR was trying to avoid? If I understand that then I could help come up with a solution that doesn't cause a regression there. I HEREBY SUMMON @gforsyth TO TRY TO REMEMBER THE CONTEXT OF A DECISION FROM 2 YEARS AGO 🤣 (kidding, this is absolutely not your responsibility, but maybe there is some chance you remember)

import os

import duckdb
import ibis

ibis.options.interactive = True

conn = ibis.duckdb.connect()
conn.raw_sql(f"ATTACH '{os.environ['DATABASE_URL']}' AS pg (TYPE postgres)")

print(duckdb.__version__)  # 1.3.2
conn.raw_sql("INSTALL spatial; LOAD spatial;")
print(
    conn.sql(
        "SELECT extension_name, extension_version  FROM duckdb_extensions() where extension_name in ('spatial', 'postgres_scanner')"
    )
)
# ┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
# ┃ extension_name   ┃ extension_version ┃
# ┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
# │ string           │ string            │
# ├──────────────────┼───────────────────┤
# │ postgres_scanner │ c0411b9           │
# │ spatial          │ 662718c           │
# └──────────────────┴───────────────────┘

pg_table = conn.table("precincts", database=("pg", "districts"))
print(pg_table.precinct__geometry.type())  # binary

# This errors:
print(pg_table.to_sql())  # SELECT * FROM "pg"."districts"."precincts"
print(pg_table.limit(1).precinct__geometry.typeof().as_scalar().execute())  # WKB_BLOB
# conn.create_table("duckdb_table", pg_table)
# ConversionException                       Traceback (most recent call last)
# Cell In[14], line 10
#       8 print(pg_table.precinct__geometry.type())  # binary
#       9 print(pg_table.to_sql())
# ---> 10 conn.create_table("duckdb_table", pg_table)

# File ~/code/ibis/ibis/backends/duckdb/__init__.py:230, in Backend.create_table(self, name, obj, schema, database, temp, overwrite)
#     224     columns = [
#     225         sge.to_identifier(col, quoted=quoted) for col in table.columns
#     226     ]
#     227     insert_stmt = sge.insert(
#     228         query, into=initial_table, columns=columns
#     229     ).sql(dialect)
# --> 230     cur.execute(insert_stmt).fetchall()
#     232 if overwrite:
#     233     cur.execute(
#     234         sge.Drop(kind="TABLE", this=final_table, exists=True).sql(dialect)
#     235     )

# ConversionException: Conversion Error: Unimplemented type for cast (WKB_BLOB -> BLOB) when casting from source column precinct__geometry

# It works to explicitly cast to geometry, the created table is actually geometry
fixed = pg_table.mutate(precinct__geometry=pg_table.precinct__geometry.cast("geometry"))
print(fixed.to_sql())
# SELECT
#   *
#   REPLACE (ST_ASWKB("precinct__geometry") AS "precinct__geometry")
# FROM (
#   SELECT
#     "t0"."precinct__id",
#     "t0"."precinct__hd",
#     "t0"."precinct__sd",
#     "t0"."precinct__borough",
#     "t0"."precinct__region",
#     "t0"."precinct__doe_name",
#     "t0"."precinct__description",
#     ST_GEOMFROMWKB("t0"."precinct__geometry") AS "precinct__geometry"
#   FROM "pg"."districts"."precincts" AS "t0"
# )
print(fixed.limit(1).precinct__geometry.typeof().as_scalar().execute())  # GEOMETRY
conn.create_table("duckdb_table", fixed)

What is the motivation behind your request?

nightly syncing from prod db to analytical for analysis.

Describe the solution you'd like

  • The pg_table.precinct__geometry column to be interpreted as geometry (eg displayed in the repr as such), not as binary.
  • The vanilla conn.create_table("duckdb_table", pg_table) to work as I describe

What version of ibis are you running?

main

What backend(s) are you using, if any?

duckdb and postgres

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIncorrect behavior inside of ibisduckdbThe DuckDB backendfeatureFeatures or general enhancementsgeospatialGeospatial related functionalitypostgresThe PostgreSQL backend

    Type

    No type

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions