Skip to content

bug: Polars: Silent integer overflow in multiplication - and probably other arithmetic ops - due to ibis aggressive type downcasting #11749

@discreteds

Description

@discreteds

What happened?

Ibis's Polars backend produces silent data corruption in multiplication operations when columns use narrow integer types (int8/int16).

When integer literals are used in a calculation, Ibis's aggressive downcasting to the smallest type (int8 for values ≤127), which causes overflow when both operands are narrow types.
By downcasting integer literals aggressively, Ibis is making Polars' explicit typing very dangerous for everyday users of ibis with the polars backend.

Similar situations will occur if the storage has been type optimised, including multiplying two Int8 columns together.

Ibis Philosophy in this space

Given Ibis is intended to be a universal access layer for general use, would it make sense for ibis to automatically promote narrow integer literals during arithmetic translation in the Polars backend - and possibly other backends also?

A direct user of polars may need to explicitly deal with this issue, but should an ibis user?

Reproduction across backends

  import ibis
  import polars as pl

  # Literal will be inferred as int8 by Ibis
  quantity = 100

  # Explicit int8 column (common in optimized Parquet files)
  df = pl.DataFrame({
      'price': pl.Series([25, 30, 15], dtype=pl.Int8),
  })

  print("=" * 60)
  print("BACKEND COMPARISON")
  print("=" * 60)

  # Polars - Silent corruption
  polars_backend = ibis.polars.connect()
  polars_table = polars_backend.create_table('orders', df, overwrite=True)
  polars_result = (quantity * polars_table.price).execute()
  print(f"\nPolars: {list(polars_result)}")
  print(f"Expected: [2500, 3000, 1500]")
  print(f"Status: ❌ SILENT DATA CORRUPTION")

  # SQLite - Correct (promotes types)
  sqlite_backend = ibis.sqlite.connect()
  sqlite_table = sqlite_backend.create_table('orders', df, overwrite=True)
  sqlite_result = (quantity * sqlite_table.price).execute()
  print(f"\nSQLite: {list(sqlite_result)}")
  print(f"Status: ✅ Correct")

  # DuckDB - Throws exception (loud failure, detectable)
  duckdb_backend = ibis.duckdb.connect()
  duckdb_table = duckdb_backend.create_table('orders', df, overwrite=True)
  try:
      duckdb_result = (quantity * duckdb_table.price).execute()
      print(f"\nDuckDB: {list(duckdb_result)}")
  except Exception as e:
      print(f"\nDuckDB: OutOfRangeException")
      print(f"Status: ⚠️  Throws error (better than silent corruption!)")

What version of ibis are you using?

11.0.0

What backend(s) are you using, if any?

Affected backend: Polars
Working backends: SQLite (automatically promotes types),
Partially Working backend: DuckDB (throws exceptions)

Relevant log output

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIncorrect behavior inside of ibis

    Type

    No type

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions