-
Notifications
You must be signed in to change notification settings - Fork 684
Description
What happened?
Ibis's Polars backend produces silent data corruption in multiplication operations when columns use narrow integer types (int8/int16).
When integer literals are used in a calculation, Ibis's aggressive downcasting to the smallest type (int8 for values ≤127), which causes overflow when both operands are narrow types.
By downcasting integer literals aggressively, Ibis is making Polars' explicit typing very dangerous for everyday users of ibis with the polars backend.
Similar situations will occur if the storage has been type optimised, including multiplying two Int8 columns together.
Ibis Philosophy in this space
Given Ibis is intended to be a universal access layer for general use, would it make sense for ibis to automatically promote narrow integer literals during arithmetic translation in the Polars backend - and possibly other backends also?
A direct user of polars may need to explicitly deal with this issue, but should an ibis user?
Reproduction across backends
import ibis
import polars as pl
# Literal will be inferred as int8 by Ibis
quantity = 100
# Explicit int8 column (common in optimized Parquet files)
df = pl.DataFrame({
'price': pl.Series([25, 30, 15], dtype=pl.Int8),
})
print("=" * 60)
print("BACKEND COMPARISON")
print("=" * 60)
# Polars - Silent corruption
polars_backend = ibis.polars.connect()
polars_table = polars_backend.create_table('orders', df, overwrite=True)
polars_result = (quantity * polars_table.price).execute()
print(f"\nPolars: {list(polars_result)}")
print(f"Expected: [2500, 3000, 1500]")
print(f"Status: ❌ SILENT DATA CORRUPTION")
# SQLite - Correct (promotes types)
sqlite_backend = ibis.sqlite.connect()
sqlite_table = sqlite_backend.create_table('orders', df, overwrite=True)
sqlite_result = (quantity * sqlite_table.price).execute()
print(f"\nSQLite: {list(sqlite_result)}")
print(f"Status: ✅ Correct")
# DuckDB - Throws exception (loud failure, detectable)
duckdb_backend = ibis.duckdb.connect()
duckdb_table = duckdb_backend.create_table('orders', df, overwrite=True)
try:
duckdb_result = (quantity * duckdb_table.price).execute()
print(f"\nDuckDB: {list(duckdb_result)}")
except Exception as e:
print(f"\nDuckDB: OutOfRangeException")
print(f"Status: ⚠️ Throws error (better than silent corruption!)")What version of ibis are you using?
11.0.0
What backend(s) are you using, if any?
Affected backend: Polars
Working backends: SQLite (automatically promotes types),
Partially Working backend: DuckDB (throws exceptions)
Relevant log output
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
Type
Projects
Status