Skip to content

Conversation

@badalprasadsingh
Copy link
Collaborator

@badalprasadsingh badalprasadsingh commented Sep 23, 2025

feat: Arrow-writer for Iceberg

This PR introduces Apache Arrow-based iceberg writer, that writes both data and delete files into object store and registers them into the Iceberg Table using Java`s API passing their file path.

Thus, supporting both:

  • full-refresh
  • cdc (equality-deletes)

The current implementation converts go types data into arrow.Record. Uses pqarrow library to write these arrow data into parquet files, flushing each file on exactly reaching the target file size.

It introduces:

Rolling Writer Support

  • rolling data file writers
  • rolling delete file writers

for both partitioned and unpartitioned data, with:

  • compression as zstd, compression level 1, etc.
  • configurable target file sizes

Fanout Partitioning Strategy

  • keeping multiple files open at the same time (no clustering or sorting required)

Transforms Logic

  • identity, year, month, week, day, hour, bucket, truncate, void; all iceberg transforms supported

How to run it?

In your destination.json (while using CLI) enable this toggle:

"arrow_writes": true

As your sync starts, you should see something like this in your logs:

INFO >>>> Arrow Writer Enabled >>>> >>>> >>>>

This indicates OLake is using the arrow writer successfully.

Currently supports:

  • schema-evolution
  • all iceberg catalogs (Glue, REST, Hadoop, JDBC, etc.)
  • all object stores (S3, ADLS, GCS, S3A, etc.)

@badalprasadsingh badalprasadsingh marked this pull request as ready for review September 29, 2025 03:15
Signed-off-by: badalprasadsingh <[email protected]>
Signed-off-by: badalprasadsingh <[email protected]>
Signed-off-by: badalprasadsingh <[email protected]>
Signed-off-by: badalprasadsingh <[email protected]>
Signed-off-by: badalprasadsingh <[email protected]>
Signed-off-by: badalprasadsingh <[email protected]>
…ed map to json marshalling

Signed-off-by: badalprasadsingh <[email protected]>
Signed-off-by: badalprasadsingh <[email protected]>
Signed-off-by: badalprasadsingh <[email protected]>
Signed-off-by: badalprasadsingh <[email protected]>
Signed-off-by: badalprasadsingh <[email protected]>
Signed-off-by: badalprasadsingh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants