There's An Iceberg In My Lakehouse - What is Apache Iceberg and Why Should You Care?

This is the companion repo to my talk on Apache Iceberg. It contains the code for showing off the various features of Apache Iceberg, as well as utility code to easily switch between running on AWS or locally in Docker.

Data source

The demo uses the data from this Kaggle dataset, fetching the expanded dataset directly from the source referenced in the Kaggle Dataset documentation. It is an event stream from a chinese online retailer, and the total dataset is around 400M rows. To speed up the demo, we will also create a downsampled dataset to avoid wasting demo time uploading all the data https://www.kaggle.com/datasets/mkechinov/ecommerce-behavior-data-from-multi-category-store

Install the package

UV (recommended)

uv sync

pip

Make sure to run inside an activated virtual environment!

pip install .

Download the data

As part of the python package, there is a CLI named demo. This will concurrently download the data files and gunzip them. You can control concurrency with the --download-concurrency and --extract-concurrency flag

uv run demo download

Running the demo

The code is setup to work with a local Docker Compose setup - if you d like to switch to using AWS, set the TUTORIAL_TYPE env variable to aws

If running on AWS, docker compose is not required.

Running locally

Startup

Start the docker containers

docker compose up -d

Run the bootstrap command to initialize the local setup

uv run demo bootstrap

Proceed to tutorial_01.py

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
images		images
src/aws_iceberg_demo		src/aws_iceberg_demo
trino		trino
.gitignore		.gitignore
README.md		README.md
bootstrap.py		bootstrap.py
compose.yaml		compose.yaml
dashboard.py		dashboard.py
pyproject.toml		pyproject.toml
tutorial_01.py		tutorial_01.py
tutorial_02.py		tutorial_02.py
tutorial_03.py		tutorial_03.py
tutorial_04.py		tutorial_04.py
tutorial_05.py		tutorial_05.py
tutorial_06.py		tutorial_06.py
tutorial_07.py		tutorial_07.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

There's An Iceberg In My Lakehouse - What is Apache Iceberg and Why Should You Care?

Data source

Install the package

UV (recommended)

pip

Download the data

Running the demo

Running locally

Startup

About

Uh oh!

Releases

Packages

Languages

andersbogsnes/aws_iceberg_demo

Folders and files

Latest commit

History

Repository files navigation

There's An Iceberg In My Lakehouse - What is Apache Iceberg and Why Should You Care?

Data source

Install the package

UV (recommended)

pip

Download the data

Running the demo

Running locally

Startup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages