Iceberg UI

A modern web application for managing Apache Iceberg tables via a REST Catalog.

🎉 What's New in Phase 4

Multi-Catalog Support: Connect to multiple catalogs simultaneously and switch between them
Cross-Catalog Joins: Query and join tables from different catalogs in a single SQL statement
DML Operations: Execute INSERT and DELETE statements directly on Iceberg tables
File Uploads: Upload CSV, JSON, and Parquet files to append data to your tables
Enhanced UI: Improved catalog management with dropdown selector and logout functionality

Features

Core Functionality

Multi-Catalog Support: Connect to and manage multiple Iceberg catalogs simultaneously
Table Management: Browse namespaces and tables across all connected catalogs
SQL Querying: Run SQL queries with Apache DataFusion, including cross-catalog joins
DML Operations: Execute INSERT and DELETE statements on Iceberg tables
File Uploads: Upload CSV, JSON, and Parquet files to append data to tables
Metadata Viewer: View table schema, snapshots, properties, and statistics
Table Maintenance: Perform snapshot expiration and other maintenance tasks
Schema Evolution: Add, rename, drop, and update table columns
Time Travel: Query historical table snapshots
Modern UI: Built with React and Material UI, featuring Light/Dark modes

Advanced Features

Cross-Catalog Joins: Query and join tables from different catalogs in a single SQL statement
Query Caching: Automatic caching of query results for improved performance
Export Results: Export query results to CSV, JSON, or Parquet formats
Query History: Track and reuse previous queries
Saved Queries: Save frequently used queries for quick access

Architecture

Backend: Python (FastAPI)
- PyIceberg: Handles interactions with the Iceberg REST Catalog.
- DataFusion: Provides a high-performance query engine.
Frontend: React (Vite)
- Material UI: For a polished, responsive user interface.

Interface Overview

Explorer: Browse namespaces and tables. Use the dropdown to switch catalogs.
Query Editor: Write and execute SQL. Supports multiple tabs.
Metadata Viewer: Inspect Schema, Snapshots, Files, and Manifests.
Dark Mode: Toggle the theme using the sun/moon icon in the header.

Getting Started

You can run Iceberg UI using Docker (recommended for quick start) or by setting it up locally (recommended for development).

Option 1: Docker (Quick Start)

You can run the application easily using the pre-built Docker image.

Standalone UI

Run the UI on port 8000:

docker run -p 8000:8000 alexmerced/iceberg-ui

Access the UI at http://localhost:8000.

Full Environment (with Nessie & Minio)

To spin up a complete testing environment with a Nessie Catalog and Minio S3 storage, use the provided docker-compose.yml:

docker-compose up -d

Option 2: Local Development

Prerequisites

Python 3.9+
Node.js 16+
(Optional) An Iceberg catalog server

Backend Setup

Navigate to the backend directory:
```
cd backend
```

Create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Start the server:
```
uvicorn main:app --reload
```

Frontend Setup

Navigate to the frontend directory:
```
cd frontend
```
Install dependencies:
```
npm install
```
Start the development server:
```
npm run dev
```
Note: You can configure the port using the PORT environment variable:
```
PORT=3000 npm run dev
```
For the backend, you can also set the port:
```
PORT=8001 python main.py
```

Environment Variables

You can configure the application using the following environment variables:

Frontend

PORT or FRONTEND_PORT: Port to run the frontend server (default: 5173).
VITE_BACKEND_URL: URL of the backend API (default: http://localhost:8000).

Backend

PORT or BACKEND_PORT: Port to run the backend server (default: 8000).
FRONTEND_URL: Comma-separated list of allowed frontend URLs for CORS (default: *).

Configuration

Option 1: Using env.json (Auto-connect on startup)

Copy example.env.json to env.json and update it with your catalog details. The application will automatically connect to this catalog on startup.

{
  "catalogs": {
    "default": {
      "uri": "https://catalog.example.com/api/iceberg",
      "oauth2-server-uri": "https://auth.example.com/oauth/token",
      "token": "your-token-here",
      "warehouse": "s3://your-warehouse",
      "type": "rest"
    }
  }
}

Note: env.json is gitignored for security. Never commit credentials to version control.

Option 2: Connect via UI

You can also connect to catalogs directly through the UI without pre-configuring env.json. This allows you to:

Connect to multiple catalogs in a single session
Give each catalog a friendly name
Switch between catalogs easily

Supported Catalog Types

REST: Iceberg REST Catalog (Dremio, Polaris, Nessie, etc.)
Hive: Hive Metastore
Glue: AWS Glue Data Catalog
DynamoDB: AWS DynamoDB Catalog
SQL: PostgreSQL, MySQL, SQLite catalogs

Usage

Connecting to Catalogs

Open your browser to the frontend URL (usually http://localhost:5173).
Click "Connect" and enter your catalog connection details:
- Catalog Name: A friendly name for this connection (e.g., "production", "staging")
- Catalog Type: REST, Hive, Glue, etc.
- URI: The catalog endpoint URL
- Warehouse: The warehouse location (S3, HDFS, etc.)
- Authentication: Choose "OAuth2", "Bearer Token", or "None" (for no-auth catalogs)
- Credentials: Authentication details if required
You can connect to multiple catalogs and switch between them using the catalog selector.

Browsing Tables

Use the sidebar explorer to browse namespaces and tables.
Click on a table to view its metadata, schema, and snapshots.
Use the upload button (cloud icon) next to any table to upload data files.
Use the "Play" button (▶️) next to any table to instantly populate a SELECT * query in the editor.

Running SQL Queries

Execute SQL queries in the Query Editor:

-- Simple query
SELECT * FROM my_namespace.my_table LIMIT 10;

-- Cross-catalog join
SELECT u.name, o.amount 
FROM catalog1.db.users u 
JOIN catalog2.db.orders o ON u.id = o.user_id;

-- INSERT data
INSERT INTO my_namespace.my_table VALUES (1, 'Alice'), (2, 'Bob');

-- DELETE data
DELETE FROM my_namespace.my_table WHERE id > 100;

-- Time travel
SELECT * FROM my_namespace.my_table 
FOR SYSTEM_TIME AS OF TIMESTAMP '2024-01-01 00:00:00';

Metadata Tables

You can query Iceberg metadata tables by appending $ to the table name:

$snapshots: History of table states
$files: Data files in the current snapshot
$manifests: Manifest files
$partitions: Partition statistics

SELECT * FROM db.orders$snapshots;

Uploading Files

Append Data: Navigate to an existing table and click the upload icon (cloud) next to the table name.
Create Table: Click the upload icon on a Namespace folder.
Select a CSV, JSON, or Parquet file.
If creating a new table, enter a name. The schema will be automatically inferred from the file.
The data will be uploaded and the table created/updated.

Schema Evolution

Iceberg supports full schema evolution. You can modify table schemas using SQL commands (if supported by your catalog):

Add Column: ALTER TABLE ... ADD COLUMN
Drop Column: ALTER TABLE ... DROP COLUMN
Rename Column: ALTER TABLE ... RENAME COLUMN
Update Type: ALTER TABLE ... ALTER COLUMN ... TYPE

Exporting Results

After running a query:

Click the "Export" button.
Choose your format (CSV, JSON, or Parquet).
The file will be downloaded to your browser.

Managing Catalogs

Switch Catalogs: Use the dropdown in the explorer to switch between connected catalogs.
Log Out: Click "Log Out" in the header to disconnect from all catalogs.

Best Practices

Query Performance

Filter Early: Always use WHERE clauses on partition columns to prune data.
Limit Results: Use LIMIT when exploring data to avoid fetching huge datasets.
Use Metadata Tables: Check $files to see how many files your query might scan.

Data Maintenance

Compaction: Regularly compact small files to improve read performance.
Expire Snapshots: Remove old snapshots to free up storage space.

Catalog Management

Environment Separation: Use separate catalogs for Prod, Dev, and Staging.
Configuration: Use env.json to share configuration with your team (but don't commit secrets!).
Naming: Use descriptive names for your catalogs to avoid confusion in cross-catalog joins.

Testing

The project includes an End-to-End (E2E) testing suite using Playwright.

cd frontend
npx playwright test

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.agent/workflows		.agent/workflows
backend		backend
frontend		frontend
.dockerignore		.dockerignore
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
example.env.json		example.env.json
rest-catalog-open-api.yaml		rest-catalog-open-api.yaml

AlexMercedCoder/apache-iceberg-ui

Folders and files

Latest commit

History

Repository files navigation