tap-redshift is a Singer tap for Amazon Redshift, built with the Meltano Tap SDK for Singer Taps.
- Multiple Authentication Methods: Support for username/password and IAM authentication
- Redshift Serverless Support: Full support for Redshift Serverless workgroups
- S3 UNLOAD Optimization: Uses Redshift's UNLOAD command to S3 for better performance on large datasets
- Comprehensive Type Mapping: Complete mapping from Redshift SQL types to JSON Schema types
- SSL Support: Secure connections with configurable SSL modes
Install from GitHub:
pipx install git+https://github.com/tobiascadee/tap-redshift.git@main{
"host": "your-cluster.abc123.us-west-2.redshift.amazonaws.com",
"port": "5439",
"database": "your_database",
"schema": "public",
"user": "your_username",
"password": "your_password",
"use_iam_authentication": false
}{
"host": "your-cluster.abc123.us-west-2.redshift.amazonaws.com",
"port": 5439,
"database": "your_database",
"schema": "public",
"use_iam_authentication": true,
"db_user": "your_db_user",
"cluster_identifier": "your-cluster",
"aws_region": "us-west-2",
"aws_access_key_id": "your_access_key",
"aws_secret_access_key": "your_secret_key"
}{
"host": "your-workgroup.123456789.us-west-2.redshift-serverless.amazonaws.com",
"database": "your_database",
"schema": "public",
"use_iam_authentication": true,
"is_serverless": true,
"serverless_work_group": "your-workgroup",
"aws_region": "us-west-2"
}For improved performance on large datasets, configure S3 UNLOAD:
{
"s3_bucket": "your-unload-bucket",
"s3_key_prefix": "redshift-unload"
}When enabled, the tap will:
- Use Redshift's UNLOAD command to export data to S3
- Download the files locally for processing
- Clean up S3 files after processing
| Setting | Description |
|---|---|
host |
Redshift cluster endpoint |
database |
Database name |
| Setting | Description | Required |
|---|---|---|
user |
Username for database auth | If not using IAM |
password |
Password for database auth | If not using IAM |
use_iam_authentication |
Use IAM instead of username/password | No (default: false) |
db_user |
Database user for IAM auth | If using IAM |
cluster_identifier |
Cluster identifier for IAM auth | If using IAM with provisioned cluster |
serverless_work_group |
Workgroup name | If using Serverless |
| Setting | Description | Required |
|---|---|---|
aws_region |
AWS region | If using IAM or S3 |
aws_access_key_id |
AWS access key | If not using IAM roles/profiles |
aws_secret_access_key |
AWS secret key | If not using IAM roles/profiles |
aws_session_token |
AWS session token | For temporary credentials |
aws_profile |
AWS profile name | If using AWS profiles |
| Setting | Description | Default |
|---|---|---|
port |
Redshift port | 5439 |
schema |
Schema to extract from | public |
ssl |
Use SSL connection | true |
sslmode |
SSL mode | require |
is_serverless |
Using Serverless | false |
s3_key_prefix |
S3 key prefix for UNLOAD | redshift-unload |
dates_as_string |
Convert dates to strings vs date formats | true |
super_as_object |
Treat SUPER columns as objects vs strings | false |
The tap includes a dedicated RedshiftSQLToJSONSchema converter class that extends the Singer SDK's base converter with Redshift-specific functionality:
| Redshift Type | JSON Schema Type | Notes |
|---|---|---|
| Standard SQL Types | Handled by base SQLToJSONSchema class | |
| SMALLINT, INTEGER, BIGINT | integer | With appropriate min/max values |
| DECIMAL, NUMERIC | number | With precision handling via registered method |
| REAL, FLOAT, DOUBLE PRECISION | number | |
| BOOLEAN | boolean | |
| CHAR, VARCHAR, TEXT | string | |
| DATE, TIMESTAMP, TIME | string | Configurable format via registered methods |
| Redshift-Specific Types | Custom handled by RedshiftSQLToJSONSchema | |
| SUPER | string or object | Configurable via super_as_object |
| GEOMETRY, GEOGRAPHY | string | Spatial data |
| HLLSKETCH | string | HyperLogLog sketches |
The converter uses the Singer SDK standard pattern:
- Class attribute:
RedshiftConnector.sql_to_jsonschema_converter = RedshiftSQLToJSONSchema - Base class handles standard SQLAlchemy types (integers, strings, booleans, etc.)
- Registered methods override date/time types for configurable format handling
- String-based dispatch handles Redshift-specific types from database introspection
- Automatic instantiation: The SDK automatically creates the converter instance with tap config
-
dates_as_string(default:true): Whentrue, date/time columns are converted to plain strings. Whenfalse, they include proper JSON Schema format constraints (date,date-time,time). -
super_as_object(default:false): Whentrue, SUPER columns are treated as flexible objects that can contain JSON-like data. Whenfalse, they are treated as strings.
Example configuration for structured data handling:
{
"dates_as_string": false,
"super_as_object": true
}# Test connection
tap-redshift --config config.json --discover > catalog.json
# Extract data
tap-redshift --config config.json --catalog catalog.jsonAdd to your meltano.yml:
extractors:
- name: tap-redshift
namespace: tap_redshift
pip_url: git+https://github.com/tobiascadee/tap-redshift.git@main
settings:
- name: host
- name: database
- name: user
- name: password
kind: password
sensitive: true
# ... other settingsThen run:
meltano install extractor tap-redshift
meltano invoke tap-redshift --discover > catalog.json
meltano run tap-redshift target-jsonl- IAM Authentication: Use IAM authentication for better security
- Appropriate Batch Size: Adjust
batch_sizebased on your data size - Schema Selection: Specify the exact schema to avoid unnecessary discovery
git clone https://github.com/tobiascadee/tap-redshift.git
cd tap-redshift
uv syncuv run pytestuv run tap-redshift --helpsinger-sdk: Meltano Singer SDKredshift-connector: Official AWS Redshift connectorboto3: AWS SDK for S3 operationssqlalchemy: SQL toolkit
Apache 2.0