Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: bug
assignees: ''

---

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Logs**
If applicable, add logs to help explain your problem.

**Additional context**
Add any other context about the problem here.
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
blank_issues_enabled: true
20 changes: 20 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: enhancement
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
41 changes: 36 additions & 5 deletions .github/workflows/dist_pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,44 @@ jobs:
ci_tools_version: main
extension_name: infera
enable_rust: true
exclude_archs: "linux_amd64_musl;windows_amd64_mingw;osx_amd64;wasm_mvp;wasm_eh;wasm_threads"
exclude_archs: "windows_amd64_mingw;osx_amd64;wasm_mvp;wasm_eh;wasm_threads"

duckdb-stable-build:
uses: duckdb/extension-ci-tools/.github/workflows/[email protected].0
uses: duckdb/extension-ci-tools/.github/workflows/[email protected].1
with:
duckdb_version: v1.4.0
ci_tools_version: v1.4.0
duckdb_version: v1.4.1
ci_tools_version: v1.4.1
extension_name: infera
enable_rust: true
exclude_archs: "linux_amd64_musl;windows_amd64_mingw;osx_amd64;wasm_mvp;wasm_eh;wasm_threads"
exclude_archs: "windows_amd64_mingw;osx_amd64;wasm_mvp;wasm_eh;wasm_threads"

create-release-draft:
name: Create Draft Release with Built Binaries
needs:
- duckdb-stable-build
if: startsWith(github.ref, 'refs/tags/')
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- name: Download All Build Artifacts
uses: actions/download-artifact@v4
with:
path: dist
merge-multiple: true
- name: List Artifacts
run: |
echo "Downloaded artifacts to: $(pwd)/dist"
ls -la dist || true
find dist -type f -maxdepth 2 -print || true
- name: Create Draft Release and Upload Assets
uses: softprops/action-gh-release@v2
with:
draft: true
name: Infera ${{ github.ref_name }}
tag_name: ${{ github.ref_name }}
generate_release_notes: true
files: |
dist/**
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
default_stages: [ pre-push ]
fail_fast: false
exclude: '(^external/)'

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
Expand Down
1 change: 0 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ EXT_NAME := infera
RUST_LIB := infera/target/release/$(EXT_NAME).a
DUCKDB_SRCDIR := ./external/duckdb/
EXT_CONFIG := ${PROJ_DIR}extension_config.cmake
TESTS_DIR := tests
EXAMPLES_DIR := docs/examples
SHELL := /bin/bash
PYTHON := python3
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ See the [ROADMAP.md](ROADMAP.md) for the list of implemented and planned feature
git clone --recursive https://github.com/CogitatorTech/infera.git
cd infera

# This might take a while to run
make release
```

Expand Down
5 changes: 3 additions & 2 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@ It outlines features to be implemented and their current status.
* **Input Data Types**
* [x] `FLOAT` features from table columns.
* [x] Type casting from `INTEGER`, `BIGINT`, and `DOUBLE` columns.
* [x] Type casting from `DECIMAL` columns.
* [x] `BLOB` input for tensor data.
* [ ] `STRUCT` or `MAP` input for named features.
* **Output Data Types**
* [x] Single `FLOAT` scalar output.
* [x] Multiple `FLOAT` outputs as a `VARCHAR` containing JSON.
* [x] Multiple `FLOAT` outputs as a `LIST[FLOAT]`.
* [ ] Return multiple outputs as a `STRUCT`.
* **Batch Processing**
* [x] Inference on batches for models with dynamic dimensions.
* [ ] Automatic batch splitting for models with a fixed batch size.
Expand All @@ -32,7 +32,8 @@ It outlines features to be implemented and their current status.
* [x] Unload models from memory.
* [x] List loaded models.
* [x] Get model metadata as a JSON object.
* [ ] Cache eviction policies for remote models.
* [x] Check if a model is currently loaded.
* [x] Cache eviction policies for remote models.

### 3. Performance and Concurrency

Expand Down
252 changes: 252 additions & 0 deletions docs/CONFIGURATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
## Infera's Configuration Guide

Infera supports configuration via environment variables to customize its behavior without code changes.

### Environment Variables

#### Cache Configuration

##### INFERA_CACHE_DIR

- **Description**: Directory path for caching remote models
- **Type**: String (path)
- **Default**: `$TMPDIR/infera_cache` (system temp directory)
- **Example**:
```bash
export INFERA_CACHE_DIR="/var/cache/infera"
```

##### INFERA_CACHE_SIZE_LIMIT

- **Description**: Maximum cache size in bytes
- **Type**: Integer (bytes)
- **Default**: `1073741824` (1GB)
- **Example**:
```bash
## Set to 5GB
export INFERA_CACHE_SIZE_LIMIT=5368709120

## Set to 500MB
export INFERA_CACHE_SIZE_LIMIT=524288000
```

##### INFERA_CACHE_EVICTION

- **Description**: Cache eviction strategy to use when cache is full
- **Type**: String (`LRU`, `LFU`, `FIFO`)
- **Default**: `LRU` (Least Recently Used)
- **Example**:
```bash
export INFERA_CACHE_EVICTION=LRU
## Note: Currently only LRU is implemented, LFU and FIFO are planned
```

#### HTTP Configuration

##### INFERA_HTTP_TIMEOUT

- **Description**: HTTP request timeout in seconds for downloading remote models
- **Type**: Integer (seconds)
- **Default**: `30`
- **Example**:
```bash
export INFERA_HTTP_TIMEOUT=60
```

##### INFERA_HTTP_RETRY_ATTEMPTS

- **Description**: Number of retry attempts for failed downloads
- **Type**: Integer
- **Default**: `3`
- **Example**:
```bash
## Retry up to 5 times on failure
export INFERA_HTTP_RETRY_ATTEMPTS=5
```

##### INFERA_HTTP_RETRY_DELAY

- **Description**: Initial delay between retry attempts in milliseconds (uses exponential backoff)
- **Type**: Integer (milliseconds)
- **Default**: `1000` (1 second)
- **Example**:
```bash
## Wait 2 seconds between retries
export INFERA_HTTP_RETRY_DELAY=2000
```

#### Logging Configuration

##### INFERA_VERBOSE

- **Description**: Enable verbose logging (deprecated, use INFERA_LOG_LEVEL instead)
- **Type**: Boolean (`1`, `true`, or `0`, `false`)
- **Default**: `false`
- **Example**:
```bash
export INFERA_VERBOSE=1
```

##### INFERA_LOG_LEVEL

- **Description**: Set logging level for detailed output
- **Type**: String (`ERROR`, `WARN`, `INFO`, `DEBUG`)
- **Default**: `WARN`
- **Example**:
```bash
## Show all messages including debug
export INFERA_LOG_LEVEL=DEBUG

## Show only errors
export INFERA_LOG_LEVEL=ERROR

## Show informational messages and above
export INFERA_LOG_LEVEL=INFO
```

### Usage Examples

#### Example 1: Custom Cache Directory

```bash
## Set custom cache directory
export INFERA_CACHE_DIR="/mnt/fast-ssd/ml-cache"

## Start DuckDB
./build/release/duckdb

## Check configuration
SELECT infera_get_version();
SELECT infera_get_cache_info();
```

#### Example 2: Larger Cache for Big Models

```bash
## Set cache to 10GB for large models
export INFERA_CACHE_SIZE_LIMIT=10737418240

## Load large models from remote URLs
./build/release/duckdb
```

#### Example 3: Production Configuration

```bash
## Complete production configuration
export INFERA_CACHE_DIR="/var/lib/infera/cache"
export INFERA_CACHE_SIZE_LIMIT=5368709120 ## 5GB
export INFERA_HTTP_TIMEOUT=120 ## 2 minutes
export INFERA_HTTP_RETRY_ATTEMPTS=5 ## Retry up to 5 times
export INFERA_HTTP_RETRY_DELAY=2000 ## 2 second initial delay
export INFERA_LOG_LEVEL=WARN ## Production logging
export INFERA_CACHE_EVICTION=LRU ## LRU cache strategy

## Run DuckDB with Infera
./build/release/duckdb
```

#### Example 4: Development/Debug Configuration

```bash
## Development setup with verbose logging
export INFERA_CACHE_DIR="./dev-cache"
export INFERA_LOG_LEVEL=DEBUG ## Detailed debug logs
export INFERA_HTTP_TIMEOUT=10 ## Shorter timeout for dev
export INFERA_HTTP_RETRY_ATTEMPTS=1 ## Fail fast in development

## Run DuckDB
./build/release/duckdb
```

#### Example 5: Slow Network Configuration

```bash
## Configuration for slow or unreliable networks
export INFERA_HTTP_TIMEOUT=300 ## 5 minute timeout
export INFERA_HTTP_RETRY_ATTEMPTS=10 ## Many retries
export INFERA_HTTP_RETRY_DELAY=5000 ## 5 second initial delay
export INFERA_LOG_LEVEL=INFO ## Track download progress

./build/release/duckdb
```

### Configuration Verification

You can verify your configuration at runtime:

```sql
-- Check version and cache directory
SELECT infera_get_version();

-- Check cache statistics
SELECT infera_get_cache_info();
```

Example output:

```json
{
"cache_dir": "/var/cache/infera",
"total_size_bytes": 204800,
"file_count": 3,
"size_limit_bytes": 5368709120
}
```

### Retry Policy Details

When downloading remote models, Infera automatically retries failed downloads with exponential backoff:

1. **Attempt 1**: Download immediately
2. **Attempt 2**: Wait `INFERA_HTTP_RETRY_DELAY` milliseconds (e.g., 1 second)
3. **Attempt 3**: Wait `INFERA_HTTP_RETRY_DELAY * 2` milliseconds (e.g., 2 seconds)
4. **Attempt N**: Wait `INFERA_HTTP_RETRY_DELAY * N` milliseconds

This helps handle temporary network issues, server rate limiting, and transient failures.

### Logging Levels

Logging levels control the verbosity of output to stderr:

- **ERROR**: Only critical errors that prevent operations
- **WARN**: Warnings about potential issues (default)
- **INFO**: Informational messages about operations (cache hits/misses, downloads)
- **DEBUG**: Detailed debugging information (retry attempts, file sizes, etc.)

Example log output with `INFERA_LOG_LEVEL=INFO`:

```
[INFO] Cache miss for URL: https://example.com/model.onnx, downloading...
[INFO] Successfully downloaded: https://example.com/model.onnx
[INFO] Cache hit for URL: https://example.com/model.onnx
```

Example log output with `INFERA_LOG_LEVEL=DEBUG`:

```
[DEBUG] Download attempt 1/3 for https://example.com/model.onnx
[INFO] Successfully downloaded: https://example.com/model.onnx
[DEBUG] Downloaded file size: 15728640 bytes
```

### Cache Eviction Strategies

Currently implemented:

- **LRU (Least Recently Used)**: Evicts files that haven't been accessed in the longest time

Planned for future releases:

- **LFU (Least Frequently Used)**: Evicts files with the lowest access count
- **FIFO (First In First Out)**: Evicts oldest downloaded files first

### Notes

- Environment variables are read once when Infera initializes
- Changes to environment variables require restarting DuckDB
- Invalid values fall back to defaults (no errors thrown)
- Cache directory is created automatically if it doesn't exist
- LRU eviction happens automatically when cache limit is reached
- Logging output goes to stderr and doesn't interfere with SQL query results
- Retry delays use exponential backoff to handle rate limiting gracefully
Loading