This guide will help you deploy your AI models using peer-to-peer infrastructure. Whether you're a developer or AI enthusiast, you'll learn how to run your models with complete sovereignty - maintaining full control over your AI weights through peer-to-peer storage.
- Key Features
- Before You Start
- Getting Started
- CLI Overview
- Running Models
- Using the API
- Multi-Model Support
- Advanced Usage
- Additional Information
- Migration Guide
- Need Help?
π TRUE Peer-to-Peer Model Storage
- Unlike Ollama/LMStudio that rely on centralized repositories (Hugging Face, GitHub), Eternal Zoo uses IPFS/Filecoin for permanent, censorship-resistant model distribution
- Models are stored across a distributed network - no single point of failure or control
- Access your models even if traditional platforms go down or restrict access
π Ultimate Privacy with Local Execution
- 100% local inference - your data never touches external servers (unlike cloud AI services)
- Zero telemetry - no usage tracking, no model access logs, no data collection
- Air-gapped capability - run models completely offline once downloaded
- ποΈ Sovereign Weights: Maintain complete ownership and control over your AI models
- π‘οΈ Zero Trust Privacy: Your prompts, responses, and model usage remain completely private
- π OpenAI Compatibility: Use familiar API endpoints with your existing tools
- ποΈ Multi-Model Support: Works with both text and vision models
- π Model Access: Run any GGUF model directly from Hugging Face
- β‘ Parallel Processing: Efficient model compression and upload
- π Automatic Retries: Robust error handling for network issues
- π Metadata Management: Comprehensive model information tracking
In an era of increasing AI centralization, Eternal Zoo puts you back in control:
- Own Your Models: Models are stored on peer-to-peer infrastructure, not controlled by any single entity
- Private by Design: All inference happens locally on your hardware - no external API calls, no data collection
- Censorship Resistant: peer-to-peer storage ensures your models remain accessible regardless of platform policies
- Vendor Independence: Break free from proprietary AI services and their limitations
- macOS or Linux operating system
- Sufficient RAM for your chosen model (see model specifications below)
- Stable internet connection for model uploads
make install-macos
bash ubuntu.sh
bash jetson.sh
- Activate the virtual environment:
source activate.sh
Remember: Activate this environment each time you use the Eternal Zoo (
eai
) tools
- Verify your installation:
eai --version
Eternal Zoo uses a structured command hierarchy for better organization. All model operations are grouped under the model
subcommand:
# Model operations
eai model run --hash <hash> # Run a model server
eai model run <model-name> # Run a preserved model (e.g., qwen3-1.7b)
eai model run --hf-repo <repo> --hf-file <file> --task <task> # Run any GGUF model from Hugging Face
eai model run --config <config.yaml> # Run multiple models using config file
eai model serve [--main-hash <hash>] # Serve all downloaded models (with optional main model)
eai model stop # Stop the running model server
eai model status # Check which model is running
eai model download --hash <hash> # Download a model from IPFS
eai model preserve --folder-path <path> # Upload/preserve a model to IPFS
# General commands
eai --version # Show version information
# Run a preserved model (user-friendly)
eai model run qwen3-1.7b
# Run any model by hash
eai model run --hash bafkreiacd5mwy4a5wkdmvxsk42nsupes5uf4q3dm52k36mvbhgdrez422y
# Run any GGUF model directly from Hugging Face
eai model run --hf-repo dphn/Dolphin3.0-Llama3.1-8B-GGUF --hf-file Dolphin3.0-Llama3.1-8B-Q4_0.gguf --task chat
# Run multiple models using config file
eai model run --config config.yaml
# Serve all downloaded models with specific main model
eai model serve --main-hash bafkreiacd5mwy4a5wkdmvxsk42nsupes5uf4q3dm52k36mvbhgdrez422y
# Check status
eai model status
# Stop the running model
eai model stop
# Download a model locally
eai model download --hash bafkreiacd5mwy4a5wkdmvxsk42nsupes5uf4q3dm52k36mvbhgdrez422y
# Upload your own model
eai model preserve --folder-path ./my-model-folder --task chat --ram 8.5
Eternal Zoo now supports running any GGUF model directly from Hugging Face without needing to download or preserve it first. This gives you instant access to thousands of models available on Hugging Face.
# Run any GGUF model from Hugging Face
eai model run --hf-repo <repository> --hf-file <filename> --task <task_type>
# Example: Run Dolphin-3.0 model
eai model run --hf-repo dphn/Dolphin3.0-Llama3.1-8B-GGUF --hf-file Dolphin3.0-Llama3.1-8B-Q4_0.gguf --task chat
--hf-repo
: Hugging Face repository (e.g.,dphn/Dolphin3.0-Llama3.1-8B-GGUF
)--hf-file
: Specific GGUF file name (e.g.,Dolphin3.0-Llama3.1-8B-Q4_0.gguf
)--task
: Task type (chat
for text generation,embed
for embeddings)
- Visit Hugging Face and search for GGUF models
- Navigate to the model repository
- Find the
.gguf
file you want to use - Use the repository name and file name in your command
# Language Model
eai model run --hf-repo bartowski/phi-4-GGUF --hf-file phi-4-Q4_K_M.gguf --task chat
# Large Model with gguf-split files
eai model run --hf-repo unsloth/Qwen3-235B-A22B-Thinking-2507-GGUF --pattern Q4_K_M
# Vision Language Model
eai model run --hf-repo unsloth/Qwen2.5-VL-7B-Instruct-GGUF --hf-file Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf --mmproj mmproj-F16.gguf --task chat
# Embedding Model
eai model run --hf-repo jinaai/jina-embeddings-v4-text-matching-GGUF --hf-file jina-embeddings-v4-text-matching-Q4_K_M.gguf --task embed
- Instant Access: No need to download or preserve models first
- Huge Selection: Access to thousands of models on Hugging Face
- Latest Models: Try new models as soon as they're uploaded
- Flexible: Works with any GGUF-compatible model
Eternal Zoo supports running multiple models simultaneously using YAML configuration files. This allows you to specify which model should be the main model (loaded immediately) and which models should be loaded on-demand.
Create a YAML file (e.g., config.yaml
) with the following structure:
port: 8080
host: 0.0.0.0
models:
# Main model - loaded immediately when server starts
gpt-oss-20b:
model: gpt-oss-20b # Featured model name from eternal-zoo
on_demand: True # Main model - loaded immediately
# On-demand models - loaded only when requested
qwen3-30b-instruct:
model: qwen3-30b-a3b-instruct-2507 # Featured model name from eternal-zoo
on_demand: True # Main model - loaded immediately
qwen3-30b-thinking:
model: qwen3-30b-a3b-thinking-2507 # Featured model name from eternal-zoo
on_demand: True # Main model - loaded immediately
qwen3-30b-coder:
model: qwen3-coder-30b-a3b-instruct
on_demand: True
flux-dev-nsfw:
model: flux-dev-nsfw # Featured model name from eternal-zoo
on_demand: True # On-demand model - loaded only when requested
# Hugging Face model
medgemma-27b:
hf_repo: unsloth/medgemma-27b-it-GGUF
model: medgemma-27b-it-Q8_0.gguf
mmproj: mmproj-F16.gguf
task: chat
on_demand: True # On-demand model - loaded when requested
# Embedding model
jina-embeddings-v4:
hf_repo: jinaai/jina-embeddings-v4-text-code-GGUF
model: jina-embeddings-v4-text-code-F16.gguf
task: embed
on_demand: True # On-demand model - loaded when requested
Option | Description | Default |
---|---|---|
model |
Featured model name from eternal-zoo or model name from Hugging Face | - |
hf_repo |
Hugging Face repository | - |
task |
Model task type (chat , embed , image-generation ) |
chat |
mmproj |
Multimodal projector file (for vision models) | - |
architecture |
Model architecture (for image-generation models) | - |
lora |
LoRA model (for LoRA models) | - |
base_model |
Base model name (for LoRA models) | - |
on_demand |
False for main model, True for on-demand models |
True |
# Run multiple models using config file
eai model run --config config.yaml
- Primary Rule: The first model with
on_demand: False
becomes the main model - Fallback: If no model has
on_demand: False
, the first model in the config becomes the main model - Behavior: The main model is loaded immediately when the server starts, while on-demand models are loaded only when first requested
- π― Precise Control: Specify exactly which model should be main vs on-demand
- π¦ Easy Deployment: Store your model configuration in version control
- π Reproducible: Share config files for consistent setups across environments
- β‘ Performance: Optimize loading times by choosing the right main model
- π οΈ Flexible: Mix different model sources (featured, IPFS, Hugging Face)
We've prepared several models for you to test with. Each model is listed with its specifications and command to run.
Model | Size | RAM | Command |
---|---|---|---|
gpt-oss-20b | 63.4 GB | _ | eai model run gpt-oss-20b |
gpt-oss-120b | 120 GB | _ | eai model run gpt-oss-120b |
Model | Size | RAM | Command |
---|---|---|---|
qwen3-embedding-0.6b | 649 MB | 1.16 GB | eai model run qwen3-embedding-0.6b |
qwen3-embedding-4b | 4.28 GB | 5.13 GB | eai model run qwen3-embedding-4b |
qwen3-1.7b | 1.83 GB | 5.71 GB | eai model run qwen3-1.7b |
qwen3-4b | 4.28 GB | 9.5 GB | eai model run qwen3-4b |
qwen3-8b | 6.21 GB | 12 GB | eai model run qwen3-8b |
qwen3-14b | 15.7 GB | 19.5 GB | eai model run qwen3-14b |
qwen3-30b-a3b | 31 GB | 37.35 GB | eai model run qwen3-30b-a3b |
qwen3-32b | 34.8 GB | 45.3 GB | eai model run qwen3-32b |
qwen3-235b-a22b | 132 GB | 155.61 GB | eai model run qwen3-235b-a22b |
qwen3-coder-480b-a35b | 260 GB | 305.66 GB | eai model run qwen3-coder-480b-a35b |
Model | Size | RAM | Command |
---|---|---|---|
dolphin-3.0-llama3.1-8b | 8.54 GB | 13.29 GB | eai model run dolphin-3.0-llama3.1-8b |
Model | Size | RAM | Command |
---|---|---|---|
gemma-3-4b | 3.16 GB | 7.9 GB | eai model run gemma-3-4b |
gemma-3-12b | 8.07 GB | 21.46 GB | eai model run gemma-3-12b |
gemma-3-27b | 17.2 GB | 38.0 GB | eai model run gemma-3-27b |
Model | Size | RAM | Command |
---|---|---|---|
gemma-3n-e4b | 7.35 GB | 10.08 GB | eai model run gemma-3n-e4b |
Learn more about Devstral-Small
Model | Size | RAM | Command |
---|---|---|---|
devstral-small | 25.1 GB | 31.71 GB | eai model run devstral-small |
Model | Size | RAM | Command |
---|---|---|---|
lfm2-1.2b | 1.25 GB | 1.72 GB | eai model run lfm2-1.2b |
Learn more about OpenReasoning-Nemotron
Model | Size | RAM | Command |
---|---|---|---|
openreasoning-nemotron-32b | 34.8 GB | 45.21 GB | eai model run openreasoning-nemotron-32b |
Generate high-quality images from text prompts using Flux models.
Model | Size | RAM | Command |
---|---|---|---|
flux-dev | 32 GB | 20 GB | eai model run flux-dev |
flux-schnell | 32 GB | 20 GB | eai model run flux-schnell |
flux-dev-nsfw | 32 GB | 20.1 GB | eai model run flux-dev-nsfw |
Note: The flux-dev-nsfw
model includes LoRA safetensor support for enhanced image generation capabilities.
Example usage:
eai model run flux-dev
The API follows the OpenAI-compatible format, making it easy to integrate with existing applications.
If you would rather interact with your models through a user-friendly desktop interface, download the CryptoAgents app from our Agent Store: eternalai.org/agent-store.
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "local-model",
"messages": [
{"role": "user", "content": "Hello! Can you help me write a Python function?"}
],
"temperature": 0.7,
"max_tokens": 4096
}'
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "local-model",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What do you see in this image? Please describe it in detail."
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/your-image.jpg"
}
}
]
}
],
"temperature": 0.7,
"max_tokens": 4096
}'
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "local-model",
"input": ["Hello, world!"]
}'
curl -X POST http://localhost:8080/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "local-model",
"prompt": "A serene landscape with mountains and a lake at sunset",
"n": 1,
"size": "1024x1024"
}'
Eternal Zoo supports structured outputs using JSON schema, allowing you to get responses in a specific format. This is useful for extracting structured data from text.
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "local-model",
"messages": [
{
"role": "system",
"content": "Extract product information from the user input into the specified JSON format."
},
{
"role": "user",
"content": "I found this laptop: Apple MacBook Pro 13-inch with M2 chip, 8GB RAM, 256GB SSD, Space Gray, $1299"
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "Product",
"schema": {
"type": "object",
"properties": {
"product": {
"type": "object",
"properties": {
"name": {"type": "string"},
"brand": {"type": "string"},
"model": {"type": "string"},
"specifications": {
"type": "object",
"properties": {
"processor": {"type": "string"},
"memory": {"type": "string"},
"storage": {"type": "string"}
},
"required": ["processor", "memory", "storage"]
},
"color": {"type": "string"},
"price": {
"type": "number",
"description": "Price in USD"
}
},
"required": ["name", "brand", "model", "specifications", "color", "price"]
}
},
"required": ["product"]
}
}
},
"temperature": 0.1,
"max_tokens": 1000
}'
Expected Response:
{
"id": "chatcmpl-ExQkhfuRkFSjWCo2HXZBsO8LOrwpJdew",
"object": "chat.completion",
"created": 1754382667,
"model": "local-model",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "{\n \"product\": {\n \"name\": \"Apple MacBook Pro 13-inch\",\n \"brand\": \"Apple\",\n \"model\": \"MacBook Pro 13-inch\",\n \"specifications\": {\n \"processor\": \"M2 chip\",\n \"memory\": \"8GB RAM\",\n \"storage\": \"256GB SSD\"\n },\n \"color\": \"Space Gray\",\n \"price\": 1299\n }\n}",
"refusal": null,
"role": "assistant",
"function_call": null,
"tool_calls": null
}
}
]
}
Eternal Zoo supports running multiple models simultaneously with intelligent request-based switching. This allows you to serve several models from a single server instance with optimized resource usage.
- Single Model: Traditional approach - one model loaded and active
- Multi-Model: Multiple models available, with automatic switching based on requests
- Main Model: The primary model that's immediately loaded and active when the server starts
- On-Demand Models: Additional models that are loaded automatically when first requested
When a client makes a request specifying a model (via the model
field in the API), the system:
- Checks Current Model: If the requested model is already active β processes immediately
- Smart Switching: If a different model is requested β automatically switches to that model
- Stream Protection: Waits for active streams to complete before switching models
- Automatic Loading: On-demand models are loaded seamlessly when first requested
Incoming Request
β
Check Model Field
β
Model Field Present?
ββ NO β Use Main Model (immediate)
ββ YES β Validate Model ID
ββ Invalid ID/Folder Name β Use Main Model (fallback)
ββ Valid ID β Check Task Compatibility
ββ Task Mismatch β Use First Compatible Model (fallback)
ββ Task Compatible β Is Requested Model Active?
ββ YES β Process Request Immediately
ββ NO β Check for Active Streams
ββ Streams Active β Wait for Completion
ββ No Streams β Switch Model β Process Request
Key Validation Steps:
- Model ID Format: Only valid model IDs (from
/v1/models
id
field) work for switching - Task Compatibility: Models must support the requested endpoint type (chat/embed/image-generation)
- Model Availability: Requested model must exist in the available models list
- Resource Validation: Sufficient resources must be available for model switching
Fallback Behavior:
- Silent Fallbacks: Invalid models, task mismatches, and missing model fields default to main model without errors
- No Error Messages: The server gracefully handles invalid requests by using the main model
- Performance Impact: Fallback requests use main model (fastest response, no switching delay)
- Client Responsibility: Applications should validate models using
/v1/models
before making requests
Best Practices for Model Switching:
- Always Use Model IDs: Use the
id
field from/v1/models
as themodel
field in requests - Validate Before Requests: Check
/v1/models
to ensure your requested model exists and supports the task - Handle Fallbacks Gracefully: Implement client-side validation to detect when fallbacks occur
- Monitor Performance: Track switching delays and fallback rates in your applications
- Group Similar Tasks: Keep models of the same task type together for faster switching
- Choose Main Model Wisely: Select your most frequently used model as the main model for optimal performance
Run multiple models with precise control over main vs on-demand models:
eai model run --config config.yaml
Serve all downloaded models with a specific main model:
eai model serve --main-model <main-model-id>
or
eai model serve --main-hash <main-model-hash>
Once running, specify which model to use in your API requests:
First, check what models are available using the /v1/models
endpoint:
curl http://localhost:8080/v1/models
Example Response:
{
"object": "list",
"data": [
{
"id": "bafkreie4uj3gluik5ob2ib3cm2pt6ww7n4vqpmjnq6pas4gkkor42yuysa",
"object": "model",
"created": 1753757438,
"owned_by": "user",
"active": true,
"task": "chat",
"is_lora": false,
"multimodal": false,
"context_length": 32768,
"lora_config": null
}
]
}
Use the id
from /v1/models
as the model
field in your requests:
# Request using the available chat model
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "bafkreie4uj3gluik5ob2ib3cm2pt6ww7n4vqpmjnq6pas4gkkor42yuysa",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Use embedding model for text similarity
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "bafkreiacd5mwy4a5wkdmvxsk42nsupes5uf4q3dm52k36mvbhgdrez422y",
"input": ["Hello, world!", "How are you?"]
}'
# Generate images using available model
curl -X POST http://localhost:8080/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "bafkreibks5pmc777snbo7dwk26sympe2o24tpqfedjq6gmgghwwu7iio34",
"prompt": "A serene landscape with mountains and a lake at sunset",
"n": 1,
"steps": 25,
"size": "1024x1024"
}'
# Use vision-capable chat model
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "medgemma-27b-it-Q4_K_M.gguf",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What do you see in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}]
}'
Important: The server uses the following logic for model selection:
- Explicit Model Request: When you specify
"model": "<model_id>"
β switches to that model - No Model Field: When you omit the
"model"
field β uses the main model - Invalid Model ID: When you specify a non-existent or invalid model ID β falls back to main model
- Task Mismatch: When you request a model that doesn't support the endpoint task β uses first compatible model
# β
Correct: Switch to specific model using model ID
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "bafkreie4uj3gluik5ob2ib3cm2pt6ww7n4vqpmjnq6pas4gkkor42yuysa",
"messages": [{"role": "user", "content": "Use specific model ID"}]
}'
# β οΈ Fallback: No model specified β uses main model
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Uses main model"}]
}'
# β οΈ Fallback: Invalid model ID β uses main model
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "invalid-model-id-12345",
"messages": [{"role": "user", "content": "Falls back to main model"}]
}'
Key Points:
- Model ID Required: To switch models, always use valid model identifiers (the
id
field) from/v1/models
- Graceful Fallback: Invalid or missing model specifications default to the main model
- No Errors: The server won't throw errors for invalid models, it silently uses the main model
- Performance: Main model requests are always fastest (no switching delay)
You can use the following as the model
field:
- Model ID (Required for Switching):
"bafkreie4uj3gluik5ob2ib3cm2pt6ww7n4vqpmjnq6pas4gkkor42yuysa"
(fromid
field in/v1/models
) - Generic Fallback:
"local-model"
(uses currently active model - not recommended)
Important: Only model IDs work for switching between models. Other identifiers will cause the server to fall back to the main model.
Troubleshooting Model Switching:
Common Issues:
- Unexpected Model Used: Check if you're using the correct model ID from
/v1/models
- Slow Response Times: Verify the requested model is compatible with the endpoint task
- Fallback Behavior: Monitor logs to see when fallbacks occur and why
- Task Mismatch: Ensure chat models are used with
/v1/chat/completions
, embedding models with/v1/embeddings
, etc.
Debugging Steps:
- Check available models:
curl http://localhost:8080/v1/models
- Verify model ID format (should be valid identifier from the
id
field) - Confirm task compatibility between model and endpoint
- Monitor server logs for switching and fallback messages
- Smart Caching: Models stay loaded once activated
- Stream Protection: No interruptions during active conversations
- Resource Sharing: Efficient memory usage across models
- Fast Switching: Sub-second model switching when no streams are active
- First Request: On-demand models have a brief loading delay (2-10 seconds)
- Subsequent Requests: Instant response from cached models
- Memory Usage: Only active models consume GPU/RAM resources
- Concurrent Streams: Multiple simultaneous conversations supported
- Fallback Performance: Missing/invalid model requests use main model (no switching delay)
- Silent Fallback: No error messages for invalid models, seamless fallback to main model
- Start with Main Model: Choose your most frequently used model as main
- Group by Task: Use similar-sized models for consistent performance
- Monitor Resources: Ensure sufficient RAM for your model combination
- Validate Model Names: Always check
/v1/models
before making requests to ensure model exists - Explicit Model Specification: Always include the
"model"
field to avoid unexpected fallback - Client-Side Validation: Implement model validation in your application to prevent silent fallbacks
- Task-Specific Optimization:
- Chat Models: Prioritize context length and response quality
- Embedding Models: Consider batch processing capabilities
- Image Generation: Ensure adequate GPU memory for high-resolution outputs
- Vision Models: Account for image processing overhead
- Mixed-Task Servers: When serving multiple task types, start with the most resource-intensive model as main
- Model Switching: Keep similar task types together for faster switching
- Model ID Validation: Always use model IDs from
/v1/models
instead of folder names - Fallback Monitoring: Track when fallbacks occur to optimize model selection
- Stream Management: Be aware that active streams prevent model switching
# Check current server status and active model
eai model status
# Shows current active model and total loaded models
# List all available models with detailed information
curl http://localhost:8080/v1/models
# Returns model IDs, task types, and status information
Key Information Returned:
id
: Model identifier (unique identifier for API requests)object
: Always "model" for model objectscreated
: Unix timestamp when the model was createdowned_by
: Model owner (typically "user")active
: Whether the model is currently activetask
: Model type and compatible endpoints:"chat"
: Use with/v1/chat/completions
(includes vision models)"embed"
: Use with/v1/embeddings
"image-generation"
: Use with/v1/images/generations
is_lora
: Whether the model uses LoRA (Low-Rank Adaptation)multimodal
: Whether the model supports multimodal inputs (images + text)context_length
: Maximum context length supported by the modellora_config
: LoRA configuration (null if not a LoRA model)
Use the id
value as the model
field in your API requests to specify which model to use. Match the model's task
type with the appropriate API endpoint for optimal results.
- A model in
gguf
format (compatible withllama.cpp
) - A Lighthouse account and API key
New Feature: You can now run any GGUF model directly from Hugging Face without downloading or preserving it first:
# Run any GGUF model from Hugging Face
eai model run --hf-repo <repository> --hf-file <filename> --task <task_type>
# Example: Run Dolphin-3.0 model
eai model run --hf-repo dphn/Dolphin3.0-Llama3.1-8B-GGUF --hf-file Dolphin3.0-Llama3.1-8B-Q4_0.gguf --task chat
This is the easiest way to try new models - no setup required!
For models that require authentication or to access gated models, you can set up your Hugging Face token:
export HF_TOKEN=your_hugging_face_token_here
To get your Hugging Face token:
- Visit Hugging Face Settings
- Click "New token"
- Give it a name and select appropriate permissions
- Copy the token and export it in your terminal
Note: The token is optional for public models but required for gated/private models.
You can use eai model preserve
to upload your own gguf
models downloaded from Huggingface for deploying to the CryptoAgents platform.
The platform now supports multiple model types through the --task
parameter:
Use --task chat
for conversational AI and text generation models.
-
Download the model:
- Go to Huggingface and download your desired
.gguf
model - Example: Download
Qwen3-8B-Q8_0.gguf
- Go to Huggingface and download your desired
-
Prepare the folder structure:
- Create a new folder with a descriptive name (e.g.,
qwen3-8b-q8
) - Place the downloaded
.gguf
file inside this folder - Rename the file to match the folder name, but remove the
.gguf
extension
- Create a new folder with a descriptive name (e.g.,
Example Structure for Chat Models:
qwen3-8b-q8/ # Folder name
βββ qwen3-8b-q8 # File name (no .gguf extension)
Use --task embed
for text embedding and similarity models.
-
Download the embedding model:
- Go to Huggingface and download your desired embedding model in
.gguf
format - Example: Text embedding models like
Qwen3 Embedding 0.6B
or specialized embedding models
- Go to Huggingface and download your desired embedding model in
-
Prepare the folder structure:
- Create a new folder with a descriptive name (e.g.,
qwen3-embedding-0.6b-q8
) - Place the downloaded
.gguf
file inside this folder - Rename the file to match the folder name, but remove the
.gguf
extension
- Create a new folder with a descriptive name (e.g.,
Example Structure for Embedding Models:
qwen3-embedding-0.6b-q8/ # Folder name
βββ qwen3-embedding-0.6b-q8 # File name (no .gguf extension)
Use --task chat
for vision models as they are conversational models with image understanding capabilities.
-
Download the model files:
- Go to Huggingface and download both required files:
- The main model file (e.g.,
gemma-3-4b-it-q4_0.gguf
) - The projector file (e.g.,
mmproj-model-f16-4B.gguf
)
- The main model file (e.g.,
- Go to Huggingface and download both required files:
-
Prepare the folder structure:
- Create a new folder with a descriptive name (e.g.,
gemma-3-4b-it-q4
) - Place both downloaded files inside this folder
- Rename the files to match the folder name, but remove the
.gguf
extension - Add
-projector
suffix to the projector file
- Create a new folder with a descriptive name (e.g.,
Example Structure for Vision Models:
gemma-3-4b-it-q4/ # Folder name
βββ gemma-3-4b-it-q4 # Main model file (no .gguf extension)
βββ gemma-3-4b-it-q4-projector # Projector file (no .gguf extension)
Use the GGUF parser to estimate RAM usage:
npx @huggingface/gguf qwen3-8b-q8/qwen3-8b-q8 --context 32768
Basic Upload:
export LIGHTHOUSE_API_KEY=your_api_key
eai model preserve --folder-path qwen3-8b-q8
Advanced Upload with Metadata:
export LIGHTHOUSE_API_KEY=your_api_key
eai model preserve \
--folder-path qwen3-8b-q8 \
--task chat \
--ram 12 \
--hf-repo Qwen/Qwen3-8B-GGUF \
--hf-file Qwen3-8B-Q8_0.gguf \
--zip-chunk-size 512 \
--threads 16 \
--max-retries 5
Upload for Embedding Models:
export LIGHTHOUSE_API_KEY=your_api_key
eai model preserve \
--folder-path qwen3-embedding-0.6b-q8 \
--task embed \
--ram 1.16 \
--`hf`-repo Qwen/Qwen3-Embedding-0.6B-GGUF \
--hf-file Qwen3-Embedding-0.6B-Q8_0.gguf
Option | Description | Default | Required |
---|---|---|---|
--folder-path |
Folder containing the model files | - | β |
--task |
Task type: chat for text generation models, embed for embedding models |
chat |
β |
--ram |
RAM usage in GB at 32768 context length | - | β |
--hf-repo |
Hugging Face repository (e.g., Qwen/Qwen3-8B-GGUF ) |
- | β |
--hf-file |
Original Hugging Face filename | - | β |
--zip-chunk-size |
Compression chunk size in MB | 512 | β |
--threads |
Number of compression threads | 16 | β |
--max-retries |
Maximum upload retry attempts | 5 | β |
The upload process involves several steps:
- Compression: The model folder is compressed using
tar
andpigz
for optimal compression - Chunking: Large files are split into chunks (default: 512MB) for reliable uploads
- Parallel Upload: Multiple chunks are uploaded simultaneously for faster transfer
- Retry Logic: Failed uploads are automatically retried up to 20 times
- Metadata Generation: A metadata file is created with upload information and model details
- IPFS Storage: All files are stored on IPFS via Lighthouse.storage
Common Issues:
- Missing API Key: Ensure
LIGHTHOUSE_API_KEY
is set in your environment - Network Issues: The system will automatically retry failed uploads
- Insufficient RAM: Check the model's RAM requirements before uploading
- Invalid File Format: Ensure the model is in GGUF format
- We support GGUF files compatible with llama.cpp
- Convert other formats using llama.cpp conversion tools
- Choose quantization levels (Q4, Q6, Q8) based on your hardware capabilities
- Higher quantization (Q8) offers better quality but requires more resources
- Lower quantization (Q4) is more efficient but may affect model performance
- Monitor system resources during model operation
- Use appropriate context lengths for your use case
- All models are stored on IPFS with content addressing
- This ensures model integrity and secure distribution
- API keys are stored securely in environment variables
- Models are verified before deployment
-
Model Selection
- Choose models based on your hardware capabilities
- Consider quantization levels for optimal performance
- Test models locally before deployment
-
Resource Management
- Monitor RAM usage during model operation
- Adjust context length based on available memory
- Use appropriate batch sizes for your use case
-
API Usage
- Implement proper error handling
- Use appropriate timeouts for requests
- Cache responses when possible
- Monitor API usage and performance
-
Deployment
- Test models thoroughly before production use
- Keep track of model versions and CIDs
- Document model configurations and requirements
- Regular backups of model metadata
- Visit our website: eternalai.org
- Join our community: Discord
- Check our documentation for detailed guides and tutorials
- Report issues on our GitHub repository
- Contact support for enterprise assistance