AgenticVisionForge is a tool for refining AI-generated images through an iterative feedback loop. By combining ComfyUI for image generation with flexible AI models from Ollama (local inference) and Gemini (cloud-based API), the tool enables dynamic and customizable workflows for creating high-quality outputs.
- ComfyUI Integration: Generate images using a customizable ComfyUI workflow.
- Vision Model Flexibility: Use any vision model from Ollama or Gemini for image evaluation.
- Text Model Flexibility: Employ thinking models (e.g., DeepSeek R1) or standard text models for prompt refinement.
- Mix-and-Match Models: Combine Ollama and Gemini in any configuration for vision and text tasks.
- Automated Feedback Loop: Generate → Evaluate → Refine → Repeat until the desired quality is reached.
- Advanced Handling of
<think>Tags: Automatically removes<think>tags from outputs of thinking models before sending prompts to ComfyUI.
- ComfyUI:
- Install and run locally. See the ComfyUI GitHub for details.
- Ollama (optional):
- Download and install from Ollama. Start the server with:
ollama serve
- Download models such as
llama3.2-visionorDeepSeek R1:ollama pull llama3.2-vision ollama pull deepseek-r1
- Download and install from Ollama. Start the server with:
- Gemini (optional):
- Obtain an API key from AI Studio.
-
Clone the repository:
git clone https://github.com/yourname/agentic-vision-forge.git cd agentic-vision-forge -
Set up a Python environment:
python -m venv venv source venv/bin/activate # Linux/Mac venv\Scripts\activate # Windows
-
Install dependencies:
pip install -r requirements.txt
-
Start ComfyUI:
cd /path/to/ComfyUI python main.py --port 8188 -
Configure models:
- Ollama (if using): Ensure
ollama serveis running and the required models are downloaded. - Gemini (if using): Set your API key in the configuration file.
- Ollama (if using): Ensure
- Open ComfyUI and design your workflow.
- Export the workflow in API format.
- Replace the contents of
comfyui_prompt_template.jsonwith your exported workflow.- Ensure your workflow includes placeholders like
"PROMPT_PLACEHOLDER"for the input prompt.
- Ensure your workflow includes placeholders like
- Copy
config.example.yamltoconfig.yaml. - Customize the following options:
- ComfyUI Settings:
comfyui: api_url: "http://localhost:8188" output_dir: "comfyui_outputs"
- Vision Models:
vision: provider: "ollama" # or "gemini" ollama: model: "llama3.2-vision" api_url: "http://localhost:11434/api/generate" gemini: model: "gemini-2.0-flash-exp" api_key: "YOUR_GEMINI_API_KEY"
- Text Models:
text: provider: "ollama" # or "gemini" ollama: model: "deepseek-r1" strip_think_tags: true gemini: model: "gemini-2.0-flash-exp"
- Iteration Settings:
iterations: max_iterations: 10 success_threshold: 90
- ComfyUI Settings:
You can use your own ComfyUI workflows with this tool. Here's how to set up a custom workflow:
-
Design Your Workflow in ComfyUI:
- Build your workflow as normal in the ComfyUI interface
- Make sure your workflow includes these essential nodes:
- A
CLIPTextEncodenode for the prompt - A
SaveImagenode for output - A
RandomNoisenode (or any node with anoise_seedinput)
- A
-
Prepare the Prompt Node:
- Find your
CLIPTextEncodenode - Set its text input to exactly:
PROMPT_PLACEHOLDER - This is where the tool will insert generated prompts
- Find your
-
Export the Workflow:
- Click the "Save (API Format)" button in ComfyUI
- This will download a JSON file
- Copy the contents of this file to
comfyui_prompt_template.json
-
Verification: The tool will automatically find the required nodes in your workflow by looking for:
- Any
CLIPTextEncodenode containingPROMPT_PLACEHOLDER - Any
SaveImagenode for saving the output - A random seed node (identified by):
- Class type
RandomNoise, or - Title containing "Random", or
- Any node with a
noise_seedinput
- Class type
- Any
-
Error Messages: If your workflow is missing any required components, you'll see helpful error messages like:
- "No CLIPTextEncode node with PROMPT_PLACEHOLDER found..."
- "No SaveImage node found in workflow"
- "No random seed node found..."
This flexible setup allows you to use any workflow structure as long as it includes these basic components. The tool will automatically adapt to your workflow's node IDs and configuration.
Run the tool with your desired goal:
python main.py --goal "A futuristic cityscape with flying cars"--max_iterations: Override the maximum iterations inconfig.yaml.--run_name: Specify a custom name for the run.--output_dir: Set a custom directory for output images and logs.
- Input the Goal: Provide a description of your desired image.
- Generate an Image: ComfyUI uses the configured workflow to generate an image.
- Evaluate the Image: A vision model analyzes the image and provides feedback.
- Refine the Prompt: A text model refines the prompt based on feedback.
- Repeat: The process continues until the success threshold or iteration limit is reached.
- Ollama: Any vision model, such as
llama3.2-vision. - Gemini: Models like
gemini-2.0-flash-exp.
- Ollama: Use models like
DeepSeek R1with<think>tag support. - Gemini: Standard text models for prompt refinement.
- Invalid Workflow:
- Ensure the workflow in
comfyui_prompt_template.jsonis exported in API format.
- Ensure the workflow in
- Connection Issues:
- Verify that ComfyUI and Ollama servers are running if configured.
- Ensure the Gemini API key is set in
config.yaml.
- No Output:
- Check if the output directory in
config.yamlhas the correct permissions.
- Check if the output directory in