Beta — feedback and bug reports welcome. Open an issue.
Chat with your documents locally using Ollama — or plug into AI agents as a retrieval backend via MCP. Indexes PDFs (including scanned via vision OCR), Office docs, spreadsheets, images, and code with a git-like per-project model. Powered by Kreuzberg for text extraction, Ollama for embeddings and chat, and LanceDB for vector storage.
- Why lilbee
- Demos
- Install
- Quick start · Full usage guide
- Agent integration
- HTTP Server · API reference
- Interactive chat
- Supported formats
lilbee indexes documents and code into a searchable local knowledge base. Use it standalone — search, ask questions, chat — or plug it into AI coding agents as a retrieval backend via MCP.
Most tools like this only handle code. lilbee handles PDFs, Word docs, spreadsheets, images (OCR) — and code too, with AST-aware chunking.
- Standalone knowledge base — add documents, search, ask questions, or chat interactively with model switching and slash commands
- AI agent backend — MCP server and JSON CLI so coding agents can search your indexed docs as context
- Per-project databases —
lilbee initcreates a.lilbee/directory (like.git/) so each project gets its own isolated index - Documents and code alike — PDFs, Office docs, spreadsheets, images, ebooks, and 150+ code languages via tree-sitter
- Open-source — runs with Ollama and LanceDB, no cloud APIs or Docker required
Add files (lilbee add), then search or ask questions. Once indexed, search works without Ollama — agents use their own LLM to reason over the retrieved chunks.
Click the ▶ arrows below to expand each demo.
AI agent — lilbee search vs web search (detailed analysis)
opencode + minimax-m2.5-free, single prompt, no follow-ups. The Godot 4.4 XML class reference (917 files) is indexed in lilbee. The baseline uses Exa AI code search instead.
| API hallucinations | Lines | |
|---|---|---|
| With lilbee (code · config) | 0 | 261 |
| Without lilbee (code · config) | 4 (~22% error rate) | 213 |
Without lilbee — 4 hallucinated APIs (details)
If you spot issues with these benchmarks, please open an issue.
Scanned PDF → searchable knowledge base
A scanned 1998 Star Wars: X-Wing Collector's Edition manual indexed with vision OCR (LightOnOCR-2), then queried in lilbee's interactive chat (qwen3-coder:30b, fully local). Three questions about dev team credits, energy management, and starfighter speeds — all answered from the OCR'd content.
See benchmarks, test documents, and sample output for model comparisons.
One-shot question from OCR'd content
The scanned Star Wars: X-Wing Collector's Edition guide, queried with a single lilbee ask command — no interactive chat needed.
Interactive local offline chat
[!NOTE] Entirely local on a 2021 M1 Pro with 32 GB RAM.
Model switching via tab completion, then a Q&A grounded in an indexed PDF.
Code index and search
Add a codebase and search with natural language. Tree-sitter provides AST-aware chunking.
When used standalone, lilbee runs entirely on your machine — chat with your documents privately, no cloud required.
| Resource | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16–32 GB |
| GPU / Accelerator | — | Apple Metal (M-series), NVIDIA GPU (6+ GB VRAM) |
| Disk | 2 GB (models + data) | 10+ GB if using multiple models |
| CPU | Any modern x86_64 / ARM64 | — |
Ollama handles inference and uses Metal on macOS or CUDA on Linux/Windows. Without a GPU, models fall back to CPU — usable for embedding but slow for chat.
- Python 3.11+
- Ollama — the embedding model (
nomic-embed-text) is auto-pulled on first sync. If no chat model is installed, lilbee prompts you to pick and download one. - Optional (for scanned PDF/image OCR): Tesseract (
brew install tesseract/apt install tesseract-ocr) or an Ollama vision model (recommended for better quality — see vision OCR)
First-time download: If you're new to Ollama, expect the first run to take a while — models are large files that need to be downloaded once. For example,
qwen3:8bis ~5 GB and the embedding modelnomic-embed-textis ~274 MB. After the initial download, models are cached locally and load in seconds. You can check what you have installed withollama list.
pip install lilbee # or: uv tool install lilbeegit clone https://github.com/tobocop2/lilbee && cd lilbee
uv sync
uv run lilbeeSee the usage guide.
lilbee can serve as a local retrieval backend for AI coding agents via MCP or JSON CLI. See docs/agent-integration.md for setup and usage.
lilbee includes a REST API server for programmatic access:
lilbee serve # start on localhost:7433
lilbee serve --host 0.0.0.0 --port 8080Endpoints include /api/search, /api/ask, /api/chat (with streaming SSE variants), /api/sync, /api/add, and /api/models. When the server is running, interactive API docs are available at /schema/redoc. See the API reference for the full OpenAPI schema.
Running lilbee or lilbee chat enters an interactive REPL with conversation history, streaming responses, and slash commands:
| Command | Description |
|---|---|
/status |
Show indexed documents and config |
/add [path] |
Add a file or directory (tab-completes paths) |
/model [name] |
Switch chat model — no args opens an interactive picker; with a name, switches directly (tab-completes installed models) |
/vision [name|off] |
Switch vision OCR model — no args opens a picker, off disables (tab-completes catalog models) |
/settings |
Show all current configuration values |
/set <key> <value> |
Change a setting (e.g. /set temperature 0.7) |
/version |
Show lilbee version |
/reset |
Delete all documents and data (asks for confirmation) |
/help |
Show available commands |
/quit |
Exit chat |
Slash commands and paths tab-complete. A spinner shows while waiting for the first token from the LLM.
Text extraction powered by Kreuzberg, code chunking by tree-sitter. Structured formats (XML, JSON, CSV) get embedding-friendly preprocessing. This list is not exhaustive — Kreuzberg supports additional formats beyond what's listed here.
| Format | Extensions | Requires |
|---|---|---|
.pdf |
— | |
| Scanned PDF | .pdf (no extractable text) |
Tesseract (auto, plain text) or Ollama vision model (recommended — preserves tables, headings, and layout as markdown) |
| Office | .docx, .xlsx, .pptx |
— |
| eBook | .epub |
— |
| Images (OCR) | .png, .jpg, .jpeg, .tiff, .bmp, .webp |
Tesseract |
| Data | .csv, .tsv |
— |
| Structured | .xml, .json, .jsonl, .yaml, .yml |
— |
| Text | .md, .txt, .html, .rst |
— |
| Code | .py, .js, .ts, .go, .rs, .java and 150+ more via tree-sitter (AST-aware chunking) |
— |
See the usage guide for OCR setup and model benchmarks.
MIT






