CogniDocs is a Model Context Protocol (MCP) server that provides AI assistants with the ability to search and query documentation. Now supports flexible backend configurations to meet different privacy and infrastructure requirements.
This version introduces a complete backend abstraction layer that allows you to choose your preferred technology stack:
- ChromaDB - Open-source vector database
- Xenova/Transformers - Local, privacy-focused embeddings
- Transformers.js (@huggingface/transformers) - Official HF JS runtime (WASM by default on server)
We now use a plugin-style provider registry with auto-registration. Configuration no longer references specific providers in the schema; instead you specify:
# Storage
STORAGE_NAME=chroma
STORAGE_OPTIONS={"url":"http://localhost:8000"}
# Embeddings
EMBEDDINGS_NAME=xenova
EMBEDDINGS_OPTIONS={"model":"Xenova/all-MiniLM-L6-v2","maxBatchSize":50}
# Alternative (Transformers.js)
# EMBEDDINGS_NAME=transformersjs
# EMBEDDINGS_OPTIONS={"model":"Xenova/all-MiniLM-L6-v2","device":"wasm","pooling":"mean","normalize":true,"maxBatchSize":50}Notes:
- Providers self-register via
app/*/providers/index.tsside-effect imports (e.g.,app/embeddings/providers/index.ts,app/storage/providers/index.ts,app/chunking/providers/index.ts). - Adding a provider is as simple as adding a new file that calls
register*Provider(). - Old variables like
STORAGE_PROVIDER,EMBEDDING_PROVIDER,CHROMA_URL,XENOVA_MODELare supported for backward-compat in parsing, but are deprecated.
- Default chunker is LangChain with the recursive strategy.
- Recommended defaults:
chunkSize=3000,chunkOverlap=150. - Configure via
CHUNKING_NAME=langchainandCHUNKING_OPTIONS={"strategy":"recursive","chunkSize":3000,"chunkOverlap":150}. - Additional strategies in the LangChain provider:
intelligent: content-typeβaware splitting (adapts separators/size for code, markdown, html, etc.).semantic: initial split + adjacent-merge when cosine similarity of embeddings is above a threshold.
- The Chonkie provider normalizes outputs to strings so
Chunk.textis always a string.
Agent-guided chunking and annotation can dramatically improve search quality for large, multi-topic docs by aligning chunks to topic boundaries and enriching them with metadata (topic tags, section headings, code language, entities, summaries, and quality scores). This is designed to be an optional, provider-agnostic stage at ingestion time.
Learn more: see docs/agentic-processing.md.
---
config:
layout: dagre
theme: redux
look: neo
---
flowchart LR
subgraph subGraph0["Storage Providers"]
chroma["ChromaDB"]
storage["Storage Layer"]
end
subgraph subGraph1["Embedding Providers"]
xenova["Xenova"]
embeddings["Embedding Layer"]
end
subgraph subGraph2["Chunking Providers"]
langchain["LangChain (default)"]
chunking["Chunking Layer"]
chonkie["Chonkie"]
builtin["Builtin"]
end
client["MCP Client (Claude)"] --- upload["HTTP Upload Server"]
web["Web UI (Optional)"] --- upload
upload --> abstractions["Provider Abstractions\n(Storage / Embeddings / Chunking)"]
abstractions --> storage & embeddings & chunking
storage --> chroma
embeddings --> xenova
chunking --> langchain & chonkie & builtin
# Clone and install
git clone <repository>
cd cogni-docs
bun install
# Configure for local processing
cp .env.example .env
# Edit .env with provider-agnostic config:
STORAGE_NAME=chroma
STORAGE_OPTIONS={"url":"http://localhost:8000"}
EMBEDDINGS_NAME=xenova
EMBEDDINGS_OPTIONS={"model":"Xenova/all-MiniLM-L6-v2","maxBatchSize":50}
# Chunking (default: LangChain recursive)
CHUNKING_NAME=langchain
CHUNKING_OPTIONS={"strategy":"recursive","chunkSize":3000,"chunkOverlap":150}
# Start servers
# Start server (Upload + MCP on the same port)
bun run upload-server:prod # Default :3001 (set HTTP_PORT). Use :dev for watch mode# Start ChromaDB
docker run -p 8000:8000 chromadb/chroma
# Configure app
STORAGE_NAME=chroma
STORAGE_OPTIONS={"url":"http://localhost:8000"}
EMBEDDINGS_NAME=xenova
EMBEDDINGS_OPTIONS={"model":"Xenova/all-MiniLM-L6-v2"}
# Start app
bun run upload-server:prod| Variable | Type | Description |
|---|---|---|
HTTP_PORT |
number | Upload server port (default: 3001 in examples, config default 8787) |
STORAGE_NAME |
string | Storage provider name (e.g., chroma) |
STORAGE_OPTIONS |
JSON | Provider-specific options as JSON (e.g., { "url": "http://localhost:8000" } or { "projectId": "..." }) |
EMBEDDINGS_NAME |
string | Embeddings provider name (e.g., xenova) |
EMBEDDINGS_OPTIONS |
JSON | Provider-specific options as JSON (e.g., { "model": "Xenova/all-MiniLM-L6-v2" }) |
CHUNKING_NAME |
string | Chunking provider name: langchain (default), chonkie, or builtin |
CHUNKING_OPTIONS |
JSON | Provider-specific chunking options as JSON (e.g., { "strategy": "recursive", "chunkSize": 3000, "chunkOverlap": 150 }) |
CHUNK_SIZE |
number | Back-compat: target chunk size (default: 3000) |
CHUNK_OVERLAP |
number | Back-compat: overlap between chunks (default: 150) |
MAX_CHUNK_SIZE |
number | Back-compat: hard cap for chunk size (default: 5000) |
See .env.example for complete configuration options.
recursive(default):CHUNKING_NAME=langchain CHUNKING_OPTIONS={"strategy":"recursive","chunkSize":3000,"chunkOverlap":150}intelligent(content-type aware: code/markdown/html get tuned separators & sizes):CHUNKING_NAME=langchain CHUNKING_OPTIONS={"strategy":"intelligent","chunkSize":3000,"chunkOverlap":150,"contentTypeAware":true}semantic(adjacent merge by embedding similarity):Notes:CHUNKING_NAME=langchain CHUNKING_OPTIONS={ "strategy":"semantic", "chunkSize":3000, "chunkOverlap":150, "contentTypeAware":true, "semanticSimilarityThreshold":0.9, "semanticMaxMergeChars":6000, "semanticBatchSize":64 }- Tweak
semanticSimilarityThreshold(typ. 0.85β0.92) per corpus. - If embeddings are unavailable, the provider should fall back to the initial split (no merges).
- Tweak
Deprecated (still parsed for backward compatibility): STORAGE_PROVIDER, EMBEDDING_PROVIDER, CHROMA_URL, XENOVA_MODEL, MAX_BATCH_SIZE, UPLOAD_SERVER_PORT, UPLOAD_SERVER_HOST.
| Feature | ChromaDB + Xenova |
|---|---|
| Privacy | β Self-hosted |
| Performance | β Good |
| Scalability | β High |
| Setup Complexity | |
| Cost | π° Infrastructure |
| Offline Support |
β ChromaDB + Xenova
- Automatic scaling
- Enterprise security
- Managed infrastructure
β ChromaDB + Xenova
- No external cloud dependencies
- Complete data control
- Works in air-gapped environments
β ChromaDB + Xenova
- Easy experimentation
- Good performance
- Flexible deployment
app/
βββ index.ts # Starts HTTP Upload + MCP server
βββ config/
β βββ app-config.ts # Zod-validated, provider-agnostic config
βββ chunking/ # Chunking interface, factory, and providers
β βββ chunker-interface.ts
β βββ chunking-factory.ts
β βββ providers/ # Providers: langchain (default), chonkie, builtin
βββ storage/
β βββ storage-interface.ts # Storage interface
β βββ chroma-storage.ts # ChromaDB implementation
β βββ storage-factory.ts # Provider registry + factory
βββ embeddings/
β βββ embedding-interface.ts # Embeddings interface
β βββ embedding-factory.ts # Provider registry + factory
β βββ providers/ # Embedding providers (e.g., Xenova)
βββ server/
β βββ mcp-server.ts # MCP tools + SSE transport (/sse, /messages)
βββ ingest/
β βββ chunker.ts # Ingestion entrypoint; uses chunking service
βββ parsers/
βββ pdf.ts # PDF parser
βββ html.ts # HTML parser
βββ text.ts # Plain text parser
GET /health- Service health check with provider statusGET /sets- List documentation setsPOST /sets- Create documentation setGET /sets/:setId- Get specific setGET /sets/:setId/documents- List documents in setPOST /sets/:setId/upload- Upload documentsDELETE /sets/:setId/documents/:docId- Delete document
- Transport:
GET /sse(event stream),POST /messages(JSON messages) - Tools:
list_documentation_sets- List available setsget_documentation_set- Get details about a specific setsearch_documentation- Vector search within a setagentic_search- Extractive, context-grounded answers
# Install dependencies
bun install
# Development with file watching
bun run upload-server:dev # Upload+MCP server with hot reload
bun run web:dev # Web UI development server
# Type checking
bun run typecheck
# Build for production
bun run web:buildCheck service status:
curl http://localhost:3001/healthResponse includes:
- Overall service health
- Storage provider status
- Embedding provider status
- System uptime
The flexible backend architecture makes it easy to add new providers:
- Storage Provider: Implement
StorageServiceinterface - Embedding Provider: Implement
EmbeddingServiceinterface - Update Factories: Add to respective factory files
- Configuration: Add options to config schema
MIT License - see LICENSE file for details.
A Model Context Protocol (MCP) server that provides AI assistants with the ability to search and query documentation using local-first, provider-agnostic backend.
This project implements a dual-server architecture:
- HTTP Upload Server - For document ingestion and management
- MCP Server - For AI assistants to query documentation
- Multi-format parsing: PDF, HTML, and plain text documents
- Agentic search: Extractive answers grounded in your documentation via MCP tools
- Multi-tenant: Multiple documentation sets with isolated search
- Modern stack: Bun runtime, TypeScript, Elysia framework
- Bun runtime installed
- Docker (optional) for ChromaDB
- Clone and install dependencies:
bun install- Configure environment:
cp .env.example .env
# Edit .env with your provider-agnostic settings- Start the upload server:
bun run upload-server- In another terminal, start the MCP server:
bun run mcp-serverCreate a documentation set and upload files:
# Create a documentation set
curl -X POST http://localhost:3001/sets \
-H "Content-Type: application/json" \
-d '{"name": "My API Docs", "description": "REST API documentation"}'
# Upload documents (PDF, HTML, TXT)
curl -X POST http://localhost:3001/sets/{SET_ID}/upload \
-F "[email protected]" \
-F "[email protected]"The MCP server exposes four tools:
list_documentation_sets- List all available documentation setsget_documentation_set- Get details about a specific setsearch_documentation- Basic vector search within a setagentic_search- Agentic, context-grounded answers from your docs
// In Claude or another MCP-compatible AI assistant
await mcp.callTool("agentic_search", {
setId: "your-set-id",
query: "How do I authenticate API requests?",
limit: 10,
});# Core
HTTP_PORT=3001
# Provider-agnostic
STORAGE_NAME=chroma
STORAGE_OPTIONS={"url":"http://localhost:8000"}
EMBEDDINGS_NAME=xenova
EMBEDDINGS_OPTIONS={"model":"Xenova/all-MiniLM-L6-v2","maxBatchSize":50}
# Chunking
CHUNKING_NAME=langchain
CHUNKING_OPTIONS={"strategy":"recursive","chunkSize":3000,"chunkOverlap":150}
CHUNK_SIZE=3000
CHUNK_OVERLAP=150
MAX_CHUNK_SIZE=5000bun run upload-server:dev # Hot reload Upload+MCP server
bun run upload-server:prod # Production Upload+MCP server
bun run web:dev # Web UI dev
bun run typecheck # Type checking- Create parser in
app/parsers/ - Register/route the MIME type alongside existing parsers
- Ensure chunking strategy in
app/ingest/chunker.tssuits the new type
- Performance: Fast startup and runtime
- TypeScript native: No compilation step needed
- Modern toolchain: Built-in testing, bundling, package management
- Prefer
bun run upload-server:prod(non-watch) for stability. - Ensure your MCP client uses
GET /sse(not POST) andPOST /messages. - If the IDE session gets stale, reload the MCP client to re-handshake.
- Verify Chroma is running and
STORAGE_OPTIONS={"url":"http://localhost:8000"}. - Check
GET /healthfor storage status; restart Chroma if down.
- Xenova/Transformers.js models download on first run; allow network access once if needed.
- Adjust
EMBEDDINGS_OPTIONS(e.g.,maxBatchSize) if you see memory warnings. - If you change model/provider, embedding dimensions may differ. Use a fresh collection or reingest to avoid mixing dimensions.
- Support for more document formats (DOCX, Markdown)
- Document metadata search and filtering
- Batch upload improvements
- Vector search optimization
- Authentication for upload server
- Metrics and monitoring
This project follows the user's coding guidelines:
- TypeScript with proper typing
- Functional programming patterns
- Modular architecture
- Comprehensive error handling
MIT
To install dependencies:
bun installTo run:
bun run upload-server:prodThis project was created using bun init in bun v1.2.20. Bun is a fast all-in-one JavaScript runtime.