Skip to content
Merged

V2 #206

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
354156c
Rewrite as single-crate v2 architecture with embedded inference
andthattoo Mar 2, 2026
e2c484e
Extract dkn-protocol crate, add multi-model worker, reconnect, and ch…
andthattoo Mar 2, 2026
68f5b45
Update model registry and add modality (Text/Vision/Audio) validation
andthattoo Mar 2, 2026
54fcc29
updated registry
andthattoo Mar 3, 2026
d8fce10
Add --quant CLI flag to override default GGUF quantization
andthattoo Mar 3, 2026
0491fa4
Add ALPN protocol negotiation for QUIC and suppress clippy too_many_a…
andthattoo Mar 3, 2026
68d1bc7
gracefull shutdown after reconnect
andthattoo Mar 3, 2026
b39c75d
Add SSE streaming support for inference tasks
andthattoo Mar 3, 2026
96159f1
Use GGUF-embedded chat templates instead of hardcoded ones
andthattoo Mar 3, 2026
50ade49
Add multimodal (vision/audio) inference support
andthattoo Mar 3, 2026
ef970bb
Add vision integration test for multimodal inference
andthattoo Mar 3, 2026
2995a97
Mark Qwen 3.5 models as Vision with mmproj files
andthattoo Mar 3, 2026
3f31ea4
Add interactive setup command, Homebrew tap, and v1.0.0-alpha.1 prep
andthattoo Mar 4, 2026
667041e
Remove obsolete Docker image workflows
andthattoo Mar 4, 2026
8fce699
Fix Linux musl builds: use rustls-tls for hf-hub, add cmake
andthattoo Mar 4, 2026
60590ae
Switch Linux targets from musl to gnu (llama.cpp needs C++ compiler)
andthattoo Mar 4, 2026
bc04372
Use native ARM64 runner for Linux arm64 build
andthattoo Mar 4, 2026
91cbd3a
Fix ARM64 Linux build: use cargo directly instead of cross
andthattoo Mar 4, 2026
123d551
Add install script
andthattoo Mar 4, 2026
c47af32
Add Windows install script
andthattoo Mar 4, 2026
9576a1e
Print banner after model load
andthattoo Mar 4, 2026
0af8ef6
Set macOS deployment target to 14.0 for Metal compatibility
andthattoo Mar 4, 2026
e09ed5f
Bump version to 1.0.0-alpha.2
andthattoo Mar 4, 2026
195407f
Update README with install scripts and setup details
andthattoo Mar 5, 2026
db7954c
Version 0.7.1: interactive setup TUI with quant selection
andthattoo Mar 5, 2026
5890067
Fix clippy warnings
andthattoo Mar 5, 2026
4f5ff7f
Bump version to 0.7.2
andthattoo Mar 5, 2026
961e68e
Add workflow_dispatch to releases workflow
andthattoo Mar 5, 2026
005b441
Update Cargo.lock
andthattoo Mar 5, 2026
6b25f64
Remove --locked from release builds (git dep breaks it)
andthattoo Mar 5, 2026
a9274f6
Quote RELEASE_TAG expression to fix YAML syntax
andthattoo Mar 5, 2026
90c5b72
Remove check_release job (fixes YAML validation on push)
andthattoo Mar 5, 2026
29d04a4
Add auto-update mechanism for v2 compute node
andthattoo Mar 8, 2026
d2242e1
Add prefill-only validation and stride-based logprob extraction
andthattoo Mar 8, 2026
94e90b5
Fix validate_prefill: use output index for get_logits_ith
andthattoo Mar 8, 2026
fa3b793
Fix logprob extraction: always use output index 0 in generation loop
andthattoo Mar 8, 2026
9d3a592
Fix get_logits_ith: use batch index in both generation and validation…
andthattoo Mar 8, 2026
42705ae
Switch dkn-protocol from path to git dependency
andthattoo Mar 9, 2026
6064734
Add structured output support (response_format) and fix double-accept…
andthattoo Mar 9, 2026
c181ade
Fix multimodal inference crash: use -1 sentinel for logits index in s…
andthattoo Mar 10, 2026
2c987ee
Add qwen3.5:0.8b, qwen3.5:2b, nemotron:30b-a3b to model registry
andthattoo Mar 12, 2026
48bf710
Add size estimates for new models in setup mode
andthattoo Mar 12, 2026
8cc2ce3
Show multiple capability tags in setup model list
andthattoo Mar 12, 2026
c33aaac
Add CUDA and ROCm GPU build targets to release workflow
andthattoo Mar 12, 2026
bcea3e8
Add TESTER_GUIDE.md to gitignore
andthattoo Mar 12, 2026
7aa7e25
Add dynamic context sizing, error propagation, and pre-flight rejection
andthattoo Mar 12, 2026
455b282
Size KV cache to request needs instead of full model context
andthattoo Mar 12, 2026
36a8579
Default KV cache to Q8_0 and add --kv-quant flag
andthattoo Mar 12, 2026
35a87fa
Fix multi-model loading: init llama backend once via OnceLock
andthattoo Mar 13, 2026
3da746c
Fix clippy: remove needless borrows on static backend ref
andthattoo Mar 13, 2026
ff5eef8
Fix Windows RAM detection and auto-recover corrupt cached models
andthattoo Mar 13, 2026
9349083
Fix quant substitution for dot-separated GGUF filenames
andthattoo Mar 13, 2026
07ba8aa
Fix mmproj cache collision and add LLVM to Windows guide
andthattoo Mar 13, 2026
191c79e
Remove legacy mmproj fallback to prevent cross-model collision
andthattoo Mar 14, 2026
e10f4ba
Bump CI CUDA toolkit to 12.8.0 for Blackwell GPU support
andthattoo Mar 15, 2026
12aef36
Fix crash when prompt exceeds n_batch (2048 tokens)
andthattoo Mar 16, 2026
e440658
Bump version to 0.7.3 and target master branch in CI
andthattoo Mar 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 0 additions & 12 deletions .dockerignore

This file was deleted.

35 changes: 0 additions & 35 deletions .env.example

This file was deleted.

70 changes: 0 additions & 70 deletions .github/workflows/image-dev.yml

This file was deleted.

62 changes: 0 additions & 62 deletions .github/workflows/image-release.yml

This file was deleted.

Loading
Loading