feat(serve): upgrade daily driver to qwen3.6-35b-a3b q6_k_xl

Switch `make serve` default to Qwen3.6 UD Q6_K_XL (32 GB, hybrid
DeltaNet, near-lossless) and register it in the model catalog. Add
--jinja to the llama-server launcher so tool/function calling works —
without it clients silently ignore tool definitions advertised by the
server.
This commit is contained in:
Felipe Cardoso
2026-04-26 20:06:18 +02:00
parent c847991740
commit 751180fdc1
3 changed files with 4 additions and 2 deletions

View File

@@ -106,6 +106,7 @@ SERVER_ARGS=(
-ngl 99 # Full GPU offload
--no-mmap # Direct load, no mmap overhead
-fa on # Flash attention
--jinja # Required for tool calling (clients ignored without it)
-m "$TOOLBOX_MODEL_PATH"
-c "$CTX_SIZE" # Context size
--cache-type-k q4_0 # KV cache quantization (fastest on Vulkan)