feat(serve): upgrade daily driver to qwen3.6-35b-a3b q6_k_xl

Switch `make serve` default to Qwen3.6 UD Q6_K_XL (32 GB, hybrid DeltaNet, near-lossless) and register it in the model catalog. Add --jinja to the llama-server launcher so tool/function calling works — without it clients silently ignore tool definitions advertised by the server.
2026-04-26 20:06:18 +02:00
parent c847991740
commit 751180fdc1
3 changed files with 4 additions and 2 deletions
--- a/scripts/serve/launch.sh
+++ b/scripts/serve/launch.sh
@@ -106,6 +106,7 @@ SERVER_ARGS=(
    -ngl 99                          # Full GPU offload
    --no-mmap                        # Direct load, no mmap overhead
    -fa on                            # Flash attention
+    --jinja                          # Required for tool calling (clients ignored without it)
    -m "$TOOLBOX_MODEL_PATH"
    -c "$CTX_SIZE"                   # Context size
    --cache-type-k q4_0              # KV cache quantization (fastest on Vulkan)