feat(serve): upgrade daily driver to qwen3.6-35b-a3b q6_k_xl
Switch `make serve` default to Qwen3.6 UD Q6_K_XL (32 GB, hybrid DeltaNet, near-lossless) and register it in the model catalog. Add --jinja to the llama-server launcher so tool/function calling works — without it clients silently ignore tool definitions advertised by the server.
This commit is contained in:
@@ -106,6 +106,7 @@ SERVER_ARGS=(
|
||||
-ngl 99 # Full GPU offload
|
||||
--no-mmap # Direct load, no mmap overhead
|
||||
-fa on # Flash attention
|
||||
--jinja # Required for tool calling (clients ignored without it)
|
||||
-m "$TOOLBOX_MODEL_PATH"
|
||||
-c "$CTX_SIZE" # Context size
|
||||
--cache-type-k q4_0 # KV cache quantization (fastest on Vulkan)
|
||||
|
||||
Reference in New Issue
Block a user