Files
miku-discord/docker-compose.yml
koko210Serve 9eb081efb1 llama-swap: use pre-built images (:cuda, :rocm) with GPU-specific flags
- Drop custom Dockerfiles; docker-compose uses ghcr.io pre-built images
  which ship llama-swap + llama-server with no pinned versions (always latest)
- NVIDIA GTX 1660 (6GB): add -fit off --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0
  to fix OOM segfault with new llama.cpp b9014's GPU-side KV cache default
- AMD RX 6800 (16GB): flags unchanged; KV cache stays on GPU for max speed
- Both running llama-swap v211 + llama.cpp b9014 (2026-05-05)
2026-05-05 16:53:34 +03:00

6.5 KiB