Files
miku-discord/llama-swap-config.yaml
koko210Serve 675bb21653 Disable model warmup to improve switching speed
- Added --no-warmup flag to both llama3.1 and vision models
- Reduces model switch time by 2-5 seconds per swap
- No impact on response quality, only minor first-token latency
- Better for frequent model switching use case and tight VRAM budget
2025-12-10 10:09:37 +02:00

993 B