miku-discord/llama-swap-config.yaml at master

Files

koko210Serve 675bb21653 Disable model warmup to improve switching speed

- Added --no-warmup flag to both llama3.1 and vision models
- Reduces model switch time by 2-5 seconds per swap
- No impact on response quality, only minor first-token latency
- Better for frequent model switching use case and tight VRAM budget

2025-12-10 10:09:37 +02:00

993 B

Raw Permalink Blame History

View Raw

993 B Raw Permalink Blame History

993 B

Raw Permalink Blame History