- Added --no-warmup flag to both llama3.1 and vision models - Reduces model switch time by 2-5 seconds per swap - No impact on response quality, only minor first-token latency - Better for frequent model switching use case and tight VRAM budget