2 Commits

Author SHA1 Message Date
675bb21653 Disable model warmup to improve switching speed
- Added --no-warmup flag to both llama3.1 and vision models
- Reduces model switch time by 2-5 seconds per swap
- No impact on response quality, only minor first-token latency
- Better for frequent model switching use case and tight VRAM budget
2025-12-10 10:09:37 +02:00
8c74ad5260 Initial commit: Miku Discord Bot 2025-12-07 17:15:09 +02:00