# Vision Model Debugging Guide ## Issue Summary Vision model not working when AMD is set as the primary GPU for text inference. ## Root Cause Analysis The vision model (MiniCPM-V) should **always run on the NVIDIA GPU**, even when AMD is the primary GPU for text models. This is because: 1. **Separate GPU design**: Each GPU has its own llama-swap instance - `llama-swap` (NVIDIA) on port 8090 → handles `vision`, `llama3.1`, `darkidol` - `llama-swap-amd` (AMD) on port 8091 → handles `llama3.1`, `darkidol` (text models only) 2. **Vision model location**: The vision model is **ONLY configured on NVIDIA** - Check: `llama-swap-config.yaml` (has vision model) - Check: `llama-swap-rocm-config.yaml` (does NOT have vision model) ## Fixes Applied ### 1. Improved GPU Routing (`bot/utils/llm.py`) **Function**: `get_vision_gpu_url()` - Now explicitly returns NVIDIA URL regardless of primary text GPU - Added debug logging when text GPU is AMD - Added clear documentation about the routing strategy **New Function**: `check_vision_endpoint_health()` - Pings the NVIDIA vision endpoint before attempting requests - Provides detailed error messages if endpoint is unreachable - Logs health status for troubleshooting ### 2. Enhanced Vision Analysis (`bot/utils/image_handling.py`) **Function**: `analyze_image_with_vision()` - Added health check before processing - Increased timeout to 60 seconds (from default) - Logs endpoint URL, model name, and detailed error messages - Added exception info logging for better debugging **Function**: `analyze_video_with_vision()` - Added health check before processing - Increased timeout to 120 seconds (from default) - Logs media type, frame count, and detailed error messages - Added exception info logging for better debugging ## Testing the Fix ### 1. Verify Docker Containers ```bash # Check both llama-swap services are running docker compose ps # Expected output: # llama-swap (port 8090) # llama-swap-amd (port 8091) ``` ### 2. Test NVIDIA Endpoint Health ```bash # Check if NVIDIA vision endpoint is responsive curl -f http://llama-swap:8080/health # Should return 200 OK ``` ### 3. Test Vision Request to NVIDIA ```bash # Send a simple vision request directly curl -X POST http://llama-swap:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "vision", "messages": [{ "role": "user", "content": [ {"type": "text", "text": "Describe this image."}, {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}} ] }], "max_tokens": 100 }' ``` ### 4. Check GPU State File ```bash # Verify which GPU is primary cat bot/memory/gpu_state.json # Should show: # {"current_gpu": "amd", "reason": "..."} when AMD is primary # {"current_gpu": "nvidia", "reason": "..."} when NVIDIA is primary ``` ### 5. Monitor Logs During Vision Request ```bash # Watch bot logs during image analysis docker compose logs -f miku-bot 2>&1 | grep -i vision # Should see: # "Sending vision request to http://llama-swap:8080" # "Vision analysis completed successfully" # OR detailed error messages if something is wrong ``` ## Troubleshooting Steps ### Issue: Vision endpoint health check fails **Symptoms**: "Vision service currently unavailable: Endpoint timeout" **Solutions**: 1. Verify NVIDIA container is running: `docker compose ps llama-swap` 2. Check NVIDIA GPU memory: `nvidia-smi` (should have free VRAM) 3. Check if vision model is loaded: `docker compose logs llama-swap` 4. Increase timeout if model is loading slowly ### Issue: Vision requests timeout (status 408/504) **Symptoms**: Requests hang or return timeout errors **Solutions**: 1. Check NVIDIA GPU is not overloaded: `nvidia-smi` 2. Check if vision model is already running: Look for MiniCPM processes 3. Restart llama-swap if model is stuck: `docker compose restart llama-swap` 4. Check available VRAM: MiniCPM-V needs ~4-6GB ### Issue: Vision model returns "No description" **Symptoms**: Image analysis returns empty or generic responses **Solutions**: 1. Check if vision model loaded correctly: `docker compose logs llama-swap` 2. Verify model file exists: `/models/MiniCPM-V-4_5-Q3_K_S.gguf` 3. Check if mmproj loaded: `/models/MiniCPM-V-4_5-mmproj-f16.gguf` 4. Test with direct curl to ensure model works ### Issue: AMD GPU affects vision performance **Symptoms**: Vision requests are slower when AMD is primary **Solutions**: 1. This is expected behavior - NVIDIA is still processing vision 2. Could indicate NVIDIA GPU memory pressure 3. Monitor both GPUs: `rocm-smi` (AMD) and `nvidia-smi` (NVIDIA) ## Architecture Diagram ``` ┌─────────────────────────────────────────────────────────────┐ │ Miku Bot │ │ │ │ Discord Messages with Images/Videos │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌──────────────────────────────┐ │ Vision Analysis Handler │ │ (image_handling.py) │ │ │ │ 1. Check NVIDIA health │ │ 2. Send to NVIDIA vision │ └──────────────────────────────┘ │ ▼ ┌──────────────────────────────┐ │ NVIDIA GPU (llama-swap) │ │ Port: 8090 │ │ │ │ Available Models: │ │ • vision (MiniCPM-V) │ │ • llama3.1 │ │ • darkidol │ └──────────────────────────────┘ │ ┌───────────┴────────────┐ │ │ ▼ (Vision only) ▼ (Text only in dual-GPU mode) NVIDIA GPU AMD GPU (llama-swap-amd) Port: 8091 Available Models: • llama3.1 • darkidol (NO vision model) ``` ## Key Files Changed 1. **bot/utils/llm.py** - Enhanced `get_vision_gpu_url()` with documentation - Added `check_vision_endpoint_health()` function 2. **bot/utils/image_handling.py** - `analyze_image_with_vision()` - added health check and logging - `analyze_video_with_vision()` - added health check and logging ## Expected Behavior After Fix ### When NVIDIA is Primary (default) ``` Image received → Check NVIDIA health: OK → Send to NVIDIA vision model → Analysis complete ✓ Works as before ``` ### When AMD is Primary (voice session active) ``` Image received → Check NVIDIA health: OK → Send to NVIDIA vision model (even though text uses AMD) → Analysis complete ✓ Vision now works correctly! ``` ## Next Steps if Issues Persist 1. Enable debug logging: Set `AUTONOMOUS_DEBUG=true` in docker-compose 2. Check Docker networking: `docker network inspect miku-discord_default` 3. Verify environment variables: `docker compose exec miku-bot env | grep LLAMA` 4. Check model file integrity: `ls -lah models/MiniCPM*` 5. Review llama-swap logs: `docker compose logs llama-swap -n 100`