moved AI generated readmes to readme folder (may delete)

2026-01-27 19:57:48 +02:00
parent 0f1c30f757
commit c58b941587
34 changed files with 8709 additions and 770 deletions
--- a/readmes/VISION_MODEL_DEBUG.md
+++ b/readmes/VISION_MODEL_DEBUG.md
@@ -0,0 +1,228 @@
+# Vision Model Debugging Guide
+
+## Issue Summary
+Vision model not working when AMD is set as the primary GPU for text inference.
+
+## Root Cause Analysis
+
+The vision model (MiniCPM-V) should **always run on the NVIDIA GPU**, even when AMD is the primary GPU for text models. This is because:
+
+1. **Separate GPU design**: Each GPU has its own llama-swap instance
+   - `llama-swap` (NVIDIA) on port 8090 → handles `vision`, `llama3.1`, `darkidol`
+   - `llama-swap-amd` (AMD) on port 8091 → handles `llama3.1`, `darkidol` (text models only)
+
+2. **Vision model location**: The vision model is **ONLY configured on NVIDIA**
+   - Check: `llama-swap-config.yaml` (has vision model)
+   - Check: `llama-swap-rocm-config.yaml` (does NOT have vision model)
+
+## Fixes Applied
+
+### 1. Improved GPU Routing (`bot/utils/llm.py`)
+
+**Function**: `get_vision_gpu_url()`
+- Now explicitly returns NVIDIA URL regardless of primary text GPU
+- Added debug logging when text GPU is AMD
+- Added clear documentation about the routing strategy
+
+**New Function**: `check_vision_endpoint_health()`
+- Pings the NVIDIA vision endpoint before attempting requests
+- Provides detailed error messages if endpoint is unreachable
+- Logs health status for troubleshooting
+
+### 2. Enhanced Vision Analysis (`bot/utils/image_handling.py`)
+
+**Function**: `analyze_image_with_vision()`
+- Added health check before processing
+- Increased timeout to 60 seconds (from default)
+- Logs endpoint URL, model name, and detailed error messages
+- Added exception info logging for better debugging
+
+**Function**: `analyze_video_with_vision()`
+- Added health check before processing
+- Increased timeout to 120 seconds (from default)
+- Logs media type, frame count, and detailed error messages
+- Added exception info logging for better debugging
+
+## Testing the Fix
+
+### 1. Verify Docker Containers
+
+```bash
+# Check both llama-swap services are running
+docker compose ps
+
+# Expected output:
+# llama-swap      (port 8090)
+# llama-swap-amd  (port 8091)
+```
+
+### 2. Test NVIDIA Endpoint Health
+
+```bash
+# Check if NVIDIA vision endpoint is responsive
+curl -f http://llama-swap:8080/health
+
+# Should return 200 OK
+```
+
+### 3. Test Vision Request to NVIDIA
+
+```bash
+# Send a simple vision request directly
+curl -X POST http://llama-swap:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "vision",
+    "messages": [{
+      "role": "user",
+      "content": [
+        {"type": "text", "text": "Describe this image."},
+        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
+      ]
+    }],
+    "max_tokens": 100
+  }'
+```
+
+### 4. Check GPU State File
+
+```bash
+# Verify which GPU is primary
+cat bot/memory/gpu_state.json
+
+# Should show:
+# {"current_gpu": "amd", "reason": "..."} when AMD is primary
+# {"current_gpu": "nvidia", "reason": "..."} when NVIDIA is primary
+```
+
+### 5. Monitor Logs During Vision Request
+
+```bash
+# Watch bot logs during image analysis
+docker compose logs -f miku-bot 2>&1 | grep -i vision
+
+# Should see:
+# "Sending vision request to http://llama-swap:8080"
+# "Vision analysis completed successfully"
+# OR detailed error messages if something is wrong
+```
+
+## Troubleshooting Steps
+
+### Issue: Vision endpoint health check fails
+
+**Symptoms**: "Vision service currently unavailable: Endpoint timeout"
+
+**Solutions**:
+1. Verify NVIDIA container is running: `docker compose ps llama-swap`
+2. Check NVIDIA GPU memory: `nvidia-smi` (should have free VRAM)
+3. Check if vision model is loaded: `docker compose logs llama-swap`
+4. Increase timeout if model is loading slowly
+
+### Issue: Vision requests timeout (status 408/504)
+
+**Symptoms**: Requests hang or return timeout errors
+
+**Solutions**:
+1. Check NVIDIA GPU is not overloaded: `nvidia-smi`
+2. Check if vision model is already running: Look for MiniCPM processes
+3. Restart llama-swap if model is stuck: `docker compose restart llama-swap`
+4. Check available VRAM: MiniCPM-V needs ~4-6GB
+
+### Issue: Vision model returns "No description"
+
+**Symptoms**: Image analysis returns empty or generic responses
+
+**Solutions**:
+1. Check if vision model loaded correctly: `docker compose logs llama-swap`
+2. Verify model file exists: `/models/MiniCPM-V-4_5-Q3_K_S.gguf`
+3. Check if mmproj loaded: `/models/MiniCPM-V-4_5-mmproj-f16.gguf`
+4. Test with direct curl to ensure model works
+
+### Issue: AMD GPU affects vision performance
+
+**Symptoms**: Vision requests are slower when AMD is primary
+
+**Solutions**:
+1. This is expected behavior - NVIDIA is still processing vision
+2. Could indicate NVIDIA GPU memory pressure
+3. Monitor both GPUs: `rocm-smi` (AMD) and `nvidia-smi` (NVIDIA)
+
+## Architecture Diagram
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                         Miku Bot                            │
+│                                                             │
+│  Discord Messages with Images/Videos                       │
+└─────────────────────────────────────────────────────────────┘
+                    │
+                    ▼
+        ┌──────────────────────────────┐
+        │  Vision Analysis Handler     │
+        │  (image_handling.py)         │
+        │                              │
+        │ 1. Check NVIDIA health       │
+        │ 2. Send to NVIDIA vision     │
+        └──────────────────────────────┘
+                    │
+                    ▼
+        ┌──────────────────────────────┐
+        │    NVIDIA GPU (llama-swap)   │
+        │    Port: 8090                │
+        │                              │
+        │  Available Models:           │
+        │  • vision (MiniCPM-V)        │
+        │  • llama3.1                  │
+        │  • darkidol                  │
+        └──────────────────────────────┘
+                    │
+        ┌───────────┴────────────┐
+        │                        │
+        ▼ (Vision only)         ▼ (Text only in dual-GPU mode)
+    NVIDIA GPU          AMD GPU (llama-swap-amd)
+                        Port: 8091
+                        
+                        Available Models:
+                        • llama3.1
+                        • darkidol
+                        (NO vision model)
+```
+
+## Key Files Changed
+
+1. **bot/utils/llm.py**
+   - Enhanced `get_vision_gpu_url()` with documentation
+   - Added `check_vision_endpoint_health()` function
+
+2. **bot/utils/image_handling.py**
+   - `analyze_image_with_vision()` - added health check and logging
+   - `analyze_video_with_vision()` - added health check and logging
+
+## Expected Behavior After Fix
+
+### When NVIDIA is Primary (default)
+```
+Image received
+→ Check NVIDIA health: OK
+→ Send to NVIDIA vision model
+→ Analysis complete
+✓ Works as before
+```
+
+### When AMD is Primary (voice session active)
+```
+Image received
+→ Check NVIDIA health: OK
+→ Send to NVIDIA vision model (even though text uses AMD)
+→ Analysis complete
+✓ Vision now works correctly!
+```
+
+## Next Steps if Issues Persist
+
+1. Enable debug logging: Set `AUTONOMOUS_DEBUG=true` in docker-compose
+2. Check Docker networking: `docker network inspect miku-discord_default`
+3. Verify environment variables: `docker compose exec miku-bot env | grep LLAMA`
+4. Check model file integrity: `ls -lah models/MiniCPM*`
+5. Review llama-swap logs: `docker compose logs llama-swap -n 100`