moved AI generated readmes to readme folder (may delete)

2026-01-27 19:57:48 +02:00
parent 0f1c30f757
commit c58b941587
34 changed files with 8709 additions and 770 deletions
--- a/readmes/VISION_TROUBLESHOOTING.md
+++ b/readmes/VISION_TROUBLESHOOTING.md
@@ -0,0 +1,330 @@
+# Vision Model Troubleshooting Checklist
+
+## Quick Diagnostics
+
+### 1. Verify Both GPU Services Running
+
+```bash
+# Check container status
+docker compose ps
+
+# Should show both RUNNING:
+# llama-swap      (NVIDIA CUDA)
+# llama-swap-amd  (AMD ROCm)
+```
+
+**If llama-swap is not running:**
+```bash
+docker compose up -d llama-swap
+docker compose logs llama-swap
+```
+
+**If llama-swap-amd is not running:**
+```bash
+docker compose up -d llama-swap-amd
+docker compose logs llama-swap-amd
+```
+
+### 2. Check NVIDIA Vision Endpoint Health
+
+```bash
+# Test NVIDIA endpoint directly
+curl -v http://llama-swap:8080/health
+
+# Expected: 200 OK
+
+# If timeout (no response for 5+ seconds):
+# - NVIDIA GPU might not have enough VRAM
+# - Model might be stuck loading
+# - Docker network might be misconfigured
+```
+
+### 3. Check Current GPU State
+
+```bash
+# See which GPU is set as primary
+cat bot/memory/gpu_state.json
+
+# Expected output:
+# {"current_gpu": "amd", "reason": "voice_session"}
+# or
+# {"current_gpu": "nvidia", "reason": "auto_switch"}
+```
+
+### 4. Verify Model Files Exist
+
+```bash
+# Check vision model files on disk
+ls -lh models/MiniCPM*
+
+# Should show both:
+# -rw-r--r-- ... MiniCPM-V-4_5-Q3_K_S.gguf (main model, ~3.3GB)
+# -rw-r--r-- ... MiniCPM-V-4_5-mmproj-f16.gguf (projection, ~500MB)
+```
+
+## Scenario-Based Troubleshooting
+
+### Scenario 1: Vision Works When NVIDIA is Primary, Fails When AMD is Primary
+
+**Diagnosis:** NVIDIA GPU is getting unloaded when AMD is primary
+
+**Root Cause:** llama-swap is configured to unload unused models
+
+**Solution:**
+```yaml
+# In llama-swap-config.yaml, reduce TTL for vision model:
+vision:
+  ttl: 3600  # Increase from 900 to keep vision model loaded longer
+```
+
+**Or:**
+```yaml
+# Disable TTL for vision to keep it always loaded:
+vision:
+  ttl: 0  # 0 means never auto-unload
+```
+
+### Scenario 2: "Vision service currently unavailable: Endpoint timeout"
+
+**Diagnosis:** NVIDIA endpoint not responding within 5 seconds
+
+**Causes:**
+1. NVIDIA GPU out of memory
+2. Vision model stuck loading
+3. Network latency
+
+**Solutions:**
+
+```bash
+# Check NVIDIA GPU memory
+nvidia-smi
+
+# If memory is full, restart NVIDIA container
+docker compose restart llama-swap
+
+# Wait for model to load (check logs)
+docker compose logs llama-swap -f
+
+# Should see: "model loaded" message
+```
+
+**If persistent:** Increase health check timeout in `bot/utils/llm.py`:
+```python
+# Change from 5 to 10 seconds
+async with session.get(f"{vision_url}/health", timeout=aiohttp.ClientTimeout(total=10)) as response:
+```
+
+### Scenario 3: Vision Model Returns Empty Description
+
+**Diagnosis:** Model loaded but not processing correctly
+
+**Causes:**
+1. Model corruption
+2. Insufficient input validation
+3. Model inference error
+
+**Solutions:**
+
+```bash
+# Test vision model directly
+curl -X POST http://llama-swap:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "vision",
+    "messages": [{
+      "role": "user",
+      "content": [
+        {"type": "text", "text": "What is this?"},
+        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQSkZJ..."}}
+      ]
+    }],
+    "max_tokens": 100
+  }'
+
+# If returns empty, check llama-swap logs for errors
+docker compose logs llama-swap -n 50
+```
+
+### Scenario 4: "Error 503 Service Unavailable"
+
+**Diagnosis:** llama-swap process crashed or model failed to load
+
+**Solutions:**
+
+```bash
+# Check llama-swap container status
+docker compose logs llama-swap -n 100
+
+# Look for error messages, stack traces
+
+# Restart the service
+docker compose restart llama-swap
+
+# Monitor startup
+docker compose logs llama-swap -f
+```
+
+### Scenario 5: Slow Vision Analysis When AMD is Primary
+
+**Diagnosis:** Both GPUs under load, NVIDIA performance degraded
+
+**Expected Behavior:** This is normal. Both GPUs are working simultaneously.
+
+**If Unacceptably Slow:**
+1. Check if text requests are blocking vision requests
+2. Verify GPU memory allocation
+3. Consider processing images sequentially instead of parallel
+
+## Log Analysis Tips
+
+### Enable Detailed Vision Logging
+
+```bash
+# Watch only vision-related logs
+docker compose logs miku-bot -f 2>&1 | grep -i vision
+
+# Watch with timestamps
+docker compose logs miku-bot -f 2>&1 | grep -i vision | grep -E "ERROR|WARNING|INFO"
+```
+
+### Check GPU Health During Vision Request
+
+In one terminal:
+```bash
+# Monitor NVIDIA GPU while processing
+watch -n 1 nvidia-smi
+```
+
+In another:
+```bash
+# Send image to bot that triggers vision
+# Then watch GPU usage spike in first terminal
+```
+
+### Monitor Both GPUs Simultaneously
+
+```bash
+# Terminal 1: NVIDIA
+watch -n 1 nvidia-smi
+
+# Terminal 2: AMD
+watch -n 1 rocm-smi
+
+# Terminal 3: Logs
+docker compose logs miku-bot -f 2>&1 | grep -E "ERROR|vision"
+```
+
+## Emergency Fixes
+
+### If Vision Completely Broken
+
+```bash
+# Full restart of all GPU services
+docker compose down
+docker compose up -d llama-swap llama-swap-amd
+docker compose restart miku-bot
+
+# Wait for services to start (30-60 seconds)
+sleep 30
+
+# Test health
+curl http://llama-swap:8080/health
+curl http://llama-swap-amd:8080/health
+```
+
+### Force NVIDIA GPU Vision
+
+If you want to guarantee vision always works, even if NVIDIA has issues:
+
+```python
+# In bot/utils/llm.py, comment out health check in image_handling.py
+# (Not recommended, but allows requests to continue)
+```
+
+### Disable Dual-GPU Mode Temporarily
+
+If AMD GPU is causing issues:
+
+```yaml
+# In docker-compose.yml, stop llama-swap-amd
+# Restart bot
+# This reverts to single-GPU mode (everything on NVIDIA)
+```
+
+## Prevention Measures
+
+### 1. Monitor GPU Memory
+
+```bash
+# Setup automated monitoring
+watch -n 5 "nvidia-smi --query-gpu=memory.used,memory.free --format=csv,noheader"
+watch -n 5 "rocm-smi --showmeminfo"
+```
+
+### 2. Set Appropriate Model TTLs
+
+In `llama-swap-config.yaml`:
+```yaml
+vision:
+  ttl: 1800  # Keep loaded 30 minutes
+  
+llama3.1:
+  ttl: 1800  # Keep loaded 30 minutes
+```
+
+In `llama-swap-rocm-config.yaml`:
+```yaml
+llama3.1:
+  ttl: 1800  # AMD text model
+  
+darkidol:
+  ttl: 1800  # AMD evil mode
+```
+
+### 3. Monitor Container Logs
+
+```bash
+# Periodic log check
+docker compose logs llama-swap | tail -20
+docker compose logs llama-swap-amd | tail -20
+docker compose logs miku-bot | grep vision | tail -20
+```
+
+### 4. Regular Health Checks
+
+```bash
+# Script to check both GPU endpoints
+#!/bin/bash
+echo "NVIDIA Health:"
+curl -s http://llama-swap:8080/health && echo "✓ OK" || echo "✗ FAILED"
+
+echo "AMD Health:"
+curl -s http://llama-swap-amd:8080/health && echo "✓ OK" || echo "✗ FAILED"
+```
+
+## Performance Optimization
+
+If vision requests are too slow:
+
+1. **Reduce image quality** before sending to model
+2. **Use smaller frames** for video analysis
+3. **Batch process** multiple images
+4. **Allocate more VRAM** to NVIDIA if available
+5. **Reduce concurrent requests** to NVIDIA during peak load
+
+## Success Indicators
+
+After applying the fix, you should see:
+
+✅ Images analyzed within 5-10 seconds (first load: 20-30 seconds)
+✅ No "Vision service unavailable" errors
+✅ Log shows `Vision analysis completed successfully`
+✅ Works correctly whether AMD or NVIDIA is primary GPU
+✅ No GPU memory errors in nvidia-smi/rocm-smi
+
+## Contact Points for Further Issues
+
+1. Check NVIDIA llama.cpp/llama-swap logs
+2. Check AMD ROCm compatibility for your GPU
+3. Verify Docker networking (if using custom networks)
+4. Check system VRAM (needs ~10GB+ for both models)