moved AI generated readmes to readme folder (may delete)

2026-01-27 19:57:48 +02:00
parent 0f1c30f757
commit c58b941587
34 changed files with 8709 additions and 770 deletions
--- a/readmes/VISION_FIX_SUMMARY.md
+++ b/readmes/VISION_FIX_SUMMARY.md
@@ -0,0 +1,150 @@
+# Vision Model Dual-GPU Fix - Summary
+
+## Problem
+Vision model (MiniCPM-V) wasn't working when AMD GPU was set as the primary GPU for text inference.
+
+## Root Cause
+While `get_vision_gpu_url()` was correctly hardcoded to always use NVIDIA, there was:
+1. No health checking before attempting requests
+2. No detailed error logging to understand failures
+3. No timeout specification (could hang indefinitely)
+4. No verification that NVIDIA GPU was actually responsive
+
+When AMD became primary, if NVIDIA GPU had issues, vision requests would fail silently with poor error reporting.
+
+## Solution Implemented
+
+### 1. Enhanced GPU Routing (`bot/utils/llm.py`)
+
+```python
+def get_vision_gpu_url():
+    """Always use NVIDIA for vision, even when AMD is primary for text"""
+    # Added clear documentation
+    # Added debug logging when switching occurs
+    # Returns NVIDIA URL unconditionally
+```
+
+### 2. Added Health Check (`bot/utils/llm.py`)
+
+```python
+async def check_vision_endpoint_health():
+    """Verify NVIDIA vision endpoint is responsive before use"""
+    # Pings http://llama-swap:8080/health
+    # Returns (is_healthy: bool, error_message: Optional[str])
+    # Logs status for debugging
+```
+
+### 3. Improved Image Analysis (`bot/utils/image_handling.py`)
+
+**Before request:**
+- Health check
+- Detailed logging of endpoint, model, image size
+
+**During request:**
+- 60-second timeout (was unlimited)
+- Endpoint URL in error messages
+
+**After error:**
+- Full exception traceback in logs
+- Endpoint information in error response
+
+### 4. Improved Video Analysis (`bot/utils/image_handling.py`)
+
+**Before request:**
+- Health check
+- Logging of media type, frame count
+
+**During request:**
+- 120-second timeout (longer for multiple frames)
+- Endpoint URL in error messages
+
+**After error:**
+- Full exception traceback in logs
+- Endpoint information in error response
+
+## Key Changes
+
+| File | Function | Changes |
+|------|----------|---------|
+| `bot/utils/llm.py` | `get_vision_gpu_url()` | Added documentation, debug logging |
+| `bot/utils/llm.py` | `check_vision_endpoint_health()` | NEW: Health check function |
+| `bot/utils/image_handling.py` | `analyze_image_with_vision()` | Added health check, timeouts, detailed logging |
+| `bot/utils/image_handling.py` | `analyze_video_with_vision()` | Added health check, timeouts, detailed logging |
+
+## Testing
+
+Quick test to verify vision model works when AMD is primary:
+
+```bash
+# 1. Check GPU state is AMD
+cat bot/memory/gpu_state.json
+# Should show: {"current_gpu": "amd", ...}
+
+# 2. Send image to Discord
+# (bot should analyze with vision model)
+
+# 3. Check logs for success
+docker compose logs miku-bot 2>&1 | grep -i "vision"
+# Should see: "Vision analysis completed successfully"
+```
+
+## Expected Log Output
+
+### When Working Correctly
+```
+[INFO] Primary GPU is AMD for text, but using NVIDIA for vision model
+[INFO] Vision endpoint (http://llama-swap:8080) health check: OK
+[INFO] Sending vision request to http://llama-swap:8080 using model: vision
+[INFO] Vision analysis completed successfully
+```
+
+### If NVIDIA Vision Endpoint Down
+```
+[WARNING] Vision endpoint (http://llama-swap:8080) health check failed: status 503
+[WARNING] Vision endpoint unhealthy: Status 503
+[ERROR] Vision service currently unavailable: Status 503
+```
+
+### If Network Timeout
+```
+[ERROR] Vision endpoint (http://llama-swap:8080) health check: timeout
+[WARNING] Vision endpoint unhealthy: Endpoint timeout
+[ERROR] Vision service currently unavailable: Endpoint timeout
+```
+
+## Architecture Reminder
+
+- **NVIDIA GPU** (port 8090): Vision + text models
+- **AMD GPU** (port 8091): Text models ONLY
+- When AMD is primary: Text goes to AMD, vision goes to NVIDIA
+- When NVIDIA is primary: Everything goes to NVIDIA
+
+## Files Modified
+
+1. `/home/koko210Serve/docker/miku-discord/bot/utils/llm.py`
+2. `/home/koko210Serve/docker/miku-discord/bot/utils/image_handling.py`
+
+## Files Created
+
+1. `/home/koko210Serve/docker/miku-discord/VISION_MODEL_DEBUG.md` - Complete debugging guide
+
+## Deployment Notes
+
+No changes needed to:
+- Docker containers
+- Environment variables
+- Configuration files
+- Database or state files
+
+Just update the code and restart the bot:
+```bash
+docker compose restart miku-bot
+```
+
+## Success Criteria
+
+✅ Images are analyzed when AMD GPU is primary
+✅ Detailed error messages if vision endpoint fails
+✅ Health check prevents hanging requests
+✅ Logs show NVIDIA is correctly used for vision
+✅ No performance degradation compared to before