moved AI generated readmes to readme folder (may delete)
This commit is contained in:
150
readmes/VISION_FIX_SUMMARY.md
Normal file
150
readmes/VISION_FIX_SUMMARY.md
Normal file
@@ -0,0 +1,150 @@
|
||||
# Vision Model Dual-GPU Fix - Summary
|
||||
|
||||
## Problem
|
||||
Vision model (MiniCPM-V) wasn't working when AMD GPU was set as the primary GPU for text inference.
|
||||
|
||||
## Root Cause
|
||||
While `get_vision_gpu_url()` was correctly hardcoded to always use NVIDIA, there was:
|
||||
1. No health checking before attempting requests
|
||||
2. No detailed error logging to understand failures
|
||||
3. No timeout specification (could hang indefinitely)
|
||||
4. No verification that NVIDIA GPU was actually responsive
|
||||
|
||||
When AMD became primary, if NVIDIA GPU had issues, vision requests would fail silently with poor error reporting.
|
||||
|
||||
## Solution Implemented
|
||||
|
||||
### 1. Enhanced GPU Routing (`bot/utils/llm.py`)
|
||||
|
||||
```python
|
||||
def get_vision_gpu_url():
|
||||
"""Always use NVIDIA for vision, even when AMD is primary for text"""
|
||||
# Added clear documentation
|
||||
# Added debug logging when switching occurs
|
||||
# Returns NVIDIA URL unconditionally
|
||||
```
|
||||
|
||||
### 2. Added Health Check (`bot/utils/llm.py`)
|
||||
|
||||
```python
|
||||
async def check_vision_endpoint_health():
|
||||
"""Verify NVIDIA vision endpoint is responsive before use"""
|
||||
# Pings http://llama-swap:8080/health
|
||||
# Returns (is_healthy: bool, error_message: Optional[str])
|
||||
# Logs status for debugging
|
||||
```
|
||||
|
||||
### 3. Improved Image Analysis (`bot/utils/image_handling.py`)
|
||||
|
||||
**Before request:**
|
||||
- Health check
|
||||
- Detailed logging of endpoint, model, image size
|
||||
|
||||
**During request:**
|
||||
- 60-second timeout (was unlimited)
|
||||
- Endpoint URL in error messages
|
||||
|
||||
**After error:**
|
||||
- Full exception traceback in logs
|
||||
- Endpoint information in error response
|
||||
|
||||
### 4. Improved Video Analysis (`bot/utils/image_handling.py`)
|
||||
|
||||
**Before request:**
|
||||
- Health check
|
||||
- Logging of media type, frame count
|
||||
|
||||
**During request:**
|
||||
- 120-second timeout (longer for multiple frames)
|
||||
- Endpoint URL in error messages
|
||||
|
||||
**After error:**
|
||||
- Full exception traceback in logs
|
||||
- Endpoint information in error response
|
||||
|
||||
## Key Changes
|
||||
|
||||
| File | Function | Changes |
|
||||
|------|----------|---------|
|
||||
| `bot/utils/llm.py` | `get_vision_gpu_url()` | Added documentation, debug logging |
|
||||
| `bot/utils/llm.py` | `check_vision_endpoint_health()` | NEW: Health check function |
|
||||
| `bot/utils/image_handling.py` | `analyze_image_with_vision()` | Added health check, timeouts, detailed logging |
|
||||
| `bot/utils/image_handling.py` | `analyze_video_with_vision()` | Added health check, timeouts, detailed logging |
|
||||
|
||||
## Testing
|
||||
|
||||
Quick test to verify vision model works when AMD is primary:
|
||||
|
||||
```bash
|
||||
# 1. Check GPU state is AMD
|
||||
cat bot/memory/gpu_state.json
|
||||
# Should show: {"current_gpu": "amd", ...}
|
||||
|
||||
# 2. Send image to Discord
|
||||
# (bot should analyze with vision model)
|
||||
|
||||
# 3. Check logs for success
|
||||
docker compose logs miku-bot 2>&1 | grep -i "vision"
|
||||
# Should see: "Vision analysis completed successfully"
|
||||
```
|
||||
|
||||
## Expected Log Output
|
||||
|
||||
### When Working Correctly
|
||||
```
|
||||
[INFO] Primary GPU is AMD for text, but using NVIDIA for vision model
|
||||
[INFO] Vision endpoint (http://llama-swap:8080) health check: OK
|
||||
[INFO] Sending vision request to http://llama-swap:8080 using model: vision
|
||||
[INFO] Vision analysis completed successfully
|
||||
```
|
||||
|
||||
### If NVIDIA Vision Endpoint Down
|
||||
```
|
||||
[WARNING] Vision endpoint (http://llama-swap:8080) health check failed: status 503
|
||||
[WARNING] Vision endpoint unhealthy: Status 503
|
||||
[ERROR] Vision service currently unavailable: Status 503
|
||||
```
|
||||
|
||||
### If Network Timeout
|
||||
```
|
||||
[ERROR] Vision endpoint (http://llama-swap:8080) health check: timeout
|
||||
[WARNING] Vision endpoint unhealthy: Endpoint timeout
|
||||
[ERROR] Vision service currently unavailable: Endpoint timeout
|
||||
```
|
||||
|
||||
## Architecture Reminder
|
||||
|
||||
- **NVIDIA GPU** (port 8090): Vision + text models
|
||||
- **AMD GPU** (port 8091): Text models ONLY
|
||||
- When AMD is primary: Text goes to AMD, vision goes to NVIDIA
|
||||
- When NVIDIA is primary: Everything goes to NVIDIA
|
||||
|
||||
## Files Modified
|
||||
|
||||
1. `/home/koko210Serve/docker/miku-discord/bot/utils/llm.py`
|
||||
2. `/home/koko210Serve/docker/miku-discord/bot/utils/image_handling.py`
|
||||
|
||||
## Files Created
|
||||
|
||||
1. `/home/koko210Serve/docker/miku-discord/VISION_MODEL_DEBUG.md` - Complete debugging guide
|
||||
|
||||
## Deployment Notes
|
||||
|
||||
No changes needed to:
|
||||
- Docker containers
|
||||
- Environment variables
|
||||
- Configuration files
|
||||
- Database or state files
|
||||
|
||||
Just update the code and restart the bot:
|
||||
```bash
|
||||
docker compose restart miku-bot
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
✅ Images are analyzed when AMD GPU is primary
|
||||
✅ Detailed error messages if vision endpoint fails
|
||||
✅ Health check prevents hanging requests
|
||||
✅ Logs show NVIDIA is correctly used for vision
|
||||
✅ No performance degradation compared to before
|
||||
Reference in New Issue
Block a user