# On-Demand Face Detection - Final Implementation ## Problem Solved **Issue**: GPU only has 6GB VRAM, but we needed to run: - Text model (~4.8GB) - Vision model (~1GB when loaded) - Face detector (~918MB when loaded) **Result**: Vision model + Face detector = OOM (Out of Memory) ## Solution: On-Demand Container Management The face detector container **does NOT start by default**. It only starts when needed for face detection, then stops immediately after to free VRAM. ## New Process Flow ### Profile Picture Change (Danbooru): ``` 1. Danbooru Search & Download └─> Download image from Danbooru 2. Vision Model Verification └─> llama-swap loads vision model └─> Verify image contains Miku └─> Vision model stays loaded (auto-unload after 15min TTL) 3. Face Detection (NEW ON-DEMAND FLOW) ├─> Swap to text model (vision unloads) ├─> Wait 3s for VRAM to clear ├─> Start anime-face-detector container <-- STARTS HERE ├─> Wait for API to be ready (~5-10s) ├─> Call face detection API ├─> Get bbox & keypoints └─> Stop anime-face-detector container <-- STOPS HERE 4. Crop & Upload └─> Crop image using face bbox └─> Upload to Discord ``` ## VRAM Timeline ``` Time: 0s 10s 15s 25s 28s 30s │ │ │ │ │ │ Vision: ████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░ ← Unloads when swapping Text: ░░░░░░░░░░░░░░░░░░░░████████████████████████████ ← Loaded for swap Face Det: ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░██████████░░░░░░░░ ← Starts, detects, stops VRAM: ~5GB ~5GB ~1GB ~5.8GB ~1GB ~5GB Vision Vision Swap Face Swap Text only ``` ## Key Changes ### 1. Docker Compose (`docker-compose.yml`) ```yaml anime-face-detector: # ... config ... restart: "no" # Don't auto-restart profiles: - tools # Don't start by default (requires --profile tools) ``` **Result**: Container exists but doesn't run unless explicitly started. ### 2. Profile Picture Manager (`bot/utils/profile_picture_manager.py`) #### Added Methods: **`_start_face_detector()`** - Runs `docker start anime-face-detector` - Waits up to 30s for API health check - Returns True when ready **`_stop_face_detector()`** - Runs `docker stop anime-face-detector` - Frees ~918MB VRAM immediately **`_ensure_vram_available()`** (updated) - Swaps to text model - Waits 3s for vision model to unload #### Updated Method: **`_detect_face()`** ```python async def _detect_face(self, image_bytes: bytes, debug: bool = False): face_detector_started = False try: # 1. Free VRAM by swapping to text model await self._ensure_vram_available(debug=debug) # 2. Start face detector container if not await self._start_face_detector(debug=debug): return None face_detector_started = True # 3. Call face detection API # ... detection logic ... return detection_result finally: # 4. ALWAYS stop container to free VRAM if face_detector_started: await self._stop_face_detector(debug=debug) ``` ## Container States ### Normal Operation (Most of the time): ``` llama-swap: RUNNING (~4.8GB VRAM - text model loaded) miku-bot: RUNNING (minimal VRAM) anime-face-detector: STOPPED (0 VRAM) ``` ### During Profile Picture Change: ``` Phase 1 - Vision Verification: llama-swap: RUNNING (~5GB VRAM - vision model) miku-bot: RUNNING anime-face-detector: STOPPED Phase 2 - Model Swap: llama-swap: RUNNING (~1GB VRAM - transitioning) miku-bot: RUNNING anime-face-detector: STOPPED Phase 3 - Face Detection: llama-swap: RUNNING (~5GB VRAM - text model) miku-bot: RUNNING anime-face-detector: RUNNING (~918MB VRAM - detecting) Phase 4 - Cleanup: llama-swap: RUNNING (~5GB VRAM - text model) miku-bot: RUNNING anime-face-detector: STOPPED (0 VRAM - stopped) ``` ## Benefits ✅ **No VRAM Conflicts**: Sequential processing with container lifecycle management ✅ **Automatic**: Bot handles all starting/stopping ✅ **Efficient**: Face detector only uses VRAM when actively needed (~10-15s) ✅ **Reliable**: Always stops in finally block, even on errors ✅ **Simple**: Uses standard docker commands from inside container ## Commands ### Manual Container Management ```bash # Start face detector manually (for testing) docker start anime-face-detector # Check if it's running docker ps | grep anime-face-detector # Stop it manually docker stop anime-face-detector # Check VRAM usage nvidia-smi ``` ### Start with Profile (for Gradio UI testing) ```bash # Start with face detector running docker-compose --profile tools up -d # Use Gradio UI at http://localhost:7860 # Stop everything docker-compose down ``` ## Monitoring ### Check Container Status ```bash docker ps -a --filter name=anime-face-detector ``` ### Watch VRAM During Profile Change ```bash # Terminal 1: Watch GPU memory watch -n 0.5 nvidia-smi # Terminal 2: Trigger profile change curl -X POST http://localhost:3939/profile-picture/change ``` ### Check Bot Logs ```bash docker logs -f miku-bot | grep -E "face|VRAM|Starting|Stopping" ``` You should see: ``` 💾 Swapping to text model to free VRAM for face detection... ✅ Vision model unloaded, VRAM available 🚀 Starting face detector container... ✅ Face detector ready 👤 Detected 1 face(s) via API... 🛑 Stopping face detector to free VRAM... ✅ Face detector stopped ``` ## Testing ### Test On-Demand Face Detection ```bash # 1. Verify face detector is stopped docker ps | grep anime-face-detector # Should show nothing # 2. Check VRAM (should be ~4.8GB for text model only) nvidia-smi # 3. Trigger profile picture change curl -X POST "http://localhost:3939/profile-picture/change" # 4. Watch logs in another terminal docker logs -f miku-bot # 5. After completion, verify face detector stopped again docker ps | grep anime-face-detector # Should show nothing again # 6. Check VRAM returned to ~4.8GB nvidia-smi ``` ## Troubleshooting ### Face Detector Won't Start **Symptom**: `⚠️ Could not start face detector` **Solutions**: ```bash # Check if container exists docker ps -a | grep anime-face-detector # If missing, rebuild cd /home/koko210Serve/docker/miku-discord docker-compose build anime-face-detector # Check logs docker logs anime-face-detector ``` ### Still Getting OOM **Symptom**: `cudaMalloc failed: out of memory` **Check**: ```bash # What's using VRAM? nvidia-smi # Is face detector still running? docker ps | grep anime-face-detector # Stop it manually docker stop anime-face-detector ``` ### Container Won't Stop **Symptom**: Face detector stays running after detection **Solutions**: ```bash # Force stop docker stop anime-face-detector # Check for errors in bot logs docker logs miku-bot | grep "stop" # Verify the finally block is executing docker logs miku-bot | grep "Stopping face detector" ``` ## Performance Metrics | Operation | Duration | VRAM Peak | Notes | |-----------|----------|-----------|-------| | Vision verification | 5-10s | ~5GB | Vision model loaded | | Model swap | 3-5s | ~1GB | Transitioning | | Container start | 5-10s | ~5GB | Text + starting detector | | Face detection | 1-2s | ~5.8GB | Text + detector running | | Container stop | 1-2s | ~5GB | Back to text only | | **Total** | **15-29s** | **5.8GB max** | Fits in 6GB VRAM ✅ | ## Files Modified 1. `/miku-discord/docker-compose.yml` - Added `restart: "no"` - Added `profiles: [tools]` 2. `/miku-discord/bot/utils/profile_picture_manager.py` - Added `_start_face_detector()` - Added `_stop_face_detector()` - Updated `_detect_face()` with lifecycle management ## Related Documentation - `/miku-discord/VRAM_MANAGEMENT.md` - Original VRAM management approach - `/miku-discord/FACE_DETECTION_API_MIGRATION.md` - API migration details - `/miku-discord/PROFILE_PICTURE_IMPLEMENTATION.md` - Profile picture feature ## Success Criteria ✅ Face detector container does not run by default ✅ Container starts only when face detection is needed ✅ Container stops immediately after detection completes ✅ No VRAM OOM errors during profile picture changes ✅ Total VRAM usage stays under 6GB at all times ✅ Process completes successfully with face detection working --- **Status**: ✅ **IMPLEMENTED AND TESTED** The on-demand face detection system is now active. The face detector will automatically start and stop as needed, ensuring efficient VRAM usage without conflicts.