8.9 KiB
On-Demand Face Detection - Final Implementation
Problem Solved
Issue: GPU only has 6GB VRAM, but we needed to run:
- Text model (~4.8GB)
- Vision model (~1GB when loaded)
- Face detector (~918MB when loaded)
Result: Vision model + Face detector = OOM (Out of Memory)
Solution: On-Demand Container Management
The face detector container does NOT start by default. It only starts when needed for face detection, then stops immediately after to free VRAM.
New Process Flow
Profile Picture Change (Danbooru):
1. Danbooru Search & Download
└─> Download image from Danbooru
2. Vision Model Verification
└─> llama-swap loads vision model
└─> Verify image contains Miku
└─> Vision model stays loaded (auto-unload after 15min TTL)
3. Face Detection (NEW ON-DEMAND FLOW)
├─> Swap to text model (vision unloads)
├─> Wait 3s for VRAM to clear
├─> Start anime-face-detector container <-- STARTS HERE
├─> Wait for API to be ready (~5-10s)
├─> Call face detection API
├─> Get bbox & keypoints
└─> Stop anime-face-detector container <-- STOPS HERE
4. Crop & Upload
└─> Crop image using face bbox
└─> Upload to Discord
VRAM Timeline
Time: 0s 10s 15s 25s 28s 30s
│ │ │ │ │ │
Vision: ████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░ ← Unloads when swapping
Text: ░░░░░░░░░░░░░░░░░░░░████████████████████████████ ← Loaded for swap
Face Det: ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░██████████░░░░░░░░ ← Starts, detects, stops
VRAM: ~5GB ~5GB ~1GB ~5.8GB ~1GB ~5GB
Vision Vision Swap Face Swap Text only
Key Changes
1. Docker Compose (docker-compose.yml)
anime-face-detector:
# ... config ...
restart: "no" # Don't auto-restart
profiles:
- tools # Don't start by default (requires --profile tools)
Result: Container exists but doesn't run unless explicitly started.
2. Profile Picture Manager (bot/utils/profile_picture_manager.py)
Added Methods:
_start_face_detector()
- Runs
docker start anime-face-detector - Waits up to 30s for API health check
- Returns True when ready
_stop_face_detector()
- Runs
docker stop anime-face-detector - Frees ~918MB VRAM immediately
_ensure_vram_available() (updated)
- Swaps to text model
- Waits 3s for vision model to unload
Updated Method:
_detect_face()
async def _detect_face(self, image_bytes: bytes, debug: bool = False):
face_detector_started = False
try:
# 1. Free VRAM by swapping to text model
await self._ensure_vram_available(debug=debug)
# 2. Start face detector container
if not await self._start_face_detector(debug=debug):
return None
face_detector_started = True
# 3. Call face detection API
# ... detection logic ...
return detection_result
finally:
# 4. ALWAYS stop container to free VRAM
if face_detector_started:
await self._stop_face_detector(debug=debug)
Container States
Normal Operation (Most of the time):
llama-swap: RUNNING (~4.8GB VRAM - text model loaded)
miku-bot: RUNNING (minimal VRAM)
anime-face-detector: STOPPED (0 VRAM)
During Profile Picture Change:
Phase 1 - Vision Verification:
llama-swap: RUNNING (~5GB VRAM - vision model)
miku-bot: RUNNING
anime-face-detector: STOPPED
Phase 2 - Model Swap:
llama-swap: RUNNING (~1GB VRAM - transitioning)
miku-bot: RUNNING
anime-face-detector: STOPPED
Phase 3 - Face Detection:
llama-swap: RUNNING (~5GB VRAM - text model)
miku-bot: RUNNING
anime-face-detector: RUNNING (~918MB VRAM - detecting)
Phase 4 - Cleanup:
llama-swap: RUNNING (~5GB VRAM - text model)
miku-bot: RUNNING
anime-face-detector: STOPPED (0 VRAM - stopped)
Benefits
✅ No VRAM Conflicts: Sequential processing with container lifecycle management ✅ Automatic: Bot handles all starting/stopping ✅ Efficient: Face detector only uses VRAM when actively needed (~10-15s) ✅ Reliable: Always stops in finally block, even on errors ✅ Simple: Uses standard docker commands from inside container
Commands
Manual Container Management
# Start face detector manually (for testing)
docker start anime-face-detector
# Check if it's running
docker ps | grep anime-face-detector
# Stop it manually
docker stop anime-face-detector
# Check VRAM usage
nvidia-smi
Start with Profile (for Gradio UI testing)
# Start with face detector running
docker-compose --profile tools up -d
# Use Gradio UI at http://localhost:7860
# Stop everything
docker-compose down
Monitoring
Check Container Status
docker ps -a --filter name=anime-face-detector
Watch VRAM During Profile Change
# Terminal 1: Watch GPU memory
watch -n 0.5 nvidia-smi
# Terminal 2: Trigger profile change
curl -X POST http://localhost:3939/profile-picture/change
Check Bot Logs
docker logs -f miku-bot | grep -E "face|VRAM|Starting|Stopping"
You should see:
💾 Swapping to text model to free VRAM for face detection...
✅ Vision model unloaded, VRAM available
🚀 Starting face detector container...
✅ Face detector ready
👤 Detected 1 face(s) via API...
🛑 Stopping face detector to free VRAM...
✅ Face detector stopped
Testing
Test On-Demand Face Detection
# 1. Verify face detector is stopped
docker ps | grep anime-face-detector
# Should show nothing
# 2. Check VRAM (should be ~4.8GB for text model only)
nvidia-smi
# 3. Trigger profile picture change
curl -X POST "http://localhost:3939/profile-picture/change"
# 4. Watch logs in another terminal
docker logs -f miku-bot
# 5. After completion, verify face detector stopped again
docker ps | grep anime-face-detector
# Should show nothing again
# 6. Check VRAM returned to ~4.8GB
nvidia-smi
Troubleshooting
Face Detector Won't Start
Symptom: ⚠️ Could not start face detector
Solutions:
# Check if container exists
docker ps -a | grep anime-face-detector
# If missing, rebuild
cd /home/koko210Serve/docker/miku-discord
docker-compose build anime-face-detector
# Check logs
docker logs anime-face-detector
Still Getting OOM
Symptom: cudaMalloc failed: out of memory
Check:
# What's using VRAM?
nvidia-smi
# Is face detector still running?
docker ps | grep anime-face-detector
# Stop it manually
docker stop anime-face-detector
Container Won't Stop
Symptom: Face detector stays running after detection
Solutions:
# Force stop
docker stop anime-face-detector
# Check for errors in bot logs
docker logs miku-bot | grep "stop"
# Verify the finally block is executing
docker logs miku-bot | grep "Stopping face detector"
Performance Metrics
| Operation | Duration | VRAM Peak | Notes |
|---|---|---|---|
| Vision verification | 5-10s | ~5GB | Vision model loaded |
| Model swap | 3-5s | ~1GB | Transitioning |
| Container start | 5-10s | ~5GB | Text + starting detector |
| Face detection | 1-2s | ~5.8GB | Text + detector running |
| Container stop | 1-2s | ~5GB | Back to text only |
| Total | 15-29s | 5.8GB max | Fits in 6GB VRAM ✅ |
Files Modified
-
/miku-discord/docker-compose.yml- Added
restart: "no" - Added
profiles: [tools]
- Added
-
/miku-discord/bot/utils/profile_picture_manager.py- Added
_start_face_detector() - Added
_stop_face_detector() - Updated
_detect_face()with lifecycle management
- Added
Related Documentation
/miku-discord/VRAM_MANAGEMENT.md- Original VRAM management approach/miku-discord/FACE_DETECTION_API_MIGRATION.md- API migration details/miku-discord/PROFILE_PICTURE_IMPLEMENTATION.md- Profile picture feature
Success Criteria
✅ Face detector container does not run by default ✅ Container starts only when face detection is needed ✅ Container stops immediately after detection completes ✅ No VRAM OOM errors during profile picture changes ✅ Total VRAM usage stays under 6GB at all times ✅ Process completes successfully with face detection working
Status: ✅ IMPLEMENTED AND TESTED
The on-demand face detection system is now active. The face detector will automatically start and stop as needed, ensuring efficient VRAM usage without conflicts.