Files
miku-discord/ON_DEMAND_FACE_DETECTION.md

8.9 KiB

On-Demand Face Detection - Final Implementation

Problem Solved

Issue: GPU only has 6GB VRAM, but we needed to run:

  • Text model (~4.8GB)
  • Vision model (~1GB when loaded)
  • Face detector (~918MB when loaded)

Result: Vision model + Face detector = OOM (Out of Memory)

Solution: On-Demand Container Management

The face detector container does NOT start by default. It only starts when needed for face detection, then stops immediately after to free VRAM.

New Process Flow

Profile Picture Change (Danbooru):

1. Danbooru Search & Download
   └─> Download image from Danbooru
   
2. Vision Model Verification  
   └─> llama-swap loads vision model
   └─> Verify image contains Miku
   └─> Vision model stays loaded (auto-unload after 15min TTL)
   
3. Face Detection (NEW ON-DEMAND FLOW)
   ├─> Swap to text model (vision unloads)
   ├─> Wait 3s for VRAM to clear
   ├─> Start anime-face-detector container  <-- STARTS HERE
   ├─> Wait for API to be ready (~5-10s)
   ├─> Call face detection API
   ├─> Get bbox & keypoints
   └─> Stop anime-face-detector container   <-- STOPS HERE
   
4. Crop & Upload
   └─> Crop image using face bbox
   └─> Upload to Discord

VRAM Timeline

Time:     0s        10s       15s       25s       28s      30s
          │         │         │         │         │        │
Vision:   ████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░  ← Unloads when swapping
Text:     ░░░░░░░░░░░░░░░░░░░░████████████████████████████  ← Loaded for swap
Face Det: ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░██████████░░░░░░░░  ← Starts, detects, stops

VRAM:     ~5GB      ~5GB      ~1GB      ~5.8GB    ~1GB     ~5GB
          Vision    Vision    Swap      Face      Swap     Text only

Key Changes

1. Docker Compose (docker-compose.yml)

anime-face-detector:
  # ... config ...
  restart: "no"          # Don't auto-restart
  profiles:
    - tools              # Don't start by default (requires --profile tools)

Result: Container exists but doesn't run unless explicitly started.

2. Profile Picture Manager (bot/utils/profile_picture_manager.py)

Added Methods:

_start_face_detector()

  • Runs docker start anime-face-detector
  • Waits up to 30s for API health check
  • Returns True when ready

_stop_face_detector()

  • Runs docker stop anime-face-detector
  • Frees ~918MB VRAM immediately

_ensure_vram_available() (updated)

  • Swaps to text model
  • Waits 3s for vision model to unload

Updated Method:

_detect_face()

async def _detect_face(self, image_bytes: bytes, debug: bool = False):
    face_detector_started = False
    try:
        # 1. Free VRAM by swapping to text model
        await self._ensure_vram_available(debug=debug)
        
        # 2. Start face detector container
        if not await self._start_face_detector(debug=debug):
            return None
        face_detector_started = True
        
        # 3. Call face detection API
        # ... detection logic ...
        
        return detection_result
        
    finally:
        # 4. ALWAYS stop container to free VRAM
        if face_detector_started:
            await self._stop_face_detector(debug=debug)

Container States

Normal Operation (Most of the time):

llama-swap:         RUNNING  (~4.8GB VRAM - text model loaded)
miku-bot:           RUNNING  (minimal VRAM)
anime-face-detector: STOPPED  (0 VRAM)

During Profile Picture Change:

Phase 1 - Vision Verification:
  llama-swap:         RUNNING  (~5GB VRAM - vision model)
  miku-bot:           RUNNING  
  anime-face-detector: STOPPED

Phase 2 - Model Swap:
  llama-swap:         RUNNING  (~1GB VRAM - transitioning)
  miku-bot:           RUNNING
  anime-face-detector: STOPPED

Phase 3 - Face Detection:
  llama-swap:         RUNNING  (~5GB VRAM - text model)
  miku-bot:           RUNNING
  anime-face-detector: RUNNING  (~918MB VRAM - detecting)

Phase 4 - Cleanup:
  llama-swap:         RUNNING  (~5GB VRAM - text model)
  miku-bot:           RUNNING
  anime-face-detector: STOPPED  (0 VRAM - stopped)

Benefits

No VRAM Conflicts: Sequential processing with container lifecycle management Automatic: Bot handles all starting/stopping Efficient: Face detector only uses VRAM when actively needed (~10-15s) Reliable: Always stops in finally block, even on errors Simple: Uses standard docker commands from inside container

Commands

Manual Container Management

# Start face detector manually (for testing)
docker start anime-face-detector

# Check if it's running
docker ps | grep anime-face-detector

# Stop it manually
docker stop anime-face-detector

# Check VRAM usage
nvidia-smi

Start with Profile (for Gradio UI testing)

# Start with face detector running
docker-compose --profile tools up -d

# Use Gradio UI at http://localhost:7860
# Stop everything
docker-compose down

Monitoring

Check Container Status

docker ps -a --filter name=anime-face-detector

Watch VRAM During Profile Change

# Terminal 1: Watch GPU memory
watch -n 0.5 nvidia-smi

# Terminal 2: Trigger profile change
curl -X POST http://localhost:3939/profile-picture/change

Check Bot Logs

docker logs -f miku-bot | grep -E "face|VRAM|Starting|Stopping"

You should see:

💾 Swapping to text model to free VRAM for face detection...
✅ Vision model unloaded, VRAM available
🚀 Starting face detector container...
✅ Face detector ready
👤 Detected 1 face(s) via API...
🛑 Stopping face detector to free VRAM...
✅ Face detector stopped

Testing

Test On-Demand Face Detection

# 1. Verify face detector is stopped
docker ps | grep anime-face-detector
# Should show nothing

# 2. Check VRAM (should be ~4.8GB for text model only)
nvidia-smi

# 3. Trigger profile picture change
curl -X POST "http://localhost:3939/profile-picture/change"

# 4. Watch logs in another terminal
docker logs -f miku-bot

# 5. After completion, verify face detector stopped again
docker ps | grep anime-face-detector
# Should show nothing again

# 6. Check VRAM returned to ~4.8GB
nvidia-smi

Troubleshooting

Face Detector Won't Start

Symptom: ⚠️ Could not start face detector

Solutions:

# Check if container exists
docker ps -a | grep anime-face-detector

# If missing, rebuild
cd /home/koko210Serve/docker/miku-discord
docker-compose build anime-face-detector

# Check logs
docker logs anime-face-detector

Still Getting OOM

Symptom: cudaMalloc failed: out of memory

Check:

# What's using VRAM?
nvidia-smi

# Is face detector still running?
docker ps | grep anime-face-detector

# Stop it manually
docker stop anime-face-detector

Container Won't Stop

Symptom: Face detector stays running after detection

Solutions:

# Force stop
docker stop anime-face-detector

# Check for errors in bot logs
docker logs miku-bot | grep "stop"

# Verify the finally block is executing
docker logs miku-bot | grep "Stopping face detector"

Performance Metrics

Operation Duration VRAM Peak Notes
Vision verification 5-10s ~5GB Vision model loaded
Model swap 3-5s ~1GB Transitioning
Container start 5-10s ~5GB Text + starting detector
Face detection 1-2s ~5.8GB Text + detector running
Container stop 1-2s ~5GB Back to text only
Total 15-29s 5.8GB max Fits in 6GB VRAM

Files Modified

  1. /miku-discord/docker-compose.yml

    • Added restart: "no"
    • Added profiles: [tools]
  2. /miku-discord/bot/utils/profile_picture_manager.py

    • Added _start_face_detector()
    • Added _stop_face_detector()
    • Updated _detect_face() with lifecycle management
  • /miku-discord/VRAM_MANAGEMENT.md - Original VRAM management approach
  • /miku-discord/FACE_DETECTION_API_MIGRATION.md - API migration details
  • /miku-discord/PROFILE_PICTURE_IMPLEMENTATION.md - Profile picture feature

Success Criteria

Face detector container does not run by default Container starts only when face detection is needed Container stops immediately after detection completes No VRAM OOM errors during profile picture changes Total VRAM usage stays under 6GB at all times Process completes successfully with face detection working


Status: IMPLEMENTED AND TESTED

The on-demand face detection system is now active. The face detector will automatically start and stop as needed, ensuring efficient VRAM usage without conflicts.