miku-discord/ON_DEMAND_FACE_DETECTION.md

# On-Demand Face Detection - Final Implementation

## Problem Solved

**Issue**: GPU only has 6GB VRAM, but we needed to run:
- Text model (~4.8GB)
- Vision model (~1GB when loaded)
- Face detector (~918MB when loaded)

**Result**: Vision model + Face detector = OOM (Out of Memory)

## Solution: On-Demand Container Management

The face detector container **does NOT start by default**. It only starts when needed for face detection, then stops immediately after to free VRAM.

## New Process Flow

### Profile Picture Change (Danbooru):

```
1. Danbooru Search & Download
   └─> Download image from Danbooru

2. Vision Model Verification
   └─> llama-swap loads vision model
   └─> Verify image contains Miku
   └─> Vision model stays loaded (auto-unload after 15min TTL)

3. Face Detection (NEW ON-DEMAND FLOW)
   ├─> Swap to text model (vision unloads)
   ├─> Wait 3s for VRAM to clear
   ├─> Start anime-face-detector container  <-- STARTS HERE
   ├─> Wait for API to be ready (~5-10s)
   ├─> Call face detection API
   ├─> Get bbox & keypoints
   └─> Stop anime-face-detector container   <-- STOPS HERE

4. Crop & Upload
   └─> Crop image using face bbox
   └─> Upload to Discord
```

## VRAM Timeline

```
Time:     0s        10s       15s       25s       28s      30s
          │         │         │         │         │        │
Vision:   ████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░  ← Unloads when swapping
Text:     ░░░░░░░░░░░░░░░░░░░░████████████████████████████  ← Loaded for swap
Face Det: ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░██████████░░░░░░░░  ← Starts, detects, stops

VRAM:     ~5GB      ~5GB      ~1GB      ~5.8GB    ~1GB     ~5GB
          Vision    Vision    Swap      Face      Swap     Text only
```

## Key Changes

### 1. Docker Compose (`docker-compose.yml`)

```yaml
anime-face-detector:
  # ... config ...
  restart: "no"          # Don't auto-restart
  profiles:
    - tools              # Don't start by default (requires --profile tools)
```

**Result**: Container exists but doesn't run unless explicitly started.

### 2. Profile Picture Manager (`bot/utils/profile_picture_manager.py`)

#### Added Methods:

**`_start_face_detector()`**
- Runs `docker start anime-face-detector`
- Waits up to 30s for API health check
- Returns True when ready

**`_stop_face_detector()`**
- Runs `docker stop anime-face-detector`
- Frees ~918MB VRAM immediately

**`_ensure_vram_available()`** (updated)
- Swaps to text model
- Waits 3s for vision model to unload

#### Updated Method:

**`_detect_face()`**
```python
async def _detect_face(self, image_bytes: bytes, debug: bool = False):
    face_detector_started = False
    try:
        # 1. Free VRAM by swapping to text model
        await self._ensure_vram_available(debug=debug)

        # 2. Start face detector container
        if not await self._start_face_detector(debug=debug):
            return None
        face_detector_started = True

        # 3. Call face detection API
        # ... detection logic ...

        return detection_result

    finally:
        # 4. ALWAYS stop container to free VRAM
        if face_detector_started:
            await self._stop_face_detector(debug=debug)
```

## Container States

### Normal Operation (Most of the time):
```
llama-swap:         RUNNING  (~4.8GB VRAM - text model loaded)
miku-bot:           RUNNING  (minimal VRAM)
anime-face-detector: STOPPED  (0 VRAM)
```

### During Profile Picture Change:
```
Phase 1 - Vision Verification:
  llama-swap:         RUNNING  (~5GB VRAM - vision model)
  miku-bot:           RUNNING
  anime-face-detector: STOPPED

Phase 2 - Model Swap:
  llama-swap:         RUNNING  (~1GB VRAM - transitioning)
  miku-bot:           RUNNING
  anime-face-detector: STOPPED

Phase 3 - Face Detection:
  llama-swap:         RUNNING  (~5GB VRAM - text model)
  miku-bot:           RUNNING
  anime-face-detector: RUNNING  (~918MB VRAM - detecting)

Phase 4 - Cleanup:
  llama-swap:         RUNNING  (~5GB VRAM - text model)
  miku-bot:           RUNNING
  anime-face-detector: STOPPED  (0 VRAM - stopped)
```

## Benefits

✅ **No VRAM Conflicts**: Sequential processing with container lifecycle management
✅ **Automatic**: Bot handles all starting/stopping
✅ **Efficient**: Face detector only uses VRAM when actively needed (~10-15s)
✅ **Reliable**: Always stops in finally block, even on errors
✅ **Simple**: Uses standard docker commands from inside container

## Commands

### Manual Container Management

```bash
# Start face detector manually (for testing)
docker start anime-face-detector

# Check if it's running
docker ps | grep anime-face-detector

# Stop it manually
docker stop anime-face-detector

# Check VRAM usage
nvidia-smi
```

### Start with Profile (for Gradio UI testing)

```bash
# Start with face detector running
docker-compose --profile tools up -d

# Use Gradio UI at http://localhost:7860
# Stop everything
docker-compose down
```

## Monitoring

### Check Container Status
```bash
docker ps -a --filter name=anime-face-detector
```

### Watch VRAM During Profile Change
```bash
# Terminal 1: Watch GPU memory
watch -n 0.5 nvidia-smi

# Terminal 2: Trigger profile change
curl -X POST http://localhost:3939/profile-picture/change
```

### Check Bot Logs
```bash
docker logs -f miku-bot | grep -E "face|VRAM|Starting|Stopping"
```

You should see:
```
💾 Swapping to text model to free VRAM for face detection...
✅ Vision model unloaded, VRAM available
🚀 Starting face detector container...
✅ Face detector ready
👤 Detected 1 face(s) via API...
🛑 Stopping face detector to free VRAM...
✅ Face detector stopped
```

## Testing

### Test On-Demand Face Detection

```bash
# 1. Verify face detector is stopped
docker ps | grep anime-face-detector
# Should show nothing

# 2. Check VRAM (should be ~4.8GB for text model only)
nvidia-smi

# 3. Trigger profile picture change
curl -X POST "http://localhost:3939/profile-picture/change"

# 4. Watch logs in another terminal
docker logs -f miku-bot

# 5. After completion, verify face detector stopped again
docker ps | grep anime-face-detector
# Should show nothing again

# 6. Check VRAM returned to ~4.8GB
nvidia-smi
```

## Troubleshooting

### Face Detector Won't Start

**Symptom**: `⚠️ Could not start face detector`

**Solutions**:
```bash
# Check if container exists
docker ps -a | grep anime-face-detector

# If missing, rebuild
cd /home/koko210Serve/docker/miku-discord
docker-compose build anime-face-detector

# Check logs
docker logs anime-face-detector
```

### Still Getting OOM

**Symptom**: `cudaMalloc failed: out of memory`

**Check**:
```bash
# What's using VRAM?
nvidia-smi

# Is face detector still running?
docker ps | grep anime-face-detector

# Stop it manually
docker stop anime-face-detector
```

### Container Won't Stop

**Symptom**: Face detector stays running after detection

**Solutions**:
```bash
# Force stop
docker stop anime-face-detector

# Check for errors in bot logs
docker logs miku-bot | grep "stop"

# Verify the finally block is executing
docker logs miku-bot | grep "Stopping face detector"
```

## Performance Metrics

| Operation | Duration | VRAM Peak | Notes |
|-----------|----------|-----------|-------|
| Vision verification | 5-10s | ~5GB | Vision model loaded |
| Model swap | 3-5s | ~1GB | Transitioning |
| Container start | 5-10s | ~5GB | Text + starting detector |
| Face detection | 1-2s | ~5.8GB | Text + detector running |
| Container stop | 1-2s | ~5GB | Back to text only |
| **Total** | **15-29s** | **5.8GB max** | Fits in 6GB VRAM ✅ |

## Files Modified

1. `/miku-discord/docker-compose.yml`
   - Added `restart: "no"`
   - Added `profiles: [tools]`

2. `/miku-discord/bot/utils/profile_picture_manager.py`
   - Added `_start_face_detector()`
   - Added `_stop_face_detector()`
   - Updated `_detect_face()` with lifecycle management

## Related Documentation

- `/miku-discord/VRAM_MANAGEMENT.md` - Original VRAM management approach
- `/miku-discord/FACE_DETECTION_API_MIGRATION.md` - API migration details
- `/miku-discord/PROFILE_PICTURE_IMPLEMENTATION.md` - Profile picture feature

## Success Criteria

✅ Face detector container does not run by default
✅ Container starts only when face detection is needed
✅ Container stops immediately after detection completes
✅ No VRAM OOM errors during profile picture changes
✅ Total VRAM usage stays under 6GB at all times
✅ Process completes successfully with face detection working

---

**Status**: ✅ **IMPLEMENTED AND TESTED**

The on-demand face detection system is now active. The face detector will automatically start and stop as needed, ensuring efficient VRAM usage without conflicts.