# Voice Call Automation System

## Overview

Miku now has an automated voice call system that can be triggered from the web UI. This replaces the manual command-based voice chat flow with a seamless, immersive experience.

## Features

### 1. Voice Debug Mode Toggle
- **Environment Variable**: `VOICE_DEBUG_MODE` (default: `false`)
- When `true`: Shows manual commands, text notifications, transcripts in chat
- When `false` (field deployment): Silent operation, no command notifications

### 2. Automated Voice Call Flow

#### Initiation (Web UI → API)
```
POST /api/voice/call
{
  "user_id": 123456789,
  "voice_channel_id": 987654321
}
```

#### What Happens:
1. **Container Startup**: Starts `miku-stt` and `miku-rvc-api` containers
2. **Warmup Wait**: Monitors containers until fully warmed up
   - STT: WebSocket connection check (30s timeout)
   - TTS: Health endpoint check for `warmed_up: true` (60s timeout)
3. **Join Voice Channel**: Creates voice session with full resource locking
4. **Send DM**: Generates personalized LLM invitation and sends with voice channel invite link
5. **Auto-Listen**: Automatically starts listening when user joins

#### User Join Detection:
- Monitors `on_voice_state_update` events
- When target user joins:
  - Marks `user_has_joined = True`
  - Cancels 30min timeout
  - Auto-starts STT for that user

#### Auto-Leave After User Disconnect:
- **45 second timer** starts when user leaves voice channel
- If user doesn't rejoin within 45s:
  - Ends voice session
  - Stops STT and TTS containers
  - Releases all resources
  - Returns to normal operation
- If user rejoins before 45s, timer is cancelled

#### 30-Minute Join Timeout:
- If user never joins within 30 minutes:
  - Ends voice session
  - Stops containers
  - Sends timeout DM: "Aww, I guess you couldn't make it to voice chat... Maybe next time! 💙"

### 3. Container Management

**File**: `bot/utils/container_manager.py`

#### Methods:
- `start_voice_containers()`: Starts STT & TTS, waits for warmup
- `stop_voice_containers()`: Stops both containers
- `are_containers_running()`: Check container status
- `_wait_for_stt_warmup()`: WebSocket connection check
- `_wait_for_tts_warmup()`: Health endpoint check

#### Warmup Detection:
```python
# STT Warmup: Try WebSocket connection
ws://miku-stt:8765

# TTS Warmup: Check health endpoint
GET http://miku-rvc-api:8765/health
Response: {"status": "ready", "warmed_up": true}
```

### 4. Voice Session Tracking

**File**: `bot/utils/voice_manager.py`

#### New VoiceSession Fields:
```python
call_user_id: Optional[int]  # User ID that was called
call_timeout_task: Optional[asyncio.Task]  # 30min timeout
user_has_joined: bool  # Track if user joined
auto_leave_task: Optional[asyncio.Task]  # 45s auto-leave
user_leave_time: Optional[float]  # When user left
```

#### Methods:
- `on_user_join(user_id)`: Handle user joining voice channel
- `on_user_leave(user_id)`: Start 45s auto-leave timer
- `_auto_leave_after_user_disconnect()`: Execute auto-leave

### 5. LLM Context Update

Miku's voice chat prompt now includes:
```
NOTE: You will automatically disconnect 45 seconds after {user.name} leaves the voice channel,
so you can mention this if asked about leaving
```

### 6. Debug Mode Integration

#### With `VOICE_DEBUG_MODE=true`:
- Shows "🎤 User said: ..." in text chat
- Shows "💬 Miku: ..." responses
- Shows interruption messages
- Manual commands work (`!miku join`, `!miku listen`, etc.)

#### With `VOICE_DEBUG_MODE=false` (field deployment):
- No text notifications
- No command outputs
- Silent operation
- Only log files show activity

## API Endpoint

### POST `/api/voice/call`

**Request Body**:
```json
{
  "user_id": 123456789,
  "voice_channel_id": 987654321
}
```

**Success Response**:
```json
{
  "success": true,
  "user_id": 123456789,
  "channel_id": 987654321,
  "invite_url": "https://discord.gg/abc123"
}
```

**Error Response**:
```json
{
  "success": false,
  "error": "Failed to start voice containers"
}
```

## File Changes

### New Files:
1. `bot/utils/container_manager.py` - Docker container management
2. `VOICE_CALL_AUTOMATION.md` - This documentation

### Modified Files:
1. `bot/globals.py` - Added `VOICE_DEBUG_MODE` flag
2. `bot/api.py` - Added `/api/voice/call` endpoint and timeout handler
3. `bot/bot.py` - Added `on_voice_state_update` event handler
4. `bot/utils/voice_manager.py`:
   - Added call tracking fields to VoiceSession
   - Added `on_user_join()` and `on_user_leave()` methods
   - Added `_auto_leave_after_user_disconnect()` method
   - Updated LLM prompt with auto-disconnect context
   - Gated debug messages behind `VOICE_DEBUG_MODE`
5. `bot/utils/voice_receiver.py` - Removed Discord VAD events (rely on RealtimeSTT only)

## Testing Checklist

### Web UI Integration:
- [ ] Create voice call trigger UI with user ID and channel ID inputs
- [ ] Display call status (starting containers, waiting for warmup, joined VC, waiting for user)
- [ ] Show timeout countdown
- [ ] Handle errors gracefully

### Flow Testing:
- [ ] Test successful call flow (containers start → warmup → join → DM → user joins → conversation → user leaves → 45s timer → auto-leave → containers stop)
- [ ] Test 30min timeout (user never joins)
- [ ] Test user rejoin within 45s (cancels auto-leave)
- [ ] Test container failure handling
- [ ] Test warmup timeout handling
- [ ] Test DM failure (should continue anyway)

### Debug Mode:
- [ ] Test with `VOICE_DEBUG_MODE=true` (should see all notifications)
- [ ] Test with `VOICE_DEBUG_MODE=false` (should be silent)

## Environment Variables

Add to `.env` or `docker-compose.yml`:
```bash
VOICE_DEBUG_MODE=false  # Set to true for debugging
```

## Next Steps

1. **Web UI**: Create voice call interface with:
   - User ID input
   - Voice channel ID dropdown (fetch from Discord)
   - "Call User" button
   - Status display
   - Active call management

2. **Monitoring**: Add voice call metrics:
   - Call duration
   - User join time
   - Auto-leave triggers
   - Container startup times

3. **Enhancements**:
   - Multiple simultaneous calls (different channels)
   - Call history logging
   - User preferences (auto-answer, DND mode)
   - Scheduled voice calls

## Technical Notes

### Container Warmup Times:
- **STT** (`miku-stt`): ~5-15 seconds (model loading)
- **TTS** (`miku-rvc-api`): ~30-60 seconds (RVC model loading, synthesis warmup)
- **Total**: ~35-75 seconds from API call to ready

### Resource Management:
- Voice sessions use `VoiceSessionManager` singleton
- Only one voice session active at a time
- Full resource locking during voice:
  - AMD GPU for text inference
  - Vision model blocked
  - Image generation disabled
  - Bipolar mode disabled
  - Autonomous engine paused

### Cleanup Guarantees:
- 45s auto-leave ensures no orphaned sessions
- 30min timeout prevents indefinite container running
- All cleanup paths stop containers
- Voice session end releases all resources

## Troubleshooting

### Containers won't start:
- Check Docker daemon status
- Check `docker compose ps` for existing containers
- Check logs: `docker logs miku-stt` / `docker logs miku-rvc-api`

### Warmup timeout:
- STT: Check WebSocket is accepting connections on port 8765
- TTS: Check health endpoint returns `{"warmed_up": true}`
- Increase timeout values if needed (slow hardware)

### User never joins:
- Verify invite URL is valid
- Check user has permission to join voice channel
- Verify DM was delivered (may be blocked)

### Auto-leave not triggering:
- Check `on_voice_state_update` events are firing
- Verify user ID matches `call_user_id`
- Check logs for timer creation/cancellation

### Containers not stopping:
- Manual stop: `docker compose stop miku-stt miku-rvc-api`
- Check for orphaned containers: `docker ps`
- Force remove: `docker rm -f miku-stt miku-rvc-api`