VOICE_CALL_AUTOMATION.md

# Voice Call Automation System

## Overview

Miku now has an automated voice call system that can be triggered from the web UI. This replaces the manual command-based voice chat flow with a seamless, immersive experience.

## Features

### 1. Voice Debug Mode Toggle
- **Environment Variable**: `VOICE_DEBUG_MODE` (default: `false`)
- When `true`: Shows manual commands, text notifications, transcripts in chat
- When `false` (field deployment): Silent operation, no command notifications

### 2. Automated Voice Call Flow

#### Initiation (Web UI → API)
```
POST /api/voice/call
{
  "user_id": 123456789,
  "voice_channel_id": 987654321
}
```

#### What Happens:
1. **Container Startup**: Starts `miku-stt` and `miku-rvc-api` containers
2. **Warmup Wait**: Monitors containers until fully warmed up
   - STT: WebSocket connection check (30s timeout)
   - TTS: Health endpoint check for `warmed_up: true` (60s timeout)
3. **Join Voice Channel**: Creates voice session with full resource locking
4. **Send DM**: Generates personalized LLM invitation and sends with voice channel invite link
5. **Auto-Listen**: Automatically starts listening when user joins

#### User Join Detection:
- Monitors `on_voice_state_update` events
- When target user joins:
  - Marks `user_has_joined = True`
  - Cancels 30min timeout
  - Auto-starts STT for that user

#### Auto-Leave After User Disconnect:
- **45 second timer** starts when user leaves voice channel
- If user doesn't rejoin within 45s:
  - Ends voice session
  - Stops STT and TTS containers
  - Releases all resources
  - Returns to normal operation
- If user rejoins before 45s, timer is cancelled

#### 30-Minute Join Timeout:
- If user never joins within 30 minutes:
  - Ends voice session
  - Stops containers
  - Sends timeout DM: "Aww, I guess you couldn't make it to voice chat... Maybe next time! 💙"

### 3. Container Management

**File**: `bot/utils/container_manager.py`

#### Methods:
- `start_voice_containers()`: Starts STT & TTS, waits for warmup
- `stop_voice_containers()`: Stops both containers
- `are_containers_running()`: Check container status
- `_wait_for_stt_warmup()`: WebSocket connection check
- `_wait_for_tts_warmup()`: Health endpoint check

#### Warmup Detection:
```python
# STT Warmup: Try WebSocket connection
ws://miku-stt:8765

# TTS Warmup: Check health endpoint
GET http://miku-rvc-api:8765/health
Response: {"status": "ready", "warmed_up": true}
```

### 4. Voice Session Tracking

**File**: `bot/utils/voice_manager.py`

#### New VoiceSession Fields:
```python
call_user_id: Optional[int]  # User ID that was called
call_timeout_task: Optional[asyncio.Task]  # 30min timeout
user_has_joined: bool  # Track if user joined
auto_leave_task: Optional[asyncio.Task]  # 45s auto-leave
user_leave_time: Optional[float]  # When user left
```

#### Methods:
- `on_user_join(user_id)`: Handle user joining voice channel
- `on_user_leave(user_id)`: Start 45s auto-leave timer
- `_auto_leave_after_user_disconnect()`: Execute auto-leave

### 5. LLM Context Update

Miku's voice chat prompt now includes:
```
NOTE: You will automatically disconnect 45 seconds after {user.name} leaves the voice channel,
so you can mention this if asked about leaving
```

### 6. Debug Mode Integration

#### With `VOICE_DEBUG_MODE=true`:
- Shows "🎤 User said: ..." in text chat
- Shows "💬 Miku: ..." responses
- Shows interruption messages
- Manual commands work (`!miku join`, `!miku listen`, etc.)

#### With `VOICE_DEBUG_MODE=false` (field deployment):
- No text notifications
- No command outputs
- Silent operation
- Only log files show activity

## API Endpoint

### POST `/api/voice/call`

**Request Body**:
```json
{
  "user_id": 123456789,
  "voice_channel_id": 987654321
}
```

**Success Response**:
```json
{
  "success": true,
  "user_id": 123456789,
  "channel_id": 987654321,
  "invite_url": "https://discord.gg/abc123"
}
```

**Error Response**:
```json
{
  "success": false,
  "error": "Failed to start voice containers"
}
```

## File Changes

### New Files:
1. `bot/utils/container_manager.py` - Docker container management
2. `VOICE_CALL_AUTOMATION.md` - This documentation

### Modified Files:
1. `bot/globals.py` - Added `VOICE_DEBUG_MODE` flag
2. `bot/api.py` - Added `/api/voice/call` endpoint and timeout handler
3. `bot/bot.py` - Added `on_voice_state_update` event handler
4. `bot/utils/voice_manager.py`:
   - Added call tracking fields to VoiceSession
   - Added `on_user_join()` and `on_user_leave()` methods
   - Added `_auto_leave_after_user_disconnect()` method
   - Updated LLM prompt with auto-disconnect context
   - Gated debug messages behind `VOICE_DEBUG_MODE`
5. `bot/utils/voice_receiver.py` - Removed Discord VAD events (rely on RealtimeSTT only)

## Testing Checklist

### Web UI Integration:
- [ ] Create voice call trigger UI with user ID and channel ID inputs
- [ ] Display call status (starting containers, waiting for warmup, joined VC, waiting for user)
- [ ] Show timeout countdown
- [ ] Handle errors gracefully

### Flow Testing:
- [ ] Test successful call flow (containers start → warmup → join → DM → user joins → conversation → user leaves → 45s timer → auto-leave → containers stop)
- [ ] Test 30min timeout (user never joins)
- [ ] Test user rejoin within 45s (cancels auto-leave)
- [ ] Test container failure handling
- [ ] Test warmup timeout handling
- [ ] Test DM failure (should continue anyway)

### Debug Mode:
- [ ] Test with `VOICE_DEBUG_MODE=true` (should see all notifications)
- [ ] Test with `VOICE_DEBUG_MODE=false` (should be silent)

## Environment Variables

Add to `.env` or `docker-compose.yml`:
```bash
VOICE_DEBUG_MODE=false  # Set to true for debugging
```

## Next Steps

1. **Web UI**: Create voice call interface with:
   - User ID input
   - Voice channel ID dropdown (fetch from Discord)
   - "Call User" button
   - Status display
   - Active call management

2. **Monitoring**: Add voice call metrics:
   - Call duration
   - User join time
   - Auto-leave triggers
   - Container startup times

3. **Enhancements**:
   - Multiple simultaneous calls (different channels)
   - Call history logging
   - User preferences (auto-answer, DND mode)
   - Scheduled voice calls

## Technical Notes

### Container Warmup Times:
- **STT** (`miku-stt`): ~5-15 seconds (model loading)
- **TTS** (`miku-rvc-api`): ~30-60 seconds (RVC model loading, synthesis warmup)
- **Total**: ~35-75 seconds from API call to ready

### Resource Management:
- Voice sessions use `VoiceSessionManager` singleton
- Only one voice session active at a time
- Full resource locking during voice:
  - AMD GPU for text inference
  - Vision model blocked
  - Image generation disabled
  - Bipolar mode disabled
  - Autonomous engine paused

### Cleanup Guarantees:
- 45s auto-leave ensures no orphaned sessions
- 30min timeout prevents indefinite container running
- All cleanup paths stop containers
- Voice session end releases all resources

## Troubleshooting

### Containers won't start:
- Check Docker daemon status
- Check `docker compose ps` for existing containers
- Check logs: `docker logs miku-stt` / `docker logs miku-rvc-api`

### Warmup timeout:
- STT: Check WebSocket is accepting connections on port 8765
- TTS: Check health endpoint returns `{"warmed_up": true}`
- Increase timeout values if needed (slow hardware)

### User never joins:
- Verify invite URL is valid
- Check user has permission to join voice channel
- Verify DM was delivered (may be blocked)

### Auto-leave not triggering:
- Check `on_voice_state_update` events are firing
- Verify user ID matches `call_user_id`
- Check logs for timer creation/cancellation

### Containers not stopping:
- Manual stop: `docker compose stop miku-stt miku-rvc-api`
- Check for orphaned containers: `docker ps`
- Force remove: `docker rm -f miku-stt miku-rvc-api`
Implemented experimental real production ready voice chat, relegated old flow to voice debug mode. New Web UI panel for Voice Chat. 2026-01-20 23:06:17 +02:00			`# Voice Call Automation System`

			`## Overview`

			`Miku now has an automated voice call system that can be triggered from the web UI. This replaces the manual command-based voice chat flow with a seamless, immersive experience.`

			`## Features`

			`### 1. Voice Debug Mode Toggle`
			- Environment Variable: `VOICE_DEBUG_MODE` (default: `false`)
			- When `true`: Shows manual commands, text notifications, transcripts in chat
			- When `false` (field deployment): Silent operation, no command notifications

			`### 2. Automated Voice Call Flow`

			`#### Initiation (Web UI → API)`
			```
			`POST /api/voice/call`
			`{`
			`"user_id": 123456789,`
			`"voice_channel_id": 987654321`
			`}`
			```

			`#### What Happens:`
			1. Container Startup: Starts `miku-stt` and `miku-rvc-api` containers
			`2. Warmup Wait: Monitors containers until fully warmed up`
			`- STT: WebSocket connection check (30s timeout)`
			- TTS: Health endpoint check for `warmed_up: true` (60s timeout)
			`3. Join Voice Channel: Creates voice session with full resource locking`
			`4. Send DM: Generates personalized LLM invitation and sends with voice channel invite link`
			`5. Auto-Listen: Automatically starts listening when user joins`

			`#### User Join Detection:`
			- Monitors `on_voice_state_update` events
			`- When target user joins:`
			- Marks `user_has_joined = True`
			`- Cancels 30min timeout`
			`- Auto-starts STT for that user`

			`#### Auto-Leave After User Disconnect:`
			`- 45 second timer starts when user leaves voice channel`
			`- If user doesn't rejoin within 45s:`
			`- Ends voice session`
			`- Stops STT and TTS containers`
			`- Releases all resources`
			`- Returns to normal operation`
			`- If user rejoins before 45s, timer is cancelled`

			`#### 30-Minute Join Timeout:`
			`- If user never joins within 30 minutes:`
			`- Ends voice session`
			`- Stops containers`
			`- Sends timeout DM: "Aww, I guess you couldn't make it to voice chat... Maybe next time! 💙"`

			`### 3. Container Management`

			File: `bot/utils/container_manager.py`

			`#### Methods:`
			- `start_voice_containers()`: Starts STT & TTS, waits for warmup
			- `stop_voice_containers()`: Stops both containers
			- `are_containers_running()`: Check container status
			- `_wait_for_stt_warmup()`: WebSocket connection check
			- `_wait_for_tts_warmup()`: Health endpoint check

			`#### Warmup Detection:`
			```python
			`# STT Warmup: Try WebSocket connection`
			`ws://miku-stt:8765`

			`# TTS Warmup: Check health endpoint`
			`GET http://miku-rvc-api:8765/health`
			`Response: {"status": "ready", "warmed_up": true}`
			```

			`### 4. Voice Session Tracking`

			File: `bot/utils/voice_manager.py`

			`#### New VoiceSession Fields:`
			```python
			`call_user_id: Optional[int] # User ID that was called`
			`call_timeout_task: Optional[asyncio.Task] # 30min timeout`
			`user_has_joined: bool # Track if user joined`
			`auto_leave_task: Optional[asyncio.Task] # 45s auto-leave`
			`user_leave_time: Optional[float] # When user left`
			```

			`#### Methods:`
			- `on_user_join(user_id)`: Handle user joining voice channel
			- `on_user_leave(user_id)`: Start 45s auto-leave timer
			- `_auto_leave_after_user_disconnect()`: Execute auto-leave

			`### 5. LLM Context Update`

			`Miku's voice chat prompt now includes:`
			```
			`NOTE: You will automatically disconnect 45 seconds after {user.name} leaves the voice channel,`
			`so you can mention this if asked about leaving`
			```

			`### 6. Debug Mode Integration`

			#### With `VOICE_DEBUG_MODE=true`:
			`- Shows "🎤 User said: ..." in text chat`
			`- Shows "💬 Miku: ..." responses`
			`- Shows interruption messages`
			- Manual commands work (`!miku join`, `!miku listen`, etc.)

			#### With `VOICE_DEBUG_MODE=false` (field deployment):
			`- No text notifications`
			`- No command outputs`
			`- Silent operation`
			`- Only log files show activity`

			`## API Endpoint`

			### POST `/api/voice/call`

			`Request Body:`
			```json
			`{`
			`"user_id": 123456789,`
			`"voice_channel_id": 987654321`
			`}`
			```

			`Success Response:`
			```json
			`{`
			`"success": true,`
			`"user_id": 123456789,`
			`"channel_id": 987654321,`
			`"invite_url": "https://discord.gg/abc123"`
			`}`
			```

			`Error Response:`
			```json
			`{`
			`"success": false,`
			`"error": "Failed to start voice containers"`
			`}`
			```

			`## File Changes`

			`### New Files:`
			1. `bot/utils/container_manager.py` - Docker container management
			2. `VOICE_CALL_AUTOMATION.md` - This documentation

			`### Modified Files:`
			1. `bot/globals.py` - Added `VOICE_DEBUG_MODE` flag
			2. `bot/api.py` - Added `/api/voice/call` endpoint and timeout handler
			3. `bot/bot.py` - Added `on_voice_state_update` event handler
			4. `bot/utils/voice_manager.py`:
			`- Added call tracking fields to VoiceSession`
			- Added `on_user_join()` and `on_user_leave()` methods
			- Added `_auto_leave_after_user_disconnect()` method
			`- Updated LLM prompt with auto-disconnect context`
			- Gated debug messages behind `VOICE_DEBUG_MODE`
			5. `bot/utils/voice_receiver.py` - Removed Discord VAD events (rely on RealtimeSTT only)

			`## Testing Checklist`

			`### Web UI Integration:`
			`- [ ] Create voice call trigger UI with user ID and channel ID inputs`
			`- [ ] Display call status (starting containers, waiting for warmup, joined VC, waiting for user)`
			`- [ ] Show timeout countdown`
			`- [ ] Handle errors gracefully`

			`### Flow Testing:`
			`- [ ] Test successful call flow (containers start → warmup → join → DM → user joins → conversation → user leaves → 45s timer → auto-leave → containers stop)`
			`- [ ] Test 30min timeout (user never joins)`
			`- [ ] Test user rejoin within 45s (cancels auto-leave)`
			`- [ ] Test container failure handling`
			`- [ ] Test warmup timeout handling`
			`- [ ] Test DM failure (should continue anyway)`

			`### Debug Mode:`
			- [ ] Test with `VOICE_DEBUG_MODE=true` (should see all notifications)
			- [ ] Test with `VOICE_DEBUG_MODE=false` (should be silent)

			`## Environment Variables`

			Add to `.env` or `docker-compose.yml`:
			```bash
			`VOICE_DEBUG_MODE=false # Set to true for debugging`
			```

			`## Next Steps`

			`1. Web UI: Create voice call interface with:`
			`- User ID input`
			`- Voice channel ID dropdown (fetch from Discord)`
			`- "Call User" button`
			`- Status display`
			`- Active call management`

			`2. Monitoring: Add voice call metrics:`
			`- Call duration`
			`- User join time`
			`- Auto-leave triggers`
			`- Container startup times`

			`3. Enhancements:`
			`- Multiple simultaneous calls (different channels)`
			`- Call history logging`
			`- User preferences (auto-answer, DND mode)`
			`- Scheduled voice calls`

			`## Technical Notes`

			`### Container Warmup Times:`
			- STT (`miku-stt`): ~5-15 seconds (model loading)
			- TTS (`miku-rvc-api`): ~30-60 seconds (RVC model loading, synthesis warmup)
			`- Total: ~35-75 seconds from API call to ready`

			`### Resource Management:`
			- Voice sessions use `VoiceSessionManager` singleton
			`- Only one voice session active at a time`
			`- Full resource locking during voice:`
			`- AMD GPU for text inference`
			`- Vision model blocked`
			`- Image generation disabled`
			`- Bipolar mode disabled`
			`- Autonomous engine paused`

			`### Cleanup Guarantees:`
			`- 45s auto-leave ensures no orphaned sessions`
			`- 30min timeout prevents indefinite container running`
			`- All cleanup paths stop containers`
			`- Voice session end releases all resources`

			`## Troubleshooting`

			`### Containers won't start:`
			`- Check Docker daemon status`
			- Check `docker compose ps` for existing containers
			- Check logs: `docker logs miku-stt` / `docker logs miku-rvc-api`

			`### Warmup timeout:`
			`- STT: Check WebSocket is accepting connections on port 8765`
			- TTS: Check health endpoint returns `{"warmed_up": true}`
			`- Increase timeout values if needed (slow hardware)`

			`### User never joins:`
			`- Verify invite URL is valid`
			`- Check user has permission to join voice channel`
			`- Verify DM was delivered (may be blocked)`

			`### Auto-leave not triggering:`
			- Check `on_voice_state_update` events are firing
			- Verify user ID matches `call_user_id`
			`- Check logs for timer creation/cancellation`

			`### Containers not stopping:`
			- Manual stop: `docker compose stop miku-stt miku-rvc-api`
			- Check for orphaned containers: `docker ps`
			- Force remove: `docker rm -f miku-stt miku-rvc-api`