# Voice Call Automation System ## Overview Miku now has an automated voice call system that can be triggered from the web UI. This replaces the manual command-based voice chat flow with a seamless, immersive experience. ## Features ### 1. Voice Debug Mode Toggle - **Environment Variable**: `VOICE_DEBUG_MODE` (default: `false`) - When `true`: Shows manual commands, text notifications, transcripts in chat - When `false` (field deployment): Silent operation, no command notifications ### 2. Automated Voice Call Flow #### Initiation (Web UI → API) ``` POST /api/voice/call { "user_id": 123456789, "voice_channel_id": 987654321 } ``` #### What Happens: 1. **Container Startup**: Starts `miku-stt` and `miku-rvc-api` containers 2. **Warmup Wait**: Monitors containers until fully warmed up - STT: WebSocket connection check (30s timeout) - TTS: Health endpoint check for `warmed_up: true` (60s timeout) 3. **Join Voice Channel**: Creates voice session with full resource locking 4. **Send DM**: Generates personalized LLM invitation and sends with voice channel invite link 5. **Auto-Listen**: Automatically starts listening when user joins #### User Join Detection: - Monitors `on_voice_state_update` events - When target user joins: - Marks `user_has_joined = True` - Cancels 30min timeout - Auto-starts STT for that user #### Auto-Leave After User Disconnect: - **45 second timer** starts when user leaves voice channel - If user doesn't rejoin within 45s: - Ends voice session - Stops STT and TTS containers - Releases all resources - Returns to normal operation - If user rejoins before 45s, timer is cancelled #### 30-Minute Join Timeout: - If user never joins within 30 minutes: - Ends voice session - Stops containers - Sends timeout DM: "Aww, I guess you couldn't make it to voice chat... Maybe next time! 💙" ### 3. Container Management **File**: `bot/utils/container_manager.py` #### Methods: - `start_voice_containers()`: Starts STT & TTS, waits for warmup - `stop_voice_containers()`: Stops both containers - `are_containers_running()`: Check container status - `_wait_for_stt_warmup()`: WebSocket connection check - `_wait_for_tts_warmup()`: Health endpoint check #### Warmup Detection: ```python # STT Warmup: Try WebSocket connection ws://miku-stt:8765 # TTS Warmup: Check health endpoint GET http://miku-rvc-api:8765/health Response: {"status": "ready", "warmed_up": true} ``` ### 4. Voice Session Tracking **File**: `bot/utils/voice_manager.py` #### New VoiceSession Fields: ```python call_user_id: Optional[int] # User ID that was called call_timeout_task: Optional[asyncio.Task] # 30min timeout user_has_joined: bool # Track if user joined auto_leave_task: Optional[asyncio.Task] # 45s auto-leave user_leave_time: Optional[float] # When user left ``` #### Methods: - `on_user_join(user_id)`: Handle user joining voice channel - `on_user_leave(user_id)`: Start 45s auto-leave timer - `_auto_leave_after_user_disconnect()`: Execute auto-leave ### 5. LLM Context Update Miku's voice chat prompt now includes: ``` NOTE: You will automatically disconnect 45 seconds after {user.name} leaves the voice channel, so you can mention this if asked about leaving ``` ### 6. Debug Mode Integration #### With `VOICE_DEBUG_MODE=true`: - Shows "🎤 User said: ..." in text chat - Shows "💬 Miku: ..." responses - Shows interruption messages - Manual commands work (`!miku join`, `!miku listen`, etc.) #### With `VOICE_DEBUG_MODE=false` (field deployment): - No text notifications - No command outputs - Silent operation - Only log files show activity ## API Endpoint ### POST `/api/voice/call` **Request Body**: ```json { "user_id": 123456789, "voice_channel_id": 987654321 } ``` **Success Response**: ```json { "success": true, "user_id": 123456789, "channel_id": 987654321, "invite_url": "https://discord.gg/abc123" } ``` **Error Response**: ```json { "success": false, "error": "Failed to start voice containers" } ``` ## File Changes ### New Files: 1. `bot/utils/container_manager.py` - Docker container management 2. `VOICE_CALL_AUTOMATION.md` - This documentation ### Modified Files: 1. `bot/globals.py` - Added `VOICE_DEBUG_MODE` flag 2. `bot/api.py` - Added `/api/voice/call` endpoint and timeout handler 3. `bot/bot.py` - Added `on_voice_state_update` event handler 4. `bot/utils/voice_manager.py`: - Added call tracking fields to VoiceSession - Added `on_user_join()` and `on_user_leave()` methods - Added `_auto_leave_after_user_disconnect()` method - Updated LLM prompt with auto-disconnect context - Gated debug messages behind `VOICE_DEBUG_MODE` 5. `bot/utils/voice_receiver.py` - Removed Discord VAD events (rely on RealtimeSTT only) ## Testing Checklist ### Web UI Integration: - [ ] Create voice call trigger UI with user ID and channel ID inputs - [ ] Display call status (starting containers, waiting for warmup, joined VC, waiting for user) - [ ] Show timeout countdown - [ ] Handle errors gracefully ### Flow Testing: - [ ] Test successful call flow (containers start → warmup → join → DM → user joins → conversation → user leaves → 45s timer → auto-leave → containers stop) - [ ] Test 30min timeout (user never joins) - [ ] Test user rejoin within 45s (cancels auto-leave) - [ ] Test container failure handling - [ ] Test warmup timeout handling - [ ] Test DM failure (should continue anyway) ### Debug Mode: - [ ] Test with `VOICE_DEBUG_MODE=true` (should see all notifications) - [ ] Test with `VOICE_DEBUG_MODE=false` (should be silent) ## Environment Variables Add to `.env` or `docker-compose.yml`: ```bash VOICE_DEBUG_MODE=false # Set to true for debugging ``` ## Next Steps 1. **Web UI**: Create voice call interface with: - User ID input - Voice channel ID dropdown (fetch from Discord) - "Call User" button - Status display - Active call management 2. **Monitoring**: Add voice call metrics: - Call duration - User join time - Auto-leave triggers - Container startup times 3. **Enhancements**: - Multiple simultaneous calls (different channels) - Call history logging - User preferences (auto-answer, DND mode) - Scheduled voice calls ## Technical Notes ### Container Warmup Times: - **STT** (`miku-stt`): ~5-15 seconds (model loading) - **TTS** (`miku-rvc-api`): ~30-60 seconds (RVC model loading, synthesis warmup) - **Total**: ~35-75 seconds from API call to ready ### Resource Management: - Voice sessions use `VoiceSessionManager` singleton - Only one voice session active at a time - Full resource locking during voice: - AMD GPU for text inference - Vision model blocked - Image generation disabled - Bipolar mode disabled - Autonomous engine paused ### Cleanup Guarantees: - 45s auto-leave ensures no orphaned sessions - 30min timeout prevents indefinite container running - All cleanup paths stop containers - Voice session end releases all resources ## Troubleshooting ### Containers won't start: - Check Docker daemon status - Check `docker compose ps` for existing containers - Check logs: `docker logs miku-stt` / `docker logs miku-rvc-api` ### Warmup timeout: - STT: Check WebSocket is accepting connections on port 8765 - TTS: Check health endpoint returns `{"warmed_up": true}` - Increase timeout values if needed (slow hardware) ### User never joins: - Verify invite URL is valid - Check user has permission to join voice channel - Verify DM was delivered (may be blocked) ### Auto-leave not triggering: - Check `on_voice_state_update` events are firing - Verify user ID matches `call_user_id` - Check logs for timer creation/cancellation ### Containers not stopping: - Manual stop: `docker compose stop miku-stt miku-rvc-api` - Check for orphaned containers: `docker ps` - Force remove: `docker rm -f miku-stt miku-rvc-api`