262 lines
7.7 KiB
Markdown
262 lines
7.7 KiB
Markdown
|
|
# Voice Call Automation System
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
Miku now has an automated voice call system that can be triggered from the web UI. This replaces the manual command-based voice chat flow with a seamless, immersive experience.
|
||
|
|
|
||
|
|
## Features
|
||
|
|
|
||
|
|
### 1. Voice Debug Mode Toggle
|
||
|
|
- **Environment Variable**: `VOICE_DEBUG_MODE` (default: `false`)
|
||
|
|
- When `true`: Shows manual commands, text notifications, transcripts in chat
|
||
|
|
- When `false` (field deployment): Silent operation, no command notifications
|
||
|
|
|
||
|
|
### 2. Automated Voice Call Flow
|
||
|
|
|
||
|
|
#### Initiation (Web UI → API)
|
||
|
|
```
|
||
|
|
POST /api/voice/call
|
||
|
|
{
|
||
|
|
"user_id": 123456789,
|
||
|
|
"voice_channel_id": 987654321
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### What Happens:
|
||
|
|
1. **Container Startup**: Starts `miku-stt` and `miku-rvc-api` containers
|
||
|
|
2. **Warmup Wait**: Monitors containers until fully warmed up
|
||
|
|
- STT: WebSocket connection check (30s timeout)
|
||
|
|
- TTS: Health endpoint check for `warmed_up: true` (60s timeout)
|
||
|
|
3. **Join Voice Channel**: Creates voice session with full resource locking
|
||
|
|
4. **Send DM**: Generates personalized LLM invitation and sends with voice channel invite link
|
||
|
|
5. **Auto-Listen**: Automatically starts listening when user joins
|
||
|
|
|
||
|
|
#### User Join Detection:
|
||
|
|
- Monitors `on_voice_state_update` events
|
||
|
|
- When target user joins:
|
||
|
|
- Marks `user_has_joined = True`
|
||
|
|
- Cancels 30min timeout
|
||
|
|
- Auto-starts STT for that user
|
||
|
|
|
||
|
|
#### Auto-Leave After User Disconnect:
|
||
|
|
- **45 second timer** starts when user leaves voice channel
|
||
|
|
- If user doesn't rejoin within 45s:
|
||
|
|
- Ends voice session
|
||
|
|
- Stops STT and TTS containers
|
||
|
|
- Releases all resources
|
||
|
|
- Returns to normal operation
|
||
|
|
- If user rejoins before 45s, timer is cancelled
|
||
|
|
|
||
|
|
#### 30-Minute Join Timeout:
|
||
|
|
- If user never joins within 30 minutes:
|
||
|
|
- Ends voice session
|
||
|
|
- Stops containers
|
||
|
|
- Sends timeout DM: "Aww, I guess you couldn't make it to voice chat... Maybe next time! 💙"
|
||
|
|
|
||
|
|
### 3. Container Management
|
||
|
|
|
||
|
|
**File**: `bot/utils/container_manager.py`
|
||
|
|
|
||
|
|
#### Methods:
|
||
|
|
- `start_voice_containers()`: Starts STT & TTS, waits for warmup
|
||
|
|
- `stop_voice_containers()`: Stops both containers
|
||
|
|
- `are_containers_running()`: Check container status
|
||
|
|
- `_wait_for_stt_warmup()`: WebSocket connection check
|
||
|
|
- `_wait_for_tts_warmup()`: Health endpoint check
|
||
|
|
|
||
|
|
#### Warmup Detection:
|
||
|
|
```python
|
||
|
|
# STT Warmup: Try WebSocket connection
|
||
|
|
ws://miku-stt:8765
|
||
|
|
|
||
|
|
# TTS Warmup: Check health endpoint
|
||
|
|
GET http://miku-rvc-api:8765/health
|
||
|
|
Response: {"status": "ready", "warmed_up": true}
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4. Voice Session Tracking
|
||
|
|
|
||
|
|
**File**: `bot/utils/voice_manager.py`
|
||
|
|
|
||
|
|
#### New VoiceSession Fields:
|
||
|
|
```python
|
||
|
|
call_user_id: Optional[int] # User ID that was called
|
||
|
|
call_timeout_task: Optional[asyncio.Task] # 30min timeout
|
||
|
|
user_has_joined: bool # Track if user joined
|
||
|
|
auto_leave_task: Optional[asyncio.Task] # 45s auto-leave
|
||
|
|
user_leave_time: Optional[float] # When user left
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Methods:
|
||
|
|
- `on_user_join(user_id)`: Handle user joining voice channel
|
||
|
|
- `on_user_leave(user_id)`: Start 45s auto-leave timer
|
||
|
|
- `_auto_leave_after_user_disconnect()`: Execute auto-leave
|
||
|
|
|
||
|
|
### 5. LLM Context Update
|
||
|
|
|
||
|
|
Miku's voice chat prompt now includes:
|
||
|
|
```
|
||
|
|
NOTE: You will automatically disconnect 45 seconds after {user.name} leaves the voice channel,
|
||
|
|
so you can mention this if asked about leaving
|
||
|
|
```
|
||
|
|
|
||
|
|
### 6. Debug Mode Integration
|
||
|
|
|
||
|
|
#### With `VOICE_DEBUG_MODE=true`:
|
||
|
|
- Shows "🎤 User said: ..." in text chat
|
||
|
|
- Shows "💬 Miku: ..." responses
|
||
|
|
- Shows interruption messages
|
||
|
|
- Manual commands work (`!miku join`, `!miku listen`, etc.)
|
||
|
|
|
||
|
|
#### With `VOICE_DEBUG_MODE=false` (field deployment):
|
||
|
|
- No text notifications
|
||
|
|
- No command outputs
|
||
|
|
- Silent operation
|
||
|
|
- Only log files show activity
|
||
|
|
|
||
|
|
## API Endpoint
|
||
|
|
|
||
|
|
### POST `/api/voice/call`
|
||
|
|
|
||
|
|
**Request Body**:
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"user_id": 123456789,
|
||
|
|
"voice_channel_id": 987654321
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Success Response**:
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"success": true,
|
||
|
|
"user_id": 123456789,
|
||
|
|
"channel_id": 987654321,
|
||
|
|
"invite_url": "https://discord.gg/abc123"
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Error Response**:
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"success": false,
|
||
|
|
"error": "Failed to start voice containers"
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## File Changes
|
||
|
|
|
||
|
|
### New Files:
|
||
|
|
1. `bot/utils/container_manager.py` - Docker container management
|
||
|
|
2. `VOICE_CALL_AUTOMATION.md` - This documentation
|
||
|
|
|
||
|
|
### Modified Files:
|
||
|
|
1. `bot/globals.py` - Added `VOICE_DEBUG_MODE` flag
|
||
|
|
2. `bot/api.py` - Added `/api/voice/call` endpoint and timeout handler
|
||
|
|
3. `bot/bot.py` - Added `on_voice_state_update` event handler
|
||
|
|
4. `bot/utils/voice_manager.py`:
|
||
|
|
- Added call tracking fields to VoiceSession
|
||
|
|
- Added `on_user_join()` and `on_user_leave()` methods
|
||
|
|
- Added `_auto_leave_after_user_disconnect()` method
|
||
|
|
- Updated LLM prompt with auto-disconnect context
|
||
|
|
- Gated debug messages behind `VOICE_DEBUG_MODE`
|
||
|
|
5. `bot/utils/voice_receiver.py` - Removed Discord VAD events (rely on RealtimeSTT only)
|
||
|
|
|
||
|
|
## Testing Checklist
|
||
|
|
|
||
|
|
### Web UI Integration:
|
||
|
|
- [ ] Create voice call trigger UI with user ID and channel ID inputs
|
||
|
|
- [ ] Display call status (starting containers, waiting for warmup, joined VC, waiting for user)
|
||
|
|
- [ ] Show timeout countdown
|
||
|
|
- [ ] Handle errors gracefully
|
||
|
|
|
||
|
|
### Flow Testing:
|
||
|
|
- [ ] Test successful call flow (containers start → warmup → join → DM → user joins → conversation → user leaves → 45s timer → auto-leave → containers stop)
|
||
|
|
- [ ] Test 30min timeout (user never joins)
|
||
|
|
- [ ] Test user rejoin within 45s (cancels auto-leave)
|
||
|
|
- [ ] Test container failure handling
|
||
|
|
- [ ] Test warmup timeout handling
|
||
|
|
- [ ] Test DM failure (should continue anyway)
|
||
|
|
|
||
|
|
### Debug Mode:
|
||
|
|
- [ ] Test with `VOICE_DEBUG_MODE=true` (should see all notifications)
|
||
|
|
- [ ] Test with `VOICE_DEBUG_MODE=false` (should be silent)
|
||
|
|
|
||
|
|
## Environment Variables
|
||
|
|
|
||
|
|
Add to `.env` or `docker-compose.yml`:
|
||
|
|
```bash
|
||
|
|
VOICE_DEBUG_MODE=false # Set to true for debugging
|
||
|
|
```
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
1. **Web UI**: Create voice call interface with:
|
||
|
|
- User ID input
|
||
|
|
- Voice channel ID dropdown (fetch from Discord)
|
||
|
|
- "Call User" button
|
||
|
|
- Status display
|
||
|
|
- Active call management
|
||
|
|
|
||
|
|
2. **Monitoring**: Add voice call metrics:
|
||
|
|
- Call duration
|
||
|
|
- User join time
|
||
|
|
- Auto-leave triggers
|
||
|
|
- Container startup times
|
||
|
|
|
||
|
|
3. **Enhancements**:
|
||
|
|
- Multiple simultaneous calls (different channels)
|
||
|
|
- Call history logging
|
||
|
|
- User preferences (auto-answer, DND mode)
|
||
|
|
- Scheduled voice calls
|
||
|
|
|
||
|
|
## Technical Notes
|
||
|
|
|
||
|
|
### Container Warmup Times:
|
||
|
|
- **STT** (`miku-stt`): ~5-15 seconds (model loading)
|
||
|
|
- **TTS** (`miku-rvc-api`): ~30-60 seconds (RVC model loading, synthesis warmup)
|
||
|
|
- **Total**: ~35-75 seconds from API call to ready
|
||
|
|
|
||
|
|
### Resource Management:
|
||
|
|
- Voice sessions use `VoiceSessionManager` singleton
|
||
|
|
- Only one voice session active at a time
|
||
|
|
- Full resource locking during voice:
|
||
|
|
- AMD GPU for text inference
|
||
|
|
- Vision model blocked
|
||
|
|
- Image generation disabled
|
||
|
|
- Bipolar mode disabled
|
||
|
|
- Autonomous engine paused
|
||
|
|
|
||
|
|
### Cleanup Guarantees:
|
||
|
|
- 45s auto-leave ensures no orphaned sessions
|
||
|
|
- 30min timeout prevents indefinite container running
|
||
|
|
- All cleanup paths stop containers
|
||
|
|
- Voice session end releases all resources
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Containers won't start:
|
||
|
|
- Check Docker daemon status
|
||
|
|
- Check `docker compose ps` for existing containers
|
||
|
|
- Check logs: `docker logs miku-stt` / `docker logs miku-rvc-api`
|
||
|
|
|
||
|
|
### Warmup timeout:
|
||
|
|
- STT: Check WebSocket is accepting connections on port 8765
|
||
|
|
- TTS: Check health endpoint returns `{"warmed_up": true}`
|
||
|
|
- Increase timeout values if needed (slow hardware)
|
||
|
|
|
||
|
|
### User never joins:
|
||
|
|
- Verify invite URL is valid
|
||
|
|
- Check user has permission to join voice channel
|
||
|
|
- Verify DM was delivered (may be blocked)
|
||
|
|
|
||
|
|
### Auto-leave not triggering:
|
||
|
|
- Check `on_voice_state_update` events are firing
|
||
|
|
- Verify user ID matches `call_user_id`
|
||
|
|
- Check logs for timer creation/cancellation
|
||
|
|
|
||
|
|
### Containers not stopping:
|
||
|
|
- Manual stop: `docker compose stop miku-stt miku-rvc-api`
|
||
|
|
- Check for orphaned containers: `docker ps`
|
||
|
|
- Force remove: `docker rm -f miku-stt miku-rvc-api`
|