7.7 KiB
7.7 KiB
Voice Call Automation System
Overview
Miku now has an automated voice call system that can be triggered from the web UI. This replaces the manual command-based voice chat flow with a seamless, immersive experience.
Features
1. Voice Debug Mode Toggle
- Environment Variable:
VOICE_DEBUG_MODE(default:false) - When
true: Shows manual commands, text notifications, transcripts in chat - When
false(field deployment): Silent operation, no command notifications
2. Automated Voice Call Flow
Initiation (Web UI → API)
POST /api/voice/call
{
"user_id": 123456789,
"voice_channel_id": 987654321
}
What Happens:
- Container Startup: Starts
miku-sttandmiku-rvc-apicontainers - Warmup Wait: Monitors containers until fully warmed up
- STT: WebSocket connection check (30s timeout)
- TTS: Health endpoint check for
warmed_up: true(60s timeout)
- Join Voice Channel: Creates voice session with full resource locking
- Send DM: Generates personalized LLM invitation and sends with voice channel invite link
- Auto-Listen: Automatically starts listening when user joins
User Join Detection:
- Monitors
on_voice_state_updateevents - When target user joins:
- Marks
user_has_joined = True - Cancels 30min timeout
- Auto-starts STT for that user
- Marks
Auto-Leave After User Disconnect:
- 45 second timer starts when user leaves voice channel
- If user doesn't rejoin within 45s:
- Ends voice session
- Stops STT and TTS containers
- Releases all resources
- Returns to normal operation
- If user rejoins before 45s, timer is cancelled
30-Minute Join Timeout:
- If user never joins within 30 minutes:
- Ends voice session
- Stops containers
- Sends timeout DM: "Aww, I guess you couldn't make it to voice chat... Maybe next time! 💙"
3. Container Management
File: bot/utils/container_manager.py
Methods:
start_voice_containers(): Starts STT & TTS, waits for warmupstop_voice_containers(): Stops both containersare_containers_running(): Check container status_wait_for_stt_warmup(): WebSocket connection check_wait_for_tts_warmup(): Health endpoint check
Warmup Detection:
# STT Warmup: Try WebSocket connection
ws://miku-stt:8765
# TTS Warmup: Check health endpoint
GET http://miku-rvc-api:8765/health
Response: {"status": "ready", "warmed_up": true}
4. Voice Session Tracking
File: bot/utils/voice_manager.py
New VoiceSession Fields:
call_user_id: Optional[int] # User ID that was called
call_timeout_task: Optional[asyncio.Task] # 30min timeout
user_has_joined: bool # Track if user joined
auto_leave_task: Optional[asyncio.Task] # 45s auto-leave
user_leave_time: Optional[float] # When user left
Methods:
on_user_join(user_id): Handle user joining voice channelon_user_leave(user_id): Start 45s auto-leave timer_auto_leave_after_user_disconnect(): Execute auto-leave
5. LLM Context Update
Miku's voice chat prompt now includes:
NOTE: You will automatically disconnect 45 seconds after {user.name} leaves the voice channel,
so you can mention this if asked about leaving
6. Debug Mode Integration
With VOICE_DEBUG_MODE=true:
- Shows "🎤 User said: ..." in text chat
- Shows "💬 Miku: ..." responses
- Shows interruption messages
- Manual commands work (
!miku join,!miku listen, etc.)
With VOICE_DEBUG_MODE=false (field deployment):
- No text notifications
- No command outputs
- Silent operation
- Only log files show activity
API Endpoint
POST /api/voice/call
Request Body:
{
"user_id": 123456789,
"voice_channel_id": 987654321
}
Success Response:
{
"success": true,
"user_id": 123456789,
"channel_id": 987654321,
"invite_url": "https://discord.gg/abc123"
}
Error Response:
{
"success": false,
"error": "Failed to start voice containers"
}
File Changes
New Files:
bot/utils/container_manager.py- Docker container managementVOICE_CALL_AUTOMATION.md- This documentation
Modified Files:
bot/globals.py- AddedVOICE_DEBUG_MODEflagbot/api.py- Added/api/voice/callendpoint and timeout handlerbot/bot.py- Addedon_voice_state_updateevent handlerbot/utils/voice_manager.py:- Added call tracking fields to VoiceSession
- Added
on_user_join()andon_user_leave()methods - Added
_auto_leave_after_user_disconnect()method - Updated LLM prompt with auto-disconnect context
- Gated debug messages behind
VOICE_DEBUG_MODE
bot/utils/voice_receiver.py- Removed Discord VAD events (rely on RealtimeSTT only)
Testing Checklist
Web UI Integration:
- Create voice call trigger UI with user ID and channel ID inputs
- Display call status (starting containers, waiting for warmup, joined VC, waiting for user)
- Show timeout countdown
- Handle errors gracefully
Flow Testing:
- Test successful call flow (containers start → warmup → join → DM → user joins → conversation → user leaves → 45s timer → auto-leave → containers stop)
- Test 30min timeout (user never joins)
- Test user rejoin within 45s (cancels auto-leave)
- Test container failure handling
- Test warmup timeout handling
- Test DM failure (should continue anyway)
Debug Mode:
- Test with
VOICE_DEBUG_MODE=true(should see all notifications) - Test with
VOICE_DEBUG_MODE=false(should be silent)
Environment Variables
Add to .env or docker-compose.yml:
VOICE_DEBUG_MODE=false # Set to true for debugging
Next Steps
-
Web UI: Create voice call interface with:
- User ID input
- Voice channel ID dropdown (fetch from Discord)
- "Call User" button
- Status display
- Active call management
-
Monitoring: Add voice call metrics:
- Call duration
- User join time
- Auto-leave triggers
- Container startup times
-
Enhancements:
- Multiple simultaneous calls (different channels)
- Call history logging
- User preferences (auto-answer, DND mode)
- Scheduled voice calls
Technical Notes
Container Warmup Times:
- STT (
miku-stt): ~5-15 seconds (model loading) - TTS (
miku-rvc-api): ~30-60 seconds (RVC model loading, synthesis warmup) - Total: ~35-75 seconds from API call to ready
Resource Management:
- Voice sessions use
VoiceSessionManagersingleton - Only one voice session active at a time
- Full resource locking during voice:
- AMD GPU for text inference
- Vision model blocked
- Image generation disabled
- Bipolar mode disabled
- Autonomous engine paused
Cleanup Guarantees:
- 45s auto-leave ensures no orphaned sessions
- 30min timeout prevents indefinite container running
- All cleanup paths stop containers
- Voice session end releases all resources
Troubleshooting
Containers won't start:
- Check Docker daemon status
- Check
docker compose psfor existing containers - Check logs:
docker logs miku-stt/docker logs miku-rvc-api
Warmup timeout:
- STT: Check WebSocket is accepting connections on port 8765
- TTS: Check health endpoint returns
{"warmed_up": true} - Increase timeout values if needed (slow hardware)
User never joins:
- Verify invite URL is valid
- Check user has permission to join voice channel
- Verify DM was delivered (may be blocked)
Auto-leave not triggering:
- Check
on_voice_state_updateevents are firing - Verify user ID matches
call_user_id - Check logs for timer creation/cancellation
Containers not stopping:
- Manual stop:
docker compose stop miku-stt miku-rvc-api - Check for orphaned containers:
docker ps - Force remove:
docker rm -f miku-stt miku-rvc-api