Files

koko210Serve 2934efba22 Implemented experimental real production ready voice chat, relegated old flow to voice debug mode. New Web UI panel for Voice Chat.

2026-01-20 23:06:17 +02:00

7.7 KiB

Raw Blame History

Voice Call Automation System

Overview

Miku now has an automated voice call system that can be triggered from the web UI. This replaces the manual command-based voice chat flow with a seamless, immersive experience.

Features

1. Voice Debug Mode Toggle

Environment Variable: VOICE_DEBUG_MODE (default: false)
When true: Shows manual commands, text notifications, transcripts in chat
When false (field deployment): Silent operation, no command notifications

2. Automated Voice Call Flow

Initiation (Web UI → API)

POST /api/voice/call
{
  "user_id": 123456789,
  "voice_channel_id": 987654321
}

What Happens:

Container Startup: Starts miku-stt and miku-rvc-api containers
Warmup Wait: Monitors containers until fully warmed up
- STT: WebSocket connection check (30s timeout)
- TTS: Health endpoint check for warmed_up: true (60s timeout)
Join Voice Channel: Creates voice session with full resource locking
Send DM: Generates personalized LLM invitation and sends with voice channel invite link
Auto-Listen: Automatically starts listening when user joins

User Join Detection:

Monitors on_voice_state_update events
When target user joins:
- Marks user_has_joined = True
- Cancels 30min timeout
- Auto-starts STT for that user

Auto-Leave After User Disconnect:

45 second timer starts when user leaves voice channel
If user doesn't rejoin within 45s:
- Ends voice session
- Stops STT and TTS containers
- Releases all resources
- Returns to normal operation
If user rejoins before 45s, timer is cancelled

30-Minute Join Timeout:

If user never joins within 30 minutes:
- Ends voice session
- Stops containers
- Sends timeout DM: "Aww, I guess you couldn't make it to voice chat... Maybe next time! 💙"

3. Container Management

File: bot/utils/container_manager.py

Methods:

start_voice_containers(): Starts STT & TTS, waits for warmup
stop_voice_containers(): Stops both containers
are_containers_running(): Check container status
_wait_for_stt_warmup(): WebSocket connection check
_wait_for_tts_warmup(): Health endpoint check

Warmup Detection:

# STT Warmup: Try WebSocket connection
ws://miku-stt:8765

# TTS Warmup: Check health endpoint
GET http://miku-rvc-api:8765/health
Response: {"status": "ready", "warmed_up": true}

4. Voice Session Tracking

File: bot/utils/voice_manager.py

New VoiceSession Fields:

call_user_id: Optional[int]  # User ID that was called
call_timeout_task: Optional[asyncio.Task]  # 30min timeout
user_has_joined: bool  # Track if user joined
auto_leave_task: Optional[asyncio.Task]  # 45s auto-leave
user_leave_time: Optional[float]  # When user left

Methods:

on_user_join(user_id): Handle user joining voice channel
on_user_leave(user_id): Start 45s auto-leave timer
_auto_leave_after_user_disconnect(): Execute auto-leave

5. LLM Context Update

Miku's voice chat prompt now includes:

NOTE: You will automatically disconnect 45 seconds after {user.name} leaves the voice channel,
so you can mention this if asked about leaving

6. Debug Mode Integration

With `VOICE_DEBUG_MODE=true`:

Shows "🎤 User said: ..." in text chat
Shows "💬 Miku: ..." responses
Shows interruption messages
Manual commands work (!miku join, !miku listen, etc.)

With `VOICE_DEBUG_MODE=false` (field deployment):

No text notifications
No command outputs
Silent operation
Only log files show activity

API Endpoint

POST `/api/voice/call`

Request Body:

{
  "user_id": 123456789,
  "voice_channel_id": 987654321
}

Success Response:

{
  "success": true,
  "user_id": 123456789,
  "channel_id": 987654321,
  "invite_url": "https://discord.gg/abc123"
}

Error Response:

{
  "success": false,
  "error": "Failed to start voice containers"
}

File Changes

New Files:

bot/utils/container_manager.py - Docker container management
VOICE_CALL_AUTOMATION.md - This documentation

Modified Files:

bot/globals.py - Added VOICE_DEBUG_MODE flag
bot/api.py - Added /api/voice/call endpoint and timeout handler
bot/bot.py - Added on_voice_state_update event handler
bot/utils/voice_manager.py:
- Added call tracking fields to VoiceSession
- Added on_user_join() and on_user_leave() methods
- Added _auto_leave_after_user_disconnect() method
- Updated LLM prompt with auto-disconnect context
- Gated debug messages behind VOICE_DEBUG_MODE
bot/utils/voice_receiver.py - Removed Discord VAD events (rely on RealtimeSTT only)

Testing Checklist

Web UI Integration:

Create voice call trigger UI with user ID and channel ID inputs
Display call status (starting containers, waiting for warmup, joined VC, waiting for user)
Show timeout countdown
Handle errors gracefully

Flow Testing:

Test successful call flow (containers start → warmup → join → DM → user joins → conversation → user leaves → 45s timer → auto-leave → containers stop)
Test 30min timeout (user never joins)
Test user rejoin within 45s (cancels auto-leave)
Test container failure handling
Test warmup timeout handling
Test DM failure (should continue anyway)

Debug Mode:

Test with VOICE_DEBUG_MODE=true (should see all notifications)
Test with VOICE_DEBUG_MODE=false (should be silent)

Environment Variables

Add to .env or docker-compose.yml:

VOICE_DEBUG_MODE=false  # Set to true for debugging

Next Steps

Web UI: Create voice call interface with:
- User ID input
- Voice channel ID dropdown (fetch from Discord)
- "Call User" button
- Status display
- Active call management
Monitoring: Add voice call metrics:
- Call duration
- User join time
- Auto-leave triggers
- Container startup times
Enhancements:
- Multiple simultaneous calls (different channels)
- Call history logging
- User preferences (auto-answer, DND mode)
- Scheduled voice calls

Technical Notes

Container Warmup Times:

STT (miku-stt): ~5-15 seconds (model loading)
TTS (miku-rvc-api): ~30-60 seconds (RVC model loading, synthesis warmup)
Total: ~35-75 seconds from API call to ready

Resource Management:

Voice sessions use VoiceSessionManager singleton
Only one voice session active at a time
Full resource locking during voice:
- AMD GPU for text inference
- Vision model blocked
- Image generation disabled
- Bipolar mode disabled
- Autonomous engine paused

Cleanup Guarantees:

45s auto-leave ensures no orphaned sessions
30min timeout prevents indefinite container running
All cleanup paths stop containers
Voice session end releases all resources

Troubleshooting

Containers won't start:

Check Docker daemon status
Check docker compose ps for existing containers
Check logs: docker logs miku-stt / docker logs miku-rvc-api

Warmup timeout:

STT: Check WebSocket is accepting connections on port 8765
TTS: Check health endpoint returns {"warmed_up": true}
Increase timeout values if needed (slow hardware)

User never joins:

Verify invite URL is valid
Check user has permission to join voice channel
Verify DM was delivered (may be blocked)

Auto-leave not triggering:

Check on_voice_state_update events are firing
Verify user ID matches call_user_id
Check logs for timer creation/cancellation

Containers not stopping:

Manual stop: docker compose stop miku-stt miku-rvc-api
Check for orphaned containers: docker ps
Force remove: docker rm -f miku-stt miku-rvc-api

7.7 KiB Raw Blame History

Voice Call Automation System

Overview

Features

1. Voice Debug Mode Toggle

2. Automated Voice Call Flow

Initiation (Web UI → API)

What Happens:

User Join Detection:

Auto-Leave After User Disconnect:

30-Minute Join Timeout:

3. Container Management

Methods:

Warmup Detection:

4. Voice Session Tracking

New VoiceSession Fields:

Methods:

5. LLM Context Update

6. Debug Mode Integration

With VOICE_DEBUG_MODE=true:

With VOICE_DEBUG_MODE=false (field deployment):

API Endpoint

POST /api/voice/call

File Changes

New Files:

Modified Files:

Testing Checklist

Web UI Integration:

Flow Testing:

Debug Mode:

Environment Variables

Next Steps

Technical Notes

Container Warmup Times:

Resource Management:

Cleanup Guarantees:

Troubleshooting

Containers won't start:

Warmup timeout:

User never joins:

Auto-leave not triggering:

Containers not stopping:

7.7 KiB

Raw Blame History

With `VOICE_DEBUG_MODE=true`:

With `VOICE_DEBUG_MODE=false` (field deployment):

POST `/api/voice/call`