Files
miku-discord/VOICE_CALL_AUTOMATION.md

7.7 KiB

Voice Call Automation System

Overview

Miku now has an automated voice call system that can be triggered from the web UI. This replaces the manual command-based voice chat flow with a seamless, immersive experience.

Features

1. Voice Debug Mode Toggle

  • Environment Variable: VOICE_DEBUG_MODE (default: false)
  • When true: Shows manual commands, text notifications, transcripts in chat
  • When false (field deployment): Silent operation, no command notifications

2. Automated Voice Call Flow

Initiation (Web UI → API)

POST /api/voice/call
{
  "user_id": 123456789,
  "voice_channel_id": 987654321
}

What Happens:

  1. Container Startup: Starts miku-stt and miku-rvc-api containers
  2. Warmup Wait: Monitors containers until fully warmed up
    • STT: WebSocket connection check (30s timeout)
    • TTS: Health endpoint check for warmed_up: true (60s timeout)
  3. Join Voice Channel: Creates voice session with full resource locking
  4. Send DM: Generates personalized LLM invitation and sends with voice channel invite link
  5. Auto-Listen: Automatically starts listening when user joins

User Join Detection:

  • Monitors on_voice_state_update events
  • When target user joins:
    • Marks user_has_joined = True
    • Cancels 30min timeout
    • Auto-starts STT for that user

Auto-Leave After User Disconnect:

  • 45 second timer starts when user leaves voice channel
  • If user doesn't rejoin within 45s:
    • Ends voice session
    • Stops STT and TTS containers
    • Releases all resources
    • Returns to normal operation
  • If user rejoins before 45s, timer is cancelled

30-Minute Join Timeout:

  • If user never joins within 30 minutes:
    • Ends voice session
    • Stops containers
    • Sends timeout DM: "Aww, I guess you couldn't make it to voice chat... Maybe next time! 💙"

3. Container Management

File: bot/utils/container_manager.py

Methods:

  • start_voice_containers(): Starts STT & TTS, waits for warmup
  • stop_voice_containers(): Stops both containers
  • are_containers_running(): Check container status
  • _wait_for_stt_warmup(): WebSocket connection check
  • _wait_for_tts_warmup(): Health endpoint check

Warmup Detection:

# STT Warmup: Try WebSocket connection
ws://miku-stt:8765

# TTS Warmup: Check health endpoint
GET http://miku-rvc-api:8765/health
Response: {"status": "ready", "warmed_up": true}

4. Voice Session Tracking

File: bot/utils/voice_manager.py

New VoiceSession Fields:

call_user_id: Optional[int]  # User ID that was called
call_timeout_task: Optional[asyncio.Task]  # 30min timeout
user_has_joined: bool  # Track if user joined
auto_leave_task: Optional[asyncio.Task]  # 45s auto-leave
user_leave_time: Optional[float]  # When user left

Methods:

  • on_user_join(user_id): Handle user joining voice channel
  • on_user_leave(user_id): Start 45s auto-leave timer
  • _auto_leave_after_user_disconnect(): Execute auto-leave

5. LLM Context Update

Miku's voice chat prompt now includes:

NOTE: You will automatically disconnect 45 seconds after {user.name} leaves the voice channel,
so you can mention this if asked about leaving

6. Debug Mode Integration

With VOICE_DEBUG_MODE=true:

  • Shows "🎤 User said: ..." in text chat
  • Shows "💬 Miku: ..." responses
  • Shows interruption messages
  • Manual commands work (!miku join, !miku listen, etc.)

With VOICE_DEBUG_MODE=false (field deployment):

  • No text notifications
  • No command outputs
  • Silent operation
  • Only log files show activity

API Endpoint

POST /api/voice/call

Request Body:

{
  "user_id": 123456789,
  "voice_channel_id": 987654321
}

Success Response:

{
  "success": true,
  "user_id": 123456789,
  "channel_id": 987654321,
  "invite_url": "https://discord.gg/abc123"
}

Error Response:

{
  "success": false,
  "error": "Failed to start voice containers"
}

File Changes

New Files:

  1. bot/utils/container_manager.py - Docker container management
  2. VOICE_CALL_AUTOMATION.md - This documentation

Modified Files:

  1. bot/globals.py - Added VOICE_DEBUG_MODE flag
  2. bot/api.py - Added /api/voice/call endpoint and timeout handler
  3. bot/bot.py - Added on_voice_state_update event handler
  4. bot/utils/voice_manager.py:
    • Added call tracking fields to VoiceSession
    • Added on_user_join() and on_user_leave() methods
    • Added _auto_leave_after_user_disconnect() method
    • Updated LLM prompt with auto-disconnect context
    • Gated debug messages behind VOICE_DEBUG_MODE
  5. bot/utils/voice_receiver.py - Removed Discord VAD events (rely on RealtimeSTT only)

Testing Checklist

Web UI Integration:

  • Create voice call trigger UI with user ID and channel ID inputs
  • Display call status (starting containers, waiting for warmup, joined VC, waiting for user)
  • Show timeout countdown
  • Handle errors gracefully

Flow Testing:

  • Test successful call flow (containers start → warmup → join → DM → user joins → conversation → user leaves → 45s timer → auto-leave → containers stop)
  • Test 30min timeout (user never joins)
  • Test user rejoin within 45s (cancels auto-leave)
  • Test container failure handling
  • Test warmup timeout handling
  • Test DM failure (should continue anyway)

Debug Mode:

  • Test with VOICE_DEBUG_MODE=true (should see all notifications)
  • Test with VOICE_DEBUG_MODE=false (should be silent)

Environment Variables

Add to .env or docker-compose.yml:

VOICE_DEBUG_MODE=false  # Set to true for debugging

Next Steps

  1. Web UI: Create voice call interface with:

    • User ID input
    • Voice channel ID dropdown (fetch from Discord)
    • "Call User" button
    • Status display
    • Active call management
  2. Monitoring: Add voice call metrics:

    • Call duration
    • User join time
    • Auto-leave triggers
    • Container startup times
  3. Enhancements:

    • Multiple simultaneous calls (different channels)
    • Call history logging
    • User preferences (auto-answer, DND mode)
    • Scheduled voice calls

Technical Notes

Container Warmup Times:

  • STT (miku-stt): ~5-15 seconds (model loading)
  • TTS (miku-rvc-api): ~30-60 seconds (RVC model loading, synthesis warmup)
  • Total: ~35-75 seconds from API call to ready

Resource Management:

  • Voice sessions use VoiceSessionManager singleton
  • Only one voice session active at a time
  • Full resource locking during voice:
    • AMD GPU for text inference
    • Vision model blocked
    • Image generation disabled
    • Bipolar mode disabled
    • Autonomous engine paused

Cleanup Guarantees:

  • 45s auto-leave ensures no orphaned sessions
  • 30min timeout prevents indefinite container running
  • All cleanup paths stop containers
  • Voice session end releases all resources

Troubleshooting

Containers won't start:

  • Check Docker daemon status
  • Check docker compose ps for existing containers
  • Check logs: docker logs miku-stt / docker logs miku-rvc-api

Warmup timeout:

  • STT: Check WebSocket is accepting connections on port 8765
  • TTS: Check health endpoint returns {"warmed_up": true}
  • Increase timeout values if needed (slow hardware)

User never joins:

  • Verify invite URL is valid
  • Check user has permission to join voice channel
  • Verify DM was delivered (may be blocked)

Auto-leave not triggering:

  • Check on_voice_state_update events are firing
  • Verify user ID matches call_user_id
  • Check logs for timer creation/cancellation

Containers not stopping:

  • Manual stop: docker compose stop miku-stt miku-rvc-api
  • Check for orphaned containers: docker ps
  • Force remove: docker rm -f miku-stt miku-rvc-api