moved AI generated readmes to readme folder (may delete)

2026-01-27 19:58:26 +02:00
parent c58b941587
commit 38a986658d
19 changed files with 0 additions and 4938 deletions
--- a/API_REFERENCE.md
+++ b/API_REFERENCE.md
@@ -1,460 +0,0 @@
 # Miku Discord Bot API Reference
 The Miku bot exposes a FastAPI REST API on port 3939 for controlling and monitoring the bot.
 ## Base URL
 ```
 http://localhost:3939
 ```
 ## API Endpoints
 ### 📊 Status & Information
 #### `GET /status`
 Get current bot status and overview.
 **Response:**
 ```json
 {
  "status": "online",
  "mood": "neutral",
  "servers": 2,
  "active_schedulers": 2,
  "server_moods": {
    "123456789": "bubbly",
    "987654321": "excited"
  }
 }
 ```
 #### `GET /logs`
 Get the last 100 lines of bot logs.
 **Response:** Plain text log output
 #### `GET /prompt`
 Get the last full prompt sent to the LLM.
 **Response:**
 ```json
 {
  "prompt": "Last prompt text..."
 }
 ```
 ---
 ### 😊 Mood Management
 #### `GET /mood`
 Get current DM mood.
 **Response:**
 ```json
 {
  "mood": "neutral",
  "description": "Mood description text..."
 }
 ```
 #### `POST /mood`
 Set DM mood.
 **Request Body:**
 ```json
 {
  "mood": "bubbly"
 }
 ```
 **Response:**
 ```json
 {
  "status": "ok",
  "new_mood": "bubbly"
 }
 ```
 #### `POST /mood/reset`
 Reset DM mood to neutral.
 #### `POST /mood/calm`
 Calm Miku down (set to neutral).
 #### `GET /servers/{guild_id}/mood`
 Get mood for specific server.
 #### `POST /servers/{guild_id}/mood`
 Set mood for specific server.
 **Request Body:**
 ```json
 {
  "mood": "excited"
 }
 ```
 #### `POST /servers/{guild_id}/mood/reset`
 Reset server mood to neutral.
 #### `GET /servers/{guild_id}/mood/state`
 Get complete mood state for server.
 #### `GET /moods/available`
 List all available moods.
 **Response:**
 ```json
 {
  "moods": {
    "neutral": "😊",
    "bubbly": "🥰",
    "excited": "🤩",
    "sleepy": "😴",
    ...
  }
 }
 ```
 ---
 ### 😴 Sleep Management
 #### `POST /sleep`
 Force Miku to sleep.
 #### `POST /wake`
 Wake Miku up.
 #### `POST /bedtime?guild_id={guild_id}`
 Send bedtime reminder. If `guild_id` is provided, sends only to that server.
 ---
 ### 🤖 Autonomous Actions
 #### `POST /autonomous/general?guild_id={guild_id}`
 Trigger autonomous general message.
 #### `POST /autonomous/engage?guild_id={guild_id}`
 Trigger autonomous user engagement.
 #### `POST /autonomous/tweet?guild_id={guild_id}`
 Trigger autonomous tweet sharing.
 #### `POST /autonomous/reaction?guild_id={guild_id}`
 Trigger autonomous reaction to a message.
 #### `POST /autonomous/custom?guild_id={guild_id}`
 Send custom autonomous message.
 **Request Body:**
 ```json
 {
  "prompt": "Say something funny about cats"
 }
 ```
 #### `GET /autonomous/stats`
 Get autonomous engine statistics for all servers.
 **Response:** Detailed stats including message counts, activity, mood profiles, etc.
 #### `GET /autonomous/v2/stats/{guild_id}`
 Get autonomous V2 stats for specific server.
 #### `GET /autonomous/v2/check/{guild_id}`
 Check if autonomous action should happen for server.
 #### `GET /autonomous/v2/status`
 Get autonomous V2 status across all servers.
 ---
 ### 🌐 Server Management
 #### `GET /servers`
 List all configured servers.
 **Response:**
 ```json
 {
  "servers": [
    {
      "guild_id": 123456789,
      "guild_name": "My Server",
      "autonomous_channel_id": 987654321,
      "autonomous_channel_name": "general",
      "bedtime_channel_ids": [111111111],
      "enabled_features": ["autonomous", "bedtime"]
    }
  ]
 }
 ```
 #### `POST /servers`
 Add a new server configuration.
 **Request Body:**
 ```json
 {
  "guild_id": 123456789,
  "guild_name": "My Server",
  "autonomous_channel_id": 987654321,
  "autonomous_channel_name": "general",
  "bedtime_channel_ids": [111111111],
  "enabled_features": ["autonomous", "bedtime"]
 }
 ```
 #### `DELETE /servers/{guild_id}`
 Remove server configuration.
 #### `PUT /servers/{guild_id}`
 Update server configuration.
 #### `POST /servers/{guild_id}/bedtime-range`
 Set bedtime range for server.
 #### `POST /servers/{guild_id}/memory`
 Update server memory/context.
 #### `GET /servers/{guild_id}/memory`
 Get server memory/context.
 #### `POST /servers/repair`
 Repair server configurations.
 ---
 ### 💬 DM Management
 #### `GET /dms/users`
 List all users with DM history.
 **Response:**
 ```json
 {
  "users": [
    {
      "user_id": "123456789",
      "username": "User#1234",
      "total_messages": 42,
      "last_message_date": "2025-12-10T12:34:56",
      "is_blocked": false
    }
  ]
 }
 ```
 #### `GET /dms/users/{user_id}`
 Get details for specific user.
 #### `GET /dms/users/{user_id}/conversations`
 Get conversation history for user.
 #### `GET /dms/users/{user_id}/search?query={query}`
 Search user's DM history.
 #### `GET /dms/users/{user_id}/export`
 Export user's DM history.
 #### `DELETE /dms/users/{user_id}`
 Delete user's DM data.
 #### `POST /dm/{user_id}/custom`
 Send custom DM (LLM-generated).
 **Request Body:**
 ```json
 {
  "prompt": "Ask about their day"
 }
 ```
 #### `POST /dm/{user_id}/manual`
 Send manual DM (direct message).
 **Form Data:**
 - `message`: Message text
 #### `GET /dms/blocked-users`
 List blocked users.
 #### `POST /dms/users/{user_id}/block`
 Block a user.
 #### `POST /dms/users/{user_id}/unblock`
 Unblock a user.
 #### `POST /dms/users/{user_id}/conversations/{conversation_id}/delete`
 Delete specific conversation.
 #### `POST /dms/users/{user_id}/conversations/delete-all`
 Delete all conversations for user.
 #### `POST /dms/users/{user_id}/delete-completely`
 Completely delete user data.
 ---
 ### 📊 DM Analysis
 #### `POST /dms/analysis/run`
 Run analysis on all DM conversations.
 #### `POST /dms/users/{user_id}/analyze`
 Analyze specific user's DMs.
 #### `GET /dms/analysis/reports`
 Get all analysis reports.
 #### `GET /dms/analysis/reports/{user_id}`
 Get analysis report for specific user.
 ---
 ### 🖼️ Profile Picture Management
 #### `POST /profile-picture/change?guild_id={guild_id}`
 Change profile picture. Optionally upload custom image.
 **Form Data:**
 - `file`: Image file (optional)
 **Response:**
 ```json
 {
  "status": "ok",
  "message": "Profile picture changed successfully",
  "source": "danbooru",
  "metadata": {
    "url": "https://...",
    "tags": ["hatsune_miku", "...]
  }
 }
 ```
 #### `GET /profile-picture/metadata`
 Get current profile picture metadata.
 #### `POST /profile-picture/restore-fallback`
 Restore original fallback profile picture.
 ---
 ### 🎨 Role Color Management
 #### `POST /role-color/custom`
 Set custom role color.
 **Form Data:**
 - `hex_color`: Hex color code (e.g., "#FF0000")
 #### `POST /role-color/reset-fallback`
 Reset role color to fallback (#86cecb).
 ---
 ### 💬 Conversation Management
 #### `GET /conversation/{user_id}`
 Get conversation history for user.
 #### `POST /conversation/reset`
 Reset conversation history.
 **Request Body:**
 ```json
 {
  "user_id": "123456789"
 }
 ```
 ---
 ### 📨 Manual Messaging
 #### `POST /manual/send`
 Send manual message to channel.
 **Form Data:**
 - `message`: Message text
 - `channel_id`: Channel ID
 - `files`: Files to attach (optional, multiple)
 ---
 ### 🎁 Figurine Notifications
 #### `GET /figurines/subscribers`
 List figurine subscribers.
 #### `POST /figurines/subscribers`
 Add figurine subscriber.
 #### `DELETE /figurines/subscribers/{user_id}`
 Remove figurine subscriber.
 #### `POST /figurines/send_now`
 Send figurine notification to all subscribers.
 #### `POST /figurines/send_to_user`
 Send figurine notification to specific user.
 ---
 ### 🖼️ Image Generation
 #### `POST /image/generate`
 Generate image using image generation service.
 #### `GET /image/status`
 Get image generation service status.
 #### `POST /image/test-detection`
 Test face detection on uploaded image.
 ---
 ### 😀 Message Reactions
 #### `POST /messages/react`
 Add reaction to a message.
 **Request Body:**
 ```json
 {
  "channel_id": "123456789",
  "message_id": "987654321",
  "emoji": "😊"
 }
 ```
 ---
 ## Error Responses
 All endpoints return errors in the following format:
 ```json
 {
  "status": "error",
  "message": "Error description"
 }
 ```
 HTTP status codes:
 - `200` - Success
 - `400` - Bad request
 - `404` - Not found
 - `500` - Internal server error
 ## Authentication
 Currently, the API does not require authentication. It's designed to run on localhost within a Docker network.
 ## Rate Limiting
 No rate limiting is currently implemented.
--- a/CHAT_INTERFACE_FEATURE.md
+++ b/CHAT_INTERFACE_FEATURE.md
@@ -1,296 +0,0 @@
 # Chat Interface Feature Documentation
 ## Overview
 A new **"Chat with LLM"** tab has been added to the Miku bot Web UI, allowing you to chat directly with the language models with full streaming support (similar to ChatGPT).
 ## Features
 ### 1. Model Selection
 - **💬 Text Model (Fast)**: Chat with the text-based LLM for quick conversations
 - **👁️ Vision Model (Images)**: Use the vision model to analyze and discuss images
 ### 2. System Prompt Options
 - **✅ Use Miku Personality**: Attach the standard Miku personality system prompt
  - Text model: Gets the full Miku character prompt (same as `query_llama`)
  - Vision model: Gets a simplified Miku-themed image analysis prompt
 - **❌ Raw LLM (No Prompt)**: Chat directly with the base LLM without any personality
  - Great for testing raw model responses
  - No character constraints
 ### 3. Real-time Streaming
 - Messages stream in character-by-character like ChatGPT
 - Shows typing indicator while waiting for response
 - Smooth, responsive interface
 ### 4. Vision Model Support
 - Upload images when using the vision model
 - Image preview before sending
 - Analyze images with Miku's personality or raw vision capabilities
 ### 5. Chat Management
 - Clear chat history button
 - Timestamps on all messages
 - Color-coded messages (user vs assistant)
 - Auto-scroll to latest message
 - Keyboard shortcut: **Ctrl+Enter** to send messages
 ## Technical Implementation
 ### Backend (api.py)
 #### New Endpoint: `POST /chat/stream`
 ```python
 # Accepts:
 {
  "message": "Your chat message",
  "model_type": "text" | "vision",
  "use_system_prompt": true | false,
  "image_data": "base64_encoded_image" (optional, for vision model)
 }
 # Returns: Server-Sent Events (SSE) stream
 data: {"content": "streamed text chunk"}
 data: {"done": true}
 data: {"error": "error message"}
 ```
 **Key Features:**
 - Uses Server-Sent Events (SSE) for streaming
 - Supports both `TEXT_MODEL` and `VISION_MODEL` from globals
 - Dynamically switches system prompts based on configuration
 - Integrates with llama.cpp's streaming API
 ### Frontend (index.html)
 #### New Tab: "💬 Chat with LLM"
 Located in the main navigation tabs (tab6)
 **Components:**
 1. **Configuration Panel**
   - Radio buttons for model selection
   - Radio buttons for system prompt toggle
   - Image upload section (shows/hides based on model)
   - Clear chat history button
 2. **Chat Messages Container**
   - Scrollable message history
   - Animated message appearance
   - Typing indicator during streaming
   - Color-coded messages with timestamps
 3. **Input Area**
   - Multi-line text input
   - Send button with loading state
   - Keyboard shortcuts
 **JavaScript Functions:**
 - `sendChatMessage()`: Handles message sending and streaming reception
 - `toggleChatImageUpload()`: Shows/hides image upload for vision model
 - `addChatMessage()`: Adds messages to chat display
 - `showTypingIndicator()` / `hideTypingIndicator()`: Typing animation
 - `clearChatHistory()`: Clears all messages
 - `handleChatKeyPress()`: Keyboard shortcuts
 ## Usage Guide
 ### Basic Text Chat with Miku
 1. Go to "💬 Chat with LLM" tab
 2. Ensure "💬 Text Model" is selected
 3. Ensure "✅ Use Miku Personality" is selected
 4. Type your message and click "📤 Send" (or press Ctrl+Enter)
 5. Watch as Miku's response streams in real-time!
 ### Raw LLM Testing
 1. Select "💬 Text Model"
 2. Select "❌ Raw LLM (No Prompt)"
 3. Chat directly with the base language model without personality constraints
 ### Vision Model Chat
 1. Select "👁️ Vision Model"
 2. Click "Upload Image" and select an image
 3. Type a message about the image (e.g., "What do you see in this image?")
 4. Click "📤 Send"
 5. The vision model will analyze the image and respond
 ### Vision Model with Miku Personality
 1. Select "👁️ Vision Model"
 2. Keep "✅ Use Miku Personality" selected
 3. Upload an image
 4. Miku will analyze and comment on the image with her cheerful personality!
 ## System Prompts
 ### Text Model (with Miku personality)
 Uses the same comprehensive system prompt as `query_llama()`:
 - Full Miku character context
 - Current mood integration
 - Character consistency rules
 - Natural conversation guidelines
 ### Vision Model (with Miku personality)
 Simplified prompt optimized for image analysis:
 ```
 You are Hatsune Miku analyzing an image. Describe what you see naturally 
 and enthusiastically as Miku would. Be detailed but conversational. 
 React to what you see with Miku's cheerful, playful personality.
 ```
 ### No System Prompt
 Both models respond without personality constraints when this option is selected.
 ## Streaming Technology
 The interface uses **Server-Sent Events (SSE)** for real-time streaming:
 - Backend sends chunked responses from llama.cpp
 - Frontend receives and displays chunks as they arrive
 - Smooth, ChatGPT-like experience
 - Works with both text and vision models
 ## UI/UX Features
 ### Message Styling
 - **User messages**: Green accent, right-aligned feel
 - **Assistant messages**: Blue accent, left-aligned feel
 - **Error messages**: Red accent with error icon
 - **Fade-in animation**: Smooth appearance for new messages
 ### Responsive Design
 - Chat container scrolls automatically
 - Image preview for vision model
 - Loading states on buttons
 - Typing indicators
 - Custom scrollbar styling
 ### Keyboard Shortcuts
 - **Ctrl+Enter**: Send message quickly
 - **Tab**: Navigate between input fields
 ## Configuration Options
 All settings are preserved during the chat session:
 - Model type (text/vision)
 - System prompt toggle (Miku/Raw)
 - Uploaded image (for vision model)
 Settings do NOT persist after page refresh (fresh session each time).
 ## Error Handling
 The interface handles various errors gracefully:
 - Connection failures
 - Model errors
 - Invalid image files
 - Empty messages
 - Timeout issues
 All errors are displayed in the chat with clear error messages.
 ## Performance Considerations
 ### Text Model
 - Fast responses (typically 1-3 seconds)
 - Streaming starts almost immediately
 - Low latency
 ### Vision Model
 - Slower due to image processing
 - First token may take 3-10 seconds
 - Streaming continues once started
 - Image is sent as base64 (efficient)
 ## Development Notes
 ### File Changes
 1. **`bot/api.py`**
   - Added `from fastapi.responses import StreamingResponse`
   - Added `ChatMessage` Pydantic model
   - Added `POST /chat/stream` endpoint with SSE support
 2. **`bot/static/index.html`**
   - Added tab6 button in navigation
   - Added complete chat interface HTML
   - Added CSS styles for chat messages and animations
   - Added JavaScript functions for chat functionality
 ### Dependencies
 - Uses existing `aiohttp` for HTTP streaming
 - Uses existing `globals.TEXT_MODEL` and `globals.VISION_MODEL`
 - Uses existing `globals.LLAMA_URL` for llama.cpp connection
 - No new dependencies required!
 ## Future Enhancements (Ideas)
 Potential improvements for future versions:
 - [ ] Save/load chat sessions
 - [ ] Export chat history to file
 - [ ] Multi-user chat history (separate sessions per user)
 - [ ] Temperature and max_tokens controls
 - [ ] Model selection dropdown (if multiple models available)
 - [ ] Token count display
 - [ ] Voice input support
 - [ ] Markdown rendering in responses
 - [ ] Code syntax highlighting
 - [ ] Copy message button
 - [ ] Regenerate response button
 ## Troubleshooting
 ### "No response received from LLM"
 - Check if llama.cpp server is running
 - Verify `LLAMA_URL` in globals is correct
 - Check bot logs for connection errors
 ### "Failed to read image file"
 - Ensure image is valid format (JPEG, PNG, GIF)
 - Check file size (large images may cause issues)
 - Try a different image
 ### Streaming not working
 - Check browser console for JavaScript errors
 - Verify SSE is not blocked by proxy/firewall
 - Try refreshing the page
 ### Model not responding
 - Check if correct model is loaded in llama.cpp
 - Verify model type matches what's configured
 - Check llama.cpp logs for errors
 ## API Reference
 ### POST /chat/stream
 **Request Body:**
 ```json
 {
  "message": "string",          // Required: User's message
  "model_type": "text|vision",  // Required: Which model to use
  "use_system_prompt": boolean, // Required: Whether to add system prompt
  "image_data": "string|null"   // Optional: Base64 image for vision model
 }
 ```
 **Response:**
 ```
 Content-Type: text/event-stream
 data: {"content": "Hello"}
 data: {"content": " there"}
 data: {"content": "!"}
 data: {"done": true}
 ```
 **Error Response:**
 ```
 data: {"error": "Error message here"}
 ```
 ## Conclusion
 The Chat Interface provides a powerful, user-friendly way to:
 - Test LLM responses interactively
 - Experiment with different prompting strategies
 - Analyze images with vision models
 - Chat with Miku's personality in real-time
 - Debug and understand model behavior
 All with a smooth, modern streaming interface that feels like ChatGPT! 🎉
--- a/CHAT_QUICK_START.md
+++ b/CHAT_QUICK_START.md
@@ -1,148 +0,0 @@
 # Chat Interface - Quick Start Guide
 ## 🚀 Quick Start
 ### Access the Chat Interface
 1. Open the Miku Control Panel in your browser
 2. Click on the **"💬 Chat with LLM"** tab
 3. Start chatting!
 ## 📋 Configuration Options
 ### Model Selection
 - **💬 Text Model**: Fast text conversations
 - **👁️ Vision Model**: Image analysis
 ### System Prompt
 - **✅ Use Miku Personality**: Chat with Miku's character
 - **❌ Raw LLM**: Direct LLM without personality
 ## 💡 Common Use Cases
 ### 1. Chat with Miku
 ```
 Model: Text Model
 System Prompt: Use Miku Personality
 Message: "Hi Miku! How are you feeling today?"
 ```
 ### 2. Test Raw LLM
 ```
 Model: Text Model
 System Prompt: Raw LLM
 Message: "Explain quantum physics"
 ```
 ### 3. Analyze Images with Miku
 ```
 Model: Vision Model
 System Prompt: Use Miku Personality
 Upload: [your image]
 Message: "What do you think of this image?"
 ```
 ### 4. Raw Image Analysis
 ```
 Model: Vision Model
 System Prompt: Raw LLM
 Upload: [your image]
 Message: "Describe this image in detail"
 ```
 ## ⌨️ Keyboard Shortcuts
 - **Ctrl+Enter**: Send message
 ## 🎨 Features
 - ✅ Real-time streaming (like ChatGPT)
 - ✅ Image upload for vision model
 - ✅ Color-coded messages
 - ✅ Timestamps
 - ✅ Typing indicators
 - ✅ Auto-scroll
 - ✅ Clear chat history
 ## 🔧 System Prompts
 ### Text Model with Miku
 - Full Miku personality
 - Current mood awareness
 - Character consistency
 ### Vision Model with Miku
 - Miku analyzing images
 - Cheerful, playful descriptions
 ### No System Prompt
 - Direct LLM responses
 - No character constraints
 ## 📊 Message Types
 ### User Messages (Green)
 - Your input
 - Right-aligned appearance
 ### Assistant Messages (Blue)
 - Miku/LLM responses
 - Left-aligned appearance
 - Streams in real-time
 ### Error Messages (Red)
 - Connection errors
 - Model errors
 - Clear error descriptions
 ## 🎯 Tips
 1. **Use Ctrl+Enter** for quick sending
 2. **Select model first** before uploading images
 3. **Clear history** to start fresh conversations
 4. **Toggle system prompt** to compare responses
 5. **Wait for streaming** to complete before sending next message
 ## 🐛 Troubleshooting
 ### No response?
 - Check if llama.cpp is running
 - Verify network connection
 - Check browser console
 ### Image not working?
 - Switch to Vision Model
 - Use valid image format (JPG, PNG)
 - Check file size
 ### Slow responses?
 - Vision model is slower than text
 - Wait for streaming to complete
 - Check llama.cpp load
 ## 📝 Examples
 ### Example 1: Personality Test
 **With Miku Personality:**
 > User: "What's your favorite song?"
 > Miku: "Oh, I love so many songs! But if I had to choose, I'd say 'World is Mine' holds a special place in my heart! It really captures that fun, playful energy that I love! ✨"
 **Without System Prompt:**
 > User: "What's your favorite song?"
 > LLM: "I don't have personal preferences as I'm an AI language model..."
 ### Example 2: Image Analysis
 **With Miku Personality:**
 > User: [uploads sunset image] "What do you see?"
 > Miku: "Wow! What a beautiful sunset! The sky is painted with such gorgeous oranges and pinks! It makes me want to write a song about it! The way the colors blend together is so dreamy and romantic~ 🌅💕"
 **Without System Prompt:**
 > User: [uploads sunset image] "What do you see?"
 > LLM: "This image shows a sunset landscape. The sky displays orange and pink hues. The sun is setting on the horizon. There are silhouettes of trees in the foreground."
 ## 🎉 Enjoy Chatting!
 Have fun experimenting with different combinations of:
 - Text vs Vision models
 - With vs Without system prompts
 - Different types of questions
 - Various images (for vision model)
 The streaming interface makes it feel just like ChatGPT! 🚀
--- a/CLI_README.md
+++ b/CLI_README.md
@@ -1,347 +0,0 @@
 # Miku CLI - Command Line Interface
 A powerful command-line interface for controlling and monitoring the Miku Discord bot.
 ## Installation
 1. Make the script executable:
 ```bash
 chmod +x miku-cli.py
 ```
 2. Install dependencies:
 ```bash
 pip install requests
 ```
 3. (Optional) Create a symlink for easier access:
 ```bash
 sudo ln -s $(pwd)/miku-cli.py /usr/local/bin/miku
 ```
 ## Quick Start
 ```bash
 # Check bot status
 ./miku-cli.py status
 # Get current mood
 ./miku-cli.py mood --get
 # Set mood to bubbly
 ./miku-cli.py mood --set bubbly
 # List available moods
 ./miku-cli.py mood --list
 # Trigger autonomous message
 ./miku-cli.py autonomous general
 # List servers
 ./miku-cli.py servers
 # View logs
 ./miku-cli.py logs
 ```
 ## Configuration
 By default, the CLI connects to `http://localhost:3939`. To use a different URL:
 ```bash
 ./miku-cli.py --url http://your-server:3939 status
 ```
 ## Commands
 ### Status & Information
 ```bash
 # Get bot status
 ./miku-cli.py status
 # View recent logs
 ./miku-cli.py logs
 # Get last LLM prompt
 ./miku-cli.py prompt
 ```
 ### Mood Management
 ```bash
 # Get current DM mood
 ./miku-cli.py mood --get
 # Get server mood
 ./miku-cli.py mood --get --server 123456789
 # Set mood
 ./miku-cli.py mood --set bubbly
 ./miku-cli.py mood --set excited --server 123456789
 # Reset mood to neutral
 ./miku-cli.py mood --reset
 ./miku-cli.py mood --reset --server 123456789
 # List available moods
 ./miku-cli.py mood --list
 ```
 ### Sleep Management
 ```bash
 # Put Miku to sleep
 ./miku-cli.py sleep
 # Wake Miku up
 ./miku-cli.py wake
 # Send bedtime reminder
 ./miku-cli.py bedtime
 ./miku-cli.py bedtime --server 123456789
 ```
 ### Autonomous Actions
 ```bash
 # Trigger general autonomous message
 ./miku-cli.py autonomous general
 ./miku-cli.py autonomous general --server 123456789
 # Trigger user engagement
 ./miku-cli.py autonomous engage
 ./miku-cli.py autonomous engage --server 123456789
 # Share a tweet
 ./miku-cli.py autonomous tweet
 ./miku-cli.py autonomous tweet --server 123456789
 # Trigger reaction
 ./miku-cli.py autonomous reaction
 ./miku-cli.py autonomous reaction --server 123456789
 # Send custom autonomous message
 ./miku-cli.py autonomous custom --prompt "Tell a joke about programming"
 ./miku-cli.py autonomous custom --prompt "Say hello" --server 123456789
 # Get autonomous stats
 ./miku-cli.py autonomous stats
 ```
 ### Server Management
 ```bash
 # List all configured servers
 ./miku-cli.py servers
 ```
 ### DM Management
 ```bash
 # List users with DM history
 ./miku-cli.py dm-users
 # Send custom DM (LLM-generated)
 ./miku-cli.py dm-custom 123456789 "Ask them how their day was"
 # Send manual DM (direct message)
 ./miku-cli.py dm-manual 123456789 "Hello! How are you?"
 # Block a user
 ./miku-cli.py block 123456789
 # Unblock a user
 ./miku-cli.py unblock 123456789
 # List blocked users
 ./miku-cli.py blocked-users
 ```
 ### Profile Picture
 ```bash
 # Change profile picture (search Danbooru based on mood)
 ./miku-cli.py change-pfp
 # Change to custom image
 ./miku-cli.py change-pfp --image /path/to/image.png
 # Change for specific server mood
 ./miku-cli.py change-pfp --server 123456789
 # Get current profile picture metadata
 ./miku-cli.py pfp-metadata
 ```
 ### Conversation Management
 ```bash
 # Reset conversation history for a user
 ./miku-cli.py reset-conversation 123456789
 ```
 ### Manual Messaging
 ```bash
 # Send message to channel
 ./miku-cli.py send 987654321 "Hello everyone!"
 # Send message with file attachments
 ./miku-cli.py send 987654321 "Check this out!" --files image.png document.pdf
 ```
 ## Available Moods
 - 😊 neutral
 - 🥰 bubbly
 - 🤩 excited
 - 😴 sleepy
 - 😡 angry
 - 🙄 irritated
 - 😏 flirty
 - 💕 romantic
 - 🤔 curious
 - 😳 shy
 - 🤪 silly
 - 😢 melancholy
 - 😤 serious
 - 💤 asleep
 ## Examples
 ### Morning Routine
 ```bash
 # Wake up Miku
 ./miku-cli.py wake
 # Set a bubbly mood
 ./miku-cli.py mood --set bubbly
 # Send a general message to all servers
 ./miku-cli.py autonomous general
 # Change profile picture to match mood
 ./miku-cli.py change-pfp
 ```
 ### Server-Specific Control
 ```bash
 # Get server list
 ./miku-cli.py servers
 # Set mood for specific server
 ./miku-cli.py mood --set excited --server 123456789
 # Trigger engagement on that server
 ./miku-cli.py autonomous engage --server 123456789
 ```
 ### DM Interaction
 ```bash
 # List users
 ./miku-cli.py dm-users
 # Send custom message
 ./miku-cli.py dm-custom 123456789 "Ask them about their favorite anime"
 # If user is spamming, block them
 ./miku-cli.py block 123456789
 ```
 ### Monitoring
 ```bash
 # Check status
 ./miku-cli.py status
 # View logs
 ./miku-cli.py logs
 # Get autonomous stats
 ./miku-cli.py autonomous stats
 # Check last prompt
 ./miku-cli.py prompt
 ```
 ## Output Format
 The CLI uses emoji and colored output for better readability:
 - ✅ Success messages
 - ❌ Error messages
 - 😊 Mood indicators
 - 🌐 Server information
 - 💬 DM information
 - 📊 Statistics
 - 🖼️ Media information
 ## Scripting
 The CLI is designed to be script-friendly:
 ```bash
 #!/bin/bash
 # Morning routine script
 ./miku-cli.py wake
 ./miku-cli.py mood --set bubbly
 ./miku-cli.py autonomous general
 # Wait 5 minutes
 sleep 300
 # Engage users
 ./miku-cli.py autonomous engage
 ```
 ## Error Handling
 The CLI exits with status code 1 on errors and 0 on success, making it suitable for use in scripts:
 ```bash
 if ./miku-cli.py mood --set bubbly; then
    echo "Mood set successfully"
 else
    echo "Failed to set mood"
 fi
 ```
 ## API Reference
 For complete API documentation, see [API_REFERENCE.md](./API_REFERENCE.md).
 ## Troubleshooting
 ### Connection Refused
 If you get "Connection refused" errors:
 1. Check that the bot API is running on port 3939
 2. Verify the URL with `--url` parameter
 3. Check Docker container status: `docker-compose ps`
 ### Permission Denied
 Make the script executable:
 ```bash
 chmod +x miku-cli.py
 ```
 ### Import Errors
 Install required dependencies:
 ```bash
 pip install requests
 ```
 ## Future Enhancements
 Planned features:
 - Configuration file support (~/.miku-cli.conf)
 - Interactive mode
 - Tab completion
 - Color output control
 - JSON output mode for scripting
 - Batch operations
 - Watch mode for real-time monitoring
 ## Contributing
 Feel free to extend the CLI with additional commands and features!
--- a/DUAL_GPU_BUILD_SUMMARY.md
+++ b/DUAL_GPU_BUILD_SUMMARY.md
@@ -1,184 +0,0 @@
 # Dual GPU Setup Summary
 ## What We Built
 A secondary llama-swap container optimized for your **AMD RX 6800** GPU using ROCm.
 ### Architecture
 ```
 Primary GPU (NVIDIA GTX 1660)     Secondary GPU (AMD RX 6800)
         ↓                                    ↓
   llama-swap (CUDA)                  llama-swap-amd (ROCm)
   Port: 8090                         Port: 8091
         ↓                                    ↓
   NVIDIA models                       AMD models
   - llama3.1                         - llama3.1-amd
   - darkidol                         - darkidol-amd
   - vision (MiniCPM)                 - moondream-amd
 ```
 ## Files Created
 1. **Dockerfile.llamaswap-rocm** - Custom multi-stage build:
   - Stage 1: Builds llama.cpp with ROCm from source
   - Stage 2: Builds llama-swap from source
   - Stage 3: Runtime image with both binaries
 2. **llama-swap-rocm-config.yaml** - Model configuration for AMD GPU
 3. **docker-compose.yml** - Updated with `llama-swap-amd` service
 4. **bot/utils/gpu_router.py** - Load balancing utility
 5. **bot/globals.py** - Updated with `LLAMA_AMD_URL`
 6. **setup-dual-gpu.sh** - Setup verification script
 7. **DUAL_GPU_SETUP.md** - Comprehensive documentation
 8. **DUAL_GPU_QUICK_REF.md** - Quick reference guide
 ## Why Custom Build?
 - llama.cpp doesn't publish ROCm Docker images (yet)
 - llama-swap doesn't provide ROCm variants
 - Building from source ensures latest ROCm compatibility
 - Full control over compilation flags and optimization
 ## Build Time
 The initial build takes 15-30 minutes depending on your system:
 - llama.cpp compilation: ~10-20 minutes
 - llama-swap compilation: ~1-2 minutes
 - Image layering: ~2-5 minutes
 Subsequent builds are much faster due to Docker layer caching.
 ## Next Steps
 Once the build completes:
 ```bash
 # 1. Start both GPU services
 docker compose up -d llama-swap llama-swap-amd
 # 2. Verify both are running
 docker compose ps
 # 3. Test NVIDIA GPU
 curl http://localhost:8090/health
 # 4. Test AMD GPU
 curl http://localhost:8091/health
 # 5. Monitor logs
 docker compose logs -f llama-swap-amd
 # 6. Test model loading on AMD
 curl -X POST http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1-amd",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 50
  }'
 ```
 ## Device Access
 The AMD container has access to:
 - `/dev/kfd` - AMD GPU kernel driver
 - `/dev/dri` - Direct Rendering Infrastructure
 - Groups: `video`, `render`
 ## Environment Variables
 RX 6800 specific settings:
 ```yaml
 HSA_OVERRIDE_GFX_VERSION=10.3.0  # Navi 21 (gfx1030) compatibility
 ROCM_PATH=/opt/rocm
 HIP_VISIBLE_DEVICES=0            # Use first AMD GPU
 ```
 ## Bot Integration
 Your bot now has two endpoints available:
 ```python
 import globals
 # NVIDIA GPU (primary)
 nvidia_url = globals.LLAMA_URL  # http://llama-swap:8080
 # AMD GPU (secondary)
 amd_url = globals.LLAMA_AMD_URL  # http://llama-swap-amd:8080
 ```
 Use the `gpu_router` utility for automatic load balancing:
 ```python
 from bot.utils.gpu_router import get_llama_url_with_load_balancing
 # Round-robin between GPUs
 url, model = get_llama_url_with_load_balancing(task_type="text")
 # Prefer AMD for vision
 url, model = get_llama_url_with_load_balancing(
    task_type="vision",
    prefer_amd=True
 )
 ```
 ## Troubleshooting
 If the AMD container fails to start:
 1. **Check build logs:**
   ```bash
   docker compose build --no-cache llama-swap-amd
   ```
 2. **Verify GPU access:**
   ```bash
   ls -l /dev/kfd /dev/dri
   ```
 3. **Check container logs:**
   ```bash
   docker compose logs llama-swap-amd
   ```
 4. **Test GPU from host:**
   ```bash
   lspci | grep -i amd
   # Should show: Radeon RX 6800
   ```
 ## Performance Notes
 **RX 6800 Specs:**
 - VRAM: 16GB
 - Architecture: RDNA 2 (Navi 21)
 - Compute: gfx1030
 **Recommended Models:**
 - Q4_K_M quantization: 5-6GB per model
 - Can load 2-3 models simultaneously
 - Good for: Llama 3.1 8B, DarkIdol 8B, Moondream2
 ## Future Improvements
 1. **Automatic failover:** Route to AMD if NVIDIA is busy
 2. **Health monitoring:** Track GPU utilization
 3. **Dynamic routing:** Use least-busy GPU
 4. **VRAM monitoring:** Alert before OOM
 5. **Model preloading:** Keep common models loaded
 ## Resources
 - [ROCm Documentation](https://rocmdocs.amd.com/)
 - [llama.cpp ROCm Build](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#rocm)
 - [llama-swap GitHub](https://github.com/mostlygeek/llama-swap)
 - [Full Setup Guide](./DUAL_GPU_SETUP.md)
 - [Quick Reference](./DUAL_GPU_QUICK_REF.md)
--- a/DUAL_GPU_QUICK_REF.md
+++ b/DUAL_GPU_QUICK_REF.md
@@ -1,194 +0,0 @@
 # Dual GPU Quick Reference
 ## Quick Start
 ```bash
 # 1. Run setup check
 ./setup-dual-gpu.sh
 # 2. Build AMD container
 docker compose build llama-swap-amd
 # 3. Start both GPUs
 docker compose up -d llama-swap llama-swap-amd
 # 4. Verify
 curl http://localhost:8090/health  # NVIDIA
 curl http://localhost:8091/health  # AMD RX 6800
 ```
 ## Endpoints
 | GPU | Container | Port | Internal URL |
 |-----|-----------|------|--------------|
 | NVIDIA | llama-swap | 8090 | http://llama-swap:8080 |
 | AMD RX 6800 | llama-swap-amd | 8091 | http://llama-swap-amd:8080 |
 ## Models
 ### NVIDIA GPU (Primary)
 - `llama3.1` - Llama 3.1 8B Instruct
 - `darkidol` - DarkIdol Uncensored 8B
 - `vision` - MiniCPM-V-4.5 (4K context)
 ### AMD RX 6800 (Secondary)
 - `llama3.1-amd` - Llama 3.1 8B Instruct
 - `darkidol-amd` - DarkIdol Uncensored 8B
 - `moondream-amd` - Moondream2 Vision (2K context)
 ## Commands
 ### Start/Stop
 ```bash
 # Start both
 docker compose up -d llama-swap llama-swap-amd
 # Start only AMD
 docker compose up -d llama-swap-amd
 # Stop AMD
 docker compose stop llama-swap-amd
 # Restart AMD with logs
 docker compose restart llama-swap-amd && docker compose logs -f llama-swap-amd
 ```
 ### Monitoring
 ```bash
 # Container status
 docker compose ps
 # Logs
 docker compose logs -f llama-swap-amd
 # GPU usage
 watch -n 1 nvidia-smi  # NVIDIA
 watch -n 1 rocm-smi    # AMD
 # Resource usage
 docker stats llama-swap llama-swap-amd
 ```
 ### Testing
 ```bash
 # List available models
 curl http://localhost:8091/v1/models | jq
 # Test text generation (AMD)
 curl -X POST http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1-amd",
    "messages": [{"role": "user", "content": "Say hello!"}],
    "max_tokens": 20
  }' | jq
 # Test vision model (AMD)
 curl -X POST http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moondream-amd",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this image"},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
      ]
    }],
    "max_tokens": 100
  }' | jq
 ```
 ## Bot Integration
 ### Using GPU Router
 ```python
 from bot.utils.gpu_router import get_llama_url_with_load_balancing, get_endpoint_for_model
 # Load balanced text generation
 url, model = get_llama_url_with_load_balancing(task_type="text")
 # Specific model
 url = get_endpoint_for_model("darkidol-amd")
 # Vision on AMD
 url, model = get_llama_url_with_load_balancing(task_type="vision", prefer_amd=True)
 ```
 ### Direct Access
 ```python
 import globals
 # AMD GPU
 amd_url = globals.LLAMA_AMD_URL  # http://llama-swap-amd:8080
 # NVIDIA GPU  
 nvidia_url = globals.LLAMA_URL   # http://llama-swap:8080
 ```
 ## Troubleshooting
 ### AMD Container Won't Start
 ```bash
 # Check ROCm
 rocm-smi
 # Check permissions
 ls -l /dev/kfd /dev/dri
 # Check logs
 docker compose logs llama-swap-amd
 # Rebuild
 docker compose build --no-cache llama-swap-amd
 ```
 ### Model Won't Load
 ```bash
 # Check VRAM
 rocm-smi --showmeminfo vram
 # Lower GPU layers in llama-swap-rocm-config.yaml
 # Change: -ngl 99
 # To:     -ngl 50
 ```
 ### GFX Version Error
 ```bash
 # RX 6800 is gfx1030
 # Ensure in docker-compose.yml:
 HSA_OVERRIDE_GFX_VERSION=10.3.0
 ```
 ## Environment Variables
 Add to `docker-compose.yml` under `miku-bot` service:
 ```yaml
 environment:
  - PREFER_AMD_GPU=true          # Prefer AMD for load balancing
  - AMD_MODELS_ENABLED=true      # Enable AMD models
  - LLAMA_AMD_URL=http://llama-swap-amd:8080
 ```
 ## Files
 - `Dockerfile.llamaswap-rocm` - ROCm container
 - `llama-swap-rocm-config.yaml` - AMD model config
 - `bot/utils/gpu_router.py` - Load balancing utility
 - `DUAL_GPU_SETUP.md` - Full documentation
 - `setup-dual-gpu.sh` - Setup verification script
 ## Performance Tips
 1. **Model Selection**: Use Q4_K quantization for best size/quality balance
 2. **VRAM**: RX 6800 has 16GB - can run 2-3 Q4 models
 3. **TTL**: Adjust in config files (1800s = 30min default)
 4. **Context**: Lower context size (`-c 8192`) to save VRAM
 5. **GPU Layers**: `-ngl 99` uses full GPU, lower if needed
 ## Support
 - ROCm Docs: https://rocmdocs.amd.com/
 - llama.cpp: https://github.com/ggml-org/llama.cpp
 - llama-swap: https://github.com/mostlygeek/llama-swap
--- a/DUAL_GPU_SETUP.md
+++ b/DUAL_GPU_SETUP.md
@@ -1,321 +0,0 @@
 # Dual GPU Setup - NVIDIA + AMD RX 6800
 This document describes the dual-GPU configuration for running two llama-swap instances simultaneously:
 - **Primary GPU (NVIDIA)**: Runs main models via CUDA
 - **Secondary GPU (AMD RX 6800)**: Runs additional models via ROCm
 ## Architecture Overview
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                         Miku Bot                            │
 │                                                             │
 │  LLAMA_URL=http://llama-swap:8080 (NVIDIA)                │
 │  LLAMA_AMD_URL=http://llama-swap-amd:8080 (AMD RX 6800)   │
 └─────────────────────────────────────────────────────────────┘
                    │                      │
                    │                      │
                    ▼                      ▼
        ┌──────────────────┐    ┌──────────────────┐
        │  llama-swap      │    │  llama-swap-amd  │
        │  (CUDA)          │    │  (ROCm)          │
        │  Port: 8090      │    │  Port: 8091      │
        └──────────────────┘    └──────────────────┘
                    │                      │
                    ▼                      ▼
        ┌──────────────────┐    ┌──────────────────┐
        │  NVIDIA GPU      │    │  AMD RX 6800     │
        │  - llama3.1      │    │  - llama3.1-amd  │
        │  - darkidol      │    │  - darkidol-amd  │
        │  - vision        │    │  - moondream-amd │
        └──────────────────┘    └──────────────────┘
 ```
 ## Files Created
 1. **Dockerfile.llamaswap-rocm** - ROCm-enabled Docker image for AMD GPU
 2. **llama-swap-rocm-config.yaml** - Model configuration for AMD models
 3. **docker-compose.yml** - Updated with `llama-swap-amd` service
 ## Configuration Details
 ### llama-swap-amd Service
 ```yaml
 llama-swap-amd:
  build:
    context: .
    dockerfile: Dockerfile.llamaswap-rocm
  container_name: llama-swap-amd
  ports:
    - "8091:8080"  # External access on port 8091
  volumes:
    - ./models:/models
    - ./llama-swap-rocm-config.yaml:/app/config.yaml
  devices:
    - /dev/kfd:/dev/kfd    # AMD GPU kernel driver
    - /dev/dri:/dev/dri    # Direct Rendering Infrastructure
  group_add:
    - video
    - render
  environment:
    - HSA_OVERRIDE_GFX_VERSION=10.3.0  # RX 6800 (Navi 21) compatibility
 ```
 ### Available Models on AMD GPU
 From `llama-swap-rocm-config.yaml`:
 - **llama3.1-amd** - Llama 3.1 8B text model
 - **darkidol-amd** - DarkIdol uncensored model  
 - **moondream-amd** - Moondream2 vision model (smaller, AMD-optimized)
 ### Model Aliases
 You can access AMD models using these aliases:
 - `llama3.1-amd`, `text-model-amd`, `amd-text`
 - `darkidol-amd`, `evil-model-amd`, `uncensored-amd`
 - `moondream-amd`, `vision-amd`, `moondream`
 ## Usage
 ### Building and Starting Services
 ```bash
 # Build the AMD ROCm container
 docker compose build llama-swap-amd
 # Start both GPU services
 docker compose up -d llama-swap llama-swap-amd
 # Check logs
 docker compose logs -f llama-swap-amd
 ```
 ### Accessing AMD Models from Bot Code
 In your bot code, you can now use either endpoint:
 ```python
 import globals
 # Use NVIDIA GPU (primary)
 nvidia_response = requests.post(
    f"{globals.LLAMA_URL}/v1/chat/completions",
    json={"model": "llama3.1", ...}
 )
 # Use AMD GPU (secondary)
 amd_response = requests.post(
    f"{globals.LLAMA_AMD_URL}/v1/chat/completions", 
    json={"model": "llama3.1-amd", ...}
 )
 ```
 ### Load Balancing Strategy
 You can implement load balancing by:
 1. **Round-robin**: Alternate between GPUs for text generation
 2. **Task-specific**: 
   - NVIDIA: Primary text + MiniCPM vision (heavy)
   - AMD: Secondary text + Moondream vision (lighter)
 3. **Failover**: Use AMD as backup if NVIDIA is busy
 Example load balancing function:
 ```python
 import random
 import globals
 def get_llama_url(prefer_amd=False):
    """Get llama URL with optional load balancing"""
    if prefer_amd:
        return globals.LLAMA_AMD_URL
    # Random load balancing for text models
    return random.choice([globals.LLAMA_URL, globals.LLAMA_AMD_URL])
 ```
 ## Testing
 ### Test NVIDIA GPU (Port 8090)
 ```bash
 curl http://localhost:8090/health
 curl http://localhost:8090/v1/models
 ```
 ### Test AMD GPU (Port 8091)
 ```bash
 curl http://localhost:8091/health
 curl http://localhost:8091/v1/models
 ```
 ### Test Model Loading (AMD)
 ```bash
 curl -X POST http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1-amd",
    "messages": [{"role": "user", "content": "Hello from AMD GPU!"}],
    "max_tokens": 50
  }'
 ```
 ## Monitoring
 ### Check GPU Usage
 **AMD GPU:**
 ```bash
 # ROCm monitoring
 rocm-smi
 # Or from host
 watch -n 1 rocm-smi
 ```
 **NVIDIA GPU:**
 ```bash
 nvidia-smi
 watch -n 1 nvidia-smi
 ```
 ### Check Container Resource Usage
 ```bash
 docker stats llama-swap llama-swap-amd
 ```
 ## Troubleshooting
 ### AMD GPU Not Detected
 1. Verify ROCm is installed on host:
   ```bash
   rocm-smi --version
   ```
 2. Check device permissions:
   ```bash
   ls -l /dev/kfd /dev/dri
   ```
 3. Verify RX 6800 compatibility:
   ```bash
   rocminfo | grep "Name:"
   ```
 ### Model Loading Issues
 If models fail to load on AMD:
 1. Check VRAM availability:
   ```bash
   rocm-smi --showmeminfo vram
   ```
 2. Adjust `-ngl` (GPU layers) in config if needed:
   ```yaml
   # Reduce GPU layers for smaller VRAM
   cmd: /app/llama-server ... -ngl 50 ...  # Instead of 99
   ```
 3. Check container logs:
   ```bash
   docker compose logs llama-swap-amd
   ```
 ### GFX Version Mismatch
 RX 6800 is Navi 21 (gfx1030). If you see GFX errors:
 ```bash
 # Set in docker-compose.yml environment:
 HSA_OVERRIDE_GFX_VERSION=10.3.0
 ```
 ### llama-swap Build Issues
 If the ROCm container fails to build:
 1. The Dockerfile attempts to build llama-swap from source
 2. Alternative: Use pre-built binary or simpler proxy setup
 3. Check build logs: `docker compose build --no-cache llama-swap-amd`
 ## Performance Considerations
 ### Memory Usage
 - **RX 6800**: 16GB VRAM
  - Q4_K_M/Q4_K_XL models: ~5-6GB each
  - Can run 2 models simultaneously or 1 with long context
 ### Model Selection
 **Best for AMD RX 6800:**
 - ✅ Q4_K_M/Q4_K_S quantized models (5-6GB)
 - ✅ Moondream2 vision (smaller, efficient)
 - ⚠️ MiniCPM-V-4.5 (possible but may be tight on VRAM)
 ### TTL Configuration
 Adjust model TTL in `llama-swap-rocm-config.yaml`:
 - Lower TTL = more aggressive unloading = more VRAM available
 - Higher TTL = less model swapping = faster response times
 ## Advanced: Model-Specific Routing
 Create a helper function to route models automatically:
 ```python
 # bot/utils/gpu_router.py
 import globals
 MODEL_TO_GPU = {
    # NVIDIA models
    "llama3.1": globals.LLAMA_URL,
    "darkidol": globals.LLAMA_URL,
    "vision": globals.LLAMA_URL,
    # AMD models
    "llama3.1-amd": globals.LLAMA_AMD_URL,
    "darkidol-amd": globals.LLAMA_AMD_URL,
    "moondream-amd": globals.LLAMA_AMD_URL,
 }
 def get_endpoint_for_model(model_name):
    """Get the correct llama-swap endpoint for a model"""
    return MODEL_TO_GPU.get(model_name, globals.LLAMA_URL)
 def is_amd_model(model_name):
    """Check if model runs on AMD GPU"""
    return model_name.endswith("-amd")
 ```
 ## Environment Variables
 Add these to control GPU selection:
 ```yaml
 # In docker-compose.yml
 environment:
  - LLAMA_URL=http://llama-swap:8080
  - LLAMA_AMD_URL=http://llama-swap-amd:8080
  - PREFER_AMD_GPU=false  # Set to true to prefer AMD for general tasks
  - AMD_MODELS_ENABLED=true  # Enable/disable AMD models
 ```
 ## Future Enhancements
 1. **Automatic load balancing**: Monitor GPU utilization and route requests
 2. **Health checks**: Fallback to primary GPU if AMD fails
 3. **Model distribution**: Automatically assign models to GPUs based on VRAM
 4. **Performance metrics**: Track response times per GPU
 5. **Dynamic routing**: Use least-busy GPU for new requests
 ## References
 - [ROCm Documentation](https://rocmdocs.amd.com/)
 - [llama.cpp ROCm Support](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#rocm)
 - [llama-swap GitHub](https://github.com/mostlygeek/llama-swap)
 - [AMD GPU Compatibility Matrix](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html)
--- a/ERROR_HANDLING_QUICK_REF.md
+++ b/ERROR_HANDLING_QUICK_REF.md
@@ -1,78 +0,0 @@
 # Error Handling Quick Reference
 ## What Changed
 When Miku encounters an error (like "Error 502" from llama-swap), she now says:
 ```
 "Someone tell Koko-nii there is a problem with my AI."
 ```
 And sends you a webhook notification with full error details.
 ## Webhook Details
 **Webhook URL**: `https://discord.com/api/webhooks/1462216811293708522/...`
 **Mentions**: @Koko-nii (User ID: 344584170839236608)
 ## Error Notification Format
 ```
 🚨 Miku Bot Error
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Error Message:
  Error: 502
 User: username#1234
 Channel: #general
 Server: Guild ID: 123456789
 User Prompt:
  Hi Miku! How are you?
 Exception Type: HTTPError
 Traceback:
  [Full Python traceback]
 ```
 ## Files Changed
 1. **NEW**: `bot/utils/error_handler.py`
   - Main error handling logic
   - Webhook notifications
   - Error detection
 2. **MODIFIED**: `bot/utils/llm.py`
   - Added error handling to `query_llama()`
   - Prevents errors in conversation history
   - Catches all exceptions and HTTP errors
 3. **NEW**: `bot/test_error_handler.py`
   - Test suite for error detection
   - 26 test cases
 4. **NEW**: `ERROR_HANDLING_SYSTEM.md`
   - Full documentation
 ## Testing
 ```bash
 cd /home/koko210Serve/docker/miku-discord/bot
 python test_error_handler.py
 ```
 Expected: ✓ All 26 tests passed!
 ## Coverage
 ✅ Works with both llama-swap (NVIDIA) and llama-swap-rocm (AMD)
 ✅ Handles all message types (DMs, server messages, autonomous)
 ✅ Catches connection errors, timeouts, HTTP errors
 ✅ Prevents errors from polluting conversation history
 ## No Changes Required
 No configuration changes needed. The system is automatically active for:
 - All direct messages to Miku
 - All server messages mentioning Miku
 - All autonomous messages
 - All LLM queries via `query_llama()`
--- a/ERROR_HANDLING_SYSTEM.md
+++ b/ERROR_HANDLING_SYSTEM.md
@@ -1,131 +0,0 @@
 # Error Handling System
 ## Overview
 The Miku bot now includes a comprehensive error handling system that catches errors from the llama-swap containers (both NVIDIA and AMD) and provides user-friendly responses while notifying the bot administrator.
 ## Features
 ### 1. Error Detection
 The system automatically detects various types of errors including:
 - HTTP error codes (502, 500, 503, etc.)
 - Connection errors (refused, timeout, failed)
 - LLM server errors
 - Timeout errors
 - Generic error messages
 ### 2. User-Friendly Responses
 When an error is detected, instead of showing technical error messages like "Error 502" or "Sorry, there was an error", Miku will respond with:
 > **"Someone tell Koko-nii there is a problem with my AI."**
 This keeps Miku in character and provides a better user experience.
 ### 3. Administrator Notifications
 When an error occurs, a webhook notification is automatically sent to Discord with:
 - **Error Message**: The full error text from the container
 - **Context Information**:
  - User who triggered the error
  - Channel/Server where the error occurred
  - User's prompt that caused the error
  - Exception type (if applicable)
  - Full traceback (if applicable)
 - **Mention**: Automatically mentions Koko-nii for immediate attention
 ### 4. Conversation History Protection
 Error messages are NOT saved to conversation history, preventing errors from polluting the context for future interactions.
 ## Implementation Details
 ### Files Modified
 1. **`bot/utils/error_handler.py`** (NEW)
   - Core error detection and webhook notification logic
   - `is_error_response()`: Detects error messages using regex patterns
   - `handle_llm_error()`: Handles exceptions from the LLM
   - `handle_response_error()`: Handles error responses from the LLM
   - `send_error_webhook()`: Sends formatted error notifications
 2. **`bot/utils/llm.py`**
   - Integrated error handling into `query_llama()` function
   - Catches all exceptions and HTTP errors
   - Filters responses to detect error messages
   - Prevents error messages from being saved to history
 ### Webhook URL
 ```
 https://discord.com/api/webhooks/1462216811293708522/4kdGenpxZFsP0z3VBgebYENODKmcRrmEzoIwCN81jCirnAxuU2YvxGgwGCNBb6TInA9Z
 ```
 ## Error Detection Patterns
 The system detects errors using the following patterns:
 - `Error: XXX` or `Error XXX` (with HTTP status codes)
 - `XXX Error` format
 - "Sorry, there was an error"
 - "Sorry, the response took too long"
 - Connection-related errors (refused, timeout, failed)
 - Server errors (service unavailable, internal server error, bad gateway)
 - HTTP status codes >= 400
 ## Coverage
 The error handler is automatically applied to:
 - ✅ Direct messages to Miku
 - ✅ Server messages mentioning Miku
 - ✅ Autonomous messages (general, engaging users, tweets)
 - ✅ Conversation joining
 - ✅ All responses using `query_llama()`
 - ✅ Both NVIDIA and AMD GPU containers
 ## Testing
 A test suite is included in `bot/test_error_handler.py` that validates the error detection logic with 26 test cases covering:
 - Various error message formats
 - Normal responses (should NOT be detected as errors)
 - HTTP status codes
 - Edge cases
 Run tests with:
 ```bash
 cd /home/koko210Serve/docker/miku-discord/bot
 python test_error_handler.py
 ```
 ## Example Scenarios
 ### Scenario 1: llama-swap Container Down
 **User**: "Hi Miku!"
 **Without Error Handler**: "Error: 502"
 **With Error Handler**: "Someone tell Koko-nii there is a problem with my AI."
 **Webhook Notification**: Sent with full error details
 ### Scenario 2: Connection Timeout
 **User**: "Tell me a story"
 **Without Error Handler**: "Sorry, the response took too long. Please try again."
 **With Error Handler**: "Someone tell Koko-nii there is a problem with my AI."
 **Webhook Notification**: Sent with timeout exception details
 ### Scenario 3: LLM Server Error
 **User**: "How are you?"
 **Without Error Handler**: "Error: Internal server error"
 **With Error Handler**: "Someone tell Koko-nii there is a problem with my AI."
 **Webhook Notification**: Sent with HTTP 500 error details
 ## Benefits
 1. **Better User Experience**: Users see a friendly, in-character message instead of technical errors
 2. **Immediate Notifications**: Administrator is notified immediately via Discord webhook
 3. **Detailed Context**: Full error information is provided for debugging
 4. **Clean History**: Errors don't pollute conversation history
 5. **Consistent Handling**: All error types are handled uniformly
 6. **Container Agnostic**: Works with both NVIDIA and AMD containers
 ## Future Enhancements
 Potential improvements:
 - Add retry logic for transient errors
 - Track error frequency to detect systemic issues
 - Automatic container restart if errors persist
 - Error categorization (transient vs. critical)
 - Rate limiting on webhook notifications to prevent spam
--- a/INTERRUPTION_DETECTION.md
+++ b/INTERRUPTION_DETECTION.md
@@ -1,311 +0,0 @@
 # Intelligent Interruption Detection System
 ## Implementation Complete ✅
 Added sophisticated interruption detection that prevents response queueing and allows natural conversation flow.
 ---
 ## Features
 ### 1. **Intelligent Interruption Detection**
 Detects when user speaks over Miku with configurable thresholds:
 - **Time threshold**: 0.8 seconds of continuous speech
 - **Chunk threshold**: 8+ audio chunks (160ms worth)
 - **Smart calculation**: Both conditions must be met to prevent false positives
 ### 2. **Graceful Cancellation**
 When interruption is detected:
 - ✅ Stops LLM streaming immediately (`miku_speaking = False`)
 - ✅ Cancels TTS playback
 - ✅ Flushes audio buffers
 - ✅ Ready for next input within milliseconds
 ### 3. **History Tracking**
 Maintains conversation context:
 - Adds `[INTERRUPTED - user started speaking]` marker to history
 - **Does NOT** add incomplete response to history
 - LLM sees the interruption in context for next response
 - Prevents confusion about what was actually said
 ### 4. **Queue Prevention**
 - If user speaks while Miku is talking **but not long enough to interrupt**:
  - Input is **ignored** (not queued)
  - User sees: `"(talk over Miku longer to interrupt)"`
  - Prevents "yeah" x5 = 5 responses problem
 ---
 ## How It Works
 ### Detection Algorithm
 ```
 User speaks during Miku's turn
         ↓
 Track: start_time, chunk_count
         ↓
 Each audio chunk increments counter
         ↓
 Check thresholds:
  - Duration >= 0.8s?
  - Chunks >= 8?
         ↓
   Both YES → INTERRUPT!
         ↓
 Stop LLM stream, cancel TTS, mark history
 ```
 ### Threshold Calculation
 **Audio chunks**: Discord sends 20ms chunks @ 16kHz (320 samples)
 - 8 chunks = 160ms of actual audio
 - But over 800ms timespan = sustained speech
 **Why both conditions?**
 - Time only: Background noise could trigger
 - Chunks only: Gaps in speech could fail
 - Both together: Reliable detection of intentional speech
 ---
 ## Configuration
 ### Interruption Thresholds
 Edit `bot/utils/voice_receiver.py`:
 ```python
 # Interruption detection
 self.interruption_threshold_time = 0.8  # seconds
 self.interruption_threshold_chunks = 8  # minimum chunks
 ```
 **Recommendations**:
 - **More sensitive** (interrupt faster): `0.5s / 6 chunks`
 - **Current** (balanced): `0.8s / 8 chunks`
 - **Less sensitive** (only clear interruptions): `1.2s / 12 chunks`
 ### Silence Timeout
 The silence detection (when to finalize transcript) was also adjusted:
 ```python
 self.silence_timeout = 1.0  # seconds (was 1.5s)
 ```
 Faster silence detection = more responsive conversations!
 ---
 ## Conversation History Format
 ### Before Interruption
 ```python
 [
    {"role": "user", "content": "koko210: Tell me a long story"},
    {"role": "assistant", "content": "Once upon a time in a digital world..."},
 ]
 ```
 ### After Interruption
 ```python
 [
    {"role": "user", "content": "koko210: Tell me a long story"},
    {"role": "assistant", "content": "[INTERRUPTED - user started speaking]"},
    {"role": "user", "content": "koko210: Actually, tell me something else"},
    {"role": "assistant", "content": "Sure! What would you like to hear about?"},
 ]
 ```
 The `[INTERRUPTED]` marker gives the LLM context that the conversation was cut off.
 ---
 ## Testing Scenarios
 ### Test 1: Basic Interruption
 1. `!miku listen`
 2. Say: "Tell me a very long story about your concerts"
 3. **While Miku is speaking**, talk over her for 1+ second
 4. **Expected**: TTS stops, LLM stops, Miku listens to your new input
 ### Test 2: Short Talk-Over (No Interruption)
 1. Miku is speaking
 2. Say a quick "yeah" or "uh-huh" (< 0.8s)
 3. **Expected**: Ignored, Miku continues speaking, message: "(talk over Miku longer to interrupt)"
 ### Test 3: Multiple Queued Inputs (PREVENTED)
 1. Miku is speaking
 2. Say "yeah" 5 times quickly
 3. **Expected**: All ignored except one that might interrupt
 4. **OLD BEHAVIOR**: Would queue 5 responses ❌
 5. **NEW BEHAVIOR**: Ignores them ✅
 ### Test 4: Conversation History
 1. Start conversation
 2. Interrupt Miku mid-sentence
 3. Ask: "What were you saying?"
 4. **Expected**: Miku should acknowledge she was interrupted
 ---
 ## User Experience
 ### What Users See
 **Normal conversation:**
 ```
 🎤 koko210: "Hey Miku, how are you?"
 💭 Miku is thinking...
 🎤 Miku: "I'm doing great! How about you?"
 ```
 **Quick talk-over (ignored):**
 ```
 🎤 Miku: "I'm doing great! How about..."
 💬 koko210 said: "yeah" (talk over Miku longer to interrupt)
 🎤 Miku: "...you? I hope you're having a good day!"
 ```
 **Successful interruption:**
 ```
 🎤 Miku: "I'm doing great! How about..."
 ⚠️ koko210 interrupted Miku
 🎤 koko210: "Actually, can you sing something?"
 💭 Miku is thinking...
 ```
 ---
 ## Technical Details
 ### Interruption Detection Flow
 ```python
 # In voice_receiver.py _send_audio_chunk()
 if miku_speaking:
    if user_id not in interruption_start_time:
        # First chunk during Miku's speech
        interruption_start_time[user_id] = current_time
        interruption_audio_count[user_id] = 1
    else:
        # Increment chunk count
        interruption_audio_count[user_id] += 1
    # Calculate duration
    duration = current_time - interruption_start_time[user_id]
    chunks = interruption_audio_count[user_id]
    # Check threshold
    if duration >= 0.8 and chunks >= 8:
        # INTERRUPT!
        trigger_interruption(user_id)
 ```
 ### Cancellation Flow
 ```python
 # In voice_manager.py on_user_interruption()
 1. Set miku_speaking = False
   → LLM streaming loop checks this and breaks
 2. Call _cancel_tts()
   → Stops voice_client playback
   → Sends /interrupt to RVC server
 3. Add history marker
   → {"role": "assistant", "content": "[INTERRUPTED]"}
 4. Ready for next input!
 ```
 ---
 ## Performance
 - **Detection latency**: ~20-40ms (1-2 audio chunks)
 - **Cancellation latency**: ~50-100ms (TTS stop + buffer clear)
 - **Total response time**: ~100-150ms from speech start to Miku stopping
 - **False positive rate**: Very low with dual threshold system
 ---
 ## Monitoring
 ### Check Interruption Logs
 ```bash
 docker logs -f miku-bot | grep "interrupted"
 ```
 **Expected output**:
 ```
 🛑 User 209381657369772032 interrupted Miku (duration=1.2s, chunks=15)
 ✓ Interruption handled, ready for next input
 ```
 ### Debug Interruption Detection
 ```bash
 docker logs -f miku-bot | grep "interruption"
 ```
 ### Check for Queued Responses (should be none!)
 ```bash
 docker logs -f miku-bot | grep "Ignoring new input"
 ```
 ---
 ## Edge Cases Handled
 1. **Multiple users interrupting**: Each user tracked independently
 2. **Rapid speech then silence**: Interruption tracking resets when Miku stops
 3. **Network packet loss**: Opus decode errors don't affect tracking
 4. **Container restart**: Tracking state cleaned up properly
 5. **Miku finishes naturally**: Interruption tracking cleared
 ---
 ## Files Modified
 1. **bot/utils/voice_receiver.py**
   - Added interruption tracking dictionaries
   - Added detection logic in `_send_audio_chunk()`
   - Cleanup interruption state in `stop_listening()`
   - Configurable thresholds at init
 2. **bot/utils/voice_manager.py**
   - Updated `on_user_interruption()` to handle graceful cancel
   - Added history marker for interruptions
   - Modified `_generate_voice_response()` to not save incomplete responses
   - Added queue prevention in `on_final_transcript()`
   - Reduced silence timeout to 1.0s
 ---
 ## Benefits
 ✅ **Natural conversation flow**: No more awkward queued responses  
 ✅ **Responsive**: Miku stops quickly when interrupted  
 ✅ **Context-aware**: History tracks interruptions  
 ✅ **False-positive resistant**: Dual threshold prevents accidental triggers  
 ✅ **User-friendly**: Clear feedback about what's happening  
 ✅ **Performant**: Minimal latency, efficient tracking  
 ---
 ## Future Enhancements
 - [ ] **Adaptive thresholds** based on user speech patterns
 - [ ] **Volume-based detection** (interrupt faster if user speaks loudly)
 - [ ] **Context-aware responses** (Miku acknowledges interruption more naturally)
 - [ ] **User preferences** (some users may want different sensitivity)
 - [ ] **Multi-turn interruption** (handle rapid back-and-forth better)
 ---
 **Status**: ✅ **DEPLOYED AND READY FOR TESTING**
 Try interrupting Miku mid-sentence - she should stop gracefully and listen to your new input!
--- a/README.md
+++ b/README.md
@@ -1,535 +0,0 @@
 # 🎤 Miku Discord Bot 💙
 <div align="center">
 ![Miku Banner](https://img.shields.io/badge/Virtual_Idol-Hatsune_Miku-00CED1?style=for-the-badge&logo=discord&logoColor=white)
 [![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white)](https://www.docker.com/)
 [![Python](https://img.shields.io/badge/python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/)
 [![Discord.py](https://img.shields.io/badge/discord.py-2.0+-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discordpy.readthedocs.io/)
 *The world's #1 Virtual Idol, now in your Discord server! 🌱✨*
 [Features](#-features) • [Quick Start](#-quick-start) • [Architecture](#️-architecture) • [API](#-api-endpoints) • [Contributing](#-contributing)
 </div>
 ---
 ## 🌟 About
 Meet **Hatsune Miku** - a fully-featured, AI-powered Discord bot that brings the cheerful, energetic virtual idol to life in your server! Powered by local LLMs (Llama 3.1), vision models (MiniCPM-V), and a sophisticated autonomous behavior system, Miku can chat, react, share content, and even change her profile picture based on her mood!
 ### Why This Bot?
 - 🎭 **14 Dynamic Moods** - From bubbly to melancholy, Miku's personality adapts
 - 🤖 **Smart Autonomous Behavior** - Context-aware decisions without spamming
 - 👁️ **Vision Capabilities** - Analyzes images, videos, and GIFs in conversations
 - 🎨 **Auto Profile Pictures** - Fetches & crops anime faces from Danbooru based on mood
 - 💬 **DM Support** - Personal conversations with mood tracking
 - 🐦 **Twitter Integration** - Shares Miku-related tweets and figurine announcements
 - 🎮 **ComfyUI Integration** - Natural language image generation requests
 - 🔊 **Voice Chat Ready** - Fish.audio TTS integration (docs included)
 - 📊 **RESTful API** - Full control via HTTP endpoints
 - 🐳 **Production Ready** - Docker Compose with GPU support
 ---
 ## ✨ Features
 ### 🧠 AI & LLM Integration
 - **Local LLM Processing** with Llama 3.1 8B (via llama.cpp + llama-swap)
 - **Automatic Model Switching** - Text ↔️ Vision models swap on-demand
 - **OpenAI-Compatible API** - Easy migration and integration
 - **Conversation History** - Per-user context with RAG-style retrieval
 - **Smart Prompting** - Mood-aware system prompts with personality profiles
 ### 🎭 Mood & Personality System
 <details>
 <summary>14 Available Moods (click to expand)</summary>
 - 😊 **Neutral** - Classic cheerful Miku
 - 😴 **Asleep** - Sleepy and minimally responsive
 - 😪 **Sleepy** - Getting tired, simple responses
 - 🎉 **Excited** - Extra energetic and enthusiastic
 - 💫 **Bubbly** - Playful and giggly
 - 🤔 **Curious** - Inquisitive and wondering
 - 😳 **Shy** - Blushing and hesitant
 - 🤪 **Silly** - Goofy and fun-loving
 - 😠 **Angry** - Frustrated or upset
 - 😤 **Irritated** - Mildly annoyed
 - 😢 **Melancholy** - Sad and reflective
 - 😏 **Flirty** - Playful and teasing
 - 💕 **Romantic** - Sweet and affectionate
 - 🎯 **Serious** - Focused and thoughtful
 </details>
 - **Per-Server Mood Tracking** - Different moods in different servers
 - **DM Mood Persistence** - Separate mood state for private conversations
 - **Automatic Mood Shifts** - Responds to conversation sentiment
 ### 🤖 Autonomous Behavior System V2
 The bot features a sophisticated **context-aware decision engine** that makes Miku feel alive:
 - **Intelligent Activity Detection** - Tracks message frequency, user presence, and channel activity
 - **Non-Intrusive** - Won't spam or interrupt important conversations
 - **Mood-Based Personality** - Behavioral patterns change with mood
 - **Multiple Action Types**:
  - 💬 General conversation starters
  - 👋 Engaging specific users
  - 🐦 Sharing Miku tweets
  - 💬 Joining ongoing conversations
  - 🎨 Changing profile pictures
  - 😊 Reacting to messages
 **Rate Limiting**: Minimum 30-second cooldown between autonomous actions to prevent spam.
 ### 👁️ Vision & Media Processing
 - **Image Analysis** - Describe images shared in chat using MiniCPM-V 4.5
 - **Video Understanding** - Extracts frames and analyzes video content
 - **GIF Support** - Processes animated GIFs (converts to MP4 if needed)
 - **Embed Content Extraction** - Reads Twitter/X embeds without API
 - **Face Detection** - On-demand anime face detection service (GPU-accelerated)
 ### 🎨 Dynamic Profile Picture System
 - **Danbooru Integration** - Searches for Miku artwork
 - **Smart Cropping** - Automatic face detection and 1:1 crop
 - **Mood-Based Selection** - Filters by tags matching current mood
 - **Quality Filtering** - Only uses high-quality, safe-rated images
 - **Fallback System** - Graceful degradation if detection fails
 ### 🐦 Twitter Features
 - **Tweet Sharing** - Automatically fetches and shares Miku-related tweets
 - **Figurine Notifications** - DM subscribers about new Miku figurine releases
 - **Embed Compatibility** - Uses fxtwitter for better Discord previews
 - **Duplicate Prevention** - Tracks sent tweets to avoid repeats
 ### 🎮 ComfyUI Image Generation
 - **Natural Language Detection** - "Draw me as Miku swimming in a pool"
 - **Workflow Integration** - Connects to external ComfyUI instance
 - **Smart Prompting** - Enhances user requests with context
 ### 📡 REST API Dashboard
 Full-featured FastAPI server with endpoints for:
 - Mood management (get/set/reset)
 - Conversation history
 - Autonomous actions (trigger manually)
 - Profile picture updates
 - Server configuration
 - DM analysis reports
 ### 🔧 Developer Features
 - **Docker Compose Setup** - One command deployment
 - **GPU Acceleration** - NVIDIA runtime for models and face detection
 - **Health Checks** - Automatic service monitoring
 - **Volume Persistence** - Conversation history and settings saved
 - **Hot Reload** - Update without restarting (for development)
 ---
 ## 🚀 Quick Start
 ### Prerequisites
 - **Docker** & **Docker Compose** installed
 - **NVIDIA GPU** with CUDA support (for model inference)
 - **Discord Bot Token** ([Create one here](https://discord.com/developers/applications))
 - At least **8GB VRAM** recommended (4GB minimum)
 ### Installation
 1. **Clone the repository**
   ```bash
   git clone https://github.com/yourusername/miku-discord.git
   cd miku-discord
   ```
 2. **Set up your bot token**
   Edit `docker-compose.yml` and replace the `DISCORD_BOT_TOKEN`:
   ```yaml
   environment:
     - DISCORD_BOT_TOKEN=your_token_here
     - OWNER_USER_ID=your_discord_user_id  # For DM reports
   ```
 3. **Add your models**
   Place these GGUF models in the `models/` directory:
   - `Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf` (text model)
   - `MiniCPM-V-4_5-Q3_K_S.gguf` (vision model)
   - `MiniCPM-V-4_5-mmproj-f16.gguf` (vision projector)
 4. **Launch the bot**
   ```bash
   docker-compose up -d
   ```
 5. **Check logs**
   ```bash
   docker-compose logs -f miku-bot
   ```
 6. **Access the dashboard**
   Open http://localhost:3939 in your browser
 ### Optional: ComfyUI Integration
 If you have ComfyUI running, update the path in `docker-compose.yml`:
 ```yaml
 volumes:
  - /path/to/your/ComfyUI/output:/app/ComfyUI/output:ro
 ```
 ### Optional: Face Detection Service
 Start the anime face detector when needed:
 ```bash
 docker-compose --profile tools up -d anime-face-detector
 ```
 Access Gradio UI at http://localhost:7860
 ---
 ## 🏗️ Architecture
 ### Service Overview
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                        Discord API                          │
 └───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
 ┌─────────────────────────────────────────────────────────────┐
 │                     Miku Bot (Python)                       │
 │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
 │  │   Discord    │  │   FastAPI    │  │  Autonomous  │     │
 │  │  Event Loop  │  │   Server     │  │    Engine    │     │
 │  └──────────────┘  └──────────────┘  └──────────────┘     │
 └───────────┬────────────────┬────────────────┬──────────────┘
            │                │                │
            ▼                ▼                ▼
 ┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐
 │   llama-swap    │ │   ComfyUI       │ │ Face Detector│
 │  (Model Server) │ │ (Image Gen)     │ │  (On-Demand) │
 │                 │ │                 │ │              │
 │  • Llama 3.1    │ │  • Workflows    │ │  • Gradio UI │
 │  • MiniCPM-V    │ │  • GPU Accel    │ │  • FastAPI   │
 │  • Auto-swap    │ │                 │ │              │
 └─────────────────┘ └─────────────────┘ └──────────────┘
         │
         ▼
   ┌──────────┐
   │  Models  │
   │  (GGUF)  │
   └──────────┘
 ```
 ### Tech Stack
 | Component | Technology |
 |-----------|-----------|
 | **Bot Framework** | Discord.py 2.0+ |
 | **LLM Backend** | llama.cpp + llama-swap |
 | **Text Model** | Llama 3.1 8B Instruct |
 | **Vision Model** | MiniCPM-V 4.5 |
 | **API Server** | FastAPI + Uvicorn |
 | **Image Gen** | ComfyUI (external) |
 | **Face Detection** | Anime-Face-Detector (Gradio) |
 | **Database** | JSON files (conversation history, settings) |
 | **Containerization** | Docker + Docker Compose |
 | **GPU Runtime** | NVIDIA Container Toolkit |
 ### Key Components
 #### 1. **llama-swap** (Model Server)
 - Automatically loads/unloads models based on requests
 - Prevents VRAM exhaustion by swapping between text and vision models
 - OpenAI-compatible `/v1/chat/completions` endpoint
 - Configurable TTL (time-to-live) per model
 #### 2. **Autonomous Engine V2**
 - Tracks message activity, user presence, and channel engagement
 - Calculates "engagement scores" per server
 - Makes context-aware decisions without LLM overhead
 - Personality profiles per mood (e.g., shy mood = less engaging)
 #### 3. **Server Manager**
 - Per-guild configuration (mood, sleep state, autonomous settings)
 - Scheduled tasks (bedtime reminders, autonomous ticks)
 - Persistent storage in `servers_config.json`
 #### 4. **Conversation History**
 - Vector-based RAG (Retrieval Augmented Generation)
 - Stores last 50 messages per user
 - Semantic search using FAISS
 - Context injection for continuity
 ---
 ## 📡 API Endpoints
 The bot runs a FastAPI server on port **3939** with the following endpoints:
 ### Mood Management
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/servers/{guild_id}/mood` | GET | Get current mood for server |
 | `/servers/{guild_id}/mood` | POST | Set mood (body: `{"mood": "excited"}`) |
 | `/servers/{guild_id}/mood/reset` | POST | Reset to neutral mood |
 | `/mood` | GET | Get DM mood (deprecated, use server-specific) |
 ### Autonomous Actions
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/autonomous/general` | POST | Make Miku say something random |
 | `/autonomous/engage` | POST | Engage a random user |
 | `/autonomous/tweet` | POST | Share a Miku tweet |
 | `/autonomous/reaction` | POST | React to a recent message |
 | `/autonomous/custom` | POST | Custom prompt (body: `{"prompt": "..."}`) |
 ### Profile Pictures
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/profile-picture/change` | POST | Change profile picture (body: `{"mood": "happy"}`) |
 | `/profile-picture/revert` | POST | Revert to previous picture |
 | `/profile-picture/current` | GET | Get current picture metadata |
 ### Utilities
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/conversation/reset` | POST | Clear conversation history for user |
 | `/logs` | GET | View bot logs (last 1000 lines) |
 | `/prompt` | GET | View current system prompt |
 | `/` | GET | Dashboard HTML page |
 ### Example Usage
 ```bash
 # Set mood to excited
 curl -X POST http://localhost:3939/servers/123456789/mood \
  -H "Content-Type: application/json" \
  -d '{"mood": "excited"}'
 # Make Miku say something
 curl -X POST http://localhost:3939/autonomous/general
 # Change profile picture
 curl -X POST http://localhost:3939/profile-picture/change \
  -H "Content-Type: application/json" \
  -d '{"mood": "flirty"}'
 ```
 ---
 ## 🎮 Usage Examples
 ### Basic Interaction
 ```
 User: Hey Miku! How are you today?
 Miku: Miku's doing great! 💙 Thanks for asking! ✨
 User: Can you see this? [uploads image]
 Miku: Ooh! 👀 I see a cute cat sitting on a keyboard! So fluffy! 🐱
 ```
 ### Mood Changes
 ```
 User: /mood excited
 Miku: YAYYY!!! 🎉✨ Miku is SO EXCITED right now!!! Let's have fun! 💙🎶
 User: What's your favorite food?
 Miku: NEGI!! 🌱🌱🌱 Green onions are THE BEST! Want some?! ✨
 ```
 ### Image Generation
 ```
 User: Draw yourself swimming in a pool
 Miku: Ooh! Let me create that for you! 🎨✨ [generates image]
 ```
 ### Autonomous Behavior
 ```
 [After detecting activity in #general]
 Miku: Hey everyone! 👋 What are you all talking about? 💙
 ```
 ---
 ## 🛠️ Configuration
 ### Model Configuration (`llama-swap-config.yaml`)
 ```yaml
 models:
  llama3.1:
    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf -ngl 99
    ttl: 1800  # 30 minutes
  vision:
    cmd: /app/llama-server --port ${PORT} --model /models/MiniCPM-V-4_5-Q3_K_S.gguf --mmproj /models/MiniCPM-V-4_5-mmproj-f16.gguf
    ttl: 900   # 15 minutes
 ```
 ### Environment Variables
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `DISCORD_BOT_TOKEN` | *Required* | Your Discord bot token |
 | `OWNER_USER_ID` | *Optional* | Your Discord user ID (for DM reports) |
 | `LLAMA_URL` | `http://llama-swap:8080` | Model server endpoint |
 | `TEXT_MODEL` | `llama3.1` | Text generation model name |
 | `VISION_MODEL` | `vision` | Vision model name |
 ### Persistent Storage
 All data is stored in `bot/memory/`:
 - `servers_config.json` - Per-server settings
 - `autonomous_config.json` - Autonomous behavior settings
 - `conversation_history/` - User conversation data
 - `profile_pictures/` - Downloaded profile pictures
 - `dms/` - DM conversation logs
 - `figurine_subscribers.json` - Figurine notification subscribers
 ---
 ## 📚 Documentation
 Detailed documentation available in the `readmes/` directory:
 - **[AUTONOMOUS_V2_IMPLEMENTED.md](readmes/AUTONOMOUS_V2_IMPLEMENTED.md)** - Autonomous system V2 details
 - **[VOICE_CHAT_IMPLEMENTATION.md](readmes/VOICE_CHAT_IMPLEMENTATION.md)** - Fish.audio TTS integration guide
 - **[PROFILE_PICTURE_FEATURE.md](readmes/PROFILE_PICTURE_FEATURE.md)** - Profile picture system
 - **[FACE_DETECTION_API_MIGRATION.md](readmes/FACE_DETECTION_API_MIGRATION.md)** - Face detection setup
 - **[DM_ANALYSIS_FEATURE.md](readmes/DM_ANALYSIS_FEATURE.md)** - DM interaction analytics
 - **[MOOD_SYSTEM_ANALYSIS.md](readmes/MOOD_SYSTEM_ANALYSIS.md)** - Mood system deep dive
 - **[QUICK_REFERENCE.md](readmes/QUICK_REFERENCE.md)** - llama.cpp setup and migration guide
 ---
 ## 🐛 Troubleshooting
 ### Bot won't start
 **Check if models are loaded:**
 ```bash
 docker-compose logs llama-swap
 ```
 **Verify GPU access:**
 ```bash
 docker run --rm --runtime=nvidia nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
 ```
 ### High VRAM usage
 - Lower the `-ngl` parameter in `llama-swap-config.yaml` (reduce GPU layers)
 - Reduce context size with `-c` parameter
 - Use smaller quantization (Q3 instead of Q4)
 ### Autonomous actions not triggering
 - Check `autonomous_config.json` - ensure enabled and cooldown settings
 - Verify activity in server (bot tracks engagement)
 - Check logs for decision engine output
 ### Face detection not working
 - Ensure GPU is available: `docker-compose --profile tools up -d anime-face-detector`
 - Check API health: `curl http://localhost:6078/health`
 - View Gradio UI: http://localhost:7860
 ### Models switching too frequently
 Increase TTL in `llama-swap-config.yaml`:
 ```yaml
 ttl: 3600  # 1 hour instead of 30 minutes
 ```
 ### Development Setup
 For local development without Docker:
 ```bash
 # Install dependencies
 cd bot
 pip install -r requirements.txt
 # Set environment variables
 export DISCORD_BOT_TOKEN="your_token"
 export LLAMA_URL="http://localhost:8080"
 # Run the bot
 python bot.py
 ```
 ### Code Style
 - Use type hints where possible
 - Follow PEP 8 conventions
 - Add docstrings to functions
 - Comment complex logic
 ---
 ## 📝 License
 This project is provided as-is for educational and personal use. Please respect:
 - Discord's [Terms of Service](https://discord.com/terms)
 - Crypton Future Media's [Piapro Character License](https://piapro.net/intl/en_for_creators.html)
 - Model licenses (Llama 3.1, MiniCPM-V)
 ---
 ## 🙏 Acknowledgments
 - **Crypton Future Media** - For creating Hatsune Miku
 - **llama.cpp** - For efficient local LLM inference
 - **mostlygeek/llama-swap** - For brilliant model management
 - **Discord.py** - For the excellent Discord API wrapper
 - **OpenAI** - For the API standard
 - **MiniCPM-V Team** - For the amazing vision model
 - **Danbooru** - For the artwork API
 ---
 ## 💙 Support
 If you enjoy this project:
 - ⭐ Star this repository
 - 🐛 Report bugs via [Issues](https://github.com/yourusername/miku-discord/issues)
 - 💬 Share your Miku bot setup in [Discussions](https://github.com/yourusername/miku-discord/discussions)
 - 🎤 Listen to some Miku songs!
 ---
 <div align="center">
 **Made with 💙 by a Miku fan, for Miku fans**
 *"The future begins now!" - Hatsune Miku* 🎶✨
 [⬆ Back to Top](#-miku-discord-bot-)
 </div>
--- a/SILENCE_DETECTION.md
+++ b/SILENCE_DETECTION.md
@@ -1,222 +0,0 @@
 # Silence Detection Implementation
 ## What Was Added
 Implemented automatic silence detection to trigger final transcriptions in the new ONNX-based STT system.
 ### Problem
 The new ONNX server requires manually sending a `{"type": "final"}` command to get the complete transcription. Without this, partial transcripts would appear but never be finalized and sent to LlamaCPP.
 ### Solution
 Added silence tracking in `voice_receiver.py`:
 1. **Track audio timestamps**: Record when the last audio chunk was sent
 2. **Detect silence**: Start a timer after each audio chunk  
 3. **Send final command**: If no new audio arrives within 1.5 seconds, send `{"type": "final"}`
 4. **Cancel on new audio**: Reset the timer if more audio arrives
 ---
 ## Implementation Details
 ### New Attributes
 ```python
 self.last_audio_time: Dict[int, float] = {}      # Track last audio per user
 self.silence_tasks: Dict[int, asyncio.Task] = {} # Silence detection tasks
 self.silence_timeout = 1.5  # Seconds of silence before "final"
 ```
 ### New Method
 ```python
 async def _detect_silence(self, user_id: int):
    """
    Wait for silence timeout and send 'final' command to STT.
    Called after each audio chunk.
    """
    await asyncio.sleep(self.silence_timeout)
    stt_client = self.stt_clients.get(user_id)
    if stt_client and stt_client.is_connected():
        await stt_client.send_final()
 ```
 ### Integration
 - Called after sending each audio chunk
 - Cancels previous silence task if new audio arrives
 - Automatically cleaned up when stopping listening
 ---
 ## Testing
 ### Test 1: Basic Transcription
 1. Join voice channel
 2. Run `!miku listen`
 3. **Speak a sentence** and wait 1.5 seconds
 4. **Expected**: Final transcript appears and is sent to LlamaCPP
 ### Test 2: Continuous Speech
 1. Start listening
 2. **Speak multiple sentences** with pauses < 1.5s between them
 3. **Expected**: Partial transcripts update, final sent after last sentence
 ### Test 3: Multiple Users
 1. Have 2+ users in voice channel
 2. Each runs `!miku listen`
 3. Both speak (taking turns or simultaneously)
 4. **Expected**: Each user's speech is transcribed independently
 ---
 ## Configuration
 ### Silence Timeout
 Default: `1.5` seconds
 **To adjust**, edit `voice_receiver.py`:
 ```python
 self.silence_timeout = 1.5  # Change this value
 ```
 **Recommendations**:
 - **Too short (< 1.0s)**: May cut off during natural pauses in speech
 - **Too long (> 3.0s)**: User waits too long for response
 - **Sweet spot**: 1.5-2.0s works well for conversational speech
 ---
 ## Monitoring
 ### Check Logs for Silence Detection
 ```bash
 docker logs miku-bot 2>&1 | grep "Silence detected"
 ```
 **Expected output**:
 ```
 [DEBUG] Silence detected for user 209381657369772032, requesting final transcript
 ```
 ### Check Final Transcripts
 ```bash
 docker logs miku-bot 2>&1 | grep "FINAL TRANSCRIPT"
 ```
 ### Check STT Processing
 ```bash
 docker logs miku-stt 2>&1 | grep "Final transcription"
 ```
 ---
 ## Debugging
 ### Issue: No Final Transcript
 **Symptoms**: Partial transcripts appear but never finalize
 **Debug steps**:
 1. Check if silence detection is triggering:
   ```bash
   docker logs miku-bot 2>&1 | grep "Silence detected"
   ```
 2. Check if final command is being sent:
   ```bash
   docker logs miku-stt 2>&1 | grep "type.*final"
   ```
 3. Increase log level in stt_client.py:
   ```python
   logger.setLevel(logging.DEBUG)
   ```
 ### Issue: Cuts Off Mid-Sentence
 **Symptoms**: Final transcript triggers during natural pauses
 **Solution**: Increase silence timeout:
 ```python
 self.silence_timeout = 2.0  # or 2.5
 ```
 ### Issue: Too Slow to Respond
 **Symptoms**: Long wait after user stops speaking
 **Solution**: Decrease silence timeout:
 ```python
 self.silence_timeout = 1.0  # or 1.2
 ```
 ---
 ## Architecture
 ```
 Discord Voice → voice_receiver.py
                     ↓
            [Audio Chunk Received]
                     ↓
         ┌─────────────────────┐
         │  send_audio()       │
         │  to STT server      │
         └─────────────────────┘
                     ↓
         ┌─────────────────────┐
         │  Start silence      │
         │  detection timer    │
         │  (1.5s countdown)   │
         └─────────────────────┘
                     ↓
              ┌──────┴──────┐
              │             │
        More audio    No more audio
        arrives       for 1.5s
              │             │
              ↓             ↓
         Cancel timer  ┌──────────────┐
         Start new     │ send_final() │
                       │ to STT       │
                       └──────────────┘
                             ↓
                    ┌─────────────────┐
                    │ Final transcript│
                    │ → LlamaCPP     │
                    └─────────────────┘
 ```
 ---
 ## Files Modified
 1. **bot/utils/voice_receiver.py**
   - Added `last_audio_time` tracking
   - Added `silence_tasks` management
   - Added `_detect_silence()` method
   - Integrated silence detection in `_send_audio_chunk()`
   - Added cleanup in `stop_listening()`
 2. **bot/utils/stt_client.py** (previously)
   - Added `send_final()` method
   - Added `send_reset()` method
   - Updated protocol handler
 ---
 ## Next Steps
 1. **Test thoroughly** with different speech patterns
 2. **Tune silence timeout** based on user feedback
 3. **Consider VAD integration** for more accurate speech end detection
 4. **Add metrics** to track transcription latency
 ---
 **Status**: ✅ **READY FOR TESTING**
 The system now:
 - ✅ Connects to ONNX STT server (port 8766)
 - ✅ Uses CUDA GPU acceleration (cuDNN 9)
 - ✅ Receives partial transcripts
 - ✅ Automatically detects silence
 - ✅ Sends final command after 1.5s silence
 - ✅ Forwards final transcript to LlamaCPP
 **Test it now with `!miku listen`!**
--- a/STT_DEBUG_SUMMARY.md
+++ b/STT_DEBUG_SUMMARY.md
@@ -1,207 +0,0 @@
 # STT Debug Summary - January 18, 2026
 ## Issues Identified & Fixed ✅
 ### 1. **CUDA Not Being Used** ❌ → ✅
 **Problem:** Container was falling back to CPU, causing slow transcription.
 **Root Cause:** 
 ```
 libcudnn.so.9: cannot open shared object file: No such file or directory
 ```
 The ONNX Runtime requires cuDNN 9, but the base image `nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04` only had cuDNN 8.
 **Fix Applied:**
 ```dockerfile
 # Changed from:
 FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
 # To:
 FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04
 ```
 **Verification:**
 ```bash
 $ docker logs miku-stt 2>&1 | grep "Providers"
 INFO:asr.asr_pipeline:Providers: [('CUDAExecutionProvider', {'device_id': 0, ...}), 'CPUExecutionProvider']
 ```
 ✅ CUDAExecutionProvider is now loaded successfully!
 ---
 ### 2. **Connection Refused Error** ❌ → ✅
 **Problem:** Bot couldn't connect to STT service.
 **Error:**
 ```
 ConnectionRefusedError: [Errno 111] Connect call failed ('172.20.0.5', 8000)
 ```
 **Root Cause:** Port mismatch between bot and STT server.
 - Bot was connecting to: `ws://miku-stt:8000`
 - STT server was running on: `ws://miku-stt:8766`
 **Fix Applied:**
 Updated `bot/utils/stt_client.py`:
 ```python
 def __init__(
    self,
    user_id: str,
    stt_url: str = "ws://miku-stt:8766/ws/stt",  # ← Changed from 8000
    ...
 )
 ```
 ---
 ### 3. **Protocol Mismatch** ❌ → ✅
 **Problem:** Bot and STT server were using incompatible protocols.
 **Old NeMo Protocol:**
 - Automatic VAD detection
 - Events: `vad`, `partial`, `final`, `interruption`
 - No manual control needed
 **New ONNX Protocol:**
 - Manual transcription control
 - Events: `transcript` (with `is_final` flag), `info`, `error`
 - Requires sending `{"type": "final"}` command to get final transcript
 **Fix Applied:**
 1. **Updated event handler** in `stt_client.py`:
 ```python
 async def _handle_event(self, event: dict):
    event_type = event.get('type')
    if event_type == 'transcript':
        # New ONNX protocol
        text = event.get('text', '')
        is_final = event.get('is_final', False)
        if is_final:
            if self.on_final_transcript:
                await self.on_final_transcript(text, timestamp)
        else:
            if self.on_partial_transcript:
                await self.on_partial_transcript(text, timestamp)
    # Also maintains backward compatibility with old protocol
    elif event_type == 'partial' or event_type == 'final':
        # Legacy support...
 ```
 2. **Added new methods** for manual control:
 ```python
 async def send_final(self):
    """Request final transcription from STT server."""
    command = json.dumps({"type": "final"})
    await self.websocket.send_str(command)
 async def send_reset(self):
    """Reset the STT server's audio buffer."""
    command = json.dumps({"type": "reset"})
    await self.websocket.send_str(command)
 ```
 ---
 ## Current Status
 ### Containers
 - ✅ `miku-stt`: Running with CUDA 12.6.2 + cuDNN 9
 - ✅ `miku-bot`: Rebuilt with updated STT client
 - ✅ Both containers healthy and communicating on correct port
 ### STT Container Logs
 ```
 CUDA Version 12.6.2
 INFO:asr.asr_pipeline:Providers: [('CUDAExecutionProvider', ...)]
 INFO:asr.asr_pipeline:Model loaded successfully
 INFO:__main__:Server running on ws://0.0.0.0:8766
 INFO:__main__:Active connections: 0
 ```
 ### Files Modified
 1. `stt-parakeet/Dockerfile` - Updated base image to CUDA 12.6.2
 2. `bot/utils/stt_client.py` - Fixed port, protocol, added new methods
 3. `docker-compose.yml` - Already updated to use new STT service
 4. `STT_MIGRATION.md` - Added troubleshooting section
 ---
 ## Testing Checklist
 ### Ready to Test ✅
 - [x] CUDA GPU acceleration enabled
 - [x] Port configuration fixed
 - [x] Protocol compatibility updated
 - [x] Containers rebuilt and running
 ### Next Steps for User 🧪
 1. **Test voice commands**: Use `!miku listen` in Discord
 2. **Verify transcription**: Check if audio is transcribed correctly
 3. **Monitor performance**: Check transcription speed and quality
 4. **Check logs**: Monitor `docker logs miku-bot` and `docker logs miku-stt` for errors
 ### Expected Behavior
 - Bot connects to STT server successfully
 - Audio is streamed to STT server
 - Progressive transcripts appear (optional, may need VAD integration)
 - Final transcript is returned when user stops speaking
 - No more CUDA/cuDNN errors
 - No more connection refused errors
 ---
 ## Technical Notes
 ### GPU Utilization
 - **Before:** CPU fallback (0% GPU usage)
 - **After:** CUDA acceleration (~85-95% GPU usage on GTX 1660)
 ### Performance Expectations
 - **Transcription Speed:** ~0.5-1 second per utterance (down from 2-3 seconds)
 - **VRAM Usage:** ~2-3GB (down from 4-5GB with NeMo)
 - **Model:** Parakeet TDT 0.6B (ONNX optimized)
 ### Known Limitations
 - No word-level timestamps (ONNX model doesn't provide them)
 - Progressive transcription requires sending audio chunks regularly
 - Must call `send_final()` to get final transcript (not automatic)
 ---
 ## Additional Information
 ### Container Network
 - Network: `miku-discord_default`
 - STT Service: `miku-stt:8766`
 - Bot Service: `miku-bot`
 ### Health Check
 ```bash
 # Check STT container health
 docker inspect miku-stt | grep -A5 Health
 # Test WebSocket connection
 curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" \
  -H "Sec-WebSocket-Version: 13" -H "Sec-WebSocket-Key: test" \
  http://localhost:8766/
 ```
 ### Logs Monitoring
 ```bash
 # Follow both containers
 docker-compose logs -f miku-bot miku-stt
 # Just STT
 docker logs -f miku-stt
 # Search for errors
 docker logs miku-bot 2>&1 | grep -i "error\|failed\|exception"
 ```
 ---
 **Migration Status:** ✅ **COMPLETE - READY FOR TESTING**
--- a/STT_FIX_COMPLETE.md
+++ b/STT_FIX_COMPLETE.md
@@ -1,192 +0,0 @@
 # STT Fix Applied - Ready for Testing
 ## Summary
 Fixed all three issues preventing the ONNX-based Parakeet STT from working:
 1. ✅ **CUDA Support**: Updated Docker base image to include cuDNN 9
 2. ✅ **Port Configuration**: Fixed bot to connect to port 8766 (found TWO places)
 3. ✅ **Protocol Compatibility**: Updated event handler for new ONNX format
 ---
 ## Files Modified
 ### 1. `stt-parakeet/Dockerfile`
 ```diff
 - FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
 + FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04
 ```
 ### 2. `bot/utils/stt_client.py`
 ```diff
 - stt_url: str = "ws://miku-stt:8000/ws/stt"
 + stt_url: str = "ws://miku-stt:8766/ws/stt"
 ```
 Added new methods:
 - `send_final()` - Request final transcription
 - `send_reset()` - Clear audio buffer
 Updated `_handle_event()` to support:
 - New ONNX protocol: `{"type": "transcript", "is_final": true/false}`
 - Legacy protocol: `{"type": "partial"}`, `{"type": "final"}` (backward compatibility)
 ### 3. `bot/utils/voice_receiver.py` ⚠️ **KEY FIX**
 ```diff
 - def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8000/ws/stt"):
 + def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8766/ws/stt"):
 ```
 **This was the missing piece!** The `voice_receiver` was overriding the default URL.
 ---
 ## Container Status
 ### STT Container ✅
 ```bash
 $ docker logs miku-stt 2>&1 | tail -10
 ```
 ```
 CUDA Version 12.6.2
 INFO:asr.asr_pipeline:Providers: [('CUDAExecutionProvider', ...)]
 INFO:asr.asr_pipeline:Model loaded successfully
 INFO:__main__:Server running on ws://0.0.0.0:8766
 INFO:__main__:Active connections: 0
 ```
 **Status**: ✅ Running with CUDA acceleration
 ### Bot Container ✅
 - Files copied directly into running container (faster than rebuild)
 - Python bytecode cache cleared
 - Container restarted
 ---
 ## Testing Instructions
 ### Test 1: Basic Connection
 1. Join a voice channel in Discord
 2. Run `!miku listen`
 3. **Expected**: Bot connects without "Connection Refused" error
 4. **Check logs**: `docker logs miku-bot 2>&1 | grep "STT"`
 ### Test 2: Transcription
 1. After running `!miku listen`, speak into your microphone
 2. **Expected**: Your speech is transcribed
 3. **Check STT logs**: `docker logs miku-stt 2>&1 | tail -20`
 4. **Check bot logs**: Look for "Partial transcript" or "Final transcript" messages
 ### Test 3: Performance
 1. Monitor GPU usage: `nvidia-smi -l 1`
 2. **Expected**: GPU utilization increases when transcribing
 3. **Expected**: Transcription completes in ~0.5-1 second
 ---
 ## Monitoring Commands
 ### Check Both Containers
 ```bash
 docker logs -f --tail=50 miku-bot miku-stt
 ```
 ### Check STT Service Health
 ```bash
 docker ps | grep miku-stt
 docker logs miku-stt 2>&1 | grep "CUDA\|Providers\|Server running"
 ```
 ### Check for Errors
 ```bash
 # Bot errors
 docker logs miku-bot 2>&1 | grep -i "error\|failed" | tail -20
 # STT errors
 docker logs miku-stt 2>&1 | grep -i "error\|failed" | tail -20
 ```
 ### Test WebSocket Connection
 ```bash
 # From host machine
 curl -i -N \
  -H "Connection: Upgrade" \
  -H "Upgrade: websocket" \
  -H "Sec-WebSocket-Version: 13" \
  -H "Sec-WebSocket-Key: test" \
  http://localhost:8766/
 ```
 ---
 ## Known Issues & Workarounds
 ### Issue: Bot Still Shows Old Errors
 **Symptom**: After restart, logs still show port 8000 errors
 **Cause**: Python module caching or log entries from before restart
 **Solution**: 
 ```bash
 # Clear cache and restart
 docker exec miku-bot find /app -name "*.pyc" -delete
 docker restart miku-bot
 # Wait 10 seconds for full restart
 sleep 10
 ```
 ### Issue: Container Rebuild Takes 15+ Minutes
 **Cause**: `playwright install` downloads chromium/firefox browsers (~500MB)
 **Workaround**: Instead of full rebuild, use `docker cp`:
 ```bash
 docker cp bot/utils/stt_client.py miku-bot:/app/utils/stt_client.py
 docker cp bot/utils/voice_receiver.py miku-bot:/app/utils/voice_receiver.py
 docker restart miku-bot
 ```
 ---
 ## Next Steps
 ### For Full Deployment (after testing)
 1. Rebuild bot container properly:
   ```bash
   docker-compose build miku-bot
   docker-compose up -d miku-bot
   ```
 2. Remove old STT directory:
   ```bash
   mv stt stt.backup
   ```
 3. Update documentation to reflect new architecture
 ### Optional Enhancements
 1. Add `send_final()` call when user stops speaking (VAD integration)
 2. Implement progressive transcription display
 3. Add transcription quality metrics/logging
 4. Test with multiple simultaneous users
 ---
 ## Quick Reference
 | Component | Old (NeMo) | New (ONNX) |
 |-----------|------------|------------|
 | **Port** | 8000 | 8766 |
 | **VRAM** | 4-5GB | 2-3GB |
 | **Speed** | 2-3s | 0.5-1s |
 | **cuDNN** | 8 | 9 |
 | **CUDA** | 12.1 | 12.6.2 |
 | **Protocol** | Auto VAD | Manual control |
 ---
 **Status**: ✅ **ALL FIXES APPLIED - READY FOR USER TESTING**
 Last Updated: January 18, 2026 20:47 EET
--- a/STT_MIGRATION.md
+++ b/STT_MIGRATION.md
@@ -1,237 +0,0 @@
 # STT Migration: NeMo → ONNX Runtime
 ## What Changed
 **Old Implementation** (`stt/`):
 - Used NVIDIA NeMo toolkit with PyTorch
 - Heavy memory usage (~4-5GB VRAM)
 - Complex dependency tree (NeMo, transformers, huggingface-hub conflicts)
 - Slow transcription (~2-3 seconds per utterance)
 - Custom VAD + FastAPI WebSocket server
 **New Implementation** (`stt-parakeet/`):
 - Uses `onnx-asr` library with ONNX Runtime
 - Optimized VRAM usage (~2-3GB VRAM)
 - Simple dependencies (onnxruntime-gpu, onnx-asr, numpy)
 - **Much faster transcription** (~0.5-1 second per utterance)
 - Clean architecture with modular ASR pipeline
 ## Architecture
 ```
 stt-parakeet/
 ├── Dockerfile              # CUDA 12.1 + Python 3.11 + ONNX Runtime
 ├── requirements-stt.txt    # Exact pinned dependencies
 ├── asr/
 │   └── asr_pipeline.py    # ONNX ASR wrapper with GPU acceleration
 ├── server/
 │   └── ws_server.py       # WebSocket server (port 8766)
 ├── vad/
 │   └── silero_vad.py      # Voice Activity Detection
 └── models/                # Model cache (auto-downloaded)
 ```
 ## Docker Setup
 ### Build
 ```bash
 docker-compose build miku-stt
 ```
 ### Run
 ```bash
 docker-compose up -d miku-stt
 ```
 ### Check Logs
 ```bash
 docker logs -f miku-stt
 ```
 ### Verify CUDA
 ```bash
 docker exec miku-stt python3.11 -c "import onnxruntime as ort; print('CUDA:', 'CUDAExecutionProvider' in ort.get_available_providers())"
 ```
 ## API Changes
 ### Old Protocol (port 8001)
 ```python
 # FastAPI with /ws/stt/{user_id} endpoint
 ws://localhost:8001/ws/stt/123456
 # Events:
 {
  "type": "vad",
  "event": "speech_start" | "speaking" | "speech_end",
  "probability": 0.95
 }
 {
  "type": "partial",
  "text": "Hello",
  "words": []
 }
 {
  "type": "final",
  "text": "Hello world",
  "words": [{"word": "Hello", "start_time": 0.0, "end_time": 0.5}]
 }
 ```
 ### New Protocol (port 8766)
 ```python
 # Direct WebSocket connection
 ws://localhost:8766
 # Send audio (binary):
 # - int16 PCM, 16kHz mono
 # - Send as raw bytes
 # Send commands (JSON):
 {"type": "final"}   # Trigger final transcription
 {"type": "reset"}   # Clear audio buffer
 # Receive transcripts:
 {
  "type": "transcript",
  "text": "Hello world",
  "is_final": false  # Progressive transcription
 }
 {
  "type": "transcript",
  "text": "Hello world",
  "is_final": true   # Final transcription after "final" command
 }
 ```
 ## Bot Integration Changes Needed
 ### 1. Update WebSocket URL
 ```python
 # Old
 ws://miku-stt:8000/ws/stt/{user_id}
 # New
 ws://miku-stt:8766
 ```
 ### 2. Update Message Format
 ```python
 # Old: Send audio with metadata
 await websocket.send_bytes(audio_data)
 # New: Send raw audio bytes (same)
 await websocket.send(audio_data)  # bytes
 # Old: Listen for VAD events
 if msg["type"] == "vad":
    # Handle VAD
 # New: No VAD events (handled internally)
 # Just send final command when user stops speaking
 await websocket.send(json.dumps({"type": "final"}))
 ```
 ### 3. Update Response Handling
 ```python
 # Old
 if msg["type"] == "partial":
    text = msg["text"]
    words = msg["words"]
 if msg["type"] == "final":
    text = msg["text"]
    words = msg["words"]
 # New
 if msg["type"] == "transcript":
    text = msg["text"]
    is_final = msg["is_final"]
    # No word-level timestamps in ONNX version
 ```
 ## Performance Comparison
 | Metric | Old (NeMo) | New (ONNX) |
 |--------|-----------|-----------|
 | **VRAM Usage** | 4-5GB | 2-3GB |
 | **Transcription Speed** | 2-3s | 0.5-1s |
 | **Build Time** | ~10 min | ~5 min |
 | **Dependencies** | 50+ packages | 15 packages |
 | **GPU Utilization** | 60-70% | 85-95% |
 | **OOM Crashes** | Frequent | None |
 ## Migration Steps
 1. ✅ Build new container: `docker-compose build miku-stt`
 2. ✅ Update bot WebSocket client (`bot/utils/stt_client.py`)
 3. ✅ Update voice receiver to send "final" command
 4. ⏳ Test transcription quality
 5. ⏳ Remove old `stt/` directory
 ## Troubleshooting
 ### Issue 1: CUDA Not Working (Falling Back to CPU)
 **Symptoms:** 
 ```
 [E:onnxruntime:Default] Failed to load library libonnxruntime_providers_cuda.so 
 with error: libcudnn.so.9: cannot open shared object file
 ```
 **Cause:** ONNX Runtime GPU requires cuDNN 9, but CUDA 12.1 base image only has cuDNN 8.
 **Fix:** Update Dockerfile base image:
 ```dockerfile
 FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04
 ```
 **Verify:**
 ```bash
 docker logs miku-stt 2>&1 | grep "Providers"
 # Should show: CUDAExecutionProvider (not just CPUExecutionProvider)
 ```
 ### Issue 2: Connection Refused (Port 8000)
 **Symptoms:**
 ```
 ConnectionRefusedError: [Errno 111] Connect call failed ('172.20.0.5', 8000)
 ```
 **Cause:** New ONNX server runs on port 8766, not 8000.
 **Fix:** Update `bot/utils/stt_client.py`:
 ```python
 stt_url: str = "ws://miku-stt:8766/ws/stt"  # Changed from 8000
 ```
 ### Issue 3: Protocol Mismatch
 **Symptoms:** Bot doesn't receive transcripts, or transcripts are empty.
 **Cause:** New ONNX server uses different WebSocket protocol.
 **Old Protocol (NeMo):** Automatic VAD-triggered `partial` and `final` events
 **New Protocol (ONNX):** Manual control with `{"type": "final"}` command
 **Fix:** 
 - Updated `stt_client._handle_event()` to handle `transcript` type with `is_final` flag
 - Added `send_final()` method to request final transcription
 - Bot should call `stt_client.send_final()` when user stops speaking
 ## Rollback Plan
 If needed, revert docker-compose.yml:
 ```yaml
 miku-stt:
  build:
    context: ./stt
    dockerfile: Dockerfile.stt
  # ... rest of old config
 ```
 ## Notes
 - Model downloads on first run (~600MB)
 - Models cached in `./stt-parakeet/models/`
 - No word-level timestamps (ONNX model doesn't provide them)
 - VAD handled internally (no need for external VAD integration)
 - Uses same GPU (GTX 1660, device 0) as before
--- a/STT_VOICE_TESTING.md
+++ b/STT_VOICE_TESTING.md
@@ -1,266 +0,0 @@
 # STT Voice Testing Guide
 ## Phase 4B: Bot-Side STT Integration - COMPLETE ✅
 All code has been deployed to containers. Ready for testing!
 ## Architecture Overview
 ```
 Discord Voice (User) → Opus 48kHz stereo
                ↓
        VoiceReceiver.write()
                ↓
        Opus decode → Stereo-to-mono → Resample to 16kHz
                ↓
        STTClient.send_audio() → WebSocket
                ↓
        miku-stt:8001 (Silero VAD + Faster-Whisper)
                ↓
        JSON events (vad, partial, final, interruption)
                ↓
        VoiceReceiver callbacks → voice_manager
                ↓
        on_final_transcript() → _generate_voice_response()
                ↓
        LLM streaming → TTS tokens → Audio playback
 ```
 ## New Voice Commands
 ### 1. Start Listening
 ```
 !miku listen
 ```
 - Starts listening to **your** voice in the current voice channel
 - You must be in the same channel as Miku
 - Miku will transcribe your speech and respond with voice
 ```
 !miku listen @username
 ```
 - Start listening to a specific user's voice
 - Useful for moderators or testing with multiple users
 ### 2. Stop Listening
 ```
 !miku stop-listening
 ```
 - Stop listening to your voice
 - Miku will no longer transcribe or respond to your speech
 ```
 !miku stop-listening @username
 ```
 - Stop listening to a specific user
 ## Testing Procedure
 ### Test 1: Basic STT Connection
 1. Join a voice channel
 2. `!miku join` - Miku joins your channel
 3. `!miku listen` - Start listening to your voice
 4. Check bot logs for "Started listening to user"
 5. Check STT logs: `docker logs miku-stt --tail 50`
   - Should show: "WebSocket connection from user {user_id}"
   - Should show: "Session started for user {user_id}"
 ### Test 2: VAD Detection
 1. After `!miku listen`, speak into your microphone
 2. Say something like: "Hello Miku, can you hear me?"
 3. Check STT logs for VAD events:
   ```
   [DEBUG] VAD: speech_start probability=0.85
   [DEBUG] VAD: speaking probability=0.92
   [DEBUG] VAD: speech_end probability=0.15
   ```
 4. Bot logs should show: "VAD event for user {id}: speech_start/speaking/speech_end"
 ### Test 3: Transcription
 1. Speak clearly into microphone: "Hey Miku, tell me a joke"
 2. Watch bot logs for:
   - "Partial transcript from user {id}: Hey Miku..."
   - "Final transcript from user {id}: Hey Miku, tell me a joke"
 3. Miku should respond with LLM-generated speech
 4. Check channel for: "🎤 Miku: *[her response]*"
 ### Test 4: Interruption Detection
 1. `!miku listen`
 2. `!miku say Tell me a very long story about your favorite song`
 3. While Miku is speaking, start talking yourself
 4. Speak loudly enough to trigger VAD (probability > 0.7)
 5. Expected behavior:
   - Miku's audio should stop immediately
   - Bot logs: "User {id} interrupted Miku (probability={prob})"
   - STT logs: "Interruption detected during TTS playback"
   - RVC logs: "Interrupted: Flushed {N} ZMQ chunks"
 ### Test 5: Multi-User (if available)
 1. Have two users join voice channel
 2. `!miku listen @user1` - Listen to first user
 3. `!miku listen @user2` - Listen to second user
 4. Both users speak separately
 5. Verify Miku responds to each user individually
 6. Check STT logs for multiple active sessions
 ## Logs to Monitor
 ### Bot Logs
 ```bash
 docker logs -f miku-bot | grep -E "(listen|STT|transcript|interrupt)"
 ```
 Expected output:
 ```
 [INFO] Started listening to user 123456789 (username)
 [DEBUG] VAD event for user 123456789: speech_start
 [DEBUG] Partial transcript from user 123456789: Hello Miku...
 [INFO] Final transcript from user 123456789: Hello Miku, how are you?
 [INFO] User 123456789 interrupted Miku (probability=0.82)
 ```
 ### STT Logs
 ```bash
 docker logs -f miku-stt
 ```
 Expected output:
 ```
 [INFO] WebSocket connection from user_123456789
 [INFO] Session started for user 123456789
 [DEBUG] Received 320 audio samples from user_123456789
 [DEBUG] VAD speech_start: probability=0.87
 [INFO] Transcribing audio segment (duration=2.5s)
 [INFO] Final transcript: "Hello Miku, how are you?"
 ```
 ### RVC Logs (for interruption)
 ```bash
 docker logs -f miku-rvc-api | grep -i interrupt
 ```
 Expected output:
 ```
 [INFO] Interrupted: Flushed 15 ZMQ chunks, cleared 48000 RVC buffer samples
 ```
 ## Component Status
 ### ✅ Completed
 - [x] STT container running (miku-stt:8001)
 - [x] Silero VAD on CPU with chunk buffering
 - [x] Faster-Whisper on GTX 1660 (1.3GB VRAM)
 - [x] STTClient WebSocket client
 - [x] VoiceReceiver Discord audio sink
 - [x] VoiceSession STT integration
 - [x] listen/stop-listening commands
 - [x] /interrupt endpoint in RVC API
 - [x] LLM response generation from transcripts
 - [x] Interruption detection and cancellation
 ### ⏳ Pending Testing
 - [ ] Basic STT connection test
 - [ ] VAD speech detection test
 - [ ] End-to-end transcription test
 - [ ] LLM voice response test
 - [ ] Interruption cancellation test
 - [ ] Multi-user testing (if available)
 ### 🔧 Configuration Tuning (after testing)
 - VAD sensitivity (currently threshold=0.5)
 - VAD timing (min_speech=250ms, min_silence=500ms)
 - Interruption threshold (currently 0.7)
 - Whisper beam size and patience
 - LLM streaming chunk size
 ## API Endpoints
 ### STT Container (port 8001)
 - WebSocket: `ws://localhost:8001/ws/stt/{user_id}`
 - Health: `http://localhost:8001/health`
 ### RVC Container (port 8765)
 - WebSocket: `ws://localhost:8765/ws/stream`
 - Interrupt: `http://localhost:8765/interrupt` (POST)
 - Health: `http://localhost:8765/health`
 ## Troubleshooting
 ### No audio received from Discord
 - Check bot logs for "write() called with data"
 - Verify user is in same voice channel as Miku
 - Check Discord permissions (View Channel, Connect, Speak)
 ### VAD not detecting speech
 - Check chunk buffer accumulation in STT logs
 - Verify audio format: PCM int16, 16kHz mono
 - Try speaking louder or more clearly
 - Check VAD threshold (may need adjustment)
 ### Transcription empty or gibberish
 - Verify Whisper model loaded (check STT startup logs)
 - Check GPU VRAM usage: `nvidia-smi`
 - Ensure audio segments are at least 1-2 seconds long
 - Try speaking more clearly with less background noise
 ### Interruption not working
 - Verify Miku is actually speaking (check miku_speaking flag)
 - Check VAD probability in logs (must be > 0.7)
 - Verify /interrupt endpoint returns success
 - Check RVC logs for flushed chunks
 ### Multiple users causing issues
 - Check STT logs for per-user session management
 - Verify each user has separate STTClient instance
 - Check for resource contention on GTX 1660
 ## Next Steps After Testing
 ### Phase 4C: LLM KV Cache Precomputation
 - Use partial transcripts to start LLM generation early
 - Precompute KV cache for common phrases
 - Reduce latency between speech end and response start
 ### Phase 4D: Multi-User Refinement
 - Queue management for multiple simultaneous speakers
 - Priority system for interruptions
 - Resource allocation for multiple Whisper requests
 ### Phase 4E: Latency Optimization
 - Profile each stage of the pipeline
 - Optimize audio chunk sizes
 - Reduce WebSocket message overhead
 - Tune Whisper beam search parameters
 - Implement VAD lookahead for quicker detection
 ## Hardware Utilization
 ### Current Allocation
 - **AMD RX 6800**: LLaMA text models (idle during listen/speak)
 - **GTX 1660**: 
  - Listen phase: Faster-Whisper (1.3GB VRAM)
  - Speak phase: Soprano TTS + RVC (time-multiplexed)
 - **CPU**: Silero VAD, audio preprocessing
 ### Expected Performance
 - VAD latency: <50ms (CPU processing)
 - Transcription latency: 200-500ms (Whisper inference)
 - LLM streaming: 20-30 tokens/sec (RX 6800)
 - TTS synthesis: Real-time (GTX 1660)
 - Total latency (speech → response): 1-2 seconds
 ## Testing Checklist
 Before marking Phase 4B as complete:
 - [ ] Test basic STT connection with `!miku listen`
 - [ ] Verify VAD detects speech start/end correctly
 - [ ] Confirm transcripts are accurate and complete
 - [ ] Test LLM voice response generation works
 - [ ] Verify interruption cancels TTS playback
 - [ ] Check multi-user handling (if possible)
 - [ ] Verify resource cleanup on `!miku stop-listening`
 - [ ] Test edge cases (silence, background noise, overlapping speech)
 - [ ] Profile latencies at each stage
 - [ ] Document any configuration tuning needed
 ---
 **Status**: Code deployed, ready for user testing! 🎤🤖
--- a/VOICE_CALL_AUTOMATION.md
+++ b/VOICE_CALL_AUTOMATION.md
@@ -1,261 +0,0 @@
 # Voice Call Automation System
 ## Overview
 Miku now has an automated voice call system that can be triggered from the web UI. This replaces the manual command-based voice chat flow with a seamless, immersive experience.
 ## Features
 ### 1. Voice Debug Mode Toggle
 - **Environment Variable**: `VOICE_DEBUG_MODE` (default: `false`)
 - When `true`: Shows manual commands, text notifications, transcripts in chat
 - When `false` (field deployment): Silent operation, no command notifications
 ### 2. Automated Voice Call Flow
 #### Initiation (Web UI → API)
 ```
 POST /api/voice/call
 {
  "user_id": 123456789,
  "voice_channel_id": 987654321
 }
 ```
 #### What Happens:
 1. **Container Startup**: Starts `miku-stt` and `miku-rvc-api` containers
 2. **Warmup Wait**: Monitors containers until fully warmed up
   - STT: WebSocket connection check (30s timeout)
   - TTS: Health endpoint check for `warmed_up: true` (60s timeout)
 3. **Join Voice Channel**: Creates voice session with full resource locking
 4. **Send DM**: Generates personalized LLM invitation and sends with voice channel invite link
 5. **Auto-Listen**: Automatically starts listening when user joins
 #### User Join Detection:
 - Monitors `on_voice_state_update` events
 - When target user joins:
  - Marks `user_has_joined = True`
  - Cancels 30min timeout
  - Auto-starts STT for that user
 #### Auto-Leave After User Disconnect:
 - **45 second timer** starts when user leaves voice channel
 - If user doesn't rejoin within 45s:
  - Ends voice session
  - Stops STT and TTS containers
  - Releases all resources
  - Returns to normal operation
 - If user rejoins before 45s, timer is cancelled
 #### 30-Minute Join Timeout:
 - If user never joins within 30 minutes:
  - Ends voice session
  - Stops containers
  - Sends timeout DM: "Aww, I guess you couldn't make it to voice chat... Maybe next time! 💙"
 ### 3. Container Management
 **File**: `bot/utils/container_manager.py`
 #### Methods:
 - `start_voice_containers()`: Starts STT & TTS, waits for warmup
 - `stop_voice_containers()`: Stops both containers
 - `are_containers_running()`: Check container status
 - `_wait_for_stt_warmup()`: WebSocket connection check
 - `_wait_for_tts_warmup()`: Health endpoint check
 #### Warmup Detection:
 ```python
 # STT Warmup: Try WebSocket connection
 ws://miku-stt:8765
 # TTS Warmup: Check health endpoint
 GET http://miku-rvc-api:8765/health
 Response: {"status": "ready", "warmed_up": true}
 ```
 ### 4. Voice Session Tracking
 **File**: `bot/utils/voice_manager.py`
 #### New VoiceSession Fields:
 ```python
 call_user_id: Optional[int]  # User ID that was called
 call_timeout_task: Optional[asyncio.Task]  # 30min timeout
 user_has_joined: bool  # Track if user joined
 auto_leave_task: Optional[asyncio.Task]  # 45s auto-leave
 user_leave_time: Optional[float]  # When user left
 ```
 #### Methods:
 - `on_user_join(user_id)`: Handle user joining voice channel
 - `on_user_leave(user_id)`: Start 45s auto-leave timer
 - `_auto_leave_after_user_disconnect()`: Execute auto-leave
 ### 5. LLM Context Update
 Miku's voice chat prompt now includes:
 ```
 NOTE: You will automatically disconnect 45 seconds after {user.name} leaves the voice channel,
 so you can mention this if asked about leaving
 ```
 ### 6. Debug Mode Integration
 #### With `VOICE_DEBUG_MODE=true`:
 - Shows "🎤 User said: ..." in text chat
 - Shows "💬 Miku: ..." responses
 - Shows interruption messages
 - Manual commands work (`!miku join`, `!miku listen`, etc.)
 #### With `VOICE_DEBUG_MODE=false` (field deployment):
 - No text notifications
 - No command outputs
 - Silent operation
 - Only log files show activity
 ## API Endpoint
 ### POST `/api/voice/call`
 **Request Body**:
 ```json
 {
  "user_id": 123456789,
  "voice_channel_id": 987654321
 }
 ```
 **Success Response**:
 ```json
 {
  "success": true,
  "user_id": 123456789,
  "channel_id": 987654321,
  "invite_url": "https://discord.gg/abc123"
 }
 ```
 **Error Response**:
 ```json
 {
  "success": false,
  "error": "Failed to start voice containers"
 }
 ```
 ## File Changes
 ### New Files:
 1. `bot/utils/container_manager.py` - Docker container management
 2. `VOICE_CALL_AUTOMATION.md` - This documentation
 ### Modified Files:
 1. `bot/globals.py` - Added `VOICE_DEBUG_MODE` flag
 2. `bot/api.py` - Added `/api/voice/call` endpoint and timeout handler
 3. `bot/bot.py` - Added `on_voice_state_update` event handler
 4. `bot/utils/voice_manager.py`:
   - Added call tracking fields to VoiceSession
   - Added `on_user_join()` and `on_user_leave()` methods
   - Added `_auto_leave_after_user_disconnect()` method
   - Updated LLM prompt with auto-disconnect context
   - Gated debug messages behind `VOICE_DEBUG_MODE`
 5. `bot/utils/voice_receiver.py` - Removed Discord VAD events (rely on RealtimeSTT only)
 ## Testing Checklist
 ### Web UI Integration:
 - [ ] Create voice call trigger UI with user ID and channel ID inputs
 - [ ] Display call status (starting containers, waiting for warmup, joined VC, waiting for user)
 - [ ] Show timeout countdown
 - [ ] Handle errors gracefully
 ### Flow Testing:
 - [ ] Test successful call flow (containers start → warmup → join → DM → user joins → conversation → user leaves → 45s timer → auto-leave → containers stop)
 - [ ] Test 30min timeout (user never joins)
 - [ ] Test user rejoin within 45s (cancels auto-leave)
 - [ ] Test container failure handling
 - [ ] Test warmup timeout handling
 - [ ] Test DM failure (should continue anyway)
 ### Debug Mode:
 - [ ] Test with `VOICE_DEBUG_MODE=true` (should see all notifications)
 - [ ] Test with `VOICE_DEBUG_MODE=false` (should be silent)
 ## Environment Variables
 Add to `.env` or `docker-compose.yml`:
 ```bash
 VOICE_DEBUG_MODE=false  # Set to true for debugging
 ```
 ## Next Steps
 1. **Web UI**: Create voice call interface with:
   - User ID input
   - Voice channel ID dropdown (fetch from Discord)
   - "Call User" button
   - Status display
   - Active call management
 2. **Monitoring**: Add voice call metrics:
   - Call duration
   - User join time
   - Auto-leave triggers
   - Container startup times
 3. **Enhancements**:
   - Multiple simultaneous calls (different channels)
   - Call history logging
   - User preferences (auto-answer, DND mode)
   - Scheduled voice calls
 ## Technical Notes
 ### Container Warmup Times:
 - **STT** (`miku-stt`): ~5-15 seconds (model loading)
 - **TTS** (`miku-rvc-api`): ~30-60 seconds (RVC model loading, synthesis warmup)
 - **Total**: ~35-75 seconds from API call to ready
 ### Resource Management:
 - Voice sessions use `VoiceSessionManager` singleton
 - Only one voice session active at a time
 - Full resource locking during voice:
  - AMD GPU for text inference
  - Vision model blocked
  - Image generation disabled
  - Bipolar mode disabled
  - Autonomous engine paused
 ### Cleanup Guarantees:
 - 45s auto-leave ensures no orphaned sessions
 - 30min timeout prevents indefinite container running
 - All cleanup paths stop containers
 - Voice session end releases all resources
 ## Troubleshooting
 ### Containers won't start:
 - Check Docker daemon status
 - Check `docker compose ps` for existing containers
 - Check logs: `docker logs miku-stt` / `docker logs miku-rvc-api`
 ### Warmup timeout:
 - STT: Check WebSocket is accepting connections on port 8765
 - TTS: Check health endpoint returns `{"warmed_up": true}`
 - Increase timeout values if needed (slow hardware)
 ### User never joins:
 - Verify invite URL is valid
 - Check user has permission to join voice channel
 - Verify DM was delivered (may be blocked)
 ### Auto-leave not triggering:
 - Check `on_voice_state_update` events are firing
 - Verify user ID matches `call_user_id`
 - Check logs for timer creation/cancellation
 ### Containers not stopping:
 - Manual stop: `docker compose stop miku-stt miku-rvc-api`
 - Check for orphaned containers: `docker ps`
 - Force remove: `docker rm -f miku-stt miku-rvc-api`
--- a/VOICE_CHAT_CONTEXT.md
+++ b/VOICE_CHAT_CONTEXT.md
@@ -1,225 +0,0 @@
 # Voice Chat Context System
 ## Implementation Complete ✅
 Added comprehensive voice chat context to give Miku awareness of the conversation environment.
 ---
 ## Features
 ### 1. Voice-Aware System Prompt
 Miku now knows she's in a voice chat and adjusts her behavior:
 - ✅ Aware she's speaking via TTS
 - ✅ Knows who she's talking to (user names included)
 - ✅ Understands responses will be spoken aloud
 - ✅ Instructed to keep responses short (1-3 sentences)
 - ✅ **CRITICAL: Instructed to only use English** (TTS can't handle Japanese well)
 ### 2. Conversation History (Last 8 Exchanges)
 - Stores last 16 messages (8 user + 8 assistant)
 - Maintains context across multiple voice interactions
 - Automatically trimmed to keep memory manageable
 - Each message includes username for multi-user context
 ### 3. Personality Integration
 - Loads `miku_lore.txt` - Her background, personality, likes/dislikes
 - Loads `miku_prompt.txt` - Core personality instructions
 - Combines with voice-specific instructions
 - Maintains character consistency
 ### 4. Reduced Log Spam
 - Set voice_recv logger to CRITICAL level
 - Suppresses routine CryptoErrors and RTCP packets
 - Only shows actual critical errors
 ---
 ## System Prompt Structure
 ```
 [miku_prompt.txt content]
 [miku_lore.txt content]
 VOICE CHAT CONTEXT:
 - You are currently in a voice channel speaking with {user.name} and others
 - Your responses will be spoken aloud via text-to-speech
 - Keep responses SHORT and CONVERSATIONAL (1-3 sentences max)
 - Speak naturally as if having a real-time voice conversation
 - IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.
 - Be expressive and use casual language, but stay in character as Miku
 Remember: This is a live voice conversation, so be concise and engaging!
 ```
 ---
 ## Conversation Flow
 ```
 User speaks → STT transcribes → Add to history
                                      ↓
                              [System Prompt]
                              [Last 8 exchanges]
                              [Current user message]
                                      ↓
                                  LLM generates
                                      ↓
                              Add response to history
                                      ↓
                              Stream to TTS → Speak
 ```
 ---
 ## Message History Format
 ```python
 conversation_history = [
    {"role": "user", "content": "koko210: Hey Miku, how are you?"},
    {"role": "assistant", "content": "Hey koko210! I'm doing great, thanks for asking!"},
    {"role": "user", "content": "koko210: Can you sing something?"},
    {"role": "assistant", "content": "I'd love to! What song would you like to hear?"},
    # ... up to 16 messages total (8 exchanges)
 ]
 ```
 ---
 ## Configuration
 ### Conversation History Limit
 **Current**: 16 messages (8 exchanges)
 To adjust, edit `voice_manager.py`:
 ```python
 # Keep only last 8 exchanges (16 messages = 8 user + 8 assistant)
 if len(self.conversation_history) > 16:
    self.conversation_history = self.conversation_history[-16:]
 ```
 **Recommendations**:
 - **8 exchanges**: Good balance (current setting)
 - **12 exchanges**: More context, slightly more tokens
 - **4 exchanges**: Minimal context, faster responses
 ### Response Length
 **Current**: max_tokens=200
 To adjust:
 ```python
 payload = {
    "max_tokens": 200  # Change this
 }
 ```
 ---
 ## Language Enforcement
 ### Why English-Only?
 The RVC TTS system is trained on English audio and struggles with:
 - Japanese characters (even though Miku is Japanese!)
 - Special characters
 - Mixed language text
 - Non-English phonetics
 ### Implementation
 The system prompt explicitly tells Miku:
 > **IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.**
 This is reinforced in every voice chat interaction.
 ---
 ## Testing
 ### Test 1: Basic Conversation
 ```
 User: "Hey Miku!"
 Miku: "Hi there! Great to hear from you!" (should be in English)
 User: "How are you doing?"
 Miku: "I'm doing wonderful! How about you?" (remembers previous exchange)
 ```
 ### Test 2: Context Retention
 Have a multi-turn conversation and verify Miku remembers:
 - Previous topics discussed
 - User names
 - Conversation flow
 ### Test 3: Response Length
 Verify responses are:
 - Short (1-3 sentences)
 - Conversational
 - Not truncated mid-sentence
 ### Test 4: Language Enforcement
 Try asking in Japanese or requesting Japanese response:
 - Miku should politely respond in English
 - Should explain she needs to use English for voice chat
 ---
 ## Monitoring
 ### Check Conversation History
 ```bash
 # Add debug logging to voice_manager.py to see history
 logger.debug(f"Conversation history: {self.conversation_history}")
 ```
 ### Check System Prompt
 ```bash
 docker exec miku-bot cat /app/miku_prompt.txt
 docker exec miku-bot cat /app/miku_lore.txt
 ```
 ### Monitor Responses
 ```bash
 docker logs -f miku-bot | grep "Voice response complete"
 ```
 ---
 ## Files Modified
 1. **bot/bot.py**
   - Changed voice_recv logger level from WARNING to CRITICAL
   - Suppresses CryptoError spam
 2. **bot/utils/voice_manager.py**
   - Added `conversation_history` to `VoiceSession.__init__()`
   - Updated `_generate_voice_response()` to load lore files
   - Built comprehensive voice-aware system prompt
   - Implemented conversation history tracking (last 8 exchanges)
   - Added English-only instruction
   - Saves both user and assistant messages to history
 ---
 ## Benefits
 ✅ **Better Context**: Miku remembers previous exchanges  
 ✅ **Cleaner Logs**: No more CryptoError spam  
 ✅ **Natural Responses**: Knows she's in voice chat, responds appropriately  
 ✅ **Language Consistency**: Enforces English for TTS compatibility  
 ✅ **Personality Intact**: Still loads lore and personality files  
 ✅ **User Awareness**: Knows who she's talking to  
 ---
 ## Next Steps
 1. **Test thoroughly** with multi-turn conversations
 2. **Adjust history length** if needed (currently 8 exchanges)
 3. **Fine-tune response length** based on TTS performance
 4. **Add conversation reset** command if needed (e.g., `!miku reset`)
 5. **Consider adding** conversation summaries for very long sessions
 ---
 **Status**: ✅ **DEPLOYED AND READY FOR TESTING**
 Miku now has full context awareness in voice chat with personality, conversation history, and language enforcement!
--- a/VOICE_TO_VOICE_REFERENCE.md
+++ b/VOICE_TO_VOICE_REFERENCE.md
@@ -1,323 +0,0 @@
 # Voice-to-Voice Quick Reference
 ## Complete Pipeline Status ✅
 All phases complete and deployed!
 ## Phase Completion Status
 ### ✅ Phase 1: Voice Connection (COMPLETE)
 - Discord voice channel connection
 - Audio playback via discord.py
 - Resource management and cleanup
 ### ✅ Phase 2: Audio Streaming (COMPLETE)
 - Soprano TTS server (GTX 1660)
 - RVC voice conversion
 - Real-time streaming via WebSocket
 - Token-by-token synthesis
 ### ✅ Phase 3: Text-to-Voice (COMPLETE)
 - LLaMA text generation (AMD RX 6800)
 - Streaming token pipeline
 - TTS integration with `!miku say`
 - Natural conversation flow
 ### ✅ Phase 4A: STT Container (COMPLETE)
 - Silero VAD on CPU
 - Faster-Whisper on GTX 1660
 - WebSocket server at port 8001
 - Per-user session management
 - Chunk buffering for VAD
 ### ✅ Phase 4B: Bot STT Integration (COMPLETE - READY FOR TESTING)
 - Discord audio capture
 - Opus decode + resampling
 - STT client WebSocket integration
 - Voice commands: `!miku listen`, `!miku stop-listening`
 - LLM voice response generation
 - Interruption detection and cancellation
 - `/interrupt` endpoint in RVC API
 ## Quick Start Commands
 ### Setup
 ```bash
 !miku join              # Join your voice channel
 !miku listen            # Start listening to your voice
 ```
 ### Usage
 - **Speak** into your microphone
 - Miku will **transcribe** your speech
 - Miku will **respond** with voice
 - **Interrupt** her by speaking while she's talking
 ### Teardown
 ```bash
 !miku stop-listening    # Stop listening to your voice
 !miku leave             # Leave voice channel
 ```
 ## Architecture Diagram
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │                         USER INPUT                              │
 └─────────────────────────────────────────────────────────────────┘
                              │
                              │ Discord Voice (Opus 48kHz)
                              ▼
 ┌─────────────────────────────────────────────────────────────────┐
 │                    miku-bot Container                           │
 │  ┌───────────────────────────────────────────────────────────┐ │
 │  │ VoiceReceiver (discord.sinks.Sink)                        │ │
 │  │  - Opus decode → PCM                                      │ │
 │  │  - Stereo → Mono                                          │ │
 │  │  - Resample 48kHz → 16kHz                                 │ │
 │  └─────────────────┬─────────────────────────────────────────┘ │
 │                    │ PCM int16, 16kHz, 20ms chunks              │
 │  ┌─────────────────▼─────────────────────────────────────────┐ │
 │  │ STTClient (WebSocket)                                     │ │
 │  │  - Sends audio to miku-stt                                │ │
 │  │  - Receives VAD events, transcripts                       │ │
 │  └─────────────────┬─────────────────────────────────────────┘ │
 └────────────────────┼───────────────────────────────────────────┘
                     │ ws://miku-stt:8001/ws/stt/{user_id}
                     ▼
 ┌─────────────────────────────────────────────────────────────────┐
 │                    miku-stt Container                           │
 │  ┌───────────────────────────────────────────────────────────┐ │
 │  │ VADProcessor (Silero VAD 5.1.2)         [CPU]            │ │
 │  │  - Chunk buffering (512 samples min)                      │ │
 │  │  - Speech detection (threshold=0.5)                       │ │
 │  │  - Events: speech_start, speaking, speech_end             │ │
 │  └─────────────────┬─────────────────────────────────────────┘ │
 │                    │ Audio segments                             │
 │  ┌─────────────────▼─────────────────────────────────────────┐ │
 │  │ WhisperTranscriber (Faster-Whisper 1.2.1) [GTX 1660]    │ │
 │  │  - Model: small (1.3GB VRAM)                              │ │
 │  │  - Transcribes speech segments                            │ │
 │  │  - Returns: partial & final transcripts                   │ │
 │  └─────────────────┬─────────────────────────────────────────┘ │
 └────────────────────┼───────────────────────────────────────────┘
                     │ JSON events via WebSocket
                     ▼
 ┌─────────────────────────────────────────────────────────────────┐
 │                    miku-bot Container                           │
 │  ┌───────────────────────────────────────────────────────────┐ │
 │  │ voice_manager.py Callbacks                                │ │
 │  │  - on_vad_event()         → Log VAD states                │ │
 │  │  - on_partial_transcript() → Show typing indicator        │ │
 │  │  - on_final_transcript()   → Generate LLM response        │ │
 │  │  - on_interruption()       → Cancel TTS playback          │ │
 │  └─────────────────┬─────────────────────────────────────────┘ │
 │                    │ Final transcript text                      │
 │  ┌─────────────────▼─────────────────────────────────────────┐ │
 │  │ _generate_voice_response()                                │ │
 │  │  - Build LLM prompt with conversation history             │ │
 │  │  - Stream LLM response                                    │ │
 │  │  - Send tokens to TTS                                     │ │
 │  └─────────────────┬─────────────────────────────────────────┘ │
 └────────────────────┼───────────────────────────────────────────┘
                     │ HTTP streaming to LLaMA server
                     ▼
 ┌─────────────────────────────────────────────────────────────────┐
 │              llama-cpp-server (AMD RX 6800)                     │
 │  - Streaming text generation                                   │
 │  - 20-30 tokens/sec                                            │
 │  - Returns: {"delta": {"content": "token"}}                    │
 └─────────────────┬───────────────────────────────────────────────┘
                  │ Token stream
                  ▼
 ┌─────────────────────────────────────────────────────────────────┐
 │                    miku-bot Container                           │
 │  ┌───────────────────────────────────────────────────────────┐ │
 │  │ audio_source.send_token()                                 │ │
 │  │  - Buffers tokens                                         │ │
 │  │  - Sends to RVC WebSocket                                 │ │
 │  └─────────────────┬─────────────────────────────────────────┘ │
 └────────────────────┼───────────────────────────────────────────┘
                     │ ws://miku-rvc-api:8765/ws/stream
                     ▼
 ┌─────────────────────────────────────────────────────────────────┐
 │                 miku-rvc-api Container                          │
 │  ┌───────────────────────────────────────────────────────────┐ │
 │  │ Soprano TTS Server (miku-soprano-tts)    [GTX 1660]      │ │
 │  │  - Text → Audio synthesis                                 │ │
 │  │  - 32kHz output                                           │ │
 │  └─────────────────┬─────────────────────────────────────────┘ │
 │                    │ Raw audio via ZMQ                          │
 │  ┌─────────────────▼─────────────────────────────────────────┐ │
 │  │ RVC Voice Conversion                     [GTX 1660]      │ │
 │  │  - Voice cloning & pitch shifting                         │ │
 │  │  - 48kHz output                                           │ │
 │  └─────────────────┬─────────────────────────────────────────┘ │
 └────────────────────┼───────────────────────────────────────────┘
                     │ PCM float32, 48kHz
                     ▼
 ┌─────────────────────────────────────────────────────────────────┐
 │                    miku-bot Container                           │
 │  ┌───────────────────────────────────────────────────────────┐ │
 │  │ discord.VoiceClient                                       │ │
 │  │  - Plays audio in voice channel                           │ │
 │  │  - Can be interrupted by user speech                      │ │
 │  └───────────────────────────────────────────────────────────┘ │
 └─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
 ┌─────────────────────────────────────────────────────────────────┐
 │                       USER OUTPUT                               │
 │                   (Miku's voice response)                       │
 └─────────────────────────────────────────────────────────────────┘
 ```
 ## Interruption Flow
 ```
 User speaks during Miku's TTS
         │
         ▼
 VAD detects speech (probability > 0.7)
         │
         ▼
 STT sends interruption event
         │
         ▼
 on_user_interruption() callback
         │
         ▼
 _cancel_tts() → voice_client.stop()
         │
         ▼
 POST http://miku-rvc-api:8765/interrupt
         │
         ▼
 Flush ZMQ socket + clear RVC buffers
         │
         ▼
 Miku stops speaking, ready for new input
 ```
 ## Hardware Utilization
 ### Listen Phase (User Speaking)
 - **CPU**: Silero VAD processing
 - **GTX 1660**: Faster-Whisper transcription (1.3GB VRAM)
 - **AMD RX 6800**: Idle
 ### Think Phase (LLM Generation)
 - **CPU**: Idle
 - **GTX 1660**: Idle
 - **AMD RX 6800**: LLaMA inference (20-30 tokens/sec)
 ### Speak Phase (Miku Responding)
 - **CPU**: Silero VAD monitoring for interruption
 - **GTX 1660**: Soprano TTS + RVC synthesis
 - **AMD RX 6800**: Idle
 ## Performance Metrics
 ### Expected Latencies
 | Stage                    | Latency      |
 |--------------------------|--------------|
 | Discord audio capture    | ~20ms        |
 | Opus decode + resample   | <10ms        |
 | VAD processing           | <50ms        |
 | Whisper transcription    | 200-500ms    |
 | LLM token generation     | 33-50ms/tok  |
 | TTS synthesis            | Real-time    |
 | **Total (speech → response)** | **1-2s** |
 ### VRAM Usage
 | GPU         | Component      | VRAM      |
 |-------------|----------------|-----------|
 | AMD RX 6800 | LLaMA 8B Q4    | ~5.5GB    |
 | GTX 1660    | Whisper small  | 1.3GB     |
 | GTX 1660    | Soprano + RVC  | ~3GB      |
 ## Key Files
 ### Bot Container
 - `bot/utils/stt_client.py` - WebSocket client for STT
 - `bot/utils/voice_receiver.py` - Discord audio sink
 - `bot/utils/voice_manager.py` - Voice session with STT integration
 - `bot/commands/voice.py` - Voice commands including listen/stop-listening
 ### STT Container
 - `stt/vad_processor.py` - Silero VAD with chunk buffering
 - `stt/whisper_transcriber.py` - Faster-Whisper transcription
 - `stt/stt_server.py` - FastAPI WebSocket server
 ### RVC Container
 - `soprano_to_rvc/soprano_rvc_api.py` - TTS + RVC pipeline with /interrupt endpoint
 ## Configuration Files
 ### docker-compose.yml
 - Network: `miku-network` (all containers)
 - Ports:
  - miku-bot: 8081 (API)
  - miku-rvc-api: 8765 (TTS)
  - miku-stt: 8001 (STT)
  - llama-cpp-server: 8080 (LLM)
 ### VAD Settings (stt/vad_processor.py)
 ```python
 threshold = 0.5          # Speech detection sensitivity
 min_speech = 250         # Minimum speech duration (ms)
 min_silence = 500        # Silence before speech_end (ms)
 interruption_threshold = 0.7  # Probability for interruption
 ```
 ### Whisper Settings (stt/whisper_transcriber.py)
 ```python
 model = "small"          # 1.3GB VRAM
 device = "cuda"
 compute_type = "float16"
 beam_size = 5
 patience = 1.0
 ```
 ## Testing Commands
 ```bash
 # Check all container health
 curl http://localhost:8001/health  # STT
 curl http://localhost:8765/health  # RVC
 curl http://localhost:8080/health  # LLM
 # Monitor logs
 docker logs -f miku-bot | grep -E "(listen|transcript|interrupt)"
 docker logs -f miku-stt
 docker logs -f miku-rvc-api | grep interrupt
 # Test interrupt endpoint
 curl -X POST http://localhost:8765/interrupt
 # Check GPU usage
 nvidia-smi
 ```
 ## Troubleshooting
 | Issue | Solution |
 |-------|----------|
 | No audio from Discord | Check bot has Connect and Speak permissions |
 | VAD not detecting | Speak louder, check microphone, lower threshold |
 | Empty transcripts | Speak for at least 1-2 seconds, check Whisper model |
 | Interruption not working | Verify `miku_speaking=true`, check VAD probability |
 | High latency | Profile each stage, check GPU utilization |
 ## Next Features (Phase 4C+)
 - [ ] KV cache precomputation from partial transcripts
 - [ ] Multi-user simultaneous conversation
 - [ ] Latency optimization (<1s total)
 - [ ] Voice activity history and analytics
 - [ ] Emotion detection from speech patterns
 - [ ] Context-aware interruption handling
 ---
 **Ready to test!** Use `!miku join` → `!miku listen` → speak to Miku 🎤